diff mbox series

[RFC,05/16] RISC-V: KVM: Implement VCPU interrupts and requests handling

Message ID 20190729115544.17895-6-anup.patel@wdc.com (mailing list archive)
State New, archived
Headers show
Series KVM RISC-V Support | expand

Commit Message

Anup Patel July 29, 2019, 11:56 a.m. UTC
This patch implements VCPU interrupts and requests which are both
asynchronous events.

The VCPU interrupts can be set/unset using KVM_INTERRUPT ioctl from
user-space. In future, the in-kernel IRQCHIP emulation will use
kvm_riscv_vcpu_set_interrupt() and kvm_riscv_vcpu_unset_interrupt()
functions to set/unset VCPU interrupts.

Important VCPU requests implemented by this patch are:
KVM_REQ_IRQ_PENDING - set whenever some VCPU interrupt pending
KVM_REQ_SLEEP       - set whenever VCPU itself goes to sleep state
KVM_REQ_VCPU_RESET  - set whenever VCPU reset is requested

The WFI trap-n-emulate (added later) will use KVM_REQ_SLEEP request
and kvm_riscv_vcpu_has_interrupt() function.

The KVM_REQ_VCPU_RESET request will be used by SBI emulation (added
later) to power-up a VCPU in power-off state. The user-space can use
the GET_MPSTATE/SET_MPSTATE ioctls to get/set power state of a VCPU.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
---
 arch/riscv/include/asm/kvm_host.h |  13 +++
 arch/riscv/include/uapi/asm/kvm.h |   3 +
 arch/riscv/kvm/vcpu.c             | 174 +++++++++++++++++++++++++++---
 3 files changed, 177 insertions(+), 13 deletions(-)

Comments

Paolo Bonzini July 30, 2019, 11:17 a.m. UTC | #1
First, something that is not clear to me: how do you deal with a guest
writing 1 to VSIP.SSIP?  I think that could lead to lost interrupts if
you have the following sequence

1) guest writes 1 to VSIP.SSIP

2) guest leaves VS-mode

3) host syncs VSIP

4) user mode triggers interrupt

5) host reenters guest

6) host moves irqs_pending to VSIP and clears VSIP.SSIP in the process

Perhaps irqs_pending needs to be split in two fields, irqs_pending and
irqs_pending_mask, and then you can do this:

/*
 * irqs_pending and irqs_pending_mask have multiple-producer/single-
 * consumer semantics; therefore bits can be set in the mask without
 * a lock, but clearing the bits requires vcpu_lock.  Furthermore,
 * consumers should never write to irqs_pending, and should not
 * use bits of irqs_pending that weren't 1 in the mask.
 */

int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
{
	...
	set_bit(irq, &vcpu->arch.irqs_pending);
	smp_mb__before_atomic();
	set_bit(irq, &vcpu->arch.irqs_pending_mask);
	kvm_vcpu_kick(vcpu);
}

int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
{
	...
	clear_bit(irq, &vcpu->arch.irqs_pending);
	smp_mb__before_atomic();
	set_bit(irq, &vcpu->arch.irqs_pending_mask);
}

static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
{
	...
	WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
}

and kvm_riscv_vcpu_flush_interrupts can leave aside VSIP bits that
aren't in vcpu->arch.irqs_pending_mask:

	if (atomic_read(&vcpu->arch.irqs_pending_mask)) {
		u32 mask, val;

		mask = xchg_acquire(&vcpu->arch.irqs_pending_mask, 0);
		val = READ_ONCE(vcpu->arch.irqs_pending) & mask;

		vcpu->arch.guest_csr.vsip &= ~mask;
		vcpu->arch.guest_csr.vsip |= val;
		csr_write(CSR_VSIP, vsip);
	}

Also, the getter of CSR_VSIP should call
kvm_riscv_vcpu_flush_interrupts, while the setter should clear
irqs_pending_mask.

On 29/07/19 13:56, Anup Patel wrote:
> +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> +	kvm_vcpu_kick(vcpu);

The request is not needed as long as kvm_riscv_vcpu_flush_interrupts is
called *after* smp_store_mb(vcpu->mode, IN_GUEST_MODE) in
kvm_arch_vcpu_ioctl_run.  This is the "request-less vCPU kick" pattern
in Documentation/virtual/kvm/vcpu-requests.rst.  The smp_store_mb then
orders the write of IN_GUEST_MODE before the read of irqs_pending (or
irqs_pending_mask in my proposal above); in the producers, there is a
dual memory barrier in kvm_vcpu_exiting_guest_mode(), ordering the write
of irqs_pending(_mask) before the read of vcpu->mode.

Similar to other VS* CSRs, I'd rather have a ONE_REG interface for VSIE
and VSIP from the beginning as well.  Note that the VSIP setter would
clear irqs_pending_mask, while the getter would call
kvm_riscv_vcpu_flush_interrupts before reading.  It's up to userspace to
ensure that no interrupt injections happen between the calls to the
getter and the setter.

Paolo

> +		csr_write(CSR_VSIP, vcpu->arch.irqs_pending);
> +		vcpu->arch.guest_csr.vsip = vcpu->arch.irqs_pending;
> +	}
Anup Patel July 30, 2019, noon UTC | #2
On Tue, Jul 30, 2019 at 4:47 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> First, something that is not clear to me: how do you deal with a guest
> writing 1 to VSIP.SSIP?  I think that could lead to lost interrupts if
> you have the following sequence
>
> 1) guest writes 1 to VSIP.SSIP
>
> 2) guest leaves VS-mode
>
> 3) host syncs VSIP
>
> 4) user mode triggers interrupt
>
> 5) host reenters guest
>
> 6) host moves irqs_pending to VSIP and clears VSIP.SSIP in the process

This reasoning also apply to M-mode firmware (OpenSBI) providing timer
and IPI services to HS-mode software. We had some discussion around
it in a different context.
(Refer, https://github.com/riscv/opensbi/issues/128)

The thing is SIP CSR is supposed to be read-only for any S-mode SW. This
means HS-mode/VS-mode SW modifications to SIP CSR should have no
effect.

For HS-mode, only certain bits are writable from M-mode such as SSIP
and in-future even this will go away when we have specialized HW to
trigger S-mode IPIs without going through M-mode firmware.

For VS-mode, only HS-mode controls the pending bits writes to VSIP CSR.

If above is honored correctly by HW then the use-case you mentioned above
is not possible because Guest writing 1 to SIP.SSIP will be ignored.

It is possible that we have buggy HW which does allow Guest write to SIP
CSR bits then our current approach is to just overwrite VSIP whenver it
is different from irq_pending bits before entering Guest.

Do you still an issue here?

Regards,
Anup

>
> Perhaps irqs_pending needs to be split in two fields, irqs_pending and
> irqs_pending_mask, and then you can do this:
>
> /*
>  * irqs_pending and irqs_pending_mask have multiple-producer/single-
>  * consumer semantics; therefore bits can be set in the mask without
>  * a lock, but clearing the bits requires vcpu_lock.  Furthermore,
>  * consumers should never write to irqs_pending, and should not
>  * use bits of irqs_pending that weren't 1 in the mask.
>  */
>
> int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
> {
>         ...
>         set_bit(irq, &vcpu->arch.irqs_pending);
>         smp_mb__before_atomic();
>         set_bit(irq, &vcpu->arch.irqs_pending_mask);
>         kvm_vcpu_kick(vcpu);
> }
>
> int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
> {
>         ...
>         clear_bit(irq, &vcpu->arch.irqs_pending);
>         smp_mb__before_atomic();
>         set_bit(irq, &vcpu->arch.irqs_pending_mask);
> }
>
> static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
> {
>         ...
>         WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
> }
>
> and kvm_riscv_vcpu_flush_interrupts can leave aside VSIP bits that
> aren't in vcpu->arch.irqs_pending_mask:
>
>         if (atomic_read(&vcpu->arch.irqs_pending_mask)) {
>                 u32 mask, val;
>
>                 mask = xchg_acquire(&vcpu->arch.irqs_pending_mask, 0);
>                 val = READ_ONCE(vcpu->arch.irqs_pending) & mask;
>
>                 vcpu->arch.guest_csr.vsip &= ~mask;
>                 vcpu->arch.guest_csr.vsip |= val;
>                 csr_write(CSR_VSIP, vsip);
>         }
>
> Also, the getter of CSR_VSIP should call
> kvm_riscv_vcpu_flush_interrupts, while the setter should clear
> irqs_pending_mask.
>
> On 29/07/19 13:56, Anup Patel wrote:
> > +     kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> > +     kvm_vcpu_kick(vcpu);
>
> The request is not needed as long as kvm_riscv_vcpu_flush_interrupts is
> called *after* smp_store_mb(vcpu->mode, IN_GUEST_MODE) in
> kvm_arch_vcpu_ioctl_run.  This is the "request-less vCPU kick" pattern
> in Documentation/virtual/kvm/vcpu-requests.rst.  The smp_store_mb then
> orders the write of IN_GUEST_MODE before the read of irqs_pending (or
> irqs_pending_mask in my proposal above); in the producers, there is a
> dual memory barrier in kvm_vcpu_exiting_guest_mode(), ordering the write
> of irqs_pending(_mask) before the read of vcpu->mode.
>
> Similar to other VS* CSRs, I'd rather have a ONE_REG interface for VSIE
> and VSIP from the beginning as well.  Note that the VSIP setter would
> clear irqs_pending_mask, while the getter would call
> kvm_riscv_vcpu_flush_interrupts before reading.  It's up to userspace to
> ensure that no interrupt injections happen between the calls to the
> getter and the setter.
>
> Paolo
>
> > +             csr_write(CSR_VSIP, vcpu->arch.irqs_pending);
> > +             vcpu->arch.guest_csr.vsip = vcpu->arch.irqs_pending;
> > +     }
>
Paolo Bonzini July 30, 2019, 12:12 p.m. UTC | #3
On 30/07/19 14:00, Anup Patel wrote:
> On Tue, Jul 30, 2019 at 4:47 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> First, something that is not clear to me: how do you deal with a guest
>> writing 1 to VSIP.SSIP?  I think that could lead to lost interrupts if
>> you have the following sequence
>>
>> 1) guest writes 1 to VSIP.SSIP
>>
>> 2) guest leaves VS-mode
>>
>> 3) host syncs VSIP
>>
>> 4) user mode triggers interrupt
>>
>> 5) host reenters guest
>>
>> 6) host moves irqs_pending to VSIP and clears VSIP.SSIP in the process
> 
> This reasoning also apply to M-mode firmware (OpenSBI) providing timer
> and IPI services to HS-mode software. We had some discussion around
> it in a different context.
> (Refer, https://github.com/riscv/opensbi/issues/128)
> 
> The thing is SIP CSR is supposed to be read-only for any S-mode SW. This
> means HS-mode/VS-mode SW modifications to SIP CSR should have no
> effect.

Is it?  The privileged specification says

  Interprocessor interrupts are sent to other harts by implementation-
  specific means, which will ultimately cause the SSIP bit to be set in
  the recipient hart’s sip register.

  All bits besides SSIP in the sip register are read-only.

Meaning that sending an IPI to self by writing 1 to sip.SSIP is
well-defined.  The same should be true of vsip.SSIP while in VS mode.

> Do you still an issue here?

Do you see any issues in the pseudocode I sent?  It gets away with the
spinlock and request so it may be a good idea anyway. :)

Paolo

> Regards,
> Anup
> 
>>
>> Perhaps irqs_pending needs to be split in two fields, irqs_pending and
>> irqs_pending_mask, and then you can do this:
>>
>> /*
>>  * irqs_pending and irqs_pending_mask have multiple-producer/single-
>>  * consumer semantics; therefore bits can be set in the mask without
>>  * a lock, but clearing the bits requires vcpu_lock.  Furthermore,
>>  * consumers should never write to irqs_pending, and should not
>>  * use bits of irqs_pending that weren't 1 in the mask.
>>  */
>>
>> int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
>> {
>>         ...
>>         set_bit(irq, &vcpu->arch.irqs_pending);
>>         smp_mb__before_atomic();
>>         set_bit(irq, &vcpu->arch.irqs_pending_mask);
>>         kvm_vcpu_kick(vcpu);
>> }
>>
>> int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
>> {
>>         ...
>>         clear_bit(irq, &vcpu->arch.irqs_pending);
>>         smp_mb__before_atomic();
>>         set_bit(irq, &vcpu->arch.irqs_pending_mask);
>> }
>>
>> static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
>> {
>>         ...
>>         WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
>> }
>>
>> and kvm_riscv_vcpu_flush_interrupts can leave aside VSIP bits that
>> aren't in vcpu->arch.irqs_pending_mask:
>>
>>         if (atomic_read(&vcpu->arch.irqs_pending_mask)) {
>>                 u32 mask, val;
>>
>>                 mask = xchg_acquire(&vcpu->arch.irqs_pending_mask, 0);
>>                 val = READ_ONCE(vcpu->arch.irqs_pending) & mask;
>>
>>                 vcpu->arch.guest_csr.vsip &= ~mask;
>>                 vcpu->arch.guest_csr.vsip |= val;
>>                 csr_write(CSR_VSIP, vsip);
>>         }
>>
>> Also, the getter of CSR_VSIP should call
>> kvm_riscv_vcpu_flush_interrupts, while the setter should clear
>> irqs_pending_mask.
>>
>> On 29/07/19 13:56, Anup Patel wrote:
>>> +     kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>>> +     kvm_vcpu_kick(vcpu);
>>
>> The request is not needed as long as kvm_riscv_vcpu_flush_interrupts is
>> called *after* smp_store_mb(vcpu->mode, IN_GUEST_MODE) in
>> kvm_arch_vcpu_ioctl_run.  This is the "request-less vCPU kick" pattern
>> in Documentation/virtual/kvm/vcpu-requests.rst.  The smp_store_mb then
>> orders the write of IN_GUEST_MODE before the read of irqs_pending (or
>> irqs_pending_mask in my proposal above); in the producers, there is a
>> dual memory barrier in kvm_vcpu_exiting_guest_mode(), ordering the write
>> of irqs_pending(_mask) before the read of vcpu->mode.
>>
>> Similar to other VS* CSRs, I'd rather have a ONE_REG interface for VSIE
>> and VSIP from the beginning as well.  Note that the VSIP setter would
>> clear irqs_pending_mask, while the getter would call
>> kvm_riscv_vcpu_flush_interrupts before reading.  It's up to userspace to
>> ensure that no interrupt injections happen between the calls to the
>> getter and the setter.
>>
>> Paolo
>>
>>> +             csr_write(CSR_VSIP, vcpu->arch.irqs_pending);
>>> +             vcpu->arch.guest_csr.vsip = vcpu->arch.irqs_pending;
>>> +     }
>>
Anup Patel July 30, 2019, 12:45 p.m. UTC | #4
On Tue, Jul 30, 2019 at 5:42 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 30/07/19 14:00, Anup Patel wrote:
> > On Tue, Jul 30, 2019 at 4:47 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>
> >> First, something that is not clear to me: how do you deal with a guest
> >> writing 1 to VSIP.SSIP?  I think that could lead to lost interrupts if
> >> you have the following sequence
> >>
> >> 1) guest writes 1 to VSIP.SSIP
> >>
> >> 2) guest leaves VS-mode
> >>
> >> 3) host syncs VSIP
> >>
> >> 4) user mode triggers interrupt
> >>
> >> 5) host reenters guest
> >>
> >> 6) host moves irqs_pending to VSIP and clears VSIP.SSIP in the process
> >
> > This reasoning also apply to M-mode firmware (OpenSBI) providing timer
> > and IPI services to HS-mode software. We had some discussion around
> > it in a different context.
> > (Refer, https://github.com/riscv/opensbi/issues/128)
> >
> > The thing is SIP CSR is supposed to be read-only for any S-mode SW. This
> > means HS-mode/VS-mode SW modifications to SIP CSR should have no
> > effect.
>
> Is it?  The privileged specification says
>
>   Interprocessor interrupts are sent to other harts by implementation-
>   specific means, which will ultimately cause the SSIP bit to be set in
>   the recipient hart’s sip register.

To further explain my rationale ...

Here's some text from RISC-V spec regarding MIP CSR:
"The mip register is an MXLEN-bit read/write register containing information
on pending interrupts, while mie is the corresponding MXLEN-bit read/write
register containing interrupt enable bits. Only the bits corresponding to
lower-privilege software interrupts (USIP, SSIP), timer interrupts (UTIP, STIP),
and external interrupts (UEIP, SEIP) in mip are writable through this CSR
address; the remaining bits are read-only."

Here's some text from RISC-V spec regarding SIP CSR:
"software interrupt-pending (SSIP) bit in the sip register. A pending
supervisor-level software interrupt can be cleared by writing 0 to the SSIP bit
in sip. Supervisor-level software interrupts are disabled when the SSIE bit in
the sie register is clear."

Without RISC-V hypervisor extension, the SIP is essentially a restricted
view of MIP CSR. Also as-per above, S-mode SW can only write 0 to SSIP
bit in SIP CSR whereas it can only be set by M-mode SW or some HW
mechanism (such as S-mode CLINT).

There was quite a bit of discussion in last RISC-V Zurich Workshop about
avoiding SBI calls for injecting IPIs. The best suggestion so far is to
eventually have RISC-V systems with separate CLINT HW for M-mode
and S-mode. The S-mode SW can use S-mode CLINT to trigger IPIs to
other CPUs and it will use SBI calls for IPIs only when S-mode CLINT is
not available.

>
>   All bits besides SSIP in the sip register are read-only.
>
> Meaning that sending an IPI to self by writing 1 to sip.SSIP is
> well-defined.  The same should be true of vsip.SSIP while in VS mode.
>
> > Do you still an issue here?
>
> Do you see any issues in the pseudocode I sent?  It gets away with the
> spinlock and request so it may be a good idea anyway. :)

Yes, I am evaluating your psedocode right now. I definitely want to
remove the irq_pending_lock if possible. I will try to in-corporate your
suggestion in v2 series.

Regards,
Anup
Paolo Bonzini July 30, 2019, 1:18 p.m. UTC | #5
On 30/07/19 14:45, Anup Patel wrote:
> Here's some text from RISC-V spec regarding SIP CSR:
> "software interrupt-pending (SSIP) bit in the sip register. A pending
> supervisor-level software interrupt can be cleared by writing 0 to the SSIP bit
> in sip. Supervisor-level software interrupts are disabled when the SSIE bit in
> the sie register is clear."
> 
> Without RISC-V hypervisor extension, the SIP is essentially a restricted
> view of MIP CSR. Also as-per above, S-mode SW can only write 0 to SSIP
> bit in SIP CSR whereas it can only be set by M-mode SW or some HW
> mechanism (such as S-mode CLINT).

But that's not what the spec says.  It just says (just before the
sentence you quoted):

   A supervisor-level software interrupt is triggered on the current
   hart by writing 1 to its supervisor software interrupt-pending (SSIP)
   bit in the sip register.

and it's not written anywhere that S-mode SW cannot write 1.  In fact
that text is even under sip, not under mip, so IMO there's no doubt that
S-mode SW _can_ write 1, and the hypervisor must operate accordingly.

In fact I'm sure that if Windows were ever ported to RISC-V, it would be
very happy to use that feature.  On x86, Intel even accelerated it
specifically for Microsoft. :)

Paolo
Anup Patel July 30, 2019, 1:35 p.m. UTC | #6
On Tue, Jul 30, 2019 at 6:48 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 30/07/19 14:45, Anup Patel wrote:
> > Here's some text from RISC-V spec regarding SIP CSR:
> > "software interrupt-pending (SSIP) bit in the sip register. A pending
> > supervisor-level software interrupt can be cleared by writing 0 to the SSIP bit
> > in sip. Supervisor-level software interrupts are disabled when the SSIE bit in
> > the sie register is clear."
> >
> > Without RISC-V hypervisor extension, the SIP is essentially a restricted
> > view of MIP CSR. Also as-per above, S-mode SW can only write 0 to SSIP
> > bit in SIP CSR whereas it can only be set by M-mode SW or some HW
> > mechanism (such as S-mode CLINT).
>
> But that's not what the spec says.  It just says (just before the
> sentence you quoted):
>
>    A supervisor-level software interrupt is triggered on the current
>    hart by writing 1 to its supervisor software interrupt-pending (SSIP)
>    bit in the sip register.

Unfortunately, this statement does not state who is allowed to write 1
in SIP.SSIP bit.

I quoted MIP CSR documentation to highlight the fact that only M-mode
SW can set SSIP bit.

In fact, I had same understanding as you have regarding SSIP bit
until we had MSIP issue in OpenSBI.
(https://github.com/riscv/opensbi/issues/128)

>
> and it's not written anywhere that S-mode SW cannot write 1.  In fact
> that text is even under sip, not under mip, so IMO there's no doubt that
> S-mode SW _can_ write 1, and the hypervisor must operate accordingly.

Without hypervisor support, SIP CSR is nothing but a restricted view of
MIP CSR thats why MIP CSR documentation applies here.

I think this discussion deserves a Github issue on RISC-V ISA manual.

If my interpretation is incorrect then it would be really strange that
HART in S-mode SW can inject IPI to itself by writing 1 to SIP.SSIP bit.

>
> In fact I'm sure that if Windows were ever ported to RISC-V, it would be
> very happy to use that feature.  On x86, Intel even accelerated it
> specifically for Microsoft. :)

That would be indeed very strange usage.  :)

Regards,
Anup
Paolo Bonzini July 30, 2019, 2:08 p.m. UTC | #7
On 30/07/19 15:35, Anup Patel wrote:
> On Tue, Jul 30, 2019 at 6:48 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 30/07/19 14:45, Anup Patel wrote:
>>> Here's some text from RISC-V spec regarding SIP CSR:
>>> "software interrupt-pending (SSIP) bit in the sip register. A pending
>>> supervisor-level software interrupt can be cleared by writing 0 to the SSIP bit
>>> in sip. Supervisor-level software interrupts are disabled when the SSIE bit in
>>> the sie register is clear."
>>>
>>> Without RISC-V hypervisor extension, the SIP is essentially a restricted
>>> view of MIP CSR. Also as-per above, S-mode SW can only write 0 to SSIP
>>> bit in SIP CSR whereas it can only be set by M-mode SW or some HW
>>> mechanism (such as S-mode CLINT).
>>
>> But that's not what the spec says.  It just says (just before the
>> sentence you quoted):
>>
>>    A supervisor-level software interrupt is triggered on the current
>>    hart by writing 1 to its supervisor software interrupt-pending (SSIP)
>>    bit in the sip register.
> 
> Unfortunately, this statement does not state who is allowed to write 1
> in SIP.SSIP bit.

If it doesn't state who is allowed to write 1, whoever has access to sip
can.

> I quoted MIP CSR documentation to highlight the fact that only M-mode
> SW can set SSIP bit.
> 
> In fact, I had same understanding as you have regarding SSIP bit
> until we had MSIP issue in OpenSBI.
> (https://github.com/riscv/opensbi/issues/128)
>
>> and it's not written anywhere that S-mode SW cannot write 1.  In fact
>> that text is even under sip, not under mip, so IMO there's no doubt that
>> S-mode SW _can_ write 1, and the hypervisor must operate accordingly.
> 
> Without hypervisor support, SIP CSR is nothing but a restricted view of
> MIP CSR thats why MIP CSR documentation applies here.

But the privileged spec says mip.MSIP is read-only, it cannot be cleared
(as in the above OpenSBI issue).  So mip.MSIP and sip.SSIP are already
different in that respect, and I don't see how the spec says that S-mode
SW cannot set sip.SSIP.

(As an aside, why would M-mode even bother using sip and not mip to
write 1 to SSIP?).

> I think this discussion deserves a Github issue on RISC-V ISA manual.

Perhaps, but I think it makes more sense this way.  The question remains
of why M-mode is not allowed to write to MSIP/MEIP/MTIP.  My guess is
that then MSIP/MEIP/MTIP are simply a read-only view of an external pin,
so it simplifies hardware a tiny bit by forcing acks to go through the
MMIO registers.

> If my interpretation is incorrect then it would be really strange that
> HART in S-mode SW can inject IPI to itself by writing 1 to SIP.SSIP bit.

Well, it can be useful, for example Windows does it when interrupt
handlers want to schedule some work to happen out of interrupt context.
 Going through SBI would be unpleasant if it causes an HS-mode trap.

Paolo

>>
>> In fact I'm sure that if Windows were ever ported to RISC-V, it would be
>> very happy to use that feature.  On x86, Intel even accelerated it
>> specifically for Microsoft. :)
> 
> That would be indeed very strange usage.  :)
> 
> Regards,
> Anup
>
Anup Patel Aug. 2, 2019, 3:59 a.m. UTC | #8
On Tue, Jul 30, 2019 at 7:38 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 30/07/19 15:35, Anup Patel wrote:
> > On Tue, Jul 30, 2019 at 6:48 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>
> >> On 30/07/19 14:45, Anup Patel wrote:
> >>> Here's some text from RISC-V spec regarding SIP CSR:
> >>> "software interrupt-pending (SSIP) bit in the sip register. A pending
> >>> supervisor-level software interrupt can be cleared by writing 0 to the SSIP bit
> >>> in sip. Supervisor-level software interrupts are disabled when the SSIE bit in
> >>> the sie register is clear."
> >>>
> >>> Without RISC-V hypervisor extension, the SIP is essentially a restricted
> >>> view of MIP CSR. Also as-per above, S-mode SW can only write 0 to SSIP
> >>> bit in SIP CSR whereas it can only be set by M-mode SW or some HW
> >>> mechanism (such as S-mode CLINT).
> >>
> >> But that's not what the spec says.  It just says (just before the
> >> sentence you quoted):
> >>
> >>    A supervisor-level software interrupt is triggered on the current
> >>    hart by writing 1 to its supervisor software interrupt-pending (SSIP)
> >>    bit in the sip register.
> >
> > Unfortunately, this statement does not state who is allowed to write 1
> > in SIP.SSIP bit.
>
> If it doesn't state who is allowed to write 1, whoever has access to sip
> can.
>
> > I quoted MIP CSR documentation to highlight the fact that only M-mode
> > SW can set SSIP bit.
> >
> > In fact, I had same understanding as you have regarding SSIP bit
> > until we had MSIP issue in OpenSBI.
> > (https://github.com/riscv/opensbi/issues/128)
> >
> >> and it's not written anywhere that S-mode SW cannot write 1.  In fact
> >> that text is even under sip, not under mip, so IMO there's no doubt that
> >> S-mode SW _can_ write 1, and the hypervisor must operate accordingly.
> >
> > Without hypervisor support, SIP CSR is nothing but a restricted view of
> > MIP CSR thats why MIP CSR documentation applies here.
>
> But the privileged spec says mip.MSIP is read-only, it cannot be cleared
> (as in the above OpenSBI issue).  So mip.MSIP and sip.SSIP are already
> different in that respect, and I don't see how the spec says that S-mode
> SW cannot set sip.SSIP.
>
> (As an aside, why would M-mode even bother using sip and not mip to
> write 1 to SSIP?).
>
> > I think this discussion deserves a Github issue on RISC-V ISA manual.
>
> Perhaps, but I think it makes more sense this way.  The question remains
> of why M-mode is not allowed to write to MSIP/MEIP/MTIP.  My guess is
> that then MSIP/MEIP/MTIP are simply a read-only view of an external pin,
> so it simplifies hardware a tiny bit by forcing acks to go through the
> MMIO registers.
>
> > If my interpretation is incorrect then it would be really strange that
> > HART in S-mode SW can inject IPI to itself by writing 1 to SIP.SSIP bit.
>
> Well, it can be useful, for example Windows does it when interrupt
> handlers want to schedule some work to happen out of interrupt context.
>  Going through SBI would be unpleasant if it causes an HS-mode trap.

Another way of artificially injecting interrupt would be using interrupt
controller, where Windows can just write to some pending register of
interrupt controller.

I have raised a new Github issue on GitHub for clarity on this. You can
add your comments to this issue as well.
https://github.com/riscv/riscv-isa-manual/issues/425

Also, I have raised a proposal to support mechanism for external entity
(such as PLICv2 with virtualization support) to inject virtual interrupts.
https://github.com/riscv/riscv-isa-manual/issues/429

Regards,
Anup
diff mbox series

Patch

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 244eabe62710..aa89f1922da1 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -125,6 +125,13 @@  struct kvm_vcpu_arch {
 	/* CPU CSR context upon Guest VCPU reset */
 	struct kvm_vcpu_csr guest_reset_csr;
 
+	/* VCPU interrupts */
+	raw_spinlock_t irqs_lock;
+	unsigned long irqs_pending;
+
+	/* VCPU power-off state */
+	bool power_off;
+
 	/* Don't run the VCPU (blocked) */
 	bool pause;
 };
@@ -146,6 +153,12 @@  int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 
 static inline void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch) {}
 
+int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
+int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
+bool kvm_riscv_vcpu_has_interrupt(struct kvm_vcpu *vcpu);
+void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu);
+void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
+
 void kvm_riscv_halt_guest(struct kvm *kvm);
 void kvm_riscv_resume_guest(struct kvm *kvm);
 
diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
index d15875818b6e..6dbc056d58ba 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -18,6 +18,9 @@ 
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 
+#define KVM_INTERRUPT_SET	-1U
+#define KVM_INTERRUPT_UNSET	-2U
+
 /* for KVM_GET_REGS and KVM_SET_REGS */
 struct kvm_regs {
 };
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 1ae806f28c0e..c6f57caa95f0 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -42,6 +42,7 @@  struct kvm_stats_debugfs_item debugfs_entries[] = {
 
 static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
 {
+	unsigned long f;
 	struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr;
 	struct kvm_vcpu_csr *reset_csr = &vcpu->arch.guest_reset_csr;
 	struct kvm_cpu_context *cntx = &vcpu->arch.guest_context;
@@ -50,6 +51,10 @@  static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
 	memcpy(csr, reset_csr, sizeof(*csr));
 
 	memcpy(cntx, reset_cntx, sizeof(*cntx));
+
+	raw_spin_lock_irqsave(&vcpu->arch.irqs_lock, f);
+	vcpu->arch.irqs_pending = 0;
+	raw_spin_unlock_irqrestore(&vcpu->arch.irqs_lock, f);
 }
 
 struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
@@ -103,6 +108,9 @@  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 	cntx->hstatus |= HSTATUS_SP2P;
 	cntx->hstatus |= HSTATUS_SPV;
 
+	/* Setup VCPU irqs lock */
+	raw_spin_lock_init(&vcpu->arch.irqs_lock);
+
 	/* Setup reset state of HEDELEG and HIDELEG CSRs */
 	csr = &vcpu->arch.guest_reset_csr;
 	csr->hedeleg = 0;
@@ -131,8 +139,15 @@  void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
 {
-	/* TODO: */
-	return 0;
+	int ret;
+	unsigned long f, irqs;
+
+	raw_spin_lock_irqsave(&vcpu->arch.irqs_lock, f);
+	irqs = vcpu->arch.irqs_pending & vcpu->arch.guest_csr.vsie;
+	ret = (irqs & (1UL << IRQ_S_TIMER)) ? 1 : 0;
+	raw_spin_unlock_irqrestore(&vcpu->arch.irqs_lock, f);
+
+	return ret;
 }
 
 void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
@@ -145,20 +160,18 @@  void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
 
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 {
-	/* TODO: */
-	return 0;
+	return (kvm_riscv_vcpu_has_interrupt(vcpu) &&
+		!vcpu->arch.power_off && !vcpu->arch.pause);
 }
 
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
 {
-	/* TODO: */
-	return 0;
+	return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
 }
 
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
 {
-	/* TODO: */
-	return false;
+	return (vcpu->arch.guest_context.sstatus & SR_SPP) ? true : false;
 }
 
 bool kvm_arch_has_vcpu_debugfs(void)
@@ -179,7 +192,21 @@  vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
 long kvm_arch_vcpu_async_ioctl(struct file *filp,
 			       unsigned int ioctl, unsigned long arg)
 {
-	/* TODO; */
+	struct kvm_vcpu *vcpu = filp->private_data;
+	void __user *argp = (void __user *)arg;
+
+	if (ioctl == KVM_INTERRUPT) {
+		struct kvm_interrupt irq;
+
+		if (copy_from_user(&irq, argp, sizeof(irq)))
+			return -EFAULT;
+
+		if (irq.irq == KVM_INTERRUPT_SET)
+			return kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_S_EXT);
+		else
+			return kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_S_EXT);
+	}
+
 	return -ENOIOCTLCMD;
 }
 
@@ -228,18 +255,113 @@  int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 	return -EINVAL;
 }
 
+static void kvm_riscv_vcpu_flush_interrupts(struct kvm_vcpu *vcpu)
+{
+	unsigned long f;
+
+	raw_spin_lock_irqsave(&vcpu->arch.irqs_lock, f);
+	if (vcpu->arch.irqs_pending ^ vcpu->arch.guest_csr.vsip) {
+		csr_write(CSR_VSIP, vcpu->arch.irqs_pending);
+		vcpu->arch.guest_csr.vsip = vcpu->arch.irqs_pending;
+	}
+	raw_spin_unlock_irqrestore(&vcpu->arch.irqs_lock, f);
+}
+
+static void kvm_riscv_vcpu_sync_interrupts(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.guest_csr.vsip = csr_read(CSR_VSIP);
+	vcpu->arch.guest_csr.vsie = csr_read(CSR_VSIE);
+}
+
+int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
+{
+	unsigned long f;
+
+	if (irq != IRQ_S_SOFT &&
+	    irq != IRQ_S_TIMER &&
+	    irq != IRQ_S_EXT)
+		return -EINVAL;
+
+	raw_spin_lock_irqsave(&vcpu->arch.irqs_lock, f);
+	vcpu->arch.irqs_pending |= (1UL << irq);
+	raw_spin_unlock_irqrestore(&vcpu->arch.irqs_lock, f);
+
+	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
+	kvm_vcpu_kick(vcpu);
+
+	return 0;
+}
+
+int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
+{
+	unsigned long f;
+
+	if (irq != IRQ_S_SOFT &&
+	    irq != IRQ_S_TIMER &&
+	    irq != IRQ_S_EXT)
+		return -EINVAL;
+
+	raw_spin_lock_irqsave(&vcpu->arch.irqs_lock, f);
+	vcpu->arch.irqs_pending &= ~(1UL << irq);
+	raw_spin_unlock_irqrestore(&vcpu->arch.irqs_lock, f);
+
+	return 0;
+}
+
+bool kvm_riscv_vcpu_has_interrupt(struct kvm_vcpu *vcpu)
+{
+	bool ret = false;
+	unsigned long f;
+
+	raw_spin_lock_irqsave(&vcpu->arch.irqs_lock, f);
+	if (vcpu->arch.irqs_pending & vcpu->arch.guest_csr.vsie)
+		ret = true;
+	raw_spin_unlock_irqrestore(&vcpu->arch.irqs_lock, f);
+
+	return ret;
+}
+
+void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.power_off = true;
+	kvm_make_request(KVM_REQ_SLEEP, vcpu);
+	kvm_vcpu_kick(vcpu);
+}
+
+void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.power_off = false;
+	kvm_vcpu_wake_up(vcpu);
+}
+
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 				    struct kvm_mp_state *mp_state)
 {
-	/* TODO: */
+	if (vcpu->arch.power_off)
+		mp_state->mp_state = KVM_MP_STATE_STOPPED;
+	else
+		mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
+
 	return 0;
 }
 
 int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 				    struct kvm_mp_state *mp_state)
 {
-	/* TODO: */
-	return 0;
+	int ret = 0;
+
+	switch (mp_state->mp_state) {
+	case KVM_MP_STATE_RUNNABLE:
+		vcpu->arch.power_off = false;
+		break;
+	case KVM_MP_STATE_STOPPED:
+		kvm_riscv_vcpu_power_off(vcpu);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
 }
 
 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
@@ -263,8 +385,25 @@  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 
 static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu)
 {
+	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
+
 	if (kvm_request_pending(vcpu)) {
-		/* TODO: */
+		if (kvm_check_request(KVM_REQ_SLEEP, vcpu)) {
+			swait_event_interruptible_exclusive(*wq,
+						((!vcpu->arch.power_off) &&
+						(!vcpu->arch.pause)));
+
+			if (vcpu->arch.power_off || vcpu->arch.pause) {
+				/*
+				 * Awaken to handle a signal, request to
+				 * sleep again later.
+				 */
+				kvm_make_request(KVM_REQ_SLEEP, vcpu);
+			}
+		}
+
+		if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu))
+			kvm_riscv_reset_vcpu(vcpu);
 
 		/*
 		 * Clear IRQ_PENDING requests that were made to guarantee
@@ -317,6 +456,12 @@  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 			run->exit_reason = KVM_EXIT_INTR;
 		}
 
+		/*
+		 * We might have got VCPU interrupts updated asynchronously
+		 * so update it in HW.
+		 */
+		kvm_riscv_vcpu_flush_interrupts(vcpu);
+
 		/*
 		 * Ensure we set mode to IN_GUEST_MODE after we disable
 		 * interrupts and before the final VCPU requests check.
@@ -347,6 +492,9 @@  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		scause = csr_read(CSR_SCAUSE);
 		stval = csr_read(CSR_STVAL);
 
+		/* Syncup interrupts state with HW */
+		kvm_riscv_vcpu_sync_interrupts(vcpu);
+
 		/*
 		 * We may have taken a host interrupt in VS/VU-mode (i.e.
 		 * while executing the guest). This interrupt is still