diff mbox series

[RFC,12/39] KVM: x86/xen: store virq when assigning evtchn

Message ID 20190220201609.28290-13-joao.m.martins@oracle.com (mailing list archive)
State New, archived
Headers show
Series x86/KVM: Xen HVM guest support | expand

Commit Message

Joao Martins Feb. 20, 2019, 8:15 p.m. UTC
Enable virq offload to the hypervisor. The primary user for this is
the timer virq.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/xen.c              | 23 ++++++++++++++++++++++-
 2 files changed, 24 insertions(+), 1 deletion(-)

Comments

Joao Martins Feb. 10, 2022, 12:17 p.m. UTC | #1
On 2/8/22 16:17, Woodhouse, David wrote:
> On Wed, 2019-02-20 at 20:15 +0000, Joao Martins wrote:
>> Enable virq offload to the hypervisor. The primary user for this is
>> the timer virq.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> 
> ...
> 
>> @@ -636,8 +654,11 @@ static int kvm_xen_eventfd_assign(struct kvm *kvm, struct idr *port_to_evt,
>>  			GFP_KERNEL);
>>  	mutex_unlock(port_lock);
>>  
>> -	if (ret >= 0)
>> +	if (ret >= 0) {
>> +		if (evtchnfd->type == XEN_EVTCHN_TYPE_VIRQ)
>> +			kvm_xen_set_virq(kvm, evtchnfd);
>>  		return 0;
>> +	}
>>  
>>  	if (ret == -ENOSPC)
>>  		ret = -EEXIST;
>>
> 
> So, I've spent a while vacillating about how best we should do this.
> 
> Since event channels are bidirectional, we essentially have *two*
> number spaces for them.
> 
> We have the inbound events, in the KVM IRQ routing table (which 5.17
> already supports for delivering PIRQs, based on my mangling of your
> earlier patches).
> 
/me nods

> And then we have the *outbound* events, which the guest can invoke with
> the EVTCHNOP_send hypercall. Those are either:
>  • IPI, raising the same port# on the guest
>  • Interdomain looped back to a different port# on the guest
>  • Interdomain triggering an eventfd.
> 
/me nods

I am forgetting why you one do this on Xen:

* Interdomain looped back to a different port# on the guest

> In the last case, that eventfd can be set up with IRQFD for direct
> event channel delivery to a different KVM/Xen guest.
> 
> I've used your implemention, with an idr for the outbound port# space
> intercepting EVTCHNOP_send for known ports and only letting userspace
> see the hypercall if it's for a port# the kernel doesn't know. Looks a
> bit like
> https://git.infradead.org/users/dwmw2/linux.git/commitdiff/b4fbc49218a
> 
> 
> But I *don't* want to do the VIRQ part shown above, "spotting" the VIRQ
> in that outbound port# space and squirreling the information away into
> the kvm_vcpu for when we need to deliver a timer event.
> 
> The VIRQ isn't part of the *outbound* port# space; it isn't a port to
> which a Xen guest can use EVTCHNOP_send to send an event. 

But it is still an event channel which port is unique regardless of port
type/space hence (...)

> If anything,
> it would be part of the *inbound* port# space, in the KVM IRQ routing
> table. So perhaps we could have a similar snippet in
> kvm_xen_setup_evtchn() which spots a VIRQ and says "aha, now I know
> where to deliver timer events for this vCPU".
> 
(...) The thinking at the time was mainly simplicity so our way of saying
'offload the evtchn to KVM' was through the machinery that offloads the outbound
part (using your terminology). I don't think even using XEN_EVENTFD as proposed
here that that one could send an VIRQ via EVTCHNOP_send (I could be wrong as
it has been a long time).

Regardless, I think you have a good point to split the semantics and (...)

> But... the IRQ routing table isn't really set up for that, and doesn't
> have explicit *deletion*. The kvm_xen_setup_evtchn() function might get
> called to translate into an updated table which is subsequently
> *abandoned*, and it would never know. I suppose we could stash the GSI#
> and then when we want to deliver it we look up that GSI# in the current
> table and see if it's *stale* but that's getting nasty.
> 
> I suppose that's not insurmountable, but the other problem with
> inferring it from *either* the inbound or outbound port# tables is that
> the vCPU might not even *exist* at the time the table is set up (after
> live update or live migration, as the vCPU threads all go off and do
> their thing and *eventually* create their vCPUs, while the machine
> itself is being restored on the main VMM thread.)
> 
> So I think I'm going to make the timer VIRQ (port#, priority) into an
> explicit KVM_XEN_VCPU_ATTR_TYPE. 

(...) thus this makes sense. Do you particularly care about
VIRQ_DEBUG?

> Along with the *actual* timer expiry,
> which we need to extract/restore for LU/LM too, don't we?
> /me nods

I haven't thought that one well for Live Update / Live Migration, but
I wonder if wouldn't be better to be instead a general 'xen state'
attr type should you need more than just pending timers expiry. Albeit
considering that the VMM has everything it needs (?), perhaps for Xen PV
timer look to be the oddball missing, and we donºt need to go that extent.
David Woodhouse Feb. 10, 2022, 3:23 p.m. UTC | #2
On Thu, 2022-02-10 at 12:17 +0000, Joao Martins wrote:
> On 2/8/22 16:17, Woodhouse, David wrote:
> > And then we have the *outbound* events, which the guest can invoke with
> > the EVTCHNOP_send hypercall. Those are either:
> >  • IPI, raising the same port# on the guest
> >  • Interdomain looped back to a different port# on the guest
> >  • Interdomain triggering an eventfd.
> > 
> 
> /me nods
> 
> I am forgetting why you one do this on Xen:
> 
> * Interdomain looped back to a different port# on the guest

It's one of the few things we had to fix up when we started running PV
guests in the 'shim' under KVM. I don't know that it actually sends
loopback events via the true Xen (or KVM) host, but it does at least
register them so that the port# is 'reserved' and the host won't
allocate that port for anything else. It does it at least for the
console port.

For the inbound vs. outbound thing.... I did ponder a really simple API
design in which outbound ports are *only* ever associated with an
eventfd, and for IPIs the VMM would be expected to bind those as IRQFD
to an inbound event on the same port#.

You pointed out that it was quite inefficient, but... we already have
complex hacks to bypass the eventfd for posted interrupts when the
source and destination "match", and perhaps we could do something
similar to allow EVTCHNOP_send to deliver directly to a local port#
without having to go through all the eventfd code?

But the implementation of that would end up being awful, *and* the
userspace API isn't even that nice despite being "simple", because it
would force userspace to allocate a whole bunch of eventfds and use
space in the IRQ routing table for them. So it didn't seem worth it.
Let's just let userspace tell us explicitly the vcpu/port/prio instead
of having to jump through hoops to magically work it out from matching
eventfds.

> > In the last case, that eventfd can be set up with IRQFD for direct
> > event channel delivery to a different KVM/Xen guest.
> > 
> > I've used your implemention, with an idr for the outbound port# space
> > intercepting EVTCHNOP_send for known ports and only letting userspace
> > see the hypercall if it's for a port# the kernel doesn't know. Looks a
> > bit like
> > https://git.infradead.org/users/dwmw2/linux.git/commitdiff/b4fbc49218a
> > 
> > 
> > 
> > But I *don't* want to do the VIRQ part shown above, "spotting" the VIRQ
> > in that outbound port# space and squirreling the information away into
> > the kvm_vcpu for when we need to deliver a timer event.
> > 
> > The VIRQ isn't part of the *outbound* port# space; it isn't a port to
> > which a Xen guest can use EVTCHNOP_send to send an event.
> 
> But it is still an event channel which port is unique regardless of port
> type/space hence (...)
> 
> > If anything,
> > it would be part of the *inbound* port# space, in the KVM IRQ routing
> > table. So perhaps we could have a similar snippet in
> > kvm_xen_setup_evtchn() which spots a VIRQ and says "aha, now I know
> > where to deliver timer events for this vCPU".
> > 
> 
> (...) The thinking at the time was mainly simplicity so our way of saying
> 'offload the evtchn to KVM' was through the machinery that offloads the outbound
> part (using your terminology). I don't think even using XEN_EVENTFD as proposed
> here that that one could send an VIRQ via EVTCHNOP_send (I could be wrong as
> it has been a long time).

I confess I didn't test it but it *looked* like you could, while true
Xen wouldn't permit that.

> Regardless, I think you have a good point to split the semantics and (...)

> > 
> > So I think I'm going to make the timer VIRQ (port#, priority) into an
> > explicit KVM_XEN_VCPU_ATTR_TYPE.
> 
> (...) thus this makes sense. Do you particularly care about
> VIRQ_DEBUG?


Not really. Especially not as something to accelerate in KVM.

Our environment doesn't have any way to deliver that to guests,
although we *do* have an API call to deliver "diagnostic interrupt"
which maps to an NMI, and we *have* occasionally hacked the VMM to
deliver VIRQ_DEBUG to Xen guests instead of that NMI. Mostly back when
I was busy being confused about ->vcpu_id vs. ->vcpu_idx vs. the
Xen/ACPI CPU# and where the hell my interrupts were going.

> > Along with the *actual* timer expiry,
> > which we need to extract/restore for LU/LM too, don't we?
> > /me nods
> 
> I haven't thought that one well for Live Update / Live Migration, but
> I wonder if wouldn't be better to be instead a general 'xen state'
> attr type should you need more than just pending timers expiry. Albeit
> considering that the VMM has everything it needs (?), perhaps for Xen PV
> timer look to be the oddball missing, and we donºt need to go that extent.

Yeah, the VMM knows most of this stuff already, as it *told* the kernel
in the first place. Userspace is still responsible for all the setup
and admin, and the kernel just handles the bare minimum of the fast
path.

So on live update/migrate we only really need to read out the runstate
data from the kernel... and now the current timer expiry. 

On the *resume* side it's still lots of syscalls, and perhaps in the
end we might decide we want to do a KVM_XEN_HVM_SET_ATTR_MULTI which
takes an array of them? But I think that's a premature optimisation for
now.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f31fcaf8fa7c..92b76127eb43 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -550,6 +550,8 @@  struct kvm_vcpu_xen {
 	gpa_t steal_time_addr;
 	struct vcpu_runstate_info *steal_time;
 	struct kvm_xen_callback cb;
+#define KVM_XEN_NR_VIRQS 24
+	unsigned int virq_to_port[KVM_XEN_NR_VIRQS];
 };
 
 struct kvm_vcpu_arch {
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 1fbdfa7c4356..42c1fe01600d 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -101,6 +101,20 @@  static void kvm_xen_evtchnfd_upcall(struct kvm_vcpu *vcpu, struct evtchnfd *e)
 	kvm_xen_do_upcall(vcpu->kvm, e->vcpu, vx->cb.via, vx->cb.vector, 0);
 }
 
+void kvm_xen_set_virq(struct kvm *kvm, struct evtchnfd *evt)
+{
+	int virq = evt->virq.type;
+	struct kvm_vcpu_xen *vcpu_xen;
+	struct kvm_vcpu *vcpu;
+
+	vcpu = kvm_get_vcpu(kvm, evt->vcpu);
+	if (!vcpu)
+		return;
+
+	vcpu_xen = vcpu_to_xen_vcpu(vcpu);
+	vcpu_xen->virq_to_port[virq] = evt->port;
+}
+
 int kvm_xen_set_evtchn(struct kvm_kernel_irq_routing_entry *e,
 		   struct kvm *kvm, int irq_source_id, int level,
 		   bool line_status)
@@ -620,6 +634,10 @@  static int kvm_xen_eventfd_assign(struct kvm *kvm, struct idr *port_to_evt,
 			return PTR_ERR(eventfd);
 	}
 
+	if (args->type == XEN_EVTCHN_TYPE_VIRQ &&
+	    args->virq.type >= KVM_XEN_NR_VIRQS)
+		return -EINVAL;
+
 	evtchnfd =  kzalloc(sizeof(struct evtchnfd), GFP_KERNEL);
 	if (!evtchnfd)
 		return -ENOMEM;
@@ -636,8 +654,11 @@  static int kvm_xen_eventfd_assign(struct kvm *kvm, struct idr *port_to_evt,
 			GFP_KERNEL);
 	mutex_unlock(port_lock);
 
-	if (ret >= 0)
+	if (ret >= 0) {
+		if (evtchnfd->type == XEN_EVTCHN_TYPE_VIRQ)
+			kvm_xen_set_virq(kvm, evtchnfd);
 		return 0;
+	}
 
 	if (ret == -ENOSPC)
 		ret = -EEXIST;