Message ID | 20090505132437.19891.42922.stgit@dev.haskins.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, 2009-05-05 at 09:24 -0400, Gregory Haskins wrote: > > *) PIO is more direct than MMIO, but it poses other problems such as: > a) can have a small limited address space (x86 is 2^16) > b) is a narrow-band interface (one 8, 16, 32, 64 bit word at a time) > c) not available on all archs (PCI mentions ppc as problematic) and > is therefore recommended to avoid. Side note: I don't know what PCI has to do with this, and "problematic" isn't the word I would use. ;) As far as I know, x86 is the only still-alive architecture that implements instructions for a separate IO space (not even ia64 does).
Gregory Haskins wrote: > We add a generic hypercall() mechanism for use by IO code which is > compatible with a variety of hypervisors, but which prefers to use > hypercalls over other types of hypervisor traps for performance and/or > feature reasons. > > For instance, consider an emulated PCI device in KVM. Today we can chose > to do IO over MMIO or PIO infrastructure, but they each have their own > distinct disadvantages: > > *) MMIO causes a page-fault, which must be decoded by the hypervisor and is > therefore fairly expensive. > > *) PIO is more direct than MMIO, but it poses other problems such as: > a) can have a small limited address space (x86 is 2^16) > b) is a narrow-band interface (one 8, 16, 32, 64 bit word at a time) > c) not available on all archs (PCI mentions ppc as problematic) and > is therefore recommended to avoid. > > Hypercalls, on the other hand, offer a direct access path like PIOs, yet > do not suffer the same drawbacks such as a limited address space or a > narrow-band interface. Hypercalls are much more friendly to software > to software interaction since we can pack multiple registers in a way > that is natural and simple for software to utilize. > No way. Hypercalls are just about awful because they cannot be implemented sanely with VT/SVM as Intel/AMD couldn't agree on a common instruction for it. This means you either need a hypercall page, which I'm pretty sure makes transparent migration impossible, or you need to do hypercall patching which is going to throw off attestation. If anything, I'd argue that we shouldn't use hypercalls for anything in KVM because it will break run-time attestation. Hypercalls cannot pass any data either. We pass data with hypercalls by relying on external state (like register state). It's just as easy to do this with PIO. VMware does this with vmport, for instance. However, in general, you do not want to pass that much data with a notification. It's better to rely on some external state (like a ring queue) and have the "hypercall" act simply as a notification mechanism. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Anthony Liguori wrote: > Gregory Haskins wrote: >> We add a generic hypercall() mechanism for use by IO code which is >> compatible with a variety of hypervisors, but which prefers to use >> hypercalls over other types of hypervisor traps for performance and/or >> feature reasons. >> >> For instance, consider an emulated PCI device in KVM. Today we can >> chose >> to do IO over MMIO or PIO infrastructure, but they each have their own >> distinct disadvantages: >> >> *) MMIO causes a page-fault, which must be decoded by the hypervisor >> and is >> therefore fairly expensive. >> >> *) PIO is more direct than MMIO, but it poses other problems such as: >> a) can have a small limited address space (x86 is 2^16) >> b) is a narrow-band interface (one 8, 16, 32, 64 bit word at a >> time) >> c) not available on all archs (PCI mentions ppc as problematic) >> and >> is therefore recommended to avoid. >> >> Hypercalls, on the other hand, offer a direct access path like PIOs, yet >> do not suffer the same drawbacks such as a limited address space or a >> narrow-band interface. Hypercalls are much more friendly to software >> to software interaction since we can pack multiple registers in a way >> that is natural and simple for software to utilize. >> > > No way. Hypercalls are just about awful because they cannot be > implemented sanely with VT/SVM as Intel/AMD couldn't agree on a common > instruction for it. This means you either need a hypercall page, > which I'm pretty sure makes transparent migration impossible, or you > need to do hypercall patching which is going to throw off attestation. This is irrelevant since KVM already does this patching today and we therefore have to deal with this fact. This new work is riding on the existing infrastructure so it of negligible cost to add new vectors at least w.r.t. your concerns above. > > If anything, I'd argue that we shouldn't use hypercalls for anything > in KVM because it will break run-time attestation. The cats already out of the bag on that one. Besides, even if the hypercall infrastructure does break attestation (which I don't think is proven as fact) not everyone cares about migration. At some point, someone may want to make a decision to trade performance for the ability to migrate (or vice versa). I am ok with that. > > > Hypercalls cannot pass any data either. We pass data with hypercalls > by relying on external state (like register state). This is a silly argument. The CALL or SYSENTER instructions do not pass data either. Rather, they branch the IP and/or execution context, predicated on the notion that the new IP/context can still access the relevant machine state prior to the call. VMCALL type instructions are just one more logical extension of that same construct. PIO on the other hand is different. The architectural assumption is that the target endpoint does not have access to the machine state. Sure, we can "cheat" in virtualization by knowing that it really will have access, but then we are still constrained by all the things I have already mentioned that are disadvantageous to PIOs, plus.... > It's just as easy to do this with PIO. ..you are glossing over the fact that we already have the infrastructure to do proper register setup in kvm_hypercallX(). We would need to come up with an similar (arch specific) "pio_callX()" as well. Oh wait, no we wouldn't, since PIOs apparently only work on x86. ;) Ok, so we would need to come up with these pio_calls for x86, and no other arch can use the infrastructure (but wait, PPC can use PCI too, so how does that work? It must be either MMIO emulation or its not supported? That puts us back to square one). In addition, we can have at most 8k unique vectors on x86_64 (2^16/8 bytes per port). Even this is a gross overestimation because it assumes the entire address space is available for pio_call, which it isn't, and it assumes all allocations are in the precise width of the address space (8-bytes), which they wont be. It doesn't exactly sound like a winner to me. > VMware does this with vmport, for instance. However, in general, you > do not want to pass that much data with a notification. It's better > to rely on some external state (like a ring queue) and have the > "hypercall" act simply as a notification mechanism. Disagree completely. Things like shared-memory rings provide a really nice asynchronous mechanism, and for the majority of your IO and for fast-path thats probably exactly what we should do. However, there are plenty of patterns that fit better with a synchronous model. Case in point: see VIRTIO_VBUS_FUNC_GET_FEATURES, SET_STATUS, and RESET functions in the virtio-vbus driver I posted. And as an aside, the ring notification rides on the same fundamental synchronous transport. So sure, you could argue that you could also make an extra "call" ring, place your request in the ring, and use a flush/block primitive to wait for that item to execute. But that sounds a bit complicated when all I want is the ability to invoke simple synchronous calls. Therefore, lets just get this "right" and build the primitives to express both patterns easily. In the end, I don't specifically care if the "call()" vehicle is a PIO or a hypercall per se, as long as we can meet the following: a) is feasible at least anywhere PCI works b) provides a robust and uniform interface so drivers do not need to care about the implementation. b) has equivalent performance (e.g. if PPC maps PIO as MMIO, that is no-good). Regards, -Greg
On Wednesday 06 May 2009, Gregory Haskins wrote: > Ok, so we would > need to come up with these pio_calls for x86, and no other arch can use > the infrastructure (but wait, PPC can use PCI too, so how does that > work? It must be either MMIO emulation or its not supported? That puts > us back to square one). PowerPC already has an abstraction for PIO and MMIO because certain broken hardware chips don't do what they should, see arch/powerpc/platforms/cell/io-workarounds.c for the only current user. If you need to, you could do the same on x86 (or generalize the code), but please don't add another level of indirection on top of this. Arnd <>< -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/hypercall.h b/include/linux/hypercall.h new file mode 100644 index 0000000..c8a1492 --- /dev/null +++ b/include/linux/hypercall.h @@ -0,0 +1,83 @@ +/* + * Copyright 2009 Novell. All Rights Reserved. + * + * Author: + * Gregory Haskins <ghaskins@novell.com> + * + * This file is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. + */ + +#ifndef _LINUX_HYPERCALL_H +#define _LINUX_HYPERCALL_H + +#ifdef CONFIG_HAVE_HYPERCALL + +long hypercall(unsigned long nr, unsigned long *args, size_t count); + +#else + +static inline long +hypercall(unsigned long nr, unsigned long *args, size_t count) +{ + return -EINVAL; +} + +#endif /* CONFIG_HAVE_HYPERCALL */ + +#define hypercall0(nr) hypercall(nr, NULL, 0) +#define hypercall1(nr, a1) \ + ({ \ + unsigned long __args[] = { a1, }; \ + long __ret; \ + __ret = hypercall(nr, __args, ARRAY_SIZE(__args)); \ + __ret; \ + }) +#define hypercall2(nr, a1, a2) \ + ({ \ + unsigned long __args[] = { a1, a2, }; \ + long __ret; \ + __ret = hypercall(nr, __args, ARRAY_SIZE(__args)); \ + __ret; \ + }) +#define hypercall3(nr, a1, a2, a3) \ + ({ \ + unsigned long __args[] = { a1, a2, a3, }; \ + long __ret; \ + __ret = hypercall(nr, __args, ARRAY_SIZE(__args)); \ + __ret; \ + }) +#define hypercall4(nr, a1, a2, a3, a4) \ + ({ \ + unsigned long __args[] = { a1, a2, a3, a4, }; \ + long __ret; \ + __ret = hypercall(nr, __args, ARRAY_SIZE(__args)); \ + __ret; \ + }) +#define hypercall5(nr, a1, a2, a3, a4, a5) \ + ({ \ + unsigned long __args[] = { a1, a2, a3, a4, a5, }; \ + long __ret; \ + __ret = hypercall(nr, __args, ARRAY_SIZE(__args)); \ + __ret; \ + }) +#define hypercall6(nr, a1, a2, a3, a4, a5, a6) \ + ({ \ + unsigned long __args[] = { a1, a2, a3, a4, a5, a6, }; \ + long __ret; \ + __ret = hypercall(nr, __args, ARRAY_SIZE(__args)); \ + __ret; \ + }) + + +#endif /* _LINUX_HYPERCALL_H */
We add a generic hypercall() mechanism for use by IO code which is compatible with a variety of hypervisors, but which prefers to use hypercalls over other types of hypervisor traps for performance and/or feature reasons. For instance, consider an emulated PCI device in KVM. Today we can chose to do IO over MMIO or PIO infrastructure, but they each have their own distinct disadvantages: *) MMIO causes a page-fault, which must be decoded by the hypervisor and is therefore fairly expensive. *) PIO is more direct than MMIO, but it poses other problems such as: a) can have a small limited address space (x86 is 2^16) b) is a narrow-band interface (one 8, 16, 32, 64 bit word at a time) c) not available on all archs (PCI mentions ppc as problematic) and is therefore recommended to avoid. Hypercalls, on the other hand, offer a direct access path like PIOs, yet do not suffer the same drawbacks such as a limited address space or a narrow-band interface. Hypercalls are much more friendly to software to software interaction since we can pack multiple registers in a way that is natural and simple for software to utilize. The problem with hypercalls today is that there is no generic support. There is various support for hypervisor specific implementations (for instance, see kvm_hypercall0() in arch/x86/include/asm/kvm_para.h). This makes it difficult to implement a device that is hypervisor agnostic since it would not only need to know the hypercall ABI, but also which platform specific function call it should make. If we can convey a dynamic binding to a specific hypercall vector in a generic way (out of the scope of this patch series), then an IO driver could utilize that dynamic binding to communicate without requiring hypervisor specific knowledge. Therefore, we implement a system wide hypercall() interface based on a variable length list of unsigned longs (representing registers to pack) and expect that various arch/hypervisor implementations can fill in the details, if supported. This is expected to be done as part of the pv_ops infrastructure, which is the natural hook-point for hypervisor specific code. Note, however, that the generic hypercall() interface does not require the implementation to use pv_ops if so desired. Example use case: ------------------ Consider a PCI device "X". It can already advertise MMIO/PIO regions via its BAR infrastructure. With this new model it could also advertise a hypercall vector in its device-specific upper configuration space. (The allocation and assignment of this vector on the backend is beyond the scope of this series). The guest-side driver for device "X" would sense (via something like a feature-bit) if the hypercall was available and valid, read the value with a configuration cycle, and proceed to ignore the BARs in favor of using the hypercall() interface. Signed-off-by: Gregory Haskins <ghaskins@novell.com> --- include/linux/hypercall.h | 83 +++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 83 insertions(+), 0 deletions(-) create mode 100644 include/linux/hypercall.h -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html