diff mbox

KVM: introduce "xinterface" API for external interaction with guests

Message ID 20090716151945.29318.10882.stgit@dev.haskins.net (mailing list archive)
State New, archived
Headers show

Commit Message

Gregory Haskins July 16, 2009, 3:19 p.m. UTC
What: xinterface is a mechanism that allows kernel modules external to
the kvm.ko proper to interface with a running guest.  It accomplishes
this by creating an abstracted interface which does not expose any
private details of the guest or its related KVM structures, and provides
a mechanism to find and bind to this interface at run-time.  This
binding mechanism uses a userspace friendly token "u64 vmid" as a
handle.  This vmid acts similar to a file-descriptor in the sense that
it can be extracted from a guest, passed to an end-point of interest,
and finally, converted back to a vtable pointer using a stable interface.

Why: There are various subsystems that would like to interact with a KVM
guest which are ideally suited to exist outside the domain of the kvm.ko
core logic. For instance, external pci-passthrough, virtual-bus, and
virtio-net modules are currently under development.  In order for these
modules to successfully interact with the guest, they need, at the very
least, various interfaces for signaling IO events, pointer translation,
and possibly memory mapping.

The signaling case is covered by the recent introduction of the
irqfd/ioeventfd mechanisms.  This patch provides a mechanism to cover the
other cases.  Note that today we only expose pointer-translation related
functions, but more could be added at a future date as needs arise.

Security considerations: This concept is not believed to expose KVM to
any kind of additional security risk.  The vmid token itself can only be
acquired via an open handle to the vmfd (i.e. qemu-kvm), and the interface
is only available within the kernel.  Therefore the xinterface
admission policy is delegated to the kernel/lkm admission policy, which
must be assumed secure or the system is already compromised independent of
this work.

Additionally, the xinterface design is hardened against malformed vmid
tokens, as well as race conditions against valid tokens (e.g. guest
exiting before the token is redeemed).  It is additionally hardened
against races in the kvm.ko module itself by acquiring proper module
references.  As a final measure, we link the xinterface code statically
into the kernel so that callers are guaranteed a stable interface to
kvm_xinterface_find() without implicitly pinning kvm.ko or racing against
it.

Example usage: QEMU instantiates a guest, and an external module "foo"
that desires the ability to interface with the guest (say via
open("/dev/foo")).  QEMU may then issue a KVM_GET_VMID operation to acquire
the u64-based vmid, and pass it to ioctl(foofd, FOO_SET_VMID, &vmid).
Upon receipt, the foo module can issue kvm_xinterface_find(vmid) to acquire
the proper context.  Internally, the struct kvm* and associated
struct module* will remain pinned at least until the foo module calls
kvm_xinterface_put().

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
---

 arch/x86/Kbuild                |    4 +
 arch/x86/kvm/Makefile          |    4 +
 arch/x86/kvm/x86.c             |    1 
 include/linux/kvm.h            |    2 +
 include/linux/kvm_host.h       |    6 ++
 include/linux/kvm_xinterface.h |   58 ++++++++++++++++
 virt/kvm/kvm_main.c            |   72 ++++++++++++++++++++
 virt/kvm/xinterface.c          |  147 ++++++++++++++++++++++++++++++++++++++++
 8 files changed, 293 insertions(+), 1 deletions(-)
 create mode 100644 include/linux/kvm_xinterface.h
 create mode 100644 virt/kvm/xinterface.c


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Gregory Haskins July 16, 2009, 3:31 p.m. UTC | #1
Sam Ravnborg wrote:
>> diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild
>> index ad8ec35..9f50cc3 100644
>> --- a/arch/x86/Kbuild
>> +++ b/arch/x86/Kbuild
>> @@ -1,5 +1,7 @@
>>  
>> -obj-$(CONFIG_KVM) += kvm/
>> +ifdef CONFIG_KVM
>> +obj-y += kvm/
>> +endif
>>     
>
> What was wrong with the old version?
> If this is because xinterface.o is builtin to the kernel

Bingo

>  then we should
> always visit kvm/ and use:
>
>    obj-y += kvm/
>
> unconditionally.
>
>   
Ok, easy enough.

Thanks Sam,
-Greg
Anthony Liguori July 16, 2009, 3:37 p.m. UTC | #2
Gregory Haskins wrote:
> +/*
> + * ------------
> + * XINTERFACE (External Interface)
> + * -------------
> + */
> +
> +static struct kvm *
> +intf_to_kvm(struct kvm_xinterface *intf)
> +{
> +	return container_of(intf, struct kvm, xinterface);
> +}
> +
> +static unsigned long
> +xinterface_gpa_to_hva(struct kvm_xinterface *intf, unsigned long gpa)
> +{
> +	struct kvm *kvm = intf_to_kvm(intf);
> +	unsigned long addr;
> +
> +	addr = gfn_to_hva(kvm, gpa >> PAGE_SHIFT);
> +	if (kvm_is_error_hva(addr))
> +		return 0;
> +
> +	return addr + offset_in_page(gpa);
> +}
> +
> +static struct page *
> +xinterface_gpa_to_page(struct kvm_xinterface *intf, unsigned long gpa)
> +{
> +	struct kvm *kvm = intf_to_kvm(intf);
> +	struct page *page;
> +
> +	page = gfn_to_page(kvm, gpa >> PAGE_SHIFT);
> +	if (page == bad_page)
> +		return ERR_PTR(-EINVAL);
> +
> +	return page;
> +}
> +
> +static void
> +xinterface_release(struct kvm_xinterface *intf)
> +{
> +	struct kvm *kvm = intf_to_kvm(intf);
> +
> +	kvm_put_kvm(kvm);
> +}
> +
> +struct kvm_xinterface_ops _kvm_xinterface_ops = {
> +	.gpa_to_hva  = xinterface_gpa_to_hva,
> +	.gpa_to_page = xinterface_gpa_to_page,
> +	.release     = xinterface_release,
> +};
>   

How do you deal with locking?

The mappings (gpa_to_page) are not fixed.  They can and do change very 
often.  The interface doesn't seem to attempt to cope with this.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gregory Haskins July 16, 2009, 3:47 p.m. UTC | #3
Anthony Liguori wrote:
> Gregory Haskins wrote:
>> +/*
>> + * ------------
>> + * XINTERFACE (External Interface)
>> + * -------------
>> + */
>> +
>> +static struct kvm *
>> +intf_to_kvm(struct kvm_xinterface *intf)
>> +{
>> +    return container_of(intf, struct kvm, xinterface);
>> +}
>> +
>> +static unsigned long
>> +xinterface_gpa_to_hva(struct kvm_xinterface *intf, unsigned long gpa)
>> +{
>> +    struct kvm *kvm = intf_to_kvm(intf);
>> +    unsigned long addr;
>> +
>> +    addr = gfn_to_hva(kvm, gpa >> PAGE_SHIFT);
>> +    if (kvm_is_error_hva(addr))
>> +        return 0;
>> +
>> +    return addr + offset_in_page(gpa);
>> +}
>> +
>> +static struct page *
>> +xinterface_gpa_to_page(struct kvm_xinterface *intf, unsigned long gpa)
>> +{
>> +    struct kvm *kvm = intf_to_kvm(intf);
>> +    struct page *page;
>> +
>> +    page = gfn_to_page(kvm, gpa >> PAGE_SHIFT);
>> +    if (page == bad_page)
>> +        return ERR_PTR(-EINVAL);
>> +
>> +    return page;
>> +}
>> +
>> +static void
>> +xinterface_release(struct kvm_xinterface *intf)
>> +{
>> +    struct kvm *kvm = intf_to_kvm(intf);
>> +
>> +    kvm_put_kvm(kvm);
>> +}
>> +
>> +struct kvm_xinterface_ops _kvm_xinterface_ops = {
>> +    .gpa_to_hva  = xinterface_gpa_to_hva,
>> +    .gpa_to_page = xinterface_gpa_to_page,
>> +    .release     = xinterface_release,
>> +};
>>   
>
> How do you deal with locking?
>
> The mappings (gpa_to_page) are not fixed.  They can and do change very
> often.  The interface doesn't seem to attempt to cope with this.

Hmm, well I used to need gpa_to_page() in the older version of vbus that
did explicit hypercalls, but I don't actually use it today in v4.  I
left it in because it seems like it might be useful in general (perhaps
for Michael).  However, if this ends up being a real problem I suppose
we can just drop that vtable entry and let the guy that actually needs
it deal with the issues ;)  I really only require gpa_to_hva() in the
short-term.

That said, I think the assumption that was made when I was using this
was that a proper ref for the page was acquired by the gfn_to_page() and
dropped by the caller.  This was always used in the context of a
hypercall/vmexit so presumably the gpa should be considered stable
across that call.  Is that not true?

Regards,
-Greg
Anthony Liguori July 16, 2009, 4:09 p.m. UTC | #4
Gregory Haskins wrote:
> That said, I think the assumption that was made when I was using this
> was that a proper ref for the page was acquired by the gfn_to_page() and
> dropped by the caller.  This was always used in the context of a
> hypercall/vmexit so presumably the gpa should be considered stable
> across that call.  Is that not true?
>   

If you're in kvm.ko, then yes, that's a safe assumption to make because 
the guest VCPU cannot run while you are running.

But you're opening this interface to any caller so the VCPU is likely to 
be running while someone calls this function

> Regards,
> -Greg
>
>
>
Arnd Bergmann July 16, 2009, 4:52 p.m. UTC | #5
On Thursday 16 July 2009, Gregory Haskins wrote:
> Background: The original vbus code was tightly integrated with kvm.ko.  Avi
> suggested that we abstract the interfaces such that it could live outside
> of kvm.

The code is still highly kvm-specific, you would not be able to use
it with another hypervisor like lguest or vmware player, right?

> Example usage: QEMU instantiates a guest, and an external module "foo"
> that desires the ability to interface with the guest (say via
> open("/dev/foo")).  QEMU may then issue a KVM_GET_VMID operation to acquire
> the u64-based vmid, and pass it to ioctl(foofd, FOO_SET_VMID, &vmid).
> Upon receipt, the foo module can issue kvm_xinterface_find(vmid) to acquire
> the proper context.  Internally, the struct kvm* and associated
> struct module* will remain pinned at least until the foo module calls
> kvm_xinterface_put().

Your approach allows passing the vmid from a process that does
not own the kvm context. This looks like an intentional feature,
but I can't see what this gains us. 

> As a final measure, we link the xinterface code statically
> into the kernel so that callers are guaranteed a stable interface to
> kvm_xinterface_find() without implicitly pinning kvm.ko or racing against
> it.

I also don't understand this. Are you worried about driver modules
breaking when an externally-compiled kvm.ko is loaded? The same could
be achieved by defining your data structures kvm_xinterface_ops and
kvm_xinterface in a kernel header that is not shipped by kvm-kmod but
always taken from the kernel headers.
It does not matter if the entry points are build into the kernel or
exported from a kvm.ko as long as you define a fixed ABI.

What is the problem with pinning kvm.ko from another module using
its features?

Can't you simply provide a function call to lookup the kvm context
pointer from the file descriptor to achieve the same functionality?

To take that thought further, maybe the dependency can be turned
around: If every user (pci-uio, virtio-net, ...) exposes a file
descriptor based interface to user space, you can have a kvm
ioctl to register the object behind that file descriptor with
an existing kvm context to associate it with a guest. That would
nicely solve the life time questions by pinning the external
object for the life time of the kvm context rather than the other
way round, and it would be completely separate from kvm in that
each such object could be used by other subsystems independent
of kvm.

	Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gregory Haskins July 16, 2009, 6:22 p.m. UTC | #6
Arnd Bergmann wrote:
> On Thursday 16 July 2009, Gregory Haskins wrote:
>   
>> Background: The original vbus code was tightly integrated with kvm.ko.  Avi
>> suggested that we abstract the interfaces such that it could live outside
>> of kvm.
>>     
>
> The code is still highly kvm-specific, you would not be able to use
> it with another hypervisor like lguest or vmware player, right?
>   

In its current form, it is kvm specific primarily because it was not a
explicit design constraint of mine to support others.  However, there is
no reason why we could not generalize the interface if that is a
desirable trait.  Ultimately I would like to have my project support
other things like lguest, so this is not a bad idea.  Otherwise, someone
will invariably be proposing an "lguest_xinterface" next ;)

>   
>> Example usage: QEMU instantiates a guest, and an external module "foo"
>> that desires the ability to interface with the guest (say via
>> open("/dev/foo")).  QEMU may then issue a KVM_GET_VMID operation to acquire
>> the u64-based vmid, and pass it to ioctl(foofd, FOO_SET_VMID, &vmid).
>> Upon receipt, the foo module can issue kvm_xinterface_find(vmid) to acquire
>> the proper context.  Internally, the struct kvm* and associated
>> struct module* will remain pinned at least until the foo module calls
>> kvm_xinterface_put().
>>     
>
> Your approach allows passing the vmid from a process that does
> not own the kvm context. This looks like an intentional feature,
> but I can't see what this gains us.

This work is towards the implementation of lockless-shared-memory
subsystems, which includes ring constructs such as virtio-ring,
VJ-netchannels, and vbus-ioq.   I find that these designs perform
optimally when you allow two distinct contexts (producer + consumer) to
process the ring concurrently, which implies a disparate context from
the guest in question.  Note that the infrastructure we are discussing
does not impose a requirement for the contexts to be unique: it will
work equally well from the same or a different process.

For an example of this "producer/consumer" dynamic over shared memory in
action, please refer to my previous posting re: "vbus"

http://lkml.org/lkml/2009/4/21/408

I am working on v4 now, and this patch is part of the required support.

>  
>
>   
>> As a final measure, we link the xinterface code statically
>> into the kernel so that callers are guaranteed a stable interface to
>> kvm_xinterface_find() without implicitly pinning kvm.ko or racing against
>> it.
>>     
>
> I also don't understand this. Are you worried about driver modules
> breaking when an externally-compiled kvm.ko is loaded? The same could
> be achieved by defining your data structures kvm_xinterface_ops and
> kvm_xinterface in a kernel header that is not shipped by kvm-kmod but
> always taken from the kernel headers.
> It does not matter if the entry points are build into the kernel or
> exported from a kvm.ko as long as you define a fixed ABI.
>
> What is the problem with pinning kvm.ko from another module using
> its features?
>   

Well, there is always the chance that I am doing something dumb or
missing the point ;)

But my rationale was as follows:  The problem is that kvm is a little
weird in the module ref department: If I were to do it the standard way
and link xinterface.o into kvm.o (and have any xinterface_find() users
do a tristate+"depends on KVM"), this would work as I believe you are
suggesting.  That is to say: whenever I loaded "foo.ko", insmod would
automatically up the reference of kvm.ko.  The issue is that is not
quite what I really want ;)

I want to hold the reference to the entire .text chain, which includes
kvm.ko + [kvm-intel.ko | kvm-amd.ko].  If you look carefully, the
ops->owner that is assigned is actually the arch.ko.  In addition, I
wanted the kvm.ko lifetime to be associated with the lifetime of its
contexts actually in use, not the lifetime of its installed
dependencies.  Therefore, I did it this way.

But to your point, I suppose the dependency lifetime thing is not a huge
deal.  I could therefore modify the patch to simply link xinterface.o
into kvm.ko and still achieve the primary objective by retaining ops->owner.

Note that if we are going to generalize the interface to support other
guests as you may have been suggesting above, it should probably stay
statically linked (and perhaps live in ./lib or something)


> Can't you simply provide a function call to lookup the kvm context
> pointer from the file descriptor to achieve the same functionality?
>   
You mean so have: struct kvm_xinterface *kvm_xinterface_find(int fd)

(instead of creating our own vmid namespace) ?

Or are you suggesting using fget() instead of kvm_xinterface_find()?

> To take that thought further, maybe the dependency can be turned
> around: If every user (pci-uio, virtio-net, ...) exposes a file
> descriptor based interface to user space, you can have a kvm
> ioctl to register the object behind that file descriptor with
> an existing kvm context to associate it with a guest.

FWIW: We do that already for the signaling path (see irqfd and ioeventfd
in kvm.git).  Each side exposes interfaces that accept eventfds, and the
fds are passed around that way.

However, for the functions we are talking about now, I don't think it
really works well to go the other way.  I could be misunderstanding what
you mean, though.  What I mean is that it's KVM that is providing a
service to the other modules (in this case, translating memory
pointers), so what would an inverse interface look like for that?  And
even if you came up with one, it seems to me that its just "6 of one,
half-dozen of the other" kind of thing.

>  That would
> nicely solve the life time questions by pinning the external
> object for the life time of the kvm context

I suppose that is nice, but note that in practice both objects (the
kvm-guest and the "foo" module) are managed by the same entity (i.e.
QEMU) and therefore share the same approximate lifetime.

Kind Regards,
-Greg
Gregory Haskins July 16, 2009, 6:53 p.m. UTC | #7
Gregory Haskins wrote:
> Note that if we are going to generalize the interface to support other
> guests as you may have been suggesting above, it should probably stay
> statically linked (and perhaps live in ./lib or something)
>   

More specifically, it can no longer live in kvm.ko.  I guess it
technically doesn't have to be statically linked, though (e.g.
xinterface.ko is fine, too).

Regards,
-Greg
Zan Lynx July 16, 2009, 7:38 p.m. UTC | #8
Gregory Haskins wrote:
> Gregory Haskins wrote:
>> Note that if we are going to generalize the interface to support other
>> guests as you may have been suggesting above, it should probably stay
>> statically linked (and perhaps live in ./lib or something)
>>   
> 
> More specifically, it can no longer live in kvm.ko.  I guess it
> technically doesn't have to be statically linked, though (e.g.
> xinterface.ko is fine, too).

Speaking as someone who knows nothing about this, if this is going to be 
a module and visible in places like lsmod, could you name it something else?

I see "xinterface.ko" and the first thing in my head is "something to do 
with the X Window System."

How about kvm_xinterface or vguest_xinterface?
Arnd Bergmann July 16, 2009, 7:46 p.m. UTC | #9
On Thursday 16 July 2009, Gregory Haskins wrote:
> Arnd Bergmann wrote:  

> > Your approach allows passing the vmid from a process that does
> > not own the kvm context. This looks like an intentional feature,
> > but I can't see what this gains us.
> 
> This work is towards the implementation of lockless-shared-memory
> subsystems, which includes ring constructs such as virtio-ring,
> VJ-netchannels, and vbus-ioq.   I find that these designs perform
> optimally when you allow two distinct contexts (producer + consumer) to
> process the ring concurrently, which implies a disparate context from
> the guest in question.  Note that the infrastructure we are discussing
> does not impose a requirement for the contexts to be unique: it will
> work equally well from the same or a different process.
> 
> For an example of this "producer/consumer" dynamic over shared memory in
> action, please refer to my previous posting re: "vbus"
> 
> http://lkml.org/lkml/2009/4/21/408
> 
> I am working on v4 now, and this patch is part of the required support.

Ok. I can see how your approach gives you more flexibility in this
regard, but it does not seem critical.

> But to your point, I suppose the dependency lifetime thing is not a huge
> deal.  I could therefore modify the patch to simply link xinterface.o
> into kvm.ko and still achieve the primary objective by retaining ops->owner.

Right. And even if it's a separate module, holding an extra reference
on kvm.ko will not cause any harm.

> > Can't you simply provide a function call to lookup the kvm context
> > pointer from the file descriptor to achieve the same functionality?
> >   
> You mean so have: struct kvm_xinterface *kvm_xinterface_find(int fd)
> 
> (instead of creating our own vmid namespace) ?
> 
> Or are you suggesting using fget() instead of kvm_xinterface_find()?

I guess they are roughly equivalent. Either you pass a fd to
kvm_xinterface_find, or pass the struct file pointer you get
from fget. The latter is probably more convenient because it
allows you to pass around the struct file in kernel contexts
that don't have that file descriptor open.

> > To take that thought further, maybe the dependency can be turned
> > around: If every user (pci-uio, virtio-net, ...) exposes a file
> > descriptor based interface to user space, you can have a kvm
> > ioctl to register the object behind that file descriptor with
> > an existing kvm context to associate it with a guest.
> 
> FWIW: We do that already for the signaling path (see irqfd and ioeventfd
> in kvm.git).  Each side exposes interfaces that accept eventfds, and the
> fds are passed around that way.
> 
> However, for the functions we are talking about now, I don't think it
> really works well to go the other way.  I could be misunderstanding what
> you mean, though.  What I mean is that it's KVM that is providing a
> service to the other modules (in this case, translating memory
> pointers), so what would an inverse interface look like for that?  And
> even if you came up with one, it seems to me that its just "6 of one,
> half-dozen of the other" kind of thing.

I mean something like 

int kvm_ioctl_register_service(struct file *filp, unsigned long arg)
{
	struct file *service = fget(arg);
	struct kvm *kvm = filp->private_data;

	if (!service->f_ops->new_xinterface_register)
		return -EINVAL;

	return service->f_ops->new_xinterface_register(service, (void*)kvm,
				&kvm_xinterface_ops);
}

This would assume that we define a new file_operation specifically for this,
which would simplify the code, but there are other ways to achieve the same.
It would even mean that you don't need any static code as an interface layer.

	Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gregory Haskins July 16, 2009, 7:48 p.m. UTC | #10
Zan Lynx wrote:
> Gregory Haskins wrote:
>> Gregory Haskins wrote:
>>> Note that if we are going to generalize the interface to support other
>>> guests as you may have been suggesting above, it should probably stay
>>> statically linked (and perhaps live in ./lib or something)
>>>   
>>
>> More specifically, it can no longer live in kvm.ko.  I guess it
>> technically doesn't have to be statically linked, though (e.g.
>> xinterface.ko is fine, too).
>
> Speaking as someone who knows nothing about this, if this is going to
> be a module and visible in places like lsmod, could you name it
> something else?
>
> I see "xinterface.ko" and the first thing in my head is "something to
> do with the X Window System."
>
> How about kvm_xinterface or vguest_xinterface?
>

Heh, I totally agree.  That was just pseudo-naming, anyway. ;)

If it was going to be generic, I would do something like
"virt-xinterface.ko".

Kind Regards,
-Greg
Gregory Haskins July 17, 2009, 12:25 a.m. UTC | #11
Arnd Bergmann wrote:
> On Thursday 16 July 2009, Gregory Haskins wrote:
>   
>> Arnd Bergmann wrote:  
>>     
>
>   
>>> Your approach allows passing the vmid from a process that does
>>> not own the kvm context. This looks like an intentional feature,
>>> but I can't see what this gains us.
>>>       
>> This work is towards the implementation of lockless-shared-memory
>> subsystems, which includes ring constructs such as virtio-ring,
>> VJ-netchannels, and vbus-ioq.   I find that these designs perform
>> optimally when you allow two distinct contexts (producer + consumer) to
>> process the ring concurrently, which implies a disparate context from
>> the guest in question.  Note that the infrastructure we are discussing
>> does not impose a requirement for the contexts to be unique: it will
>> work equally well from the same or a different process.
>>
>> For an example of this "producer/consumer" dynamic over shared memory in
>> action, please refer to my previous posting re: "vbus"
>>
>> http://lkml.org/lkml/2009/4/21/408
>>
>> I am working on v4 now, and this patch is part of the required support.
>>     
>
> Ok. I can see how your approach gives you more flexibility in this
> regard, but it does not seem critical.
>   

Yeah, and I think I missed your point the first time around.  I thought
you were asking about how the resulting memory API was used, but now I
see you were referring to the scope of the vmid namespace (vs
file-descriptors).

On that topic, yes the vmid implementation here differs from
file-descriptors in the sense that the namespace is global to the
kernel, instead of per-process.  And yes, I also agree with you that
this is not critical to the design, per se.  For instance, the use cases
I have in mind would work fine with the per-task fd namespace as well
since I will be within the same QEMU instance.

So ultimately, I think that means the fd idea you presented would work
equally well. 

>   
>> But to your point, I suppose the dependency lifetime thing is not a huge
>> deal.  I could therefore modify the patch to simply link xinterface.o
>> into kvm.ko and still achieve the primary objective by retaining ops->owner.
>>     
>
> Right. And even if it's a separate module, holding an extra reference
> on kvm.ko will not cause any harm.
>   

Yep, agreed.
>   
>>> Can't you simply provide a function call to lookup the kvm context
>>> pointer from the file descriptor to achieve the same functionality?
>>>   
>>>       
>> You mean so have: struct kvm_xinterface *kvm_xinterface_find(int fd)
>>
>> (instead of creating our own vmid namespace) ?
>>
>> Or are you suggesting using fget() instead of kvm_xinterface_find()?
>>     
>
> I guess they are roughly equivalent. Either you pass a fd to
> kvm_xinterface_find, or pass the struct file pointer you get
> from fget. The latter is probably more convenient because it
> allows you to pass around the struct file in kernel contexts
> that don't have that file descriptor open.
>   

Interesting.  I like it.
>   
>>> To take that thought further, maybe the dependency can be turned
>>> around: If every user (pci-uio, virtio-net, ...) exposes a file
>>> descriptor based interface to user space, you can have a kvm
>>> ioctl to register the object behind that file descriptor with
>>> an existing kvm context to associate it with a guest.
>>>       
>> FWIW: We do that already for the signaling path (see irqfd and ioeventfd
>> in kvm.git).  Each side exposes interfaces that accept eventfds, and the
>> fds are passed around that way.
>>
>> However, for the functions we are talking about now, I don't think it
>> really works well to go the other way.  I could be misunderstanding what
>> you mean, though.  What I mean is that it's KVM that is providing a
>> service to the other modules (in this case, translating memory
>> pointers), so what would an inverse interface look like for that?  And
>> even if you came up with one, it seems to me that its just "6 of one,
>> half-dozen of the other" kind of thing.
>>     
>
> I mean something like 
>
> int kvm_ioctl_register_service(struct file *filp, unsigned long arg)
> {
> 	struct file *service = fget(arg);
> 	struct kvm *kvm = filp->private_data;
>
> 	if (!service->f_ops->new_xinterface_register)
> 		return -EINVAL;
>
> 	return service->f_ops->new_xinterface_register(service, (void*)kvm,
> 				&kvm_xinterface_ops);
> }
>
> This would assume that we define a new file_operation specifically for this,
> which would simplify the code, but there are other ways to achieve the same.
> It would even mean that you don't need any static code as an interface layer.
>   

I suspect I will have a much harder time getting that type of
modification accepted in mainline ;)

I think I will go with your suggestion for the fd/file interface, tho. 
I can get rid of the rbtree logic and re-use the lookup via fget, which
is a nice win.

Thanks Arnd!
-Greg
diff mbox

Patch

diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild
index ad8ec35..9f50cc3 100644
--- a/arch/x86/Kbuild
+++ b/arch/x86/Kbuild
@@ -1,5 +1,7 @@ 
 
-obj-$(CONFIG_KVM) += kvm/
+ifdef CONFIG_KVM
+obj-y += kvm/
+endif
 
 # Xen paravirtualization support
 obj-$(CONFIG_XEN) += xen/
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index afaaa76..80d951d 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -17,3 +17,7 @@  kvm-amd-y		+= svm.o
 obj-$(CONFIG_KVM)	+= kvm.o
 obj-$(CONFIG_KVM_INTEL)	+= kvm-intel.o
 obj-$(CONFIG_KVM_AMD)	+= kvm-amd.o
+
+ifdef CONFIG_KVM
+obj-y += $(addprefix ../../../virt/kvm/, xinterface.o)
+endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 48567fa..5725527 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1208,6 +1208,7 @@  int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_IOEVENTFD:
 	case KVM_CAP_PIT2:
 	case KVM_CAP_PIT_STATE2:
+	case KVM_CAP_XINTERFACE:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 230a91a..7790894 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -435,6 +435,7 @@  struct kvm_ioeventfd {
 #define KVM_CAP_PIT_STATE2 35
 #endif
 #define KVM_CAP_IOEVENTFD 36
+#define KVM_CAP_XINTERFACE 37
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -544,6 +545,7 @@  struct kvm_irqfd {
 #define KVM_CREATE_PIT2		   _IOW(KVMIO, 0x77, struct kvm_pit_config)
 #define KVM_SET_BOOT_CPU_ID        _IO(KVMIO, 0x78)
 #define KVM_IOEVENTFD             _IOW(KVMIO, 0x79, struct kvm_ioeventfd)
+#define KVM_GET_VMID              _IOR(KVMIO, 0x7a, __u64)
 
 /*
  * ioctls for vcpu fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f244f11..0ee95df 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -23,6 +23,7 @@ 
 #include <linux/kvm_para.h>
 
 #include <linux/kvm_types.h>
+#include <linux/kvm_xinterface.h>
 
 #include <asm/kvm_host.h>
 
@@ -175,6 +176,7 @@  struct kvm {
 	unsigned long mmu_notifier_seq;
 	long mmu_notifier_count;
 #endif
+	struct kvm_xinterface xinterface; /* interface for external modules */
 };
 
 /* The guest did something we don't support. */
@@ -199,6 +201,10 @@  static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
 	     idx < atomic_read(&kvm->online_vcpus) && vcpup; \
 	     vcpup = kvm_get_vcpu(kvm, ++idx))
 
+void kvm_xinterface_register(struct kvm_xinterface *intf,
+			     const struct kvm_xinterface_ops *ops);
+void kvm_xinterface_unregister(struct kvm_xinterface *intf);
+
 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id);
 void kvm_vcpu_uninit(struct kvm_vcpu *vcpu);
 
diff --git a/include/linux/kvm_xinterface.h b/include/linux/kvm_xinterface.h
new file mode 100644
index 0000000..858acfd
--- /dev/null
+++ b/include/linux/kvm_xinterface.h
@@ -0,0 +1,58 @@ 
+#ifndef __KVM_XINTERFACE_H
+#define __KVM_XINTERFACE_H
+
+/*
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <linux/kref.h>
+#include <linux/module.h>
+#include <linux/rbtree.h>
+
+struct kvm_xinterface;
+
+struct kvm_xinterface_ops {
+	struct module *owner;
+
+	unsigned long (*gpa_to_hva)(struct kvm_xinterface *, unsigned long gpa);
+	struct page*  (*gpa_to_page)(struct kvm_xinterface *, unsigned long gpa);
+	void (*release)(struct kvm_xinterface *);
+};
+
+struct kvm_xinterface {
+	struct kref                     kref;
+	const struct kvm_xinterface_ops *ops;
+	struct rb_node                  node;
+};
+
+static inline void
+kvm_xinterface_get(struct kvm_xinterface *intf)
+{
+	kref_get(&intf->kref);
+}
+
+static inline void
+_kvm_xinterface_release(struct kref *kref)
+{
+	struct kvm_xinterface *intf;
+	struct module *owner;
+
+	intf = container_of(kref, struct kvm_xinterface, kref);
+
+	owner = intf->ops->owner;
+	rmb();
+
+	intf->ops->release(intf);
+	module_put(owner);
+}
+
+static inline void
+kvm_xinterface_put(struct kvm_xinterface *intf)
+{
+	kref_put(&intf->kref, _kvm_xinterface_release);
+}
+
+struct kvm_xinterface *kvm_xinterface_find(long vmid);
+
+#endif /* __KVM_XINTERFACE_H */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7cd1c10..058cb6c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -935,6 +935,58 @@  static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
 };
 #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
 
+/*
+ * ------------
+ * XINTERFACE (External Interface)
+ * -------------
+ */
+
+static struct kvm *
+intf_to_kvm(struct kvm_xinterface *intf)
+{
+	return container_of(intf, struct kvm, xinterface);
+}
+
+static unsigned long
+xinterface_gpa_to_hva(struct kvm_xinterface *intf, unsigned long gpa)
+{
+	struct kvm *kvm = intf_to_kvm(intf);
+	unsigned long addr;
+
+	addr = gfn_to_hva(kvm, gpa >> PAGE_SHIFT);
+	if (kvm_is_error_hva(addr))
+		return 0;
+
+	return addr + offset_in_page(gpa);
+}
+
+static struct page *
+xinterface_gpa_to_page(struct kvm_xinterface *intf, unsigned long gpa)
+{
+	struct kvm *kvm = intf_to_kvm(intf);
+	struct page *page;
+
+	page = gfn_to_page(kvm, gpa >> PAGE_SHIFT);
+	if (page == bad_page)
+		return ERR_PTR(-EINVAL);
+
+	return page;
+}
+
+static void
+xinterface_release(struct kvm_xinterface *intf)
+{
+	struct kvm *kvm = intf_to_kvm(intf);
+
+	kvm_put_kvm(kvm);
+}
+
+struct kvm_xinterface_ops _kvm_xinterface_ops = {
+	.gpa_to_hva  = xinterface_gpa_to_hva,
+	.gpa_to_page = xinterface_gpa_to_page,
+	.release     = xinterface_release,
+};
+
 static struct kvm *kvm_create_vm(void)
 {
 	struct kvm *kvm = kvm_arch_create_vm();
@@ -991,6 +1043,8 @@  static struct kvm *kvm_create_vm(void)
 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
 	kvm_coalesced_mmio_init(kvm);
 #endif
+	kvm_get_kvm(kvm); /* the xinterface needs another ref */
+	kvm_xinterface_register(&kvm->xinterface, &_kvm_xinterface_ops);
 out:
 	return kvm;
 }
@@ -1073,6 +1127,7 @@  static int kvm_vm_release(struct inode *inode, struct file *filp)
 	struct kvm *kvm = filp->private_data;
 
 	kvm_irqfd_release(kvm);
+	kvm_xinterface_unregister(&kvm->xinterface);
 
 	kvm_put_kvm(kvm);
 	return 0;
@@ -2289,6 +2344,22 @@  static long kvm_vm_ioctl(struct file *filp,
 		mutex_unlock(&kvm->lock);
 		break;
 #endif
+	case KVM_GET_VMID: {
+		u64 vmid = (u64)&kvm->xinterface.node;
+
+		/*
+		 * our vmid is simply the address of our rb_node in the
+		 * registry, which is guaranteed unique.  This also simplifies
+		 * the registry map-lookup since we dont need to do a deep
+		 * decode on the pointer to figure out if we have a match
+		 */
+
+		r = -EFAULT;
+		if (copy_to_user(argp, &vmid, (sizeof vmid)))
+			goto out;
+		r = 0;
+		break;
+	}
 	default:
 		r = kvm_arch_vm_ioctl(filp, ioctl, arg);
 	}
@@ -2761,6 +2832,7 @@  int kvm_init(void *opaque, unsigned int vcpu_size,
 	kvm_chardev_ops.owner = module;
 	kvm_vm_fops.owner = module;
 	kvm_vcpu_fops.owner = module;
+	_kvm_xinterface_ops.owner = module;
 
 	r = misc_register(&kvm_dev);
 	if (r) {
diff --git a/virt/kvm/xinterface.c b/virt/kvm/xinterface.c
new file mode 100644
index 0000000..fe9a214
--- /dev/null
+++ b/virt/kvm/xinterface.c
@@ -0,0 +1,147 @@ 
+/*
+ * KVM module interface - Allows external modules to interface with a guest
+ *
+ * This code is designed to be statically linked to the kernel, regardless
+ * of the configuration of kvm.ko.  This allows the kvm_xinterface_find
+ * routine to be stably exported without dependencies on, or race conditions
+ * against acquiring the kvm.ko module itself.
+ *
+ * Copyright 2009 Novell.  All Rights Reserved.
+ *
+ * Author:
+ *      Gregory Haskins <ghaskins@novell.com>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/kvm_host.h>
+#include <linux/kvm_xinterface.h>
+
+struct kvm_registry {
+	struct mutex   lock;
+	struct rb_root root;
+};
+
+/* system wide registry of kvm based VMs */
+static struct kvm_registry kvm_registry = {
+	.lock = __MUTEX_INITIALIZER(kvm_registry.lock),
+	.root = RB_ROOT,
+};
+
+static struct kvm_xinterface *
+to_intf(struct rb_node *node)
+{
+	return node ? container_of(node, struct kvm_xinterface, node) : NULL;
+}
+
+struct kvm_xinterface *
+kvm_xinterface_find(long vmid)
+{
+	struct rb_node       *node;
+	struct kvm_xinterface *intf;
+
+	mutex_lock(&kvm_registry.lock);
+
+	node = kvm_registry.root.rb_node;
+
+	while (node) {
+		long val;
+
+		val = vmid - (long)node;
+		if (val < 0)
+			node = node->rb_left;
+		else if (val > 0)
+			node = node->rb_right;
+		else
+			break;
+	}
+
+	intf = to_intf(node);
+	if (intf)
+		kvm_xinterface_get(intf);
+
+	mutex_unlock(&kvm_registry.lock);
+
+	return intf;
+}
+EXPORT_SYMBOL_GPL(kvm_xinterface_find);
+
+/*
+ * ------------------------------------------
+ * register/unregister
+ * ------------------------------------------
+ *
+ * These functions are private to the API and are only to be called
+ * by the KVM core
+ * ------------------------------------------
+ */
+
+/* caller must hold intf->ops->owner */
+void
+kvm_xinterface_register(struct kvm_xinterface *intf,
+			const struct kvm_xinterface_ops *ops)
+{
+	struct rb_root *root;
+	struct rb_node **new, *parent = NULL;
+	struct rb_node *node;
+
+	memset(intf, 0, sizeof(*intf));
+	kref_init(&intf->kref);
+	intf->ops = ops;
+
+	mutex_lock(&kvm_registry.lock);
+
+	root = &kvm_registry.root;
+	new  = &(root->rb_node);
+	node = &intf->node;
+
+	/* Figure out where to put new node */
+	while (*new) {
+		long val;
+
+		parent = *new;
+
+		val = node - parent;
+		if (val < 0)
+			new = &((*new)->rb_left);
+		else if (val > 0)
+			new = &((*new)->rb_right);
+		else
+			panic("kvm_xinterface: duplicate entry: %ld\n", val);
+	}
+
+	/* Add new node and rebalance tree. */
+	rb_link_node(node, parent, new);
+	rb_insert_color(node, root);
+
+	/* released when the last xinterface reference is released */
+	__module_get(intf->ops->owner);
+
+	mutex_unlock(&kvm_registry.lock);
+}
+EXPORT_SYMBOL_GPL(kvm_xinterface_register);
+
+/* caller must hold intf->ops->owner */
+void
+kvm_xinterface_unregister(struct kvm_xinterface *intf)
+{
+	mutex_lock(&kvm_registry.lock);
+	rb_erase(&intf->node, &kvm_registry.root);
+	mutex_unlock(&kvm_registry.lock);
+
+	kvm_xinterface_put(intf);
+}
+EXPORT_SYMBOL_GPL(kvm_xinterface_unregister);