diff mbox

PCI passthrough with stubdomain

Message ID 20160923142541.GB31510@mail-itl (mailing list archive)
State New, archived
Headers show

Commit Message

Marek Marczykowski-Górecki Sept. 23, 2016, 2:25 p.m. UTC
On Fri, Sep 23, 2016 at 03:27:07PM +0200, Samuel Thibault wrote:
> Marek Marczykowski-Górecki, on Fri 23 Sep 2016 10:48:14 +0200, wrote:
> > I'm still trying to get PCI passthrough working with stubdomain and
> > without qemu in dom0 (even for just vfb/vkbd backends). How is this
> > supposed to work?
> 
> Just as I recall from my memory:
> 
> > 1. Should xen-pcifront in the target domain be used (and consequently,
> > should xen-pciback be set for it)?
> 
> I guess that could work.

Could, or should? ;)

In the meantime, I've found this in xen-pcifront driver:

	static int __init pcifront_init(void)
	{
		if (!xen_pv_domain() || xen_initial_domain())
			return -ENODEV;

So, it looks like pcifront is not used in PV domain.

> > Currently xen-pciback is set for both
> > stubdomain and target domain, which presents a race condition and
> > xen-pciback refuses to setup one of them.
> 
> Yes, that's expected, for the reason you say.

In the meantime I've tried to modify libxl to not setup pciback for
target domain. This, along with other modifications (see below) improved
the situation.

> * to summarize
> **************
>
> If running PV drivers in the guest, you set the pciback on the guest, be
> it run with stubdom or not. 
> If running plain drivers in the guest,
>   * when not using a stubdom you don't need to set a pciback.
>   * when using a stubdom you need to set a pciback on the stubdom.

Thanks for the explanation!

> So the unfortunate thing is that when using stubdom, you'd have to set
> the pciback either on the guest (to run a PV driver in it), or on the
> stubdom (to run a plain driver in the guest, and let mini-os use PV to
> let qemu pass the board through)

I've tried only on Linux HVM guest and as noted above xen-pcifront
doesn't look to be involved. Is it any different on other OSes?

As for getting PCI passthrough working with stubdomain, for now I hit
those problems:
1. getdomaininfo not allowed from stubdomain (already fixed in master),
2. double setup of pciback (explained above) - for now I've disabled one
of them,
3. race condition between pcifront in mini-os and qemu.

Regarding the last one:
Toolstack during (stub)domain startup setup two things, mostly
asynchronously:
1. pciback/pcifront (through standard xenstore entries)
2. instruct qemu to attach device (by writing "pci-ins" to
device-model/xxx/command)

The later fails if pcifront is not initialized yet. For now I've moved
the second step to be really after the first one, including
synchronization through libxl__wait_for_backend call. 
Is it ok? Or maybe it should be done on stubdomain side (handling
"pci-ins" should wait on pcifront)? I guess the same will apply to
qemu-xen case, or any other stubdomain, so probably better have it in
toolstack, right?

Work-in-progress patches attached. The result is that lspci inside the
guest finally list the device. No idea if it's really working yet.

Comments

Samuel Thibault Sept. 23, 2016, 2:51 p.m. UTC | #1
Marek Marczykowski-Górecki, on Fri 23 Sep 2016 16:25:41 +0200, wrote:
> On Fri, Sep 23, 2016 at 03:27:07PM +0200, Samuel Thibault wrote:
> > Marek Marczykowski-Górecki, on Fri 23 Sep 2016 10:48:14 +0200, wrote:
> > > I'm still trying to get PCI passthrough working with stubdomain and
> > > without qemu in dom0 (even for just vfb/vkbd backends). How is this
> > > supposed to work?
> > 
> > Just as I recall from my memory:
> > 
> > > 1. Should xen-pcifront in the target domain be used (and consequently,
> > > should xen-pciback be set for it)?
> > 
> > I guess that could work.
> 
> Could, or should? ;)

I don't really remember, actually. I do remember doing some passthrough
tests, but I don't remember the exact conditions.

> In the meantime, I've found this in xen-pcifront driver:
> 
> 	static int __init pcifront_init(void)
> 	{
> 		if (!xen_pv_domain() || xen_initial_domain())
> 			return -ENODEV;
> 
> So, it looks like pcifront is not used in PV domain.

? I read it as disabling pcifront when used in an HVM or native domain,
i.e. precisely used in PV domains.

> > So the unfortunate thing is that when using stubdom, you'd have to set
> > the pciback either on the guest (to run a PV driver in it), or on the
> > stubdom (to run a plain driver in the guest, and let mini-os use PV to
> > let qemu pass the board through)
> 
> I've tried only on Linux HVM guest and as noted above xen-pcifront
> doesn't look to be involved. Is it any different on other OSes?

I don't know, perhaps it's different on windows. Or perhaps for windows
we just rely on the qemu pass-through.

> Toolstack during (stub)domain startup setup two things, mostly
> asynchronously:
> 1. pciback/pcifront (through standard xenstore entries)
> 2. instruct qemu to attach device (by writing "pci-ins" to
> device-model/xxx/command)

Ah, since pcifront_watches runs in a separate thread, it may not have
called init_pcifront before qemu calls libpci, i.e. pcifront_scan.

I'd say that's where the fix should be: make pcifront_scan wait for
init_pcifront to be done.  It's however not so simple: we want to make
pcifront_scan do nothing if no pciback is to be launched. My memory is
rusty: is there something we are sure will show up in the xenstore when
a pciback is to be launched?  If so, pcifront_scan could do this: wait
for pcifront_watches to have checked for the potential presence of
pciback. If pciback is not to be started, just do nothing; if pciback is
to be started, wait for init_pcifront to have been called by
pcifront_watches. Then it will find the devices.

Samuel
Konrad Rzeszutek Wilk Sept. 23, 2016, 3:35 p.m. UTC | #2
On Fri, Sep 23, 2016 at 04:25:41PM +0200, Marek Marczykowski-Górecki wrote:
> On Fri, Sep 23, 2016 at 03:27:07PM +0200, Samuel Thibault wrote:
> > Marek Marczykowski-Górecki, on Fri 23 Sep 2016 10:48:14 +0200, wrote:
> > > I'm still trying to get PCI passthrough working with stubdomain and
> > > without qemu in dom0 (even for just vfb/vkbd backends). How is this
> > > supposed to work?
> > 
> > Just as I recall from my memory:
> > 
> > > 1. Should xen-pcifront in the target domain be used (and consequently,
> > > should xen-pciback be set for it)?
> > 
> > I guess that could work.
> 
> Could, or should? ;)
> 
> In the meantime, I've found this in xen-pcifront driver:
> 
> 	static int __init pcifront_init(void)
> 	{
> 		if (!xen_pv_domain() || xen_initial_domain())
> 			return -ENODEV;
> 
> So, it looks like pcifront is not used in PV domain.

You mean in HVM domains.
Marek Marczykowski-Górecki Sept. 23, 2016, 6:47 p.m. UTC | #3
On Fri, Sep 23, 2016 at 11:35:27AM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Sep 23, 2016 at 04:25:41PM +0200, Marek Marczykowski-Górecki wrote:
> > On Fri, Sep 23, 2016 at 03:27:07PM +0200, Samuel Thibault wrote:
> > > Marek Marczykowski-Górecki, on Fri 23 Sep 2016 10:48:14 +0200, wrote:
> > > > I'm still trying to get PCI passthrough working with stubdomain and
> > > > without qemu in dom0 (even for just vfb/vkbd backends). How is this
> > > > supposed to work?
> > > 
> > > Just as I recall from my memory:
> > > 
> > > > 1. Should xen-pcifront in the target domain be used (and consequently,
> > > > should xen-pciback be set for it)?
> > > 
> > > I guess that could work.
> > 
> > Could, or should? ;)
> > 
> > In the meantime, I've found this in xen-pcifront driver:
> > 
> > 	static int __init pcifront_init(void)
> > 	{
> > 		if (!xen_pv_domain() || xen_initial_domain())
> > 			return -ENODEV;
> > 
> > So, it looks like pcifront is not used in PV domain.
> 
> You mean in HVM domains.

Of course.
Marek Marczykowski-Górecki Sept. 23, 2016, 6:56 p.m. UTC | #4
On Fri, Sep 23, 2016 at 04:51:33PM +0200, Samuel Thibault wrote:
> Marek Marczykowski-Górecki, on Fri 23 Sep 2016 16:25:41 +0200, wrote:
> > Toolstack during (stub)domain startup setup two things, mostly
> > asynchronously:
> > 1. pciback/pcifront (through standard xenstore entries)
> > 2. instruct qemu to attach device (by writing "pci-ins" to
> > device-model/xxx/command)
> 
> Ah, since pcifront_watches runs in a separate thread, it may not have
> called init_pcifront before qemu calls libpci, i.e. pcifront_scan.

Yes, exactly.

> I'd say that's where the fix should be: make pcifront_scan wait for
> init_pcifront to be done.  It's however not so simple: we want to make
> pcifront_scan do nothing if no pciback is to be launched. My memory is
> rusty: is there something we are sure will show up in the xenstore when
> a pciback is to be launched?  If so, pcifront_scan could do this: wait
> for pcifront_watches to have checked for the potential presence of
> pciback. If pciback is not to be started, just do nothing; if pciback is
> to be started, wait for init_pcifront to have been called by
> pcifront_watches. Then it will find the devices.

Ok. So, two questions:
1. How to do this? ;) I.e. what synchronization primitives are available
in mini-os? Just pthread_mutex_lock/unlock?

2. Wouldn't the same problem be with other stubdomain implementations?
Like Linux + qemu-xen or rumprun + qemu-xen? At least in Linux case
pcifront driver will also needs some time for initialization...
Samuel Thibault Sept. 23, 2016, 9 p.m. UTC | #5
Marek Marczykowski-Górecki, on Fri 23 Sep 2016 20:56:43 +0200, wrote:
> 1. How to do this? ;) I.e. what synchronization primitives are available
> in mini-os? Just pthread_mutex_lock/unlock?

pthread_mutex_lock are nops :o) because we don't have pthread_create.
But for mini-os itself there are synchronization primitives, yes:
there are also semaphores (./include/semaphore.h) and waitqueues
(include/wait.h).

> 2. Wouldn't the same problem be with other stubdomain implementations?
> Like Linux + qemu-xen or rumprun + qemu-xen? At least in Linux case
> pcifront driver will also needs some time for initialization...

Possibly, I don't know about them, they didn't exist at the time ;)

Samuel
Marek Marczykowski-Górecki Sept. 23, 2016, 9:04 p.m. UTC | #6
On Fri, Sep 23, 2016 at 11:00:42PM +0200, Samuel Thibault wrote:
> Marek Marczykowski-Górecki, on Fri 23 Sep 2016 20:56:43 +0200, wrote:
> > 1. How to do this? ;) I.e. what synchronization primitives are available
> > in mini-os? Just pthread_mutex_lock/unlock?
> 
> pthread_mutex_lock are nops :o) because we don't have pthread_create.
> But for mini-os itself there are synchronization primitives, yes:
> there are also semaphores (./include/semaphore.h) and waitqueues
> (include/wait.h).

Ok, will take a look at this. Thanks.

> > 2. Wouldn't the same problem be with other stubdomain implementations?
> > Like Linux + qemu-xen or rumprun + qemu-xen? At least in Linux case
> > pcifront driver will also needs some time for initialization...
> 
> Possibly, I don't know about them, they didn't exist at the time ;)

Actually they don't exist even now ;) But hopefully will, in the near
future...
diff mbox

Patch

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index c6862b8..5625ef2 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1429,7 +1429,7 @@  static void domcreate_attach_pci(libxl__egc *egc, libxl__multidev *multidev,
         }
     }
 
-    if (d_config->num_pcidevs > 0) {
+    if (d_config->num_pcidevs > 0 && d_config->c_info.type == LIBXL_DOMAIN_TYPE_PV) {
         ret = libxl__create_pci_backend(gc, domid, d_config->pcidevs,
             d_config->num_pcidevs);
         if (ret < 0) {
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 19c597e..2942772 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1001,7 +1001,7 @@  out:
         }
     }
 
-    if (!starting)
+    if (!starting && !hvm)
         rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting);
     else
         rc = 0;