QEMU as non-root and PCI passthrough do not mix
diff mbox

Message ID alpine.DEB.2.02.1601121645431.13564@kaball.uk.xensource.com
State New, archived
Headers show

Commit Message

Stefano Stabellini Jan. 12, 2016, 4:52 p.m. UTC
PCI passthrough cannot work if QEMU is run as a non-root process today,
as QEMU needs to open /dev/mem to mmap the MSI-X table of the device and
read/write relevant nodes on sysfs.

Update the docs to reflect that.

Run QEMU as root and print a warning if at least one PCI device has been
assigned to the guest at domain creation. Print a debug message on pci
hotplug.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

Comments

Ian Campbell Jan. 14, 2016, 10:39 a.m. UTC | #1
On Tue, 2016-01-12 at 16:52 +0000, Stefano Stabellini wrote:
> PCI passthrough cannot work if QEMU is run as a non-root process today,
> as QEMU needs to open /dev/mem to mmap the MSI-X table of the device and
> read/write relevant nodes on sysfs.
> 
> Update the docs to reflect that.
> 
> Run QEMU as root and print a warning if at least one PCI device has been
> assigned to the guest at domain creation. Print a debug message on pci
> hotplug.
> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> 
> diff --git a/docs/misc/qemu-deprivilege.txt b/docs/misc/qemu-
> deprivilege.txt
> index dde74ab..cf52547 100644
> --- a/docs/misc/qemu-deprivilege.txt
> +++ b/docs/misc/qemu-deprivilege.txt
> @@ -29,3 +29,13 @@ adduser --no-create-home --system xen-qemuuser-shared
>  
>  3) root
>  As a last resort, libxl will start QEMU as root.
> +
> +
> +Please note that QEMU will still be run as root when PCI devices are
> +assigned to the virtual machine (if you specified pci=["$PCI_BDF"] in
> +your VM config file, where $PCI_BDF is the PCI BDF of the device you
> +want to assign). If you want to hotplug a PCI device sometime after the
> +VM has started, you need to make sure that the QEMU instance of that VM
> +has root privileges (for example by not specifying either
> +xen-qemuuser-shared or xen-qemuuser-domid$domid, or by giving root
> +privileges to xen-qemuuser-domid$domid).
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index 0aaefd9..6b98750 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -1254,6 +1254,12 @@ static int
> libxl__build_device_model_args_new(libxl__gc *gc,
>              break;
>          }
>  
> +        /* Do not run QEMU as non-root if PCI devices are assigned */
> +        if (guest_config->num_pcidevs > 0) {
> +            LOG(WARN, "Cannot run QEMU as non-root when PCI devices are
> being assigned to the guest VM");
> +            goto end_search;
> +        }

What if b_info->device_model_user is NULL or == "root"? Doesn't this warn
even then?

Conversely if it is != root and num_pcidevs > 0 then it ought to error out,
since running as root when the config explicitly says otherwise would be
wrong I think.

> +
>          if (b_info->device_model_user) {
>              user = b_info->device_model_user;
>              goto end_search;
> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index dc10cb7..04d0dd4 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -1176,6 +1176,9 @@ int libxl_device_pci_add(libxl_ctx *ctx, uint32_t
> domid,
>  {
>      AO_CREATE(ctx, domid, ao_how);
>      int rc;
> +
> +    LOG(DEBUG, "QEMU needs to be run as root for PCI passthrough to work");

Shouldn't there be an if here, and/or an error return?

> +
>      rc = libxl__device_pci_add(gc, domid, pcidev, 0);
>      libxl__ao_complete(egc, ao, rc);
>      return AO_INPROGRESS;
Stefano Stabellini Jan. 14, 2016, 5:32 p.m. UTC | #2
On Thu, 14 Jan 2016, Ian Campbell wrote:
> On Tue, 2016-01-12 at 16:52 +0000, Stefano Stabellini wrote:
> > PCI passthrough cannot work if QEMU is run as a non-root process today,
> > as QEMU needs to open /dev/mem to mmap the MSI-X table of the device and
> > read/write relevant nodes on sysfs.
> > 
> > Update the docs to reflect that.
> > 
> > Run QEMU as root and print a warning if at least one PCI device has been
> > assigned to the guest at domain creation. Print a debug message on pci
> > hotplug.
> > 
> > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > 
> > diff --git a/docs/misc/qemu-deprivilege.txt b/docs/misc/qemu-
> > deprivilege.txt
> > index dde74ab..cf52547 100644
> > --- a/docs/misc/qemu-deprivilege.txt
> > +++ b/docs/misc/qemu-deprivilege.txt
> > @@ -29,3 +29,13 @@ adduser --no-create-home --system xen-qemuuser-shared
> >  
> >  3) root
> >  As a last resort, libxl will start QEMU as root.
> > +
> > +
> > +Please note that QEMU will still be run as root when PCI devices are
> > +assigned to the virtual machine (if you specified pci=["$PCI_BDF"] in
> > +your VM config file, where $PCI_BDF is the PCI BDF of the device you
> > +want to assign). If you want to hotplug a PCI device sometime after the
> > +VM has started, you need to make sure that the QEMU instance of that VM
> > +has root privileges (for example by not specifying either
> > +xen-qemuuser-shared or xen-qemuuser-domid$domid, or by giving root
> > +privileges to xen-qemuuser-domid$domid).
> > diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> > index 0aaefd9..6b98750 100644
> > --- a/tools/libxl/libxl_dm.c
> > +++ b/tools/libxl/libxl_dm.c
> > @@ -1254,6 +1254,12 @@ static int
> > libxl__build_device_model_args_new(libxl__gc *gc,
> >              break;
> >          }
> >  
> > +        /* Do not run QEMU as non-root if PCI devices are assigned */
> > +        if (guest_config->num_pcidevs > 0) {
> > +            LOG(WARN, "Cannot run QEMU as non-root when PCI devices are
> > being assigned to the guest VM");
> > +            goto end_search;
> > +        }
> 
> What if b_info->device_model_user is NULL or == "root"? Doesn't this warn
> even then?

I meant to warn even if device_model_user is NULL because it is the
default and I think it is fair to inform the user about this. But I
think you are right that we don't want to warn if device_model_user is
specified as "root".


> Conversely if it is != root and num_pcidevs > 0 then it ought to error out,
> since running as root when the config explicitly says otherwise would be
> wrong I think.

OK, I'll error out in that case.


> > +
> >          if (b_info->device_model_user) {
> >              user = b_info->device_model_user;
> >              goto end_search;
> > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > index dc10cb7..04d0dd4 100644
> > --- a/tools/libxl/libxl_pci.c
> > +++ b/tools/libxl/libxl_pci.c
> > @@ -1176,6 +1176,9 @@ int libxl_device_pci_add(libxl_ctx *ctx, uint32_t
> > domid,
> >  {
> >      AO_CREATE(ctx, domid, ao_how);
> >      int rc;
> > +
> > +    LOG(DEBUG, "QEMU needs to be run as root for PCI passthrough to work");
> 
> Shouldn't there be an if here, and/or an error return?

Unfortunately we cannot get the user used to run QEMU with from here.
However, even without this change, there is already plenty of
information printed out:

- xl prints:
libxl: error: libxl_qmp.c:287:qmp_handle_error_response: received an
error message from QMP server: Device initialization failed

- qemu prints:
[00:03.0] xen_pt_initfn: Error: Failed to "open" the real pci device. rc: -13


> > +
> >      rc = libxl__device_pci_add(gc, domid, pcidev, 0);
> >      libxl__ao_complete(egc, ao, rc);
> >      return AO_INPROGRESS;
Ian Jackson Jan. 14, 2016, 5:34 p.m. UTC | #3
Stefano Stabellini writes ("Re: [PATCH] QEMU as non-root and PCI passthrough do not mix"):
> On Thu, 14 Jan 2016, Ian Campbell wrote:
> > What if b_info->device_model_user is NULL or == "root"? Doesn't this warn
> > even then?
> 
> I meant to warn even if device_model_user is NULL because it is the
> default and I think it is fair to inform the user about this. But I
> think you are right that we don't want to warn if device_model_user is
> specified as "root".

Much of the logic here is upended by what is now my qemu privsep
series.  Is it really worth fine-tuning the default handling here ?

Ian.
Ian Campbell Jan. 14, 2016, 5:40 p.m. UTC | #4
On Thu, 2016-01-14 at 17:32 +0000, Stefano Stabellini wrote:

> > > +    LOG(DEBUG, "QEMU needs to be run as root for PCI passthrough to
> > > work");
> > 
> > Shouldn't there be an if here, and/or an error return?
> 
> Unfortunately we cannot get the user used to run QEMU with from here.
> However, even without this change, there is already plenty of
> information printed out:
> 
> - xl prints:
> libxl: error: libxl_qmp.c:287:qmp_handle_error_response: received an
> error message from QMP server: Device initialization failed
> 
> - qemu prints:
> [00:03.0] xen_pt_initfn: Error: Failed to "open" the real pci device. rc:
> -13

All that is under normal (i.e. successful) usage? The presence of such spew
under those circumstances should probably be considered a bug. Certainly
adding more noise doesn't seem desirable.

libxl should be able to get the user, either from the json stash of the
config or by writing it somewhere on create so it can get it back when it
needs it.

That reminds me -- how does the qemu user selection work over migrate? Does
it remember the specific user or does it try and pick it again on the other
end? Do we require all hosts in a pool to have been setup in the same way
i.e. either xen-qemuuser-domid$i or xen-qemuuser-shared but not a mixture?

Ian.
Ian Jackson Jan. 14, 2016, 5:42 p.m. UTC | #5
Ian Campbell writes ("Re: [PATCH] QEMU as non-root and PCI passthrough do not mix"):
> That reminds me -- how does the qemu user selection work over migrate? Does
> it remember the specific user or does it try and pick it again on the other
> end? Do we require all hosts in a pool to have been setup in the same way
> i.e. either xen-qemuuser-domid$i or xen-qemuuser-shared but not a mixture?

I forget how it was before my RFC privsep series, but afterwards the
user is defaulted into the configuration during domain creation and I
would expect it to be transferred with the config to the destination.

So the destination needs to have whatever user the source chose to
use.  I think that's a reasonable restriction.

Ian.
Ian Campbell Jan. 15, 2016, 9:53 a.m. UTC | #6
On Thu, 2016-01-14 at 17:42 +0000, Ian Jackson wrote:
> Ian Campbell writes ("Re: [PATCH] QEMU as non-root and PCI passthrough do
> not mix"):
> > That reminds me -- how does the qemu user selection work over migrate?
> > Does
> > it remember the specific user or does it try and pick it again on the
> > other
> > end? Do we require all hosts in a pool to have been setup in the same
> > way
> > i.e. either xen-qemuuser-domid$i or xen-qemuuser-shared but not a
> > mixture?
> 
> I forget how it was before my RFC privsep series, but afterwards the
> user is defaulted into the configuration during domain creation and I
> would expect it to be transferred with the config to the destination.
> 
> So the destination needs to have whatever user the source chose to
> use.  I think that's a reasonable restriction.

So do I, but I think it should be written in docs/misc/qemu-deprivilege.txt 
at least.

Ian.
Stefano Stabellini Jan. 15, 2016, 2:14 p.m. UTC | #7
On Thu, 14 Jan 2016, Ian Jackson wrote:
> Stefano Stabellini writes ("Re: [PATCH] QEMU as non-root and PCI passthrough do not mix"):
> > On Thu, 14 Jan 2016, Ian Campbell wrote:
> > > What if b_info->device_model_user is NULL or == "root"? Doesn't this warn
> > > even then?
> > 
> > I meant to warn even if device_model_user is NULL because it is the
> > default and I think it is fair to inform the user about this. But I
> > think you are right that we don't want to warn if device_model_user is
> > specified as "root".
> 
> Much of the logic here is upended by what is now my qemu privsep
> series.  Is it really worth fine-tuning the default handling here ?

Fair enough.

Do you prefer the PCI passthrough case to be fixed now or after your
series? In other words, do you prefer this patch to go in now, or after
your series?
Konrad Rzeszutek Wilk Jan. 15, 2016, 2:29 p.m. UTC | #8
On Fri, Jan 15, 2016 at 02:14:40PM +0000, Stefano Stabellini wrote:
> On Thu, 14 Jan 2016, Ian Jackson wrote:
> > Stefano Stabellini writes ("Re: [PATCH] QEMU as non-root and PCI passthrough do not mix"):
> > > On Thu, 14 Jan 2016, Ian Campbell wrote:
> > > > What if b_info->device_model_user is NULL or == "root"? Doesn't this warn
> > > > even then?
> > > 
> > > I meant to warn even if device_model_user is NULL because it is the
> > > default and I think it is fair to inform the user about this. But I
> > > think you are right that we don't want to warn if device_model_user is
> > > specified as "root".
> > 
> > Much of the logic here is upended by what is now my qemu privsep
> > series.  Is it really worth fine-tuning the default handling here ?
> 
> Fair enough.
> 
> Do you prefer the PCI passthrough case to be fixed now or after your
> series? In other words, do you prefer this patch to go in now, or after
> your series?

I hadn't chance to look at the patch but I was wondering if we also
take into account the pci-assign and pci-detach on a guest that booted
without PCI passthrough?
Stefano Stabellini Jan. 15, 2016, 2:49 p.m. UTC | #9
On Fri, 15 Jan 2016, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 15, 2016 at 02:14:40PM +0000, Stefano Stabellini wrote:
> > On Thu, 14 Jan 2016, Ian Jackson wrote:
> > > Stefano Stabellini writes ("Re: [PATCH] QEMU as non-root and PCI passthrough do not mix"):
> > > > On Thu, 14 Jan 2016, Ian Campbell wrote:
> > > > > What if b_info->device_model_user is NULL or == "root"? Doesn't this warn
> > > > > even then?
> > > > 
> > > > I meant to warn even if device_model_user is NULL because it is the
> > > > default and I think it is fair to inform the user about this. But I
> > > > think you are right that we don't want to warn if device_model_user is
> > > > specified as "root".
> > > 
> > > Much of the logic here is upended by what is now my qemu privsep
> > > series.  Is it really worth fine-tuning the default handling here ?
> > 
> > Fair enough.
> > 
> > Do you prefer the PCI passthrough case to be fixed now or after your
> > series? In other words, do you prefer this patch to go in now, or after
> > your series?
> 
> I hadn't chance to look at the patch but I was wondering if we also
> take into account the pci-assign and pci-detach on a guest that booted
> without PCI passthrough?

A guest booted without PCI passthrough, with QEMU running as root, would
work as usual. But if QEMU is running as non-root, pci-assign wouldn't
work. The errors printed out by xl and QEMU respectively are the
followings:

libxl: error: libxl_qmp.c:287:qmp_handle_error_response: received an
error message from QMP server: Device initialization failed

[00:03.0] xen_pt_initfn: Error: Failed to "open" the real pci device. rc: -13

In this patch I only added one more debugging message to pci-assign,
informing the user that pci-assign requires QEMU running as root.

After IanJ series we'll be able to do better.
Ian Campbell Jan. 15, 2016, 2:56 p.m. UTC | #10
On Fri, 2016-01-15 at 14:49 +0000, Stefano Stabellini wrote:
> libxl: error: libxl_qmp.c:287:qmp_handle_error_response: received an
> error message from QMP server: Device initialization failed

Can we also get a patch added to qemu which provides something more
meaningful if the reason it can't succeed is lack of privilege?

I think that can proceed in parallel to the rest?

Ian.

Patch
diff mbox

diff --git a/docs/misc/qemu-deprivilege.txt b/docs/misc/qemu-deprivilege.txt
index dde74ab..cf52547 100644
--- a/docs/misc/qemu-deprivilege.txt
+++ b/docs/misc/qemu-deprivilege.txt
@@ -29,3 +29,13 @@  adduser --no-create-home --system xen-qemuuser-shared
 
 3) root
 As a last resort, libxl will start QEMU as root.
+
+
+Please note that QEMU will still be run as root when PCI devices are
+assigned to the virtual machine (if you specified pci=["$PCI_BDF"] in
+your VM config file, where $PCI_BDF is the PCI BDF of the device you
+want to assign). If you want to hotplug a PCI device sometime after the
+VM has started, you need to make sure that the QEMU instance of that VM
+has root privileges (for example by not specifying either
+xen-qemuuser-shared or xen-qemuuser-domid$domid, or by giving root
+privileges to xen-qemuuser-domid$domid).
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 0aaefd9..6b98750 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -1254,6 +1254,12 @@  static int libxl__build_device_model_args_new(libxl__gc *gc,
             break;
         }
 
+        /* Do not run QEMU as non-root if PCI devices are assigned */
+        if (guest_config->num_pcidevs > 0) {
+            LOG(WARN, "Cannot run QEMU as non-root when PCI devices are being assigned to the guest VM");
+            goto end_search;
+        }
+
         if (b_info->device_model_user) {
             user = b_info->device_model_user;
             goto end_search;
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index dc10cb7..04d0dd4 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1176,6 +1176,9 @@  int libxl_device_pci_add(libxl_ctx *ctx, uint32_t domid,
 {
     AO_CREATE(ctx, domid, ao_how);
     int rc;
+
+    LOG(DEBUG, "QEMU needs to be run as root for PCI passthrough to work");
+
     rc = libxl__device_pci_add(gc, domid, pcidev, 0);
     libxl__ao_complete(egc, ao, rc);
     return AO_INPROGRESS;