diff mbox

[28/28] libxl: xsrestrict QEMU

Message ID 1450809903-3393-29-git-send-email-ian.jackson@eu.citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ian Jackson Dec. 22, 2015, 6:45 p.m. UTC
If QEMU supports xsrestrict, pass xsrestrict=on to it (by default).

XXX We need to do this only if xenstored supports it, and AFAICT there
is not a particularly easy way to test this.  Should we open a new
test xenstore connection to query this information ?  We could do this
once per libxl ctx.

Allow the user to specify that xsrestrict should be on, in which case
if it qemu cannot be depriv'd, we fail.

When qemu is depriv'd it still needs write access to the physmap
records in xenstore.  It will be running with the privilege of the
domain, so we need to allow the domain write access to
 /local/domain/$LIBXL_TOOLSTACK_DOMID/device-model/$DOMID/physmap

It doesn't contain any information the guest doesn't already know.
However be aware that it still means relaxing a tools/guest
interface.

The guest physmap contains the memory layout of the guest.  The guest
is already in possession of this information.  A malicious guest could
modify the physmap entries causing QEMU to read wrong information at
restore time. QEMU would populate its XenPhysmap list with wrong
entries that wouldn't match its own device emulation addresses,
leading to a failure in restoring the domain.  But it could not
compromise the security of the system because the addresses are only
used for checks against QEMU's addresses.

Is it dangerous to let the guest write values that later QEMU and
libxl read back?  Let's dig a bit deeper into the code. QEMU and libxl
use very similar functions to parse the physmap entries.

Let's start checking libxl.  The function that reads the physmap is
libxl__toolstack_save.  It calls:

* libxl__xs_directory on /physmap
  This is safe.

* libxl__xs_read
  If the path is wrong (nothing is there), it returns NULL, and it is
  handled correctly.

* strtoll on the values read
  The values are under guest control but strtoll can handle bad
  formats.  strtoll always returns an unsigned long long.  In case of
  errors, it can return LLONG_MIN or LLONG_MAX.  libxl__toolstack_save
  doesn't check for conversion errors, but it never uses the returned
  values anywhere.  That's OK.  libxl__toolstack_restore writes back
  the values to xenstore.

So in case of errors:

1) libxl__toolstack_save could return early with an error, if the
xenstore paths are wrong (nothing is on xenstore)

2) libxl__toolstack_restore could write back to xenstore any unsigned
long long, including LLONG_MIN or LLONG_MAX

Either way libxl is fine.

From the QEMU side, the code is very similar and uses strtoll to parse
the entries, that can deal with bad input values.  The values are used
to avoid memory allocations at restore time for memory that has
already been allocated (video memory).  If the values are wrong, QEMU
will attempt another memory allocation, failing because the maxmem
limit has been reached.

Either way QEMU is fine.

There are no other out-of-guest consumers of this information.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

---
v6: Merge the xsrestrict and the permissions update patches, so that
      the permission relaxation is done only when needed.
    Rip out qemu feature checking code, now in new (earlier) patch
    Rip out emulator_id, now in new (earlier) patch
    Provide and document a config option to control restriction.
    Make the default depend on specified dm user, and on stubdomness.
    Arrange that when we are going to xsrestrict, we will run two qemus,
      as required
    Move rwperm initialisaton higher up the relevant function to help
      avoid future changes using it uninitialised.
    Grant write access on to the physmap subdirectory, and drop
      discussion of the rest from the commit message, and adjust
      the xenstore doc accordingly.
    Move the physmap number of entries limit into its own patch.

v5 (permissions):
    improve commit message with security details

v4 (permissions):
    add error message in case count > MAX_PHYSMAP_ENTRIES
    add a note to xenstore-paths.markdown about the possible change in
     privilege level
    only change permissions if xsrestrict is supported

v4 (xsrestrict):
    update xenstore-paths.markdown

v3 (xsrestrict):
    add emulator_ids
    mark as WIP

v3 (permissions):
    use LIBXL_TOOLSTACK_DOMID instead of 0 in the commit message
    update commit message with more info on why it is safe
    add a limit on the number of physmap entries to save and restore

v2 (permissioins):
    fix permissions to actually do what intended
    use LIBXL_TOOLSTACK_DOMID instead of 0
---
 docs/man/xl.cfg.pod.5             |   14 ++++++++++
 docs/misc/xenstore-paths.markdown |    7 +++++
 tools/libxl/libxl_create.c        |   51 +++++++++++++++++++++++++++++++++++--
 tools/libxl/libxl_dm.c            |   30 ++++++++++++++++++++--
 tools/libxl/libxl_types.idl       |    1 +
 tools/libxl/xl_cmdimpl.c          |    2 ++
 6 files changed, 101 insertions(+), 4 deletions(-)

Comments

Ian Campbell Jan. 7, 2016, 5:36 p.m. UTC | #1
On Tue, 2015-12-22 at 18:45 +0000, Ian Jackson wrote:
> If QEMU supports xsrestrict, pass xsrestrict=on to it (by default).
> 
> XXX We need to do this only if xenstored supports it, and AFAICT there
> is not a particularly easy way to test this.  Should we open a new
> test xenstore connection to query this information ?  We could do this
> once per libxl ctx.

Is this because the support is in oxenstored ^ cxenstored?

Otherwise I would argue that the toolstack should expect the xenstored to
be at least as new as it is.

> Let's start checking libxl.  The function that reads the physmap is
> libxl__toolstack_save.  It calls:
> 
> * libxl__xs_directory on /physmap
>   This is safe.

Potentially very many entries and hence a big return array? Are the guest
quotas sufficient to not worry about this?

> Either way libxl is fine.
> 
[...]
> Either way QEMU is fine.

Perhaps we should add a code comment to each of these places noting that
the values are guest controlled, since it is a bit more unexpected in this
case than e.g. frontend dirs.

> diff --git a/docs/misc/xenstore-paths.markdown b/docs/misc/xenstore-
> paths.markdown
> index 1dc54ac..27981de 100644
> --- a/docs/misc/xenstore-paths.markdown
> +++ b/docs/misc/xenstore-paths.markdown
> @@ -339,6 +339,13 @@ A PV console backend. Described in
> [console.txt](console.txt)
>  Information relating to device models running in the domain. $DOMID is
>  the target domain of the device model.
>  
> +#### ~/device-model/$DOMID/physmap [w]
> +
> +Information about an HVM domain's physical memory map.  This is
> +written as well as read by device models which may run at the same
> +privilege level of the guest domain.  When the device model ruus with

"runs"

> +system privilege, this path is INTERNAL.
Ian Jackson Jan. 8, 2016, 2:38 p.m. UTC | #2
Ian Campbell writes ("Re: [PATCH 28/28] libxl: xsrestrict QEMU"):
> On Tue, 2015-12-22 at 18:45 +0000, Ian Jackson wrote:
> > If QEMU supports xsrestrict, pass xsrestrict=on to it (by default).
> > 
> > XXX We need to do this only if xenstored supports it, and AFAICT there
> > is not a particularly easy way to test this.  Should we open a new
> > test xenstore connection to query this information ?  We could do this
> > once per libxl ctx.
> 
> Is this because the support is in oxenstored ^ cxenstored?

Yes.

> Otherwise I would argue that the toolstack should expect the xenstored to
> be at least as new as it is.

True.

> > * libxl__xs_directory on /physmap
> >   This is safe.
> 
> Potentially very many entries and hence a big return array? Are the guest
> quotas sufficient to not worry about this?

Yes, the quota has to deal with that anyway (since the directory is in
core in xenstored, anyway).

> Perhaps we should add a code comment to each of these places noting that
> the values are guest controlled, since it is a bit more unexpected in this
> case than e.g. frontend dirs.

This is a good idea.

> > +Information about an HVM domain's physical memory map.  This is
> > +written as well as read by device models which may run at the same
> > +privilege level of the guest domain.  When the device model ruus with
> 
> "runs"

Fixed, thanks.

Ian.
Stefano Stabellini April 10, 2016, 7:52 p.m. UTC | #3
On Tue, 22 Dec 2015, Ian Jackson wrote:
> If QEMU supports xsrestrict, pass xsrestrict=on to it (by default).
> 
> XXX We need to do this only if xenstored supports it, and AFAICT there
> is not a particularly easy way to test this.  Should we open a new
> test xenstore connection to query this information ?  We could do this
> once per libxl ctx.
> 
> Allow the user to specify that xsrestrict should be on, in which case
> if it qemu cannot be depriv'd, we fail.
> 
> When qemu is depriv'd it still needs write access to the physmap
> records in xenstore.  It will be running with the privilege of the
> domain, so we need to allow the domain write access to
>  /local/domain/$LIBXL_TOOLSTACK_DOMID/device-model/$DOMID/physmap

QEMU also needs to be able to write to "state" under
/local/domain/$LIBXL_TOOLSTACK_DOMID/device-model/$DOMID

I guess theoretically it might be possible to restrict the xenstore
connection on the QEMU side only after switching to the "running" state,
however that's not how it was done in the early implementation at least:

http://marc.info/?i=1433173614-19716-2-git-send-email-stefano.stabellini%40eu.citrix.com
diff mbox

Patch

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index dd93725..7c0bac3 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -1842,6 +1842,20 @@  Run the device model as user "username", instead of
 xen-qemudepriv-domid$domid or xen-qemudepriv-shared or root.  The
 default is to use the first of these users which exists.
 
+=item B<device_model_restrict=BOOLEAN>
+
+Restrict the device model's access to hypervisor facilities (such as
+xenstore).  The default is true if device_model_user is not `root` and
+the qemu in use supports this option (ie, when the restriction is both
+possible and beneficial), or false otherwise.
+
+If the option is set to true (1) but the restriction is not possible
+or not effective, the domain configuration is rejected.
+
+This option is ignored when a stub domain device model is in use,
+because in that situation the device model is restricted by virtue of
+being in its own domain.
+
 =back
 
 =head2 Keymaps
diff --git a/docs/misc/xenstore-paths.markdown b/docs/misc/xenstore-paths.markdown
index 1dc54ac..27981de 100644
--- a/docs/misc/xenstore-paths.markdown
+++ b/docs/misc/xenstore-paths.markdown
@@ -339,6 +339,13 @@  A PV console backend. Described in [console.txt](console.txt)
 Information relating to device models running in the domain. $DOMID is
 the target domain of the device model.
 
+#### ~/device-model/$DOMID/physmap [w]
+
+Information about an HVM domain's physical memory map.  This is
+written as well as read by device models which may run at the same
+privilege level of the guest domain.  When the device model ruus with
+system privilege, this path is INTERNAL.
+
 #### ~/device-model/$DOMID[/$EMULATOR_ID]/* [INTERNAL]
 
 Information relating to device models running in the domain. $DOMID is
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 59dfcd67..5182d2b 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -478,6 +478,49 @@  static int domcreate_setdefault_dm_user(libxl__gc *gc,
     return rc;
 }
 
+static int domcreate_setdefault_dm_restrict(libxl__gc *gc,
+                                        libxl__domain_create_state *dcs)
+{
+    /* convenience aliases */
+    libxl_domain_config *const d_config = dcs->guest_config;
+    libxl_domain_build_info *b_info = &d_config->b_info;
+
+    if (!libxl_defbool_is_default(b_info->device_model_restrict) &&
+        !libxl_defbool_val(b_info->device_model_restrict))
+        /* explicitly false */
+        return 0;
+
+    if (b_info->type != LIBXL_DOMAIN_TYPE_HVM ||
+        libxl_defbool_val(b_info->device_model_stubdomain)) {
+        /* in theory we don't care, so default it to true for
+         * the benefit of future callers or in case of bugs */
+        libxl_defbool_setdefault(&b_info->device_model_restrict, 1);
+        return 0;
+    }
+
+    const char *dm = libxl__domain_device_model(gc, b_info);
+
+    const char *cannot_restrict =
+        !strcmp(b_info->device_model_user, "root")
+        ? "device model user is root" :
+        !libxl__dm_supported(gc, dm, libxl__dm_support_check__xsrestrict)
+        ? "device model does not support xs_restrict" :
+        !libxl__dm_supported(gc, dm, libxl__dm_support_check__emulator_id)
+        ? "device model does not support emulator_id" :
+        0;
+
+    libxl_defbool_setdefault(&b_info->device_model_restrict, !cannot_restrict);
+
+    if (libxl_defbool_val(b_info->device_model_restrict) &&
+        cannot_restrict) {
+        LOG(ERROR, "device model restriction desired but not possible: %s",
+            cannot_restrict);
+        return ERROR_INVAL;
+    }
+
+    return 0;
+}
+
 static void init_console_info(libxl__gc *gc,
                              libxl__device_console *console,
                              int dev_num)
@@ -1075,11 +1118,15 @@  static void domcreate_dm_support_checked(libxl__egc *egc,
     rc = domcreate_setdefault_dm_user(gc, dcs);
     if (rc) goto out;
 
+    rc = domcreate_setdefault_dm_restrict(gc, dcs);
+    if (rc) goto out;
+
     /* run two qemus? */
     if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM &&
         !libxl_defbool_val(d_config->b_info.device_model_stubdomain) &&
-        b_info->device_model_user &&
-        strcmp(b_info->device_model_user, "root")) {
+        ((b_info->device_model_user &&
+          strcmp(b_info->device_model_user, "root")) ||
+         libxl_defbool_val(d_config->b_info.device_model_restrict))) {
         rc = libxl__dm_emuidmap_add(gc, domid, b_state, EMUID_SPLIT);
         if (rc) goto out;
     }
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index d805800..0ee956f 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -420,6 +420,14 @@  static int libxl__build_device_model_args_old(libxl__gc *gc,
             b_info->device_model_user);
         return ERROR_INVAL;
     }
+    if (libxl_defbool_val(b_info->device_model_restrict) &&
+        !libxl_defbool_val(b_info->device_model_stubdomain)) {
+        /* The dm support check ought to prevent this, but in case
+         * someone has done something odd, we double check. */
+        LOG(ERROR,
+            "device_model_restrict not supported by qemu-xen-traditional");
+        return ERROR_INVAL;
+    }
 
     flexarray_vappend(dm_args, dm,
                       "-d", GCSPRINTF("%d", domid), NULL);
@@ -1238,8 +1246,14 @@  static int libxl__build_device_model_args_new(libxl__gc *gc,
         }
         if (state->emuidmap & (1u << EMUID_SPLIT)) {
             flexarray_append(dm_args, "-xenopts");
-            flexarray_append(dm_args,
-                             GCSPRINTF("emulator_id=%u", emuid));
+            flexarray_append(dm_args, GCSPRINTF("%semulator_id=%u",
+                    libxl_defbool_val(b_info->device_model_restrict)
+                    ? "xsrestrict=on," : "",
+                                                emuid));
+        } else {
+            /* xsrestrict only makes sense with split qemu because
+             * the pv devices need unrestricted access */
+            assert(!libxl_defbool_val(b_info->device_model_restrict));
         }
     }
     flexarray_append(dm_args, NULL);
@@ -1760,11 +1774,17 @@  void libxl__spawn_local_dm(libxl__egc *egc, libxl__dm_spawn_state *dmss,
     char **pass_stuff;
     const char *dm;
     int dm_state_fd = -1;
+    struct xs_permissions rwperm[2];
 
     if (libxl_defbool_val(b_info->device_model_stubdomain)) {
         abort();
     }
 
+    rwperm[0].id = LIBXL_TOOLSTACK_DOMID;
+    rwperm[0].perms = XS_PERM_NONE;
+    rwperm[1].id = domid;
+    rwperm[1].perms = XS_PERM_WRITE;
+
     rc = libxl__dm_emuidmap_add(gc, domid, state, emuid);
     if (rc) goto out;
 
@@ -1804,6 +1824,12 @@  void libxl__spawn_local_dm(libxl__egc *egc, libxl__dm_spawn_state *dmss,
                                        domid, emuid, "");
     xs_mkdir(ctx->xsh, XBT_NULL, path);
 
+    if (libxl_defbool_val(b_info->device_model_restrict)) {
+        rc = libxl__xs_mknod(gc, XBT_NULL, GCSPRINTF("%s/physmap", path),
+                             rwperm, 2);
+        if (rc) goto out;
+    }
+
     if (b_info->type == LIBXL_DOMAIN_TYPE_HVM &&
         b_info->device_model_version
         == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL)
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 9658356..d5890a8 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -443,6 +443,7 @@  libxl_domain_build_info = Struct("domain_build_info",[
     ("device_model_ssidref", uint32),
     ("device_model_ssid_label", string),
     ("device_model_user", string),
+    ("device_model_restrict", libxl_defbool),
 
     # extra parameters pass directly to qemu, NULL terminated
     ("extra",            libxl_string_list),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index f9933cb..bdef4dc 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -2176,6 +2176,8 @@  skip_vfb:
         }
     }
 
+    xlu_cfg_get_defbool(config, "device_model_restrict",
+                        &b_info->device_model_restrict, 0);
 
     xlu_cfg_replace_string (config, "device_model_override",
                             &b_info->device_model, 0);