mbox series

[v5,00/21] Add support for qemu-xen runnning in a Linux-based stubdomain

Message ID 20200428040433.23504-1-jandryuk@gmail.com (mailing list archive)
Headers show
Series Add support for qemu-xen runnning in a Linux-based stubdomain | expand

Message

Jason Andryuk April 28, 2020, 4:04 a.m. UTC
Hi,

In coordination with Marek, I'm making a submission of his patches for Linux
stubdomain device-model support.  I made a few of my own additions, but Marek
did the heavy lifting.  Thank you, Marek.

Below is mostly the v4 cover leter with a few additions.

General idea is to allow freely set device_model_version and
device_model_stubdomain_override and choose the right options based on this
choice.  Also, allow to specific path to stubdomain kernel/ramdisk, for greater
flexibility.

First two patches add documentation about expected toolstack-stubdomain-qemu
interface, both for MiniOS stubdomain and Linux stubdomain.

Initial version has no QMP support - in initial patches it is completely
disabled, which means no suspend/restore and no PCI passthrough.

Later patches add QMP over libvchan connection support. The actual connection
is made in a separate process. As discussed on Xen Summit 2019, this allows to
apply some basic checks and/or filtering (not part of this series), to limit
libxl exposure for potentially malicious stubdomain.

Jason's additions ensure the qmp-proxy (vchan-socket-proxy) processes and
sockets are cleaned up and add some documentation.

The actual stubdomain implementation is here:

    https://github.com/marmarek/qubes-vmm-xen-stubdom-linux
    (branch for-upstream, tag for-upstream-v3)

See readme there for build instructions.  Marek's version requires dracut.  I
have hacked up a version usable install with initramfs-tools:

   https://github.com/jandryuk/qubes-vmm-xen-stubdom-linux
   (branch initramfs-tools)

Few comments/questions about the stubdomain code:

1. There are extra patches for qemu that are necessary to run it in stubdomain.
While it is desirable to upstream them, I think it can be done after merging
libxl part. Stubdomain's qemu build will in most cases be separate anyway, to
limit qemu's dependencies (so the stubdomain size).

2. By default Linux hvc-xen console frontend is unreliable for data transfer
(qemu state save/restore) - it drops data sent faster than client is reading
it. To fix it, console device needs to be switched into raw mode (`stty raw
/dev/hvc1`). Especially for restoring qemu state it is tricky, as it would need
to be done before opening the device, but stty (obviously) needs to open the
device first. To solve this problem, for now the repository contains kernel
patch which changes the default for all hvc consoles. Again, this isn't
practical problem, as the kernel for stubdomain is built separately. But it
would be nice to have something working with vanilla kernel. I see those
options:
  - convert it to kernel cmdline parameter (hvc_console_raw=1 ?)
  - use channels instead of consoles (and on the kernel side change the default
    to "raw" only for channels); while in theory better design, libxl part will
    be more complex, as channels can be connected to sockets but not files, so
    libxl would need to read/write to it exactly when qemu write/read the data,
    not before/after as it is done now

3. Mini-OS stubdoms use dmargs xenstore key as a string.  Linux stubdoms use
dmargs as a directory for numbered entries.  Should they be different names?

Remaining parts for eliminating dom0's instance of qemu:
 - do not force QDISK backend for CDROM
 - multiple consoles support in xenconsoled

Changes in v2:
 - apply review comments by Jason Andryuk
Changes in v3:
 - rework qemu arguments handling (separate xenstore keys, instead of \x1b separator)
 - add QMP over libvchan, instead of console
 - add protocol documentation
 - a lot of minor changes, see individual patches for full changes list
 - split xenconsoled patches into separate series
Changes in v4:
 - extract vchan connection into a separate process
 - rebase on master
 - various fixes
Changes in v5:
 - Marek: apply review comments from Jason Andryuk
 - Jason: Clean up qmp-proxy processes and sockets

Cc: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Cc: Simon Gaiser <simon@invisiblethingslab.com>
Cc: Eric Shelton <eshelton@pobox.com>
Cc: Ian Jackson <ian.jackson@citrix.com>
Cc: Wei Liu <wl@xen.org>

Eric Shelton (1):
  libxl: Handle Linux stubdomain specific QEMU options.

Jason Andryuk (5):
  docs: Add device-model-domid to xenstore-paths
  libxl: Check stubdomain kernel & ramdisk presence
  libxl: Refactor kill_device_model to libxl__kill_xs_path
  libxl: Kill vchan-socket-proxy when cleaning up qmp
  tools: Clean up vchan-socket-proxy socket

Marek Marczykowski-Górecki (15):
  Document ioemu MiniOS stubdomain protocol
  Document ioemu Linux stubdomain protocol
  libxl: fix qemu-trad cmdline for no sdl/vnc case
  libxl: Allow running qemu-xen in stubdomain
  libxl: write qemu arguments into separate xenstore keys
  xl: add stubdomain related options to xl config parser
  tools/libvchan: notify server when client is connected
  libxl: add save/restore support for qemu-xen in stubdomain
  tools: add missing libxenvchan cflags
  tools: add simple vchan-socket-proxy
  libxl: use vchan for QMP access with Linux stubdomain
  Regenerate autotools files
  libxl: require qemu in dom0 even if stubdomain is in use
  libxl: ignore emulated IDE disks beyond the first 4
  libxl: consider also qemu in stubdomain in libxl__dm_active check

 .gitignore                          |   1 +
 configure                           |  14 +-
 docs/configure                      |  14 +-
 docs/man/xl.cfg.5.pod.in            |  27 +-
 docs/misc/stubdom.txt               | 103 ++++++
 docs/misc/xenstore-paths.pandoc     |   5 +
 stubdom/configure                   |  14 +-
 tools/Rules.mk                      |   2 +-
 tools/config.h.in                   |   3 +
 tools/configure                     |  46 ++-
 tools/configure.ac                  |   9 +
 tools/libvchan/Makefile             |   8 +-
 tools/libvchan/init.c               |   3 +
 tools/libvchan/vchan-socket-proxy.c | 500 ++++++++++++++++++++++++++++
 tools/libxl/libxl_aoutils.c         |  32 ++
 tools/libxl/libxl_create.c          |  46 ++-
 tools/libxl/libxl_dm.c              | 484 +++++++++++++++++++++------
 tools/libxl/libxl_domain.c          |   7 +
 tools/libxl/libxl_internal.h        |  22 ++
 tools/libxl/libxl_mem.c             |   6 +-
 tools/libxl/libxl_qmp.c             |  27 +-
 tools/libxl/libxl_types.idl         |   3 +
 tools/xl/xl_parse.c                 |   7 +
 23 files changed, 1205 insertions(+), 178 deletions(-)
 create mode 100644 tools/libvchan/vchan-socket-proxy.c

Comments

Jason Andryuk May 11, 2020, 8:19 p.m. UTC | #1
Ping?

-Jason

On Tue, Apr 28, 2020 at 12:05 AM Jason Andryuk <jandryuk@gmail.com> wrote:
>
> Hi,
>
> In coordination with Marek, I'm making a submission of his patches for Linux
> stubdomain device-model support.  I made a few of my own additions, but Marek
> did the heavy lifting.  Thank you, Marek.
>
> Below is mostly the v4 cover leter with a few additions.
>
> General idea is to allow freely set device_model_version and
> device_model_stubdomain_override and choose the right options based on this
> choice.  Also, allow to specific path to stubdomain kernel/ramdisk, for greater
> flexibility.
>
> First two patches add documentation about expected toolstack-stubdomain-qemu
> interface, both for MiniOS stubdomain and Linux stubdomain.
>
> Initial version has no QMP support - in initial patches it is completely
> disabled, which means no suspend/restore and no PCI passthrough.
>
> Later patches add QMP over libvchan connection support. The actual connection
> is made in a separate process. As discussed on Xen Summit 2019, this allows to
> apply some basic checks and/or filtering (not part of this series), to limit
> libxl exposure for potentially malicious stubdomain.
>
> Jason's additions ensure the qmp-proxy (vchan-socket-proxy) processes and
> sockets are cleaned up and add some documentation.
>
> The actual stubdomain implementation is here:
>
>     https://github.com/marmarek/qubes-vmm-xen-stubdom-linux
>     (branch for-upstream, tag for-upstream-v3)
>
> See readme there for build instructions.  Marek's version requires dracut.  I
> have hacked up a version usable install with initramfs-tools:
>
>    https://github.com/jandryuk/qubes-vmm-xen-stubdom-linux
>    (branch initramfs-tools)
>
> Few comments/questions about the stubdomain code:
>
> 1. There are extra patches for qemu that are necessary to run it in stubdomain.
> While it is desirable to upstream them, I think it can be done after merging
> libxl part. Stubdomain's qemu build will in most cases be separate anyway, to
> limit qemu's dependencies (so the stubdomain size).
>
> 2. By default Linux hvc-xen console frontend is unreliable for data transfer
> (qemu state save/restore) - it drops data sent faster than client is reading
> it. To fix it, console device needs to be switched into raw mode (`stty raw
> /dev/hvc1`). Especially for restoring qemu state it is tricky, as it would need
> to be done before opening the device, but stty (obviously) needs to open the
> device first. To solve this problem, for now the repository contains kernel
> patch which changes the default for all hvc consoles. Again, this isn't
> practical problem, as the kernel for stubdomain is built separately. But it
> would be nice to have something working with vanilla kernel. I see those
> options:
>   - convert it to kernel cmdline parameter (hvc_console_raw=1 ?)
>   - use channels instead of consoles (and on the kernel side change the default
>     to "raw" only for channels); while in theory better design, libxl part will
>     be more complex, as channels can be connected to sockets but not files, so
>     libxl would need to read/write to it exactly when qemu write/read the data,
>     not before/after as it is done now
>
> 3. Mini-OS stubdoms use dmargs xenstore key as a string.  Linux stubdoms use
> dmargs as a directory for numbered entries.  Should they be different names?
>
> Remaining parts for eliminating dom0's instance of qemu:
>  - do not force QDISK backend for CDROM
>  - multiple consoles support in xenconsoled
>
> Changes in v2:
>  - apply review comments by Jason Andryuk
> Changes in v3:
>  - rework qemu arguments handling (separate xenstore keys, instead of \x1b separator)
>  - add QMP over libvchan, instead of console
>  - add protocol documentation
>  - a lot of minor changes, see individual patches for full changes list
>  - split xenconsoled patches into separate series
> Changes in v4:
>  - extract vchan connection into a separate process
>  - rebase on master
>  - various fixes
> Changes in v5:
>  - Marek: apply review comments from Jason Andryuk
>  - Jason: Clean up qmp-proxy processes and sockets
>
> Cc: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Cc: Simon Gaiser <simon@invisiblethingslab.com>
> Cc: Eric Shelton <eshelton@pobox.com>
> Cc: Ian Jackson <ian.jackson@citrix.com>
> Cc: Wei Liu <wl@xen.org>
>
> Eric Shelton (1):
>   libxl: Handle Linux stubdomain specific QEMU options.
>
> Jason Andryuk (5):
>   docs: Add device-model-domid to xenstore-paths
>   libxl: Check stubdomain kernel & ramdisk presence
>   libxl: Refactor kill_device_model to libxl__kill_xs_path
>   libxl: Kill vchan-socket-proxy when cleaning up qmp
>   tools: Clean up vchan-socket-proxy socket
>
> Marek Marczykowski-Górecki (15):
>   Document ioemu MiniOS stubdomain protocol
>   Document ioemu Linux stubdomain protocol
>   libxl: fix qemu-trad cmdline for no sdl/vnc case
>   libxl: Allow running qemu-xen in stubdomain
>   libxl: write qemu arguments into separate xenstore keys
>   xl: add stubdomain related options to xl config parser
>   tools/libvchan: notify server when client is connected
>   libxl: add save/restore support for qemu-xen in stubdomain
>   tools: add missing libxenvchan cflags
>   tools: add simple vchan-socket-proxy
>   libxl: use vchan for QMP access with Linux stubdomain
>   Regenerate autotools files
>   libxl: require qemu in dom0 even if stubdomain is in use
>   libxl: ignore emulated IDE disks beyond the first 4
>   libxl: consider also qemu in stubdomain in libxl__dm_active check
>
>  .gitignore                          |   1 +
>  configure                           |  14 +-
>  docs/configure                      |  14 +-
>  docs/man/xl.cfg.5.pod.in            |  27 +-
>  docs/misc/stubdom.txt               | 103 ++++++
>  docs/misc/xenstore-paths.pandoc     |   5 +
>  stubdom/configure                   |  14 +-
>  tools/Rules.mk                      |   2 +-
>  tools/config.h.in                   |   3 +
>  tools/configure                     |  46 ++-
>  tools/configure.ac                  |   9 +
>  tools/libvchan/Makefile             |   8 +-
>  tools/libvchan/init.c               |   3 +
>  tools/libvchan/vchan-socket-proxy.c | 500 ++++++++++++++++++++++++++++
>  tools/libxl/libxl_aoutils.c         |  32 ++
>  tools/libxl/libxl_create.c          |  46 ++-
>  tools/libxl/libxl_dm.c              | 484 +++++++++++++++++++++------
>  tools/libxl/libxl_domain.c          |   7 +
>  tools/libxl/libxl_internal.h        |  22 ++
>  tools/libxl/libxl_mem.c             |   6 +-
>  tools/libxl/libxl_qmp.c             |  27 +-
>  tools/libxl/libxl_types.idl         |   3 +
>  tools/xl/xl_parse.c                 |   7 +
>  23 files changed, 1205 insertions(+), 178 deletions(-)
>  create mode 100644 tools/libvchan/vchan-socket-proxy.c
>
> --
> 2.20.1
>
Ian Jackson May 14, 2020, 4:07 p.m. UTC | #2
Jason Andryuk writes ("[PATCH v5 00/21] Add support for qemu-xen runnning in a Linux-based stubdomain"):
> In coordination with Marek, I'm making a submission of his patches for Linux
> stubdomain device-model support.  I made a few of my own additions, but Marek
> did the heavy lifting.  Thank you, Marek.

Hi.  Thanks very much for this contribution.  Sorry it has taken me so
long to get to review it.

> Later patches add QMP over libvchan connection support. The actual connection
> is made in a separate process. As discussed on Xen Summit 2019, this allows to
> apply some basic checks and/or filtering (not part of this series), to limit
> libxl exposure for potentially malicious stubdomain.

OK.

> Few comments/questions about the stubdomain code:
> 
> 1. There are extra patches for qemu that are necessary to run it in stubdomain.
> While it is desirable to upstream them, I think it can be done after merging
> libxl part. Stubdomain's qemu build will in most cases be separate anyway, to
> limit qemu's dependencies (so the stubdomain size).

Yes.

> 2. By default Linux hvc-xen console frontend is unreliable for data transfer
> (qemu state save/restore) - it drops data sent faster than client is reading
> it. To fix it, console device needs to be switched into raw mode (`stty raw
> /dev/hvc1`). Especially for restoring qemu state it is tricky, as it would need
> to be done before opening the device, but stty (obviously) needs to open the
> device first. To solve this problem, for now the repository contains kernel
> patch which changes the default for all hvc consoles. Again, this isn't
> practical problem, as the kernel for stubdomain is built separately. But it
> would be nice to have something working with vanilla kernel. I see those
> options:
>   - convert it to kernel cmdline parameter (hvc_console_raw=1 ?)
>   - use channels instead of consoles (and on the kernel side change the default
>     to "raw" only for channels); while in theory better design, libxl part will
>     be more complex, as channels can be connected to sockets but not files, so
>     libxl would need to read/write to it exactly when qemu write/read the data,
>     not before/after as it is done now

What a mess.  Thanks for trying to tackle it!

Would it be possible to add a rendenzvous to the console ?  Eg, the
guest could write a "ready" byte after it has set the mode.

I'm not sure I understand the problem with libxl and channels.  Maybe
a helper process (perhaps existing only during migration) could help ?

Or, libxl has the "datacopier" async thing in it which you can spawn
one of and hopefully forget about.  You could teach it channels, or
make a thing like it that uses channels, or something.

> 3. Mini-OS stubdoms use dmargs xenstore key as a string.  Linux stubdoms use
> dmargs as a directory for numbered entries.  Should they be different names?

Yes, I think so.  That way if there's a version mismatch you get
ENOENT rather than an empty argument list...


I'll go and look at the patches now.

Regards,
Ian.
Ian Jackson May 14, 2020, 4:55 p.m. UTC | #3
Jason Andryuk writes ("[PATCH v5 00/21] Add support for qemu-xen runnning in a Linux-based stubdomain"):
> In coordination with Marek, I'm making a submission of his patches for Linux
> stubdomain device-model support.  I made a few of my own additions, but Marek
> did the heavy lifting.  Thank you, Marek.

Hi.  I finished reading these patches.  Thank you very much.  They
were nicely structured.  I found them clear and easy to read.  As
you'll have seen I have requested only a few changes.

I am very hopeful that this series will make 4.14.  Codefreeze is
Friday the 22nd of May.  Please let us know whether you think we'll
all be able to make that...

Regards,
Ian.
Jason Andryuk May 14, 2020, 7:10 p.m. UTC | #4
On Thu, May 14, 2020 at 12:55 PM Ian Jackson <ian.jackson@citrix.com> wrote:
>
> Jason Andryuk writes ("[PATCH v5 00/21] Add support for qemu-xen runnning in a Linux-based stubdomain"):
> > In coordination with Marek, I'm making a submission of his patches for Linux
> > stubdomain device-model support.  I made a few of my own additions, but Marek
> > did the heavy lifting.  Thank you, Marek.
>
> Hi.  I finished reading these patches.  Thank you very much.  They
> were nicely structured.  I found them clear and easy to read.  As
> you'll have seen I have requested only a few changes.

I haven't taken a look yet, but thanks for the review.

Marek deserves credit for the structuring and work.

> I am very hopeful that this series will make 4.14.  Codefreeze is
> Friday the 22nd of May.  Please let us know whether you think we'll
> all be able to make that...

Yes, I'm aiming for 4.14.  I plan to re-spin and re-post over the new
couple days.

Regards,
Jason