mbox series

[v5,00/18] vfio-user server in QEMU

Message ID cover.1642626515.git.jag.raman@oracle.com (mailing list archive)
Headers show
Series vfio-user server in QEMU | expand

Message

Jag Raman Jan. 19, 2022, 9:41 p.m. UTC
Hi,

Thank you for taking the time to provide a comprehensive feedback
of the last series of patches. We have addressed all the comments.

We are posting this v5 of the series, which incorporates all the
feedback. Kindly share your feedback for this latest series

We added the following patches to the series:
  - [PATCH v5 03/18] pci: isolated address space for PCI bus
  - [PATCH v5 04/18] pci: create and free isolated PCI buses
  - [PATCH v5 05/18] qdev: unplug blocker for devices
  - [PATCH v5 06/18] vfio-user: add HotplugHandler for remote machine
  - [PATCH v5 07/18] vfio-user: set qdev bus callbacks for remote machine

We made the following changes to the existing patches:

[PATCH v5 09/18] vfio-user: define vfio-user-server object
  - renamed object class member 'daemon' as 'auto_shutdown'
  - set VfioUserServerProperties version to 6.3
  - use SocketAddressType_str to compose error message
  - refuse setting 'socket' and 'device' properties after server starts
  - added VFU_OBJECT_ERROR macro to report error

[PATCH v5 10/18] vfio-user: instantiate vfio-user context
  - set error variable to NULL after transferring ownership with
    error_propagate()

[PATCH v5 11/18] vfio-user: find and init PCI device
  - block hot-unplug of PCI device when it is attached to the server object

[PATCH v5 12/18] vfio-user: run vfio-user context
  - emit a hangup event to the monitor when the client disconnects
  - reset vfu_poll_fd member and disable FD handler during finalize
  - add a comment to explain that attach could block
  - use VFU_OBJECT_ERROR instead of setting error_abort

[PATCH v5 14/18] vfio-user: handle DMA mappings
  - use pci_address_space() to access device's root memory region
  - given we're using one bus per device, mapped memory regions get
    destroyed automatically when device is unplugged

[PATCH v5 15/18] vfio-user: handle PCI BAR accesses
  - use pci_isol_as_io() & pci_isol_as_mem() to access the device's
    PCI/CPU address space. This simultaneously fixes the AddressSpace
    issue noted in the last review cycle

[PATCH v5 16/18] vfio-user: handle device interrupts
  - setting own IRQ handlers for each bus
  - renamed vfu_object_dev_table to vfu_object_dev_to_ctx_table
  - indexing into vfu_object_dev_to_ctx_table with device's
    address pointer instead of devfn
  - not looking up before removing from table

[PATCH v5 17/18] vfio-user: register handlers to facilitate migration
  - use VFU_OBJECT_ERROR instead of setting error_abort

We dropped the following patch from previous series:
  - vfio-user: IOMMU support for remote device

Thank you very much!

Jagannathan Raman (18):
  configure, meson: override C compiler for cmake
  tests/avocado: Specify target VM argument to helper routines
  pci: isolated address space for PCI bus
  pci: create and free isolated PCI buses
  qdev: unplug blocker for devices
  vfio-user: add HotplugHandler for remote machine
  vfio-user: set qdev bus callbacks for remote machine
  vfio-user: build library
  vfio-user: define vfio-user-server object
  vfio-user: instantiate vfio-user context
  vfio-user: find and init PCI device
  vfio-user: run vfio-user context
  vfio-user: handle PCI config space accesses
  vfio-user: handle DMA mappings
  vfio-user: handle PCI BAR accesses
  vfio-user: handle device interrupts
  vfio-user: register handlers to facilitate migration
  vfio-user: avocado tests for vfio-user

 configure                                  |   21 +-
 meson.build                                |   44 +-
 qapi/misc.json                             |   23 +
 qapi/qom.json                              |   20 +-
 include/hw/pci/pci.h                       |   12 +
 include/hw/pci/pci_bus.h                   |   17 +
 include/hw/qdev-core.h                     |   21 +
 include/migration/vmstate.h                |    2 +
 migration/savevm.h                         |    2 +
 hw/pci/msi.c                               |   13 +-
 hw/pci/msix.c                              |   12 +-
 hw/pci/pci.c                               |  186 ++++
 hw/pci/pci_bridge.c                        |    5 +
 hw/remote/machine.c                        |   86 ++
 hw/remote/vfio-user-obj.c                  | 1019 ++++++++++++++++++++
 migration/savevm.c                         |   73 ++
 migration/vmstate.c                        |   19 +
 softmmu/qdev-monitor.c                     |   74 +-
 .gitlab-ci.d/buildtest.yml                 |    2 +
 .gitmodules                                |    3 +
 Kconfig.host                               |    4 +
 MAINTAINERS                                |    3 +
 hw/remote/Kconfig                          |    4 +
 hw/remote/meson.build                      |    3 +
 hw/remote/trace-events                     |   11 +
 meson_options.txt                          |    2 +
 subprojects/libvfio-user                   |    1 +
 tests/avocado/avocado_qemu/__init__.py     |   14 +-
 tests/avocado/vfio-user.py                 |  225 +++++
 tests/docker/dockerfiles/centos8.docker    |    2 +
 tests/docker/dockerfiles/ubuntu2004.docker |    2 +
 31 files changed, 1912 insertions(+), 13 deletions(-)
 create mode 100644 hw/remote/vfio-user-obj.c
 create mode 160000 subprojects/libvfio-user
 create mode 100644 tests/avocado/vfio-user.py

Comments

Stefan Hajnoczi Jan. 25, 2022, 4 p.m. UTC | #1
Hi Jag,
Thanks for this latest revision. The biggest outstanding question I have
is about the isolated address spaces design.

This patch series needs a PCIBus with its own Memory Space, I/O Space,
and interrupts. That way a single QEMU process can host vfio-user
servers that different VMs connect to. They all need isolated address
spaces so that mapping a BAR in Device A does not conflict with mapping
a BAR in Device B.

The current approach adds special code to hw/pci/pci.c so that custom
AddressSpace can be set up. The isolated PCIBus is an automatically
created PCIe root port that's a child of the machine's main PCI bus. On
one hand it's neat because QEMU's assumption that there is only one
root SysBus isn't violated. On the other hand it seems like a special
case hack for PCI and I'm not sure in what sense these PCIBusses are
really children of the machine's main PCI bus since they don't share or
interact in any way.

Another approach that came to mind is to allow multiple root SysBusses.
Each vfio-user server would need its own SysBus and put a regular PCI
host onto that isolated SysBus without modifying hw/pci/pci.c with a
special case. The downside to this is that violating the single SysBus
assumption probably breaks monitor commands that rely on
qdev_find_recursive() and friends. It seems cleaner than adding isolated
address spaces to PCI specifically, but also raises the question if
multiple machine instances are needed (which would raise even more
questions).

I wanted to raise this to see if Peter, Kevin, Michael, and others are
happy with the current approach or have ideas for a clean solution.

Stefan
Jag Raman Jan. 26, 2022, 5:04 a.m. UTC | #2
> On Jan 25, 2022, at 11:00 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> Hi Jag,
> Thanks for this latest revision. The biggest outstanding question I have
> is about the isolated address spaces design.

Thank you for taking the time to review the patches, Stefan!

> 
> This patch series needs a PCIBus with its own Memory Space, I/O Space,
> and interrupts. That way a single QEMU process can host vfio-user
> servers that different VMs connect to. They all need isolated address
> spaces so that mapping a BAR in Device A does not conflict with mapping
> a BAR in Device B.
> 
> The current approach adds special code to hw/pci/pci.c so that custom
> AddressSpace can be set up. The isolated PCIBus is an automatically
> created PCIe root port that's a child of the machine's main PCI bus. On
> one hand it's neat because QEMU's assumption that there is only one
> root SysBus isn't violated. On the other hand it seems like a special
> case hack for PCI and I'm not sure in what sense these PCIBusses are
> really children of the machine's main PCI bus since they don't share or
> interact in any way.

We are discussing the automatic creation part you just mentioned in
the following email:
[PATCH v5 07/18] vfio-user: set qdev bus callbacks for remote machine

I agree that automatic creation of a parent bus is not ideal - we could
specify the parent bus as a separate option in the command-line or
QMP. This change would avoid modification to hw/pci/pci.c - the new
PCI bus could be created inplace during device creation/hotplug.

The following image gives an idea of the bus/device topology in the remote
machine, as implemented in the current series. Each secondary bus and
its children have isolated memory and IO spaces.
https://gitlab.com/jraman/qemu/-/commit/2e2ebf004894075ad8044739b0b16ce875114c4c

> 
> Another approach that came to mind is to allow multiple root SysBusses.
> Each vfio-user server would need its own SysBus and put a regular PCI
> host onto that isolated SysBus without modifying hw/pci/pci.c with a
> special case. The downside to this is that violating the single SysBus
> assumption probably breaks monitor commands that rely on
> qdev_find_recursive() and friends. It seems cleaner than adding isolated
> address spaces to PCI specifically, but also raises the question if
> multiple machine instances are needed (which would raise even more
> questions).

Based on further discussion with Stefan, I got some clarity. We could consider one
more option as well - somewhere in-between multiple root SysBuses and the topology
discussed above (with secondary PCI buses). We could implement a
TYPE_SYS_BUS_DEVICE that creates a root PCI bus with isolated memory ranges.
Something along the lines in the following diagram:
https://gitlab.com/jraman/qemu/-/commit/81f6a998278a2a795be0db7acdeb1caa2d6744fb

An example set of QMP commands to attach PCI devices would be:
device_add pci-root-bus,id=rb1
device_add <driver>,id=mydev,bus=rb1
object-add x-vfio-user-server,device=mydev

where ‘pci-root-bus’ is a TYPE_SYS_BUS_DEVICE that creates its own root PCI bus.

I’m very sorry if the above two web links disturb your review workflow.

> 
> I wanted to raise this to see if Peter, Kevin, Michael, and others are
> happy with the current approach or have ideas for a clean solution.

Looking forward to your comments.

Thank you!
--
Jag

> 
> Stefan
Stefan Hajnoczi Jan. 26, 2022, 9:56 a.m. UTC | #3
On Wed, Jan 26, 2022 at 05:04:58AM +0000, Jag Raman wrote:
> 
> 
> > On Jan 25, 2022, at 11:00 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > Hi Jag,
> > Thanks for this latest revision. The biggest outstanding question I have
> > is about the isolated address spaces design.
> 
> Thank you for taking the time to review the patches, Stefan!
> 
> > 
> > This patch series needs a PCIBus with its own Memory Space, I/O Space,
> > and interrupts. That way a single QEMU process can host vfio-user
> > servers that different VMs connect to. They all need isolated address
> > spaces so that mapping a BAR in Device A does not conflict with mapping
> > a BAR in Device B.
> > 
> > The current approach adds special code to hw/pci/pci.c so that custom
> > AddressSpace can be set up. The isolated PCIBus is an automatically
> > created PCIe root port that's a child of the machine's main PCI bus. On
> > one hand it's neat because QEMU's assumption that there is only one
> > root SysBus isn't violated. On the other hand it seems like a special
> > case hack for PCI and I'm not sure in what sense these PCIBusses are
> > really children of the machine's main PCI bus since they don't share or
> > interact in any way.
> 
> We are discussing the automatic creation part you just mentioned in
> the following email:
> [PATCH v5 07/18] vfio-user: set qdev bus callbacks for remote machine
> 
> I agree that automatic creation of a parent bus is not ideal - we could
> specify the parent bus as a separate option in the command-line or
> QMP. This change would avoid modification to hw/pci/pci.c - the new
> PCI bus could be created inplace during device creation/hotplug.
> 
> The following image gives an idea of the bus/device topology in the remote
> machine, as implemented in the current series. Each secondary bus and
> its children have isolated memory and IO spaces.
> https://gitlab.com/jraman/qemu/-/commit/2e2ebf004894075ad8044739b0b16ce875114c4c

Do isolated PCI busses have any relationship with their parent at all? I
think the parent plays a useful role in DMA/IOMMU, interrupts, or PCI
addressing. That leaves me wondering if a parent/child relationship is
an appropriate way to model this.

That said, this approach solves two problems:
1. There must be some parent for the new PCI bus.
2. qdev_find_recursive() and friends must be able to find the PCIDevice
   on the isolated bus.

> > Another approach that came to mind is to allow multiple root SysBusses.
> > Each vfio-user server would need its own SysBus and put a regular PCI
> > host onto that isolated SysBus without modifying hw/pci/pci.c with a
> > special case. The downside to this is that violating the single SysBus
> > assumption probably breaks monitor commands that rely on
> > qdev_find_recursive() and friends. It seems cleaner than adding isolated
> > address spaces to PCI specifically, but also raises the question if
> > multiple machine instances are needed (which would raise even more
> > questions).
> 
> Based on further discussion with Stefan, I got some clarity. We could consider one
> more option as well - somewhere in-between multiple root SysBuses and the topology
> discussed above (with secondary PCI buses). We could implement a
> TYPE_SYS_BUS_DEVICE that creates a root PCI bus with isolated memory ranges.
> Something along the lines in the following diagram:
> https://gitlab.com/jraman/qemu/-/commit/81f6a998278a2a795be0db7acdeb1caa2d6744fb
> 
> An example set of QMP commands to attach PCI devices would be:
> device_add pci-root-bus,id=rb1
> device_add <driver>,id=mydev,bus=rb1
> object-add x-vfio-user-server,device=mydev
> 
> where ‘pci-root-bus’ is a TYPE_SYS_BUS_DEVICE that creates its own root PCI bus.

If it's less code then that's an advantage but it still places unrelated
DMA/interrupt spaces onto the same SysBus and therefore requires
isolation. I think this alternative doesn't fundamentally fix the
design.

If multiple roots are possible then isolation doesn't need to be
implemented explicitly, it comes for free as part of the regular
qdev/qbus hierarchy. The devices would be isolated by the fact that they
live on different roots :).

I have never tried the multi-root approach though, so I'm not sure how
much work it is.

Stefan