mbox series

[RFC,0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Message ID 20241108125242.60136-1-shameerali.kolothum.thodi@huawei.com (mailing list archive)
Headers show
Series hw/arm/virt: Add support for user-creatable nested SMMUv3 | expand

Message

Shameer Kolothum Nov. 8, 2024, 12:52 p.m. UTC
Hi,

This series adds initial support for a user-creatable "arm-smmuv3-nested"
device to Qemu. At present the Qemu ARM SMMUv3 emulation is per machine
and cannot support multiple SMMUv3s.

In order to support vfio-pci dev assignment with vSMMUv3, the physical
SMMUv3 has to be configured in nested mode. Having a pluggable
"arm-smmuv3-nested" device enables us to have multiple vSMMUv3 for Guests
running on a host with multiple physical SMMUv3s. A few benefits of doing
this are,

1. Avoid invalidation broadcast or lookup in case devices are behind
   multiple phys SMMUv3s.
2. Makes it easy to handle phys SMMUv3s that differ in features.
3. Easy to handle future requirements such as vCMDQ support.

This is based on discussions/suggestions received for a previous RFC by
Nicolin here[0].

This series includes,
 -Adds support for "arm-smmuv3-nested" device. At present only virt is
  supported and is using _plug_cb() callback to hook the sysbus mem
  and irq (Not sure this has any negative repercussions). Patch #3.
 -Provides a way to associate a pci-bus(pxb-pcie) to the above device.
  Patch #3.
 -The last patch is adding RMR support for MSI doorbell handling. Patch #5.
  This may change in future[1].

This RFC is for initial discussion/test purposes only and includes patches
that are only relevant for adding the "arm-smmuv3-nested" support. For the
complete branch please find,
https://github.com/hisilicon/qemu/tree/private-smmuv3-nested-dev-rfc-v1

Few ToDos to note,
1. At present default-bus-bypass-iommu=on should be set when
   arm-smmuv3-nested dev is specified. Otherwise you may get an IORT
   related boot error.  Requires fixing.
2. Hot adding a device is not working at the moment. Looks like pcihp irq issue.
   Could be a bug in IORT id mappings.
3. The above branch doesn't support vSVA yet.

Hopefully this is helpful in taking the discussion forward. Please take a
look and let me know.

How to use it(Eg:):

On a HiSilicon platform that has multiple physical SMMUv3s, the ACC ZIP VF
devices and HNS VF devices are behind different SMMUv3s. So for a Guest,
specify two smmuv3-nested devices each behind a pxb-pcie as below,

./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-iommu=on \
-enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
-object iommufd,id=iommufd0 \
-bios QEMU_EFI.fd \
-kernel Image \
-device virtio-blk-device,drive=fs \
-drive if=none,file=rootfs.qcow2,id=fs \
-device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
-device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
-device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
-device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
-device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
-device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
-device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
-device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
-append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \
-device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
-fsdev local,id=p9fs2,path=p9root,security_model=mapped \
-net none \
-nographic

Guest will boot with two SMMuv3s,
[    1.608130] arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
[    1.609655] arm-smmu-v3 arm-smmu-v3.0.auto: ias 48-bit, oas 48-bit (features 0x00020b25)
[    1.612475] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
[    1.614444] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
[    1.617451] arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
[    1.618842] arm-smmu-v3 arm-smmu-v3.1.auto: ias 48-bit, oas 48-bit (features 0x00020b25)
[    1.621366] arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
[    1.623225] arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq

With a pci topology like below,
[root@localhost ~]# lspci -tv
-+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
 |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
 |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
 |           \-03.0  Virtio: Virtio filesystem
 +-[0000:08]---00.0-[09]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
 \-[0000:10]---00.0-[11]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)
[root@localhost ~]#

And if you want to add another HNS VF, it should be added to the same SMMUv3
as of the first HNS dev,

-device pcie-root-port,id=pcie.port3,bus=pcie.1,chassis=3 \
-device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0 \

[root@localhost ~]# lspci -tv
-+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
 |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
 |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
 |           \-03.0  Virtio: Virtio filesystem
 +-[0000:08]-+-00.0-[09]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
 |           \-01.0-[0a]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
 \-[0000:10]---00.0-[11]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)
[root@localhost ~]#

Attempt to add the HNS VF to a different SMMUv3 will result in,

-device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: Unable to attach viommu
-device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: vfio 0000:7d:02.2:
   Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 (38) to id=11: Invalid argument

At present Qemu is not doing any extra validation other than the above
failure to make sure the user configuration is correct or not. The
assumption is libvirt will take care of this.

Thanks,
Shameer
[0] https://lore.kernel.org/qemu-devel/cover.1719361174.git.nicolinc@nvidia.com/
[1] https://lore.kernel.org/linux-iommu/ZrVN05VylFq8lK4q@Asurada-Nvidia/

Eric Auger (1):
  hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested
    binding

Nicolin Chen (2):
  hw/arm/virt: Add an SMMU_IO_LEN macro
  hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes

Shameer Kolothum (2):
  hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  hw/arm/smmuv3: Associate a pci bus with a SMMUv3 Nested device

 hw/arm/smmuv3.c          |  61 ++++++++++++++++++++++
 hw/arm/virt-acpi-build.c | 109 ++++++++++++++++++++++++++++++++-------
 hw/arm/virt.c            |  33 ++++++++++--
 hw/core/sysbus-fdt.c     |   1 +
 include/hw/arm/smmuv3.h  |  17 ++++++
 include/hw/arm/virt.h    |  15 ++++++
 6 files changed, 215 insertions(+), 21 deletions(-)