mbox series

[RFC,v6,00/24] vSMMUv3/pSMMUv3 2 stage VFIO integration

Message ID 20200320165840.30057-1-eric.auger@redhat.com (mailing list archive)
Headers show
Series vSMMUv3/pSMMUv3 2 stage VFIO integration | expand

Message

Eric Auger March 20, 2020, 4:58 p.m. UTC
Up to now vSMMUv3 has not been integrated with VFIO. VFIO
integration requires to program the physical IOMMU consistently
with the guest mappings. However, as opposed to VTD, SMMUv3 has
no "Caching Mode" which allows easy trapping of guest mappings.
This means the vSMMUV3 cannot use the same VFIO integration as VTD.

However SMMUv3 has 2 translation stages. This was devised with
virtualization use case in mind where stage 1 is "owned" by the
guest whereas the host uses stage 2 for VM isolation.

This series sets up this nested translation stage. It only works
if there is one physical SMMUv3 used along with QEMU vSMMUv3 (in
other words, it does not work if there is a physical SMMUv2).

- We force the host to use stage 2 instead of stage 1, when we
  detect a vSMMUV3 is behind a VFIO device. For a VFIO device
  without any virtual IOMMU, we still use stage 1 as many existing
  SMMUs expect this behavior.
- We use PCIPASIDOps to propage guest stage1 config changes on
  STE (Stream Table Entry) changes.
- We implement a specific UNMAP notifier that conveys guest
  IOTLB invalidations to the host
- We register MSI IOVA/GPA bindings to the host so that this latter
  can build a nested stage translation
- As the legacy MAP notifier is not called anymore, we must make
  sure stage 2 mappings are set. This is achieved through another
  prereg memory listener.
- Physical SMMU stage 1 related faults are reported to the guest
  via en eventfd mechanism and exposed trhough a dedicated VFIO-PCI
  region. Then they are reinjected into the guest.

Best Regards

Eric

This series can be found at:
https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6

Kernel Dependencies:
[1] [PATCH v10 00/11] SMMUv3 Nested Stage Setup (VFIO part)
[2] [PATCH v10 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
branch at: https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10

History:

v5 -> v6:
- just rebase work

v4 -> v5:
- Use PCIPASIDOps for config update notifications
- removal of notification for MSI binding which is not needed
  anymore
- Use a single fault region
- use the specific interrupt index

v3 -> v4:
- adapt to changes in uapi (asid cache invalidation)
- check VFIO_PCI_DMA_FAULT_IRQ_INDEX is supported at kernel level
  before attempting to set signaling for it.
- sync on 5.2-rc1 kernel headers + Drew's patch that imports sve_context.h
- fix MSI binding for MSI (not MSIX)
- fix mingw compilation

v2 -> v3:
- rework fault handling
- MSI binding registration done in vfio-pci. MSI binding tear down called
  on container cleanup path
- leaf parameter propagated

v1 -> v2:
- Fixed dual assignment (asid now correctly propagated on TLB invalidations)
- Integrated fault reporting


Eric Auger (23):
  update-linux-headers: Import iommu.h
  header update against 5.6.0-rc3 and IOMMU/VFIO nested stage APIs
  memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute
  memory: Add IOMMU_ATTR_MSI_TRANSLATE IOMMU memory region attribute
  memory: Introduce IOMMU Memory Region inject_faults API
  memory: Add arch_id and leaf fields in IOTLBEntry
  iommu: Introduce generic header
  vfio: Force nested if iommu requires it
  vfio: Introduce hostwin_from_range helper
  vfio: Introduce helpers to DMA map/unmap a RAM section
  vfio: Set up nested stage mappings
  vfio: Pass stage 1 MSI bindings to the host
  vfio: Helper to get IRQ info including capabilities
  vfio/pci: Register handler for iommu fault
  vfio/pci: Set up the DMA FAULT region
  vfio/pci: Implement the DMA fault handler
  hw/arm/smmuv3: Advertise MSI_TRANSLATE attribute
  hw/arm/smmuv3: Store the PASID table GPA in the translation config
  hw/arm/smmuv3: Fill the IOTLBEntry arch_id on NH_VA invalidation
  hw/arm/smmuv3: Fill the IOTLBEntry leaf field on NH_VA invalidation
  hw/arm/smmuv3: Pass stage 1 configurations to the host
  hw/arm/smmuv3: Implement fault injection
  hw/arm/smmuv3: Allow MAP notifiers

Liu Yi L (1):
  pci: introduce PCIPASIDOps to PCIDevice

 hw/arm/smmuv3.c                 | 189 ++++++++++--
 hw/arm/trace-events             |   3 +-
 hw/pci/pci.c                    |  34 +++
 hw/vfio/common.c                | 506 +++++++++++++++++++++++++-------
 hw/vfio/pci.c                   | 267 ++++++++++++++++-
 hw/vfio/pci.h                   |   9 +
 hw/vfio/trace-events            |   9 +-
 include/exec/memory.h           |  49 +++-
 include/hw/arm/smmu-common.h    |   1 +
 include/hw/iommu/iommu.h        |  28 ++
 include/hw/pci/pci.h            |  11 +
 include/hw/vfio/vfio-common.h   |  16 +
 linux-headers/COPYING           |   2 +
 linux-headers/asm-x86/kvm.h     |   1 +
 linux-headers/linux/iommu.h     | 375 +++++++++++++++++++++++
 linux-headers/linux/vfio.h      | 109 ++++++-
 memory.c                        |  10 +
 scripts/update-linux-headers.sh |   2 +-
 18 files changed, 1478 insertions(+), 143 deletions(-)
 create mode 100644 include/hw/iommu/iommu.h
 create mode 100644 linux-headers/linux/iommu.h

Comments

Shameerali Kolothum Thodi March 25, 2020, 11:35 a.m. UTC | #1
Hi Eric,

> -----Original Message-----
> From: Eric Auger [mailto:eric.auger@redhat.com]
> Sent: 20 March 2020 16:58
> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
> qemu-devel@nongnu.org; qemu-arm@nongnu.org; peter.maydell@linaro.org;
> mst@redhat.com; alex.williamson@redhat.com;
> jacob.jun.pan@linux.intel.com; yi.l.liu@intel.com
> Cc: peterx@redhat.com; jean-philippe@linaro.org; will@kernel.org;
> tnowicki@marvell.com; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; zhangfei.gao@foxmail.com;
> zhangfei.gao@linaro.org; maz@kernel.org; bbhushan2@marvell.com
> Subject: [RFC v6 00/24] vSMMUv3/pSMMUv3 2 stage VFIO integration
> 
> Up to now vSMMUv3 has not been integrated with VFIO. VFIO
> integration requires to program the physical IOMMU consistently
> with the guest mappings. However, as opposed to VTD, SMMUv3 has
> no "Caching Mode" which allows easy trapping of guest mappings.
> This means the vSMMUV3 cannot use the same VFIO integration as VTD.
> 
> However SMMUv3 has 2 translation stages. This was devised with
> virtualization use case in mind where stage 1 is "owned" by the
> guest whereas the host uses stage 2 for VM isolation.
> 
> This series sets up this nested translation stage. It only works
> if there is one physical SMMUv3 used along with QEMU vSMMUv3 (in
> other words, it does not work if there is a physical SMMUv2).

I was testing this series on one of our hardware board with SMMUv3. I did
observe an issue while trying to bring up Guest with and without the vsmmuV3.

Steps are like below,

1. start a guest with "iommu=smmuv3" and a n/w vf device.

2.Exit the VM.

3. start the guest again without "iommu=smmuv3"

This time qemu crashes with,

[ 0.447830] hns3 0000:00:01.0: enabling device (0000 -> 0002)
/home/shameer/qemu-eric/qemu/hw/vfio/pci.c:2851:vfio_dma_fault_notifier_handler:
Object 0xaaaaeeb47c00 is not an instance of type
qemu:iommu-memory-region
./qemu_run-vsmmu-hns: line 9: 13609 Aborted                 (core
dumped) ./qemu-system-aarch64-vsmmuv3v10 -machine
virt,kernel_irqchip=on,gic-version=3 -cpu host -smp cpus=1 -kernel
Image-ericv10-uacce -initrd rootfs-iperf.cpio -bios
QEMU_EFI_Dec2018.fd -device vfio-pci,host=0000:7d:02.1 -net none -m
4096 -nographic -D -d -enable-kvm -append "console=ttyAMA0
root=/dev/vda -m 4096 rw earlycon=pl011,0x9000000"

And you can see that host kernel receives smmuv3 C_BAD_STE event,

[10499.379288] vfio-pci 0000:7d:02.1: enabling device (0000 -> 0002)
[10501.943881] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x04 received:
[10501.943884] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00007d1100000004
[10501.943886] arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000100800000080
[10501.943887] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fe040000
[10501.943889] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000000007e04c440

So I suspect we didn't clear nested stage configuration and that affects the 
translation in the second run. I tried to issue(force) a vfio_detach_pasid_table() but 
that didn't solve the problem.

May be I am missing something. Could you please take a look and let me know.

Thanks,
Shameer

> - We force the host to use stage 2 instead of stage 1, when we
>   detect a vSMMUV3 is behind a VFIO device. For a VFIO device
>   without any virtual IOMMU, we still use stage 1 as many existing
>   SMMUs expect this behavior.
> - We use PCIPASIDOps to propage guest stage1 config changes on
>   STE (Stream Table Entry) changes.
> - We implement a specific UNMAP notifier that conveys guest
>   IOTLB invalidations to the host
> - We register MSI IOVA/GPA bindings to the host so that this latter
>   can build a nested stage translation
> - As the legacy MAP notifier is not called anymore, we must make
>   sure stage 2 mappings are set. This is achieved through another
>   prereg memory listener.
> - Physical SMMU stage 1 related faults are reported to the guest
>   via en eventfd mechanism and exposed trhough a dedicated VFIO-PCI
>   region. Then they are reinjected into the guest.
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6
> 
> Kernel Dependencies:
> [1] [PATCH v10 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> [2] [PATCH v10 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
> branch at:
> https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10
> 
> History:
> 
> v5 -> v6:
> - just rebase work
> 
> v4 -> v5:
> - Use PCIPASIDOps for config update notifications
> - removal of notification for MSI binding which is not needed
>   anymore
> - Use a single fault region
> - use the specific interrupt index
> 
> v3 -> v4:
> - adapt to changes in uapi (asid cache invalidation)
> - check VFIO_PCI_DMA_FAULT_IRQ_INDEX is supported at kernel level
>   before attempting to set signaling for it.
> - sync on 5.2-rc1 kernel headers + Drew's patch that imports sve_context.h
> - fix MSI binding for MSI (not MSIX)
> - fix mingw compilation
> 
> v2 -> v3:
> - rework fault handling
> - MSI binding registration done in vfio-pci. MSI binding tear down called
>   on container cleanup path
> - leaf parameter propagated
> 
> v1 -> v2:
> - Fixed dual assignment (asid now correctly propagated on TLB invalidations)
> - Integrated fault reporting
> 
> 
> Eric Auger (23):
>   update-linux-headers: Import iommu.h
>   header update against 5.6.0-rc3 and IOMMU/VFIO nested stage APIs
>   memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region
> attribute
>   memory: Add IOMMU_ATTR_MSI_TRANSLATE IOMMU memory region
> attribute
>   memory: Introduce IOMMU Memory Region inject_faults API
>   memory: Add arch_id and leaf fields in IOTLBEntry
>   iommu: Introduce generic header
>   vfio: Force nested if iommu requires it
>   vfio: Introduce hostwin_from_range helper
>   vfio: Introduce helpers to DMA map/unmap a RAM section
>   vfio: Set up nested stage mappings
>   vfio: Pass stage 1 MSI bindings to the host
>   vfio: Helper to get IRQ info including capabilities
>   vfio/pci: Register handler for iommu fault
>   vfio/pci: Set up the DMA FAULT region
>   vfio/pci: Implement the DMA fault handler
>   hw/arm/smmuv3: Advertise MSI_TRANSLATE attribute
>   hw/arm/smmuv3: Store the PASID table GPA in the translation config
>   hw/arm/smmuv3: Fill the IOTLBEntry arch_id on NH_VA invalidation
>   hw/arm/smmuv3: Fill the IOTLBEntry leaf field on NH_VA invalidation
>   hw/arm/smmuv3: Pass stage 1 configurations to the host
>   hw/arm/smmuv3: Implement fault injection
>   hw/arm/smmuv3: Allow MAP notifiers
> 
> Liu Yi L (1):
>   pci: introduce PCIPASIDOps to PCIDevice
> 
>  hw/arm/smmuv3.c                 | 189 ++++++++++--
>  hw/arm/trace-events             |   3 +-
>  hw/pci/pci.c                    |  34 +++
>  hw/vfio/common.c                | 506
> +++++++++++++++++++++++++-------
>  hw/vfio/pci.c                   | 267 ++++++++++++++++-
>  hw/vfio/pci.h                   |   9 +
>  hw/vfio/trace-events            |   9 +-
>  include/exec/memory.h           |  49 +++-
>  include/hw/arm/smmu-common.h    |   1 +
>  include/hw/iommu/iommu.h        |  28 ++
>  include/hw/pci/pci.h            |  11 +
>  include/hw/vfio/vfio-common.h   |  16 +
>  linux-headers/COPYING           |   2 +
>  linux-headers/asm-x86/kvm.h     |   1 +
>  linux-headers/linux/iommu.h     | 375 +++++++++++++++++++++++
>  linux-headers/linux/vfio.h      | 109 ++++++-
>  memory.c                        |  10 +
>  scripts/update-linux-headers.sh |   2 +-
>  18 files changed, 1478 insertions(+), 143 deletions(-)
>  create mode 100644 include/hw/iommu/iommu.h
>  create mode 100644 linux-headers/linux/iommu.h
> 
> --
> 2.20.1
Eric Auger March 25, 2020, 12:42 p.m. UTC | #2
Hi Shameer,

On 3/25/20 12:35 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: Eric Auger [mailto:eric.auger@redhat.com]
>> Sent: 20 March 2020 16:58
>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; peter.maydell@linaro.org;
>> mst@redhat.com; alex.williamson@redhat.com;
>> jacob.jun.pan@linux.intel.com; yi.l.liu@intel.com
>> Cc: peterx@redhat.com; jean-philippe@linaro.org; will@kernel.org;
>> tnowicki@marvell.com; Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; zhangfei.gao@foxmail.com;
>> zhangfei.gao@linaro.org; maz@kernel.org; bbhushan2@marvell.com
>> Subject: [RFC v6 00/24] vSMMUv3/pSMMUv3 2 stage VFIO integration
>>
>> Up to now vSMMUv3 has not been integrated with VFIO. VFIO
>> integration requires to program the physical IOMMU consistently
>> with the guest mappings. However, as opposed to VTD, SMMUv3 has
>> no "Caching Mode" which allows easy trapping of guest mappings.
>> This means the vSMMUV3 cannot use the same VFIO integration as VTD.
>>
>> However SMMUv3 has 2 translation stages. This was devised with
>> virtualization use case in mind where stage 1 is "owned" by the
>> guest whereas the host uses stage 2 for VM isolation.
>>
>> This series sets up this nested translation stage. It only works
>> if there is one physical SMMUv3 used along with QEMU vSMMUv3 (in
>> other words, it does not work if there is a physical SMMUv2).
> 
> I was testing this series on one of our hardware board with SMMUv3. I did
> observe an issue while trying to bring up Guest with and without the vsmmuV3.
> 
> Steps are like below,
> 
> 1. start a guest with "iommu=smmuv3" and a n/w vf device.
> 
> 2.Exit the VM.
> 
> 3. start the guest again without "iommu=smmuv3"
> 
> This time qemu crashes with,
> 
> [ 0.447830] hns3 0000:00:01.0: enabling device (0000 -> 0002)
> /home/shameer/qemu-eric/qemu/hw/vfio/pci.c:2851:vfio_dma_fault_notifier_handler:
> Object 0xaaaaeeb47c00 is not an instance of type
> qemu:iommu-memory-region
> ./qemu_run-vsmmu-hns: line 9: 13609 Aborted                 (core
> dumped) ./qemu-system-aarch64-vsmmuv3v10 -machine
> virt,kernel_irqchip=on,gic-version=3 -cpu host -smp cpus=1 -kernel
> Image-ericv10-uacce -initrd rootfs-iperf.cpio -bios
> QEMU_EFI_Dec2018.fd -device vfio-pci,host=0000:7d:02.1 -net none -m
> 4096 -nographic -D -d -enable-kvm -append "console=ttyAMA0
> root=/dev/vda -m 4096 rw earlycon=pl011,0x9000000"
> 
> And you can see that host kernel receives smmuv3 C_BAD_STE event,
> 
> [10499.379288] vfio-pci 0000:7d:02.1: enabling device (0000 -> 0002)
> [10501.943881] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x04 received:
> [10501.943884] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00007d1100000004
> [10501.943886] arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000100800000080
> [10501.943887] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fe040000
> [10501.943889] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000000007e04c440
> 
> So I suspect we didn't clear nested stage configuration and that affects the 
> translation in the second run. I tried to issue(force) a vfio_detach_pasid_table() but 
> that didn't solve the problem.
> 
> May be I am missing something. Could you please take a look and let me know.

Sure, I will try to reproduce on my end. Thank you for the bug report!

Best Regards

Eric
> 
> Thanks,
> Shameer
> 
>> - We force the host to use stage 2 instead of stage 1, when we
>>   detect a vSMMUV3 is behind a VFIO device. For a VFIO device
>>   without any virtual IOMMU, we still use stage 1 as many existing
>>   SMMUs expect this behavior.
>> - We use PCIPASIDOps to propage guest stage1 config changes on
>>   STE (Stream Table Entry) changes.
>> - We implement a specific UNMAP notifier that conveys guest
>>   IOTLB invalidations to the host
>> - We register MSI IOVA/GPA bindings to the host so that this latter
>>   can build a nested stage translation
>> - As the legacy MAP notifier is not called anymore, we must make
>>   sure stage 2 mappings are set. This is achieved through another
>>   prereg memory listener.
>> - Physical SMMU stage 1 related faults are reported to the guest
>>   via en eventfd mechanism and exposed trhough a dedicated VFIO-PCI
>>   region. Then they are reinjected into the guest.
>>
>> Best Regards
>>
>> Eric
>>
>> This series can be found at:
>> https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6
>>
>> Kernel Dependencies:
>> [1] [PATCH v10 00/11] SMMUv3 Nested Stage Setup (VFIO part)
>> [2] [PATCH v10 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
>> branch at:
>> https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10
>>
>> History:
>>
>> v5 -> v6:
>> - just rebase work
>>
>> v4 -> v5:
>> - Use PCIPASIDOps for config update notifications
>> - removal of notification for MSI binding which is not needed
>>   anymore
>> - Use a single fault region
>> - use the specific interrupt index
>>
>> v3 -> v4:
>> - adapt to changes in uapi (asid cache invalidation)
>> - check VFIO_PCI_DMA_FAULT_IRQ_INDEX is supported at kernel level
>>   before attempting to set signaling for it.
>> - sync on 5.2-rc1 kernel headers + Drew's patch that imports sve_context.h
>> - fix MSI binding for MSI (not MSIX)
>> - fix mingw compilation
>>
>> v2 -> v3:
>> - rework fault handling
>> - MSI binding registration done in vfio-pci. MSI binding tear down called
>>   on container cleanup path
>> - leaf parameter propagated
>>
>> v1 -> v2:
>> - Fixed dual assignment (asid now correctly propagated on TLB invalidations)
>> - Integrated fault reporting
>>
>>
>> Eric Auger (23):
>>   update-linux-headers: Import iommu.h
>>   header update against 5.6.0-rc3 and IOMMU/VFIO nested stage APIs
>>   memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region
>> attribute
>>   memory: Add IOMMU_ATTR_MSI_TRANSLATE IOMMU memory region
>> attribute
>>   memory: Introduce IOMMU Memory Region inject_faults API
>>   memory: Add arch_id and leaf fields in IOTLBEntry
>>   iommu: Introduce generic header
>>   vfio: Force nested if iommu requires it
>>   vfio: Introduce hostwin_from_range helper
>>   vfio: Introduce helpers to DMA map/unmap a RAM section
>>   vfio: Set up nested stage mappings
>>   vfio: Pass stage 1 MSI bindings to the host
>>   vfio: Helper to get IRQ info including capabilities
>>   vfio/pci: Register handler for iommu fault
>>   vfio/pci: Set up the DMA FAULT region
>>   vfio/pci: Implement the DMA fault handler
>>   hw/arm/smmuv3: Advertise MSI_TRANSLATE attribute
>>   hw/arm/smmuv3: Store the PASID table GPA in the translation config
>>   hw/arm/smmuv3: Fill the IOTLBEntry arch_id on NH_VA invalidation
>>   hw/arm/smmuv3: Fill the IOTLBEntry leaf field on NH_VA invalidation
>>   hw/arm/smmuv3: Pass stage 1 configurations to the host
>>   hw/arm/smmuv3: Implement fault injection
>>   hw/arm/smmuv3: Allow MAP notifiers
>>
>> Liu Yi L (1):
>>   pci: introduce PCIPASIDOps to PCIDevice
>>
>>  hw/arm/smmuv3.c                 | 189 ++++++++++--
>>  hw/arm/trace-events             |   3 +-
>>  hw/pci/pci.c                    |  34 +++
>>  hw/vfio/common.c                | 506
>> +++++++++++++++++++++++++-------
>>  hw/vfio/pci.c                   | 267 ++++++++++++++++-
>>  hw/vfio/pci.h                   |   9 +
>>  hw/vfio/trace-events            |   9 +-
>>  include/exec/memory.h           |  49 +++-
>>  include/hw/arm/smmu-common.h    |   1 +
>>  include/hw/iommu/iommu.h        |  28 ++
>>  include/hw/pci/pci.h            |  11 +
>>  include/hw/vfio/vfio-common.h   |  16 +
>>  linux-headers/COPYING           |   2 +
>>  linux-headers/asm-x86/kvm.h     |   1 +
>>  linux-headers/linux/iommu.h     | 375 +++++++++++++++++++++++
>>  linux-headers/linux/vfio.h      | 109 ++++++-
>>  memory.c                        |  10 +
>>  scripts/update-linux-headers.sh |   2 +-
>>  18 files changed, 1478 insertions(+), 143 deletions(-)
>>  create mode 100644 include/hw/iommu/iommu.h
>>  create mode 100644 linux-headers/linux/iommu.h
>>
>> --
>> 2.20.1
> 
>
Zhangfei Gao March 31, 2020, 6:42 a.m. UTC | #3
Hi, Eric

On 2020/3/21 上午12:58, Eric Auger wrote:
> Up to now vSMMUv3 has not been integrated with VFIO. VFIO
> integration requires to program the physical IOMMU consistently
> with the guest mappings. However, as opposed to VTD, SMMUv3 has
> no "Caching Mode" which allows easy trapping of guest mappings.
> This means the vSMMUV3 cannot use the same VFIO integration as VTD.
>
> However SMMUv3 has 2 translation stages. This was devised with
> virtualization use case in mind where stage 1 is "owned" by the
> guest whereas the host uses stage 2 for VM isolation.
>
> This series sets up this nested translation stage. It only works
> if there is one physical SMMUv3 used along with QEMU vSMMUv3 (in
> other words, it does not work if there is a physical SMMUv2).
>
> - We force the host to use stage 2 instead of stage 1, when we
>    detect a vSMMUV3 is behind a VFIO device. For a VFIO device
>    without any virtual IOMMU, we still use stage 1 as many existing
>    SMMUs expect this behavior.
> - We use PCIPASIDOps to propage guest stage1 config changes on
>    STE (Stream Table Entry) changes.
> - We implement a specific UNMAP notifier that conveys guest
>    IOTLB invalidations to the host
> - We register MSI IOVA/GPA bindings to the host so that this latter
>    can build a nested stage translation
> - As the legacy MAP notifier is not called anymore, we must make
>    sure stage 2 mappings are set. This is achieved through another
>    prereg memory listener.
> - Physical SMMU stage 1 related faults are reported to the guest
>    via en eventfd mechanism and exposed trhough a dedicated VFIO-PCI
>    region. Then they are reinjected into the guest.
>
> Best Regards
>
> Eric
>
> This series can be found at:
> https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6
>
> Kernel Dependencies:
> [1] [PATCH v10 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> [2] [PATCH v10 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
> branch at: https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10
Really appreciated that you re-start this work.

I tested your branch and some update.

Guest: https://github.com/Linaro/linux-kernel-warpdrive/tree/sva-devel 
<https://github.com/Linaro/linux-kernel-warpdrive/tree/sva-devel>
Host: 
https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10 
<https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10>
qemu: https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6 
<https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6>

The guest I am using is contains Jean's sva patches.
Since currently they are many patch conflict, so use two different tree.

Result
No-sva mode works.
This mode, guest directly get physical address via ioctl.

While vSVA can not work, there are still much work to do.
I am trying to SVA mode, and it fails, so choose no-sva instead.
iommu_dev_enable_feature(parent, IOMMU_DEV_FEAT_SVA)

I am in debugging how to enable this.

Thanks
Eric Auger March 31, 2020, 8:12 a.m. UTC | #4
Hi Zhangfei,

On 3/31/20 8:42 AM, Zhangfei Gao wrote:
> Hi, Eric
> 
> On 2020/3/21 上午12:58, Eric Auger wrote:
>> Up to now vSMMUv3 has not been integrated with VFIO. VFIO
>> integration requires to program the physical IOMMU consistently
>> with the guest mappings. However, as opposed to VTD, SMMUv3 has
>> no "Caching Mode" which allows easy trapping of guest mappings.
>> This means the vSMMUV3 cannot use the same VFIO integration as VTD.
>>
>> However SMMUv3 has 2 translation stages. This was devised with
>> virtualization use case in mind where stage 1 is "owned" by the
>> guest whereas the host uses stage 2 for VM isolation.
>>
>> This series sets up this nested translation stage. It only works
>> if there is one physical SMMUv3 used along with QEMU vSMMUv3 (in
>> other words, it does not work if there is a physical SMMUv2).
>>
>> - We force the host to use stage 2 instead of stage 1, when we
>>    detect a vSMMUV3 is behind a VFIO device. For a VFIO device
>>    without any virtual IOMMU, we still use stage 1 as many existing
>>    SMMUs expect this behavior.
>> - We use PCIPASIDOps to propage guest stage1 config changes on
>>    STE (Stream Table Entry) changes.
>> - We implement a specific UNMAP notifier that conveys guest
>>    IOTLB invalidations to the host
>> - We register MSI IOVA/GPA bindings to the host so that this latter
>>    can build a nested stage translation
>> - As the legacy MAP notifier is not called anymore, we must make
>>    sure stage 2 mappings are set. This is achieved through another
>>    prereg memory listener.
>> - Physical SMMU stage 1 related faults are reported to the guest
>>    via en eventfd mechanism and exposed trhough a dedicated VFIO-PCI
>>    region. Then they are reinjected into the guest.
>>
>> Best Regards
>>
>> Eric
>>
>> This series can be found at:
>> https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6
>>
>> Kernel Dependencies:
>> [1] [PATCH v10 00/11] SMMUv3 Nested Stage Setup (VFIO part)
>> [2] [PATCH v10 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
>> branch at:
>> https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10
> Really appreciated that you re-start this work.
> 
> I tested your branch and some update.
> 
> Guest: https://github.com/Linaro/linux-kernel-warpdrive/tree/sva-devel
> <https://github.com/Linaro/linux-kernel-warpdrive/tree/sva-devel>
> Host:
> https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10
> <https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10>
> qemu: https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6
> <https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6>
> 
> The guest I am using is contains Jean's sva patches.
> Since currently they are many patch conflict, so use two different tree.
> 
> Result
> No-sva mode works.
> This mode, guest directly get physical address via ioctl.
OK thanks for testing
> 
> While vSVA can not work, there are still much work to do.
> I am trying to SVA mode, and it fails, so choose no-sva instead.
> iommu_dev_enable_feature(parent, IOMMU_DEV_FEAT_SVA)
Indeed I assume there are plenty of things missing to make vSVM work on
ARM (iommu, vfio, QEMU). I am currently reviewing Jacob and Yi's kernel
and QEMU series on Intel side. After that, I will come back to you to
help. Also vSMMUv3 does not support multiple contexts at the moment. I
will add this soon.


Still the problem I have is testing. Any suggestion welcome.

Thanks

Eric
> 
> I am in debugging how to enable this.
> 
> Thanks
> 
>
Zhangfei Gao March 31, 2020, 8:24 a.m. UTC | #5
Hi, Eric

On 2020/3/31 下午4:12, Auger Eric wrote:
> Hi Zhangfei,
>
> On 3/31/20 8:42 AM, Zhangfei Gao wrote:
>> Hi, Eric
>>
>> On 2020/3/21 上午12:58, Eric Auger wrote:
>>> Up to now vSMMUv3 has not been integrated with VFIO. VFIO
>>> integration requires to program the physical IOMMU consistently
>>> with the guest mappings. However, as opposed to VTD, SMMUv3 has
>>> no "Caching Mode" which allows easy trapping of guest mappings.
>>> This means the vSMMUV3 cannot use the same VFIO integration as VTD.
>>>
>>> However SMMUv3 has 2 translation stages. This was devised with
>>> virtualization use case in mind where stage 1 is "owned" by the
>>> guest whereas the host uses stage 2 for VM isolation.
>>>
>>> This series sets up this nested translation stage. It only works
>>> if there is one physical SMMUv3 used along with QEMU vSMMUv3 (in
>>> other words, it does not work if there is a physical SMMUv2).
>>>
>>> - We force the host to use stage 2 instead of stage 1, when we
>>>     detect a vSMMUV3 is behind a VFIO device. For a VFIO device
>>>     without any virtual IOMMU, we still use stage 1 as many existing
>>>     SMMUs expect this behavior.
>>> - We use PCIPASIDOps to propage guest stage1 config changes on
>>>     STE (Stream Table Entry) changes.
>>> - We implement a specific UNMAP notifier that conveys guest
>>>     IOTLB invalidations to the host
>>> - We register MSI IOVA/GPA bindings to the host so that this latter
>>>     can build a nested stage translation
>>> - As the legacy MAP notifier is not called anymore, we must make
>>>     sure stage 2 mappings are set. This is achieved through another
>>>     prereg memory listener.
>>> - Physical SMMU stage 1 related faults are reported to the guest
>>>     via en eventfd mechanism and exposed trhough a dedicated VFIO-PCI
>>>     region. Then they are reinjected into the guest.
>>>
>>> Best Regards
>>>
>>> Eric
>>>
>>> This series can be found at:
>>> https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6
>>>
>>> Kernel Dependencies:
>>> [1] [PATCH v10 00/11] SMMUv3 Nested Stage Setup (VFIO part)
>>> [2] [PATCH v10 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
>>> branch at:
>>> https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10
>> Really appreciated that you re-start this work.
>>
>> I tested your branch and some update.
>>
>> Guest: https://github.com/Linaro/linux-kernel-warpdrive/tree/sva-devel
>> <https://github.com/Linaro/linux-kernel-warpdrive/tree/sva-devel>
>> Host:
>> https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10
>> <https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10>
>> qemu: https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6
>> <https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6>
>>
>> The guest I am using is contains Jean's sva patches.
>> Since currently they are many patch conflict, so use two different tree.
>>
>> Result
>> No-sva mode works.
>> This mode, guest directly get physical address via ioctl.
> OK thanks for testing
>> While vSVA can not work, there are still much work to do.
>> I am trying to SVA mode, and it fails, so choose no-sva instead.
>> iommu_dev_enable_feature(parent, IOMMU_DEV_FEAT_SVA)
> Indeed I assume there are plenty of things missing to make vSVM work on
> ARM (iommu, vfio, QEMU). I am currently reviewing Jacob and Yi's kernel
> and QEMU series on Intel side. After that, I will come back to you to
> help. Also vSMMUv3 does not support multiple contexts at the moment. I
> will add this soon.
>
>
> Still the problem I have is testing. Any suggestion welcome.
>
>
To make sure
Do you mean you need a environment for testing?

How about Hisilicon kunpeng920, arm64 platform supporting SVA in host now.
There is such a platform in linaro mlab that I think we can share.
Currently I am testing with uacce,
By testing a user driver (hisi zip accelerator) in guest, we can test 
vSVA and PASID easily.

Thanks
Eric Auger April 2, 2020, 4:46 p.m. UTC | #6
Hi Zhangfei,

On 3/31/20 10:24 AM, Zhangfei Gao wrote:
> Hi, Eric
> 
> On 2020/3/31 下午4:12, Auger Eric wrote:
>> Hi Zhangfei,
>>
>> On 3/31/20 8:42 AM, Zhangfei Gao wrote:
>>> Hi, Eric
>>>
>>> On 2020/3/21 上午12:58, Eric Auger wrote:
>>>> Up to now vSMMUv3 has not been integrated with VFIO. VFIO
>>>> integration requires to program the physical IOMMU consistently
>>>> with the guest mappings. However, as opposed to VTD, SMMUv3 has
>>>> no "Caching Mode" which allows easy trapping of guest mappings.
>>>> This means the vSMMUV3 cannot use the same VFIO integration as VTD.
>>>>
>>>> However SMMUv3 has 2 translation stages. This was devised with
>>>> virtualization use case in mind where stage 1 is "owned" by the
>>>> guest whereas the host uses stage 2 for VM isolation.
>>>>
>>>> This series sets up this nested translation stage. It only works
>>>> if there is one physical SMMUv3 used along with QEMU vSMMUv3 (in
>>>> other words, it does not work if there is a physical SMMUv2).
>>>>
>>>> - We force the host to use stage 2 instead of stage 1, when we
>>>>     detect a vSMMUV3 is behind a VFIO device. For a VFIO device
>>>>     without any virtual IOMMU, we still use stage 1 as many existing
>>>>     SMMUs expect this behavior.
>>>> - We use PCIPASIDOps to propage guest stage1 config changes on
>>>>     STE (Stream Table Entry) changes.
>>>> - We implement a specific UNMAP notifier that conveys guest
>>>>     IOTLB invalidations to the host
>>>> - We register MSI IOVA/GPA bindings to the host so that this latter
>>>>     can build a nested stage translation
>>>> - As the legacy MAP notifier is not called anymore, we must make
>>>>     sure stage 2 mappings are set. This is achieved through another
>>>>     prereg memory listener.
>>>> - Physical SMMU stage 1 related faults are reported to the guest
>>>>     via en eventfd mechanism and exposed trhough a dedicated VFIO-PCI
>>>>     region. Then they are reinjected into the guest.
>>>>
>>>> Best Regards
>>>>
>>>> Eric
>>>>
>>>> This series can be found at:
>>>> https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6
>>>>
>>>> Kernel Dependencies:
>>>> [1] [PATCH v10 00/11] SMMUv3 Nested Stage Setup (VFIO part)
>>>> [2] [PATCH v10 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
>>>> branch at:
>>>> https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10
>>> Really appreciated that you re-start this work.
>>>
>>> I tested your branch and some update.
>>>
>>> Guest: https://github.com/Linaro/linux-kernel-warpdrive/tree/sva-devel
>>> <https://github.com/Linaro/linux-kernel-warpdrive/tree/sva-devel>
>>> Host:
>>> https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10
>>> <https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10>
>>> qemu: https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6
>>> <https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6>
>>>
>>> The guest I am using is contains Jean's sva patches.
>>> Since currently they are many patch conflict, so use two different tree.
>>>
>>> Result
>>> No-sva mode works.
>>> This mode, guest directly get physical address via ioctl.
>> OK thanks for testing
>>> While vSVA can not work, there are still much work to do.
>>> I am trying to SVA mode, and it fails, so choose no-sva instead.
>>> iommu_dev_enable_feature(parent, IOMMU_DEV_FEAT_SVA)
>> Indeed I assume there are plenty of things missing to make vSVM work on
>> ARM (iommu, vfio, QEMU). I am currently reviewing Jacob and Yi's kernel
>> and QEMU series on Intel side. After that, I will come back to you to
>> help. Also vSMMUv3 does not support multiple contexts at the moment. I
>> will add this soon.
>>
>>
>> Still the problem I have is testing. Any suggestion welcome.
>>
>>
> To make sure
> Do you mean you need a environment for testing?
> 
> How about Hisilicon kunpeng920, arm64 platform supporting SVA in host now.
> There is such a platform in linaro mlab that I think we can share.
> Currently I am testing with uacce,
> By testing a user driver (hisi zip accelerator) in guest, we can test
> vSVA and PASID easily.
Sorry for the delay. I am currently investigating if this could be
possible. Thank you for the suggestion!

Best Regards

Eric
> 
> Thanks
>
Eric Auger April 3, 2020, 10:45 a.m. UTC | #7
Hi Shameer,

On 3/25/20 12:35 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: Eric Auger [mailto:eric.auger@redhat.com]
>> Sent: 20 March 2020 16:58
>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; peter.maydell@linaro.org;
>> mst@redhat.com; alex.williamson@redhat.com;
>> jacob.jun.pan@linux.intel.com; yi.l.liu@intel.com
>> Cc: peterx@redhat.com; jean-philippe@linaro.org; will@kernel.org;
>> tnowicki@marvell.com; Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; zhangfei.gao@foxmail.com;
>> zhangfei.gao@linaro.org; maz@kernel.org; bbhushan2@marvell.com
>> Subject: [RFC v6 00/24] vSMMUv3/pSMMUv3 2 stage VFIO integration
>>
>> Up to now vSMMUv3 has not been integrated with VFIO. VFIO
>> integration requires to program the physical IOMMU consistently
>> with the guest mappings. However, as opposed to VTD, SMMUv3 has
>> no "Caching Mode" which allows easy trapping of guest mappings.
>> This means the vSMMUV3 cannot use the same VFIO integration as VTD.
>>
>> However SMMUv3 has 2 translation stages. This was devised with
>> virtualization use case in mind where stage 1 is "owned" by the
>> guest whereas the host uses stage 2 for VM isolation.
>>
>> This series sets up this nested translation stage. It only works
>> if there is one physical SMMUv3 used along with QEMU vSMMUv3 (in
>> other words, it does not work if there is a physical SMMUv2).
> 
> I was testing this series on one of our hardware board with SMMUv3. I did
> observe an issue while trying to bring up Guest with and without the vsmmuV3.

I am currently investigating and up to now I fail to reproduce on my end.
> 
> Steps are like below,
> 
> 1. start a guest with "iommu=smmuv3" and a n/w vf device.
> 
> 2.Exit the VM.
how to you exit the VM?
> 
> 3. start the guest again without "iommu=smmuv3"
> 
> This time qemu crashes with,
> 
> [ 0.447830] hns3 0000:00:01.0: enabling device (0000 -> 0002)
> /home/shameer/qemu-eric/qemu/hw/vfio/pci.c:2851:vfio_dma_fault_notifier_handler:
> Object 0xaaaaeeb47c00 is not an instance of type
So I think I understand the qemu crash. At the moment the vfio_pci
registers a fault handler even if we are not in nested mode. The smmuv3
host driver calls any registered fault handler when it encounters an
error in !nested mode. So the eventfd is triggered to userspace but qemu
does not expect that. However the root case is we got some physical
faults on the second run.
> qemu:iommu-memory-region
> ./qemu_run-vsmmu-hns: line 9: 13609 Aborted                 (core
> dumped) ./qemu-system-aarch64-vsmmuv3v10 -machine
> virt,kernel_irqchip=on,gic-version=3 -cpu host -smp cpus=1 -kernel
> Image-ericv10-uacce -initrd rootfs-iperf.cpio -bios
Just to double check with you,
host: will-arm-smmu-updates-2stage-v10
qemu: v4.2.0-2stage-rfcv6
guest version?
> QEMU_EFI_Dec2018.fd -device vfio-pci,host=0000:7d:02.1 -net none -m
Do you assign exactly the same VF as during the 1st run?
> 4096 -nographic -D -d -enable-kvm -append "console=ttyAMA0
> root=/dev/vda -m 4096 rw earlycon=pl011,0x9000000"
> 
> And you can see that host kernel receives smmuv3 C_BAD_STE event,
> 
> [10499.379288] vfio-pci 0000:7d:02.1: enabling device (0000 -> 0002)
> [10501.943881] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x04 received:
> [10501.943884] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00007d1100000004
> [10501.943886] arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000100800000080
> [10501.943887] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fe040000
> [10501.943889] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000000007e04c440
I will try to prepare a kernel branch with additional traces.

Thanks

Eric
> 
> So I suspect we didn't clear nested stage configuration and that affects the 
> translation in the second run. I tried to issue(force) a vfio_detach_pasid_table() but 
> that didn't solve the problem.
> 
> May be I am missing something. Could you please take a look and let me know.
> 
> Thanks,
> Shameer
> 
>> - We force the host to use stage 2 instead of stage 1, when we
>>   detect a vSMMUV3 is behind a VFIO device. For a VFIO device
>>   without any virtual IOMMU, we still use stage 1 as many existing
>>   SMMUs expect this behavior.
>> - We use PCIPASIDOps to propage guest stage1 config changes on
>>   STE (Stream Table Entry) changes.
>> - We implement a specific UNMAP notifier that conveys guest
>>   IOTLB invalidations to the host
>> - We register MSI IOVA/GPA bindings to the host so that this latter
>>   can build a nested stage translation
>> - As the legacy MAP notifier is not called anymore, we must make
>>   sure stage 2 mappings are set. This is achieved through another
>>   prereg memory listener.
>> - Physical SMMU stage 1 related faults are reported to the guest
>>   via en eventfd mechanism and exposed trhough a dedicated VFIO-PCI
>>   region. Then they are reinjected into the guest.
>>
>> Best Regards
>>
>> Eric
>>
>> This series can be found at:
>> https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6
>>
>> Kernel Dependencies:
>> [1] [PATCH v10 00/11] SMMUv3 Nested Stage Setup (VFIO part)
>> [2] [PATCH v10 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
>> branch at:
>> https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10
>>
>> History:
>>
>> v5 -> v6:
>> - just rebase work
>>
>> v4 -> v5:
>> - Use PCIPASIDOps for config update notifications
>> - removal of notification for MSI binding which is not needed
>>   anymore
>> - Use a single fault region
>> - use the specific interrupt index
>>
>> v3 -> v4:
>> - adapt to changes in uapi (asid cache invalidation)
>> - check VFIO_PCI_DMA_FAULT_IRQ_INDEX is supported at kernel level
>>   before attempting to set signaling for it.
>> - sync on 5.2-rc1 kernel headers + Drew's patch that imports sve_context.h
>> - fix MSI binding for MSI (not MSIX)
>> - fix mingw compilation
>>
>> v2 -> v3:
>> - rework fault handling
>> - MSI binding registration done in vfio-pci. MSI binding tear down called
>>   on container cleanup path
>> - leaf parameter propagated
>>
>> v1 -> v2:
>> - Fixed dual assignment (asid now correctly propagated on TLB invalidations)
>> - Integrated fault reporting
>>
>>
>> Eric Auger (23):
>>   update-linux-headers: Import iommu.h
>>   header update against 5.6.0-rc3 and IOMMU/VFIO nested stage APIs
>>   memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region
>> attribute
>>   memory: Add IOMMU_ATTR_MSI_TRANSLATE IOMMU memory region
>> attribute
>>   memory: Introduce IOMMU Memory Region inject_faults API
>>   memory: Add arch_id and leaf fields in IOTLBEntry
>>   iommu: Introduce generic header
>>   vfio: Force nested if iommu requires it
>>   vfio: Introduce hostwin_from_range helper
>>   vfio: Introduce helpers to DMA map/unmap a RAM section
>>   vfio: Set up nested stage mappings
>>   vfio: Pass stage 1 MSI bindings to the host
>>   vfio: Helper to get IRQ info including capabilities
>>   vfio/pci: Register handler for iommu fault
>>   vfio/pci: Set up the DMA FAULT region
>>   vfio/pci: Implement the DMA fault handler
>>   hw/arm/smmuv3: Advertise MSI_TRANSLATE attribute
>>   hw/arm/smmuv3: Store the PASID table GPA in the translation config
>>   hw/arm/smmuv3: Fill the IOTLBEntry arch_id on NH_VA invalidation
>>   hw/arm/smmuv3: Fill the IOTLBEntry leaf field on NH_VA invalidation
>>   hw/arm/smmuv3: Pass stage 1 configurations to the host
>>   hw/arm/smmuv3: Implement fault injection
>>   hw/arm/smmuv3: Allow MAP notifiers
>>
>> Liu Yi L (1):
>>   pci: introduce PCIPASIDOps to PCIDevice
>>
>>  hw/arm/smmuv3.c                 | 189 ++++++++++--
>>  hw/arm/trace-events             |   3 +-
>>  hw/pci/pci.c                    |  34 +++
>>  hw/vfio/common.c                | 506
>> +++++++++++++++++++++++++-------
>>  hw/vfio/pci.c                   | 267 ++++++++++++++++-
>>  hw/vfio/pci.h                   |   9 +
>>  hw/vfio/trace-events            |   9 +-
>>  include/exec/memory.h           |  49 +++-
>>  include/hw/arm/smmu-common.h    |   1 +
>>  include/hw/iommu/iommu.h        |  28 ++
>>  include/hw/pci/pci.h            |  11 +
>>  include/hw/vfio/vfio-common.h   |  16 +
>>  linux-headers/COPYING           |   2 +
>>  linux-headers/asm-x86/kvm.h     |   1 +
>>  linux-headers/linux/iommu.h     | 375 +++++++++++++++++++++++
>>  linux-headers/linux/vfio.h      | 109 ++++++-
>>  memory.c                        |  10 +
>>  scripts/update-linux-headers.sh |   2 +-
>>  18 files changed, 1478 insertions(+), 143 deletions(-)
>>  create mode 100644 include/hw/iommu/iommu.h
>>  create mode 100644 linux-headers/linux/iommu.h
>>
>> --
>> 2.20.1
>
Shameerali Kolothum Thodi April 3, 2020, 12:10 p.m. UTC | #8
Hi Eric,

> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: 03 April 2020 11:45
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> eric.auger.pro@gmail.com; qemu-devel@nongnu.org; qemu-arm@nongnu.org;
> peter.maydell@linaro.org; mst@redhat.com; alex.williamson@redhat.com;
> jacob.jun.pan@linux.intel.com; yi.l.liu@intel.com
> Cc: peterx@redhat.com; jean-philippe@linaro.org; will@kernel.org;
> tnowicki@marvell.com; zhangfei.gao@foxmail.com; zhangfei.gao@linaro.org;
> maz@kernel.org; bbhushan2@marvell.com
> Subject: Re: [RFC v6 00/24] vSMMUv3/pSMMUv3 2 stage VFIO integration
> 
> Hi Shameer,
> 
> On 3/25/20 12:35 PM, Shameerali Kolothum Thodi wrote:
> > Hi Eric,
> >
> >> -----Original Message-----
> >> From: Eric Auger [mailto:eric.auger@redhat.com]
> >> Sent: 20 March 2020 16:58
> >> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
> >> qemu-devel@nongnu.org; qemu-arm@nongnu.org;
> peter.maydell@linaro.org;
> >> mst@redhat.com; alex.williamson@redhat.com;
> >> jacob.jun.pan@linux.intel.com; yi.l.liu@intel.com
> >> Cc: peterx@redhat.com; jean-philippe@linaro.org; will@kernel.org;
> >> tnowicki@marvell.com; Shameerali Kolothum Thodi
> >> <shameerali.kolothum.thodi@huawei.com>; zhangfei.gao@foxmail.com;
> >> zhangfei.gao@linaro.org; maz@kernel.org; bbhushan2@marvell.com
> >> Subject: [RFC v6 00/24] vSMMUv3/pSMMUv3 2 stage VFIO integration
> >>
> >> Up to now vSMMUv3 has not been integrated with VFIO. VFIO
> >> integration requires to program the physical IOMMU consistently
> >> with the guest mappings. However, as opposed to VTD, SMMUv3 has
> >> no "Caching Mode" which allows easy trapping of guest mappings.
> >> This means the vSMMUV3 cannot use the same VFIO integration as VTD.
> >>
> >> However SMMUv3 has 2 translation stages. This was devised with
> >> virtualization use case in mind where stage 1 is "owned" by the
> >> guest whereas the host uses stage 2 for VM isolation.
> >>
> >> This series sets up this nested translation stage. It only works
> >> if there is one physical SMMUv3 used along with QEMU vSMMUv3 (in
> >> other words, it does not work if there is a physical SMMUv2).
> >
> > I was testing this series on one of our hardware board with SMMUv3. I did
> > observe an issue while trying to bring up Guest with and without the
> vsmmuV3.
> 
> I am currently investigating and up to now I fail to reproduce on my end.
> >
> > Steps are like below,
> >
> > 1. start a guest with "iommu=smmuv3" and a n/w vf device.
> >
> > 2.Exit the VM.
> how to you exit the VM?

QMP system_powerdown

> >
> > 3. start the guest again without "iommu=smmuv3"
> >
> > This time qemu crashes with,
> >
> > [ 0.447830] hns3 0000:00:01.0: enabling device (0000 -> 0002)
> >
> /home/shameer/qemu-eric/qemu/hw/vfio/pci.c:2851:vfio_dma_fault_notifier_
> handler:
> > Object 0xaaaaeeb47c00 is not an instance of type
> So I think I understand the qemu crash. At the moment the vfio_pci
> registers a fault handler even if we are not in nested mode. The smmuv3
> host driver calls any registered fault handler when it encounters an
> error in !nested mode. So the eventfd is triggered to userspace but qemu
> does not expect that. However the root case is we got some physical
> faults on the second run.

True. And qemu works fine if I run again with iommu=smmuv3 option. 
That's why I suspect the mapping for the device in the phys smmu
is not cleared and on vfio-pci enable dev path it encounters error ?

> > qemu:iommu-memory-region
> > ./qemu_run-vsmmu-hns: line 9: 13609 Aborted                 (core
> > dumped) ./qemu-system-aarch64-vsmmuv3v10 -machine
> > virt,kernel_irqchip=on,gic-version=3 -cpu host -smp cpus=1 -kernel
> > Image-ericv10-uacce -initrd rootfs-iperf.cpio -bios
> Just to double check with you,
> host: will-arm-smmu-updates-2stage-v10
> qemu: v4.2.0-2stage-rfcv6
> guest version?

Yes. And guest = host image.

> > QEMU_EFI_Dec2018.fd -device vfio-pci,host=0000:7d:02.1 -net none -m
> Do you assign exactly the same VF as during the 1st run?

Yes same. Only change is "iommu=smmuv3" omission. 

> > 4096 -nographic -D -d -enable-kvm -append "console=ttyAMA0
> > root=/dev/vda -m 4096 rw earlycon=pl011,0x9000000"
> >
> > And you can see that host kernel receives smmuv3 C_BAD_STE event,
> >
> > [10499.379288] vfio-pci 0000:7d:02.1: enabling device (0000 -> 0002)
> > [10501.943881] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x04 received:
> > [10501.943884] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00007d1100000004
> > [10501.943886] arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000100800000080
> > [10501.943887] arm-smmu-v3 arm-smmu-v3.2.auto: 0x00000000fe040000
> > [10501.943889] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000000007e04c440
> I will try to prepare a kernel branch with additional traces.

Ok. You can find the qemu traces below (vfio*/smmu*) for with and without
iommu=smmuv3 runs(may be not that useful).

https://github.com/hisilicon/qemu/tree/v4.2.0-2stage-rfcv6-eric/traces

Thanks,
Shameer

> Thanks
> 
> Eric
> >
> > So I suspect we didn't clear nested stage configuration and that affects the
> > translation in the second run. I tried to issue(force) a
> vfio_detach_pasid_table() but
> > that didn't solve the problem.
> >
> > May be I am missing something. Could you please take a look and let me
> know.
> >
> > Thanks,
> > Shameer
> >
> >> - We force the host to use stage 2 instead of stage 1, when we
> >>   detect a vSMMUV3 is behind a VFIO device. For a VFIO device
> >>   without any virtual IOMMU, we still use stage 1 as many existing
> >>   SMMUs expect this behavior.
> >> - We use PCIPASIDOps to propage guest stage1 config changes on
> >>   STE (Stream Table Entry) changes.
> >> - We implement a specific UNMAP notifier that conveys guest
> >>   IOTLB invalidations to the host
> >> - We register MSI IOVA/GPA bindings to the host so that this latter
> >>   can build a nested stage translation
> >> - As the legacy MAP notifier is not called anymore, we must make
> >>   sure stage 2 mappings are set. This is achieved through another
> >>   prereg memory listener.
> >> - Physical SMMU stage 1 related faults are reported to the guest
> >>   via en eventfd mechanism and exposed trhough a dedicated VFIO-PCI
> >>   region. Then they are reinjected into the guest.
> >>
> >> Best Regards
> >>
> >> Eric
> >>
> >> This series can be found at:
> >> https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6
> >>
> >> Kernel Dependencies:
> >> [1] [PATCH v10 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> >> [2] [PATCH v10 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
> >> branch at:
> >> https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10
> >>
> >> History:
> >>
> >> v5 -> v6:
> >> - just rebase work
> >>
> >> v4 -> v5:
> >> - Use PCIPASIDOps for config update notifications
> >> - removal of notification for MSI binding which is not needed
> >>   anymore
> >> - Use a single fault region
> >> - use the specific interrupt index
> >>
> >> v3 -> v4:
> >> - adapt to changes in uapi (asid cache invalidation)
> >> - check VFIO_PCI_DMA_FAULT_IRQ_INDEX is supported at kernel level
> >>   before attempting to set signaling for it.
> >> - sync on 5.2-rc1 kernel headers + Drew's patch that imports sve_context.h
> >> - fix MSI binding for MSI (not MSIX)
> >> - fix mingw compilation
> >>
> >> v2 -> v3:
> >> - rework fault handling
> >> - MSI binding registration done in vfio-pci. MSI binding tear down called
> >>   on container cleanup path
> >> - leaf parameter propagated
> >>
> >> v1 -> v2:
> >> - Fixed dual assignment (asid now correctly propagated on TLB invalidations)
> >> - Integrated fault reporting
> >>
> >>
> >> Eric Auger (23):
> >>   update-linux-headers: Import iommu.h
> >>   header update against 5.6.0-rc3 and IOMMU/VFIO nested stage APIs
> >>   memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region
> >> attribute
> >>   memory: Add IOMMU_ATTR_MSI_TRANSLATE IOMMU memory region
> >> attribute
> >>   memory: Introduce IOMMU Memory Region inject_faults API
> >>   memory: Add arch_id and leaf fields in IOTLBEntry
> >>   iommu: Introduce generic header
> >>   vfio: Force nested if iommu requires it
> >>   vfio: Introduce hostwin_from_range helper
> >>   vfio: Introduce helpers to DMA map/unmap a RAM section
> >>   vfio: Set up nested stage mappings
> >>   vfio: Pass stage 1 MSI bindings to the host
> >>   vfio: Helper to get IRQ info including capabilities
> >>   vfio/pci: Register handler for iommu fault
> >>   vfio/pci: Set up the DMA FAULT region
> >>   vfio/pci: Implement the DMA fault handler
> >>   hw/arm/smmuv3: Advertise MSI_TRANSLATE attribute
> >>   hw/arm/smmuv3: Store the PASID table GPA in the translation config
> >>   hw/arm/smmuv3: Fill the IOTLBEntry arch_id on NH_VA invalidation
> >>   hw/arm/smmuv3: Fill the IOTLBEntry leaf field on NH_VA invalidation
> >>   hw/arm/smmuv3: Pass stage 1 configurations to the host
> >>   hw/arm/smmuv3: Implement fault injection
> >>   hw/arm/smmuv3: Allow MAP notifiers
> >>
> >> Liu Yi L (1):
> >>   pci: introduce PCIPASIDOps to PCIDevice
> >>
> >>  hw/arm/smmuv3.c                 | 189 ++++++++++--
> >>  hw/arm/trace-events             |   3 +-
> >>  hw/pci/pci.c                    |  34 +++
> >>  hw/vfio/common.c                | 506
> >> +++++++++++++++++++++++++-------
> >>  hw/vfio/pci.c                   | 267 ++++++++++++++++-
> >>  hw/vfio/pci.h                   |   9 +
> >>  hw/vfio/trace-events            |   9 +-
> >>  include/exec/memory.h           |  49 +++-
> >>  include/hw/arm/smmu-common.h    |   1 +
> >>  include/hw/iommu/iommu.h        |  28 ++
> >>  include/hw/pci/pci.h            |  11 +
> >>  include/hw/vfio/vfio-common.h   |  16 +
> >>  linux-headers/COPYING           |   2 +
> >>  linux-headers/asm-x86/kvm.h     |   1 +
> >>  linux-headers/linux/iommu.h     | 375 +++++++++++++++++++++++
> >>  linux-headers/linux/vfio.h      | 109 ++++++-
> >>  memory.c                        |  10 +
> >>  scripts/update-linux-headers.sh |   2 +-
> >>  18 files changed, 1478 insertions(+), 143 deletions(-)
> >>  create mode 100644 include/hw/iommu/iommu.h
> >>  create mode 100644 linux-headers/linux/iommu.h
> >>
> >> --
> >> 2.20.1
> >