[v9,00/11] SMMUv3 Nested Stage Setup (VFIO part)
mbox series

Message ID 20190711135625.20684-1-eric.auger@redhat.com
Headers show
Series
  • SMMUv3 Nested Stage Setup (VFIO part)
Related show

Message

Auger Eric July 11, 2019, 1:56 p.m. UTC
This series brings the VFIO part of HW nested paging support
in the SMMUv3.

The series depends on:
[PATCH v9 00/14] SMMUv3 Nested Stage Setup (IOMMU part)
(https://www.spinics.net/lists/kernel/msg3187714.html)

3 new IOCTLs are introduced that allow the userspace to
1) pass the guest stage 1 configuration
2) pass stage 1 MSI bindings
3) invalidate stage 1 related caches

They map onto the related new IOMMU API functions.

We introduce the capability to register specific interrupt
indexes (see [1]). A new DMA_FAULT interrupt index allows to register
an eventfd to be signaled whenever a stage 1 related fault
is detected at physical level. Also a specific region allows
to expose the fault records to the user space.

Best Regards

Eric

This series can be found at:
https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9

It series includes Tina's patch steming from
[1] "[RFC PATCH v2 1/3] vfio: Use capability chains to handle device
specific irq" plus patches originally contributed by Yi.

History:

v8 -> v9:
- introduce specific irq framework
- single fault region
- iommu_unregister_device_fault_handler failure case not handled
  yet.

v7 -> v8:
- rebase on top of v5.2-rc1 and especially
  8be39a1a04c1  iommu/arm-smmu-v3: Add a master->domain pointer
- dynamic alloc of s1_cfg/s2_cfg
- __arm_smmu_tlb_inv_asid/s1_range_nosync
- check there is no HW MSI regions
- asid invalidation using pasid extended struct (change in the uapi)
- add s1_live/s2_live checks
- move check about support of nested stages in domain finalise
- fixes in error reporting according to the discussion with Robin
- reordered the patches to have first iommu/smmuv3 patches and then
  VFIO patches

v6 -> v7:
- removed device handle from bind/unbind_guest_msi
- added "iommu/smmuv3: Nested mode single MSI doorbell per domain
  enforcement"
- added few uapi comments as suggested by Jean, Jacop and Alex

v5 -> v6:
- Fix compilation issue when CONFIG_IOMMU_API is unset

v4 -> v5:
- fix bug reported by Vincent: fault handler unregistration now happens in
  vfio_pci_release
- IOMMU_FAULT_PERM_* moved outside of struct definition + small
  uapi changes suggested by Kean-Philippe (except fetch_addr)
- iommu: introduce device fault report API: removed the PRI part.
- see individual logs for more details
- reset the ste abort flag on detach

v3 -> v4:
- took into account Alex, jean-Philippe and Robin's comments on v3
- rework of the smmuv3 driver integration
- add tear down ops for msi binding and PASID table binding
- fix S1 fault propagation
- put fault reporting patches at the beginning of the series following
  Jean-Philippe's request
- update of the cache invalidate and fault API uapis
- VFIO fault reporting rework with 2 separate regions and one mmappable
  segment for the fault queue
- moved to PATCH

v2 -> v3:
- When registering the S1 MSI binding we now store the device handle. This
  addresses Robin's comment about discimination of devices beonging to
  different S1 groups and using different physical MSI doorbells.
- Change the fault reporting API: use VFIO_PCI_DMA_FAULT_IRQ_INDEX to
  set the eventfd and expose the faults through an mmappable fault region

v1 -> v2:
- Added the fault reporting capability
- asid properly passed on invalidation (fix assignment of multiple
  devices)
- see individual change logs for more info


Eric Auger (8):
  vfio: VFIO_IOMMU_SET_MSI_BINDING
  vfio/pci: Add VFIO_REGION_TYPE_NESTED region type
  vfio/pci: Register an iommu fault handler
  vfio/pci: Allow to mmap the fault queue
  vfio: Add new IRQ for DMA fault reporting
  vfio/pci: Add framework for custom interrupt indices
  vfio/pci: Register and allow DMA FAULT IRQ signaling
  vfio: Document nested stage control

Liu, Yi L (2):
  vfio: VFIO_IOMMU_SET_PASID_TABLE
  vfio: VFIO_IOMMU_CACHE_INVALIDATE

Tina Zhang (1):
  vfio: Use capability chains to handle device specific irq

 Documentation/vfio.txt              |  77 ++++++++
 drivers/vfio/pci/vfio_pci.c         | 283 ++++++++++++++++++++++++++--
 drivers/vfio/pci/vfio_pci_intrs.c   |  62 ++++++
 drivers/vfio/pci/vfio_pci_private.h |  24 +++
 drivers/vfio/pci/vfio_pci_rdwr.c    |  45 +++++
 drivers/vfio/vfio_iommu_type1.c     | 166 ++++++++++++++++
 include/uapi/linux/vfio.h           | 109 ++++++++++-
 7 files changed, 747 insertions(+), 19 deletions(-)

Comments

Auger Eric Nov. 12, 2019, 11:28 a.m. UTC | #1
Hi Shameer,
On 11/12/19 12:08 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: kvmarm-bounces@lists.cs.columbia.edu
>> [mailto:kvmarm-bounces@lists.cs.columbia.edu] On Behalf Of Eric Auger
>> Sent: 11 July 2019 14:56
>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
>> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>> kvm@vger.kernel.org; kvmarm@lists.cs.columbia.edu; joro@8bytes.org;
>> alex.williamson@redhat.com; jacob.jun.pan@linux.intel.com;
>> yi.l.liu@intel.com; jean-philippe.brucker@arm.com; will.deacon@arm.com;
>> robin.murphy@arm.com
>> Cc: kevin.tian@intel.com; vincent.stehle@arm.com; ashok.raj@intel.com;
>> marc.zyngier@arm.com; tina.zhang@intel.com
>> Subject: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
>>
>> This series brings the VFIO part of HW nested paging support
>> in the SMMUv3.
>>
>> The series depends on:
>> [PATCH v9 00/14] SMMUv3 Nested Stage Setup (IOMMU part)
>> (https://www.spinics.net/lists/kernel/msg3187714.html)
>>
>> 3 new IOCTLs are introduced that allow the userspace to
>> 1) pass the guest stage 1 configuration
>> 2) pass stage 1 MSI bindings
>> 3) invalidate stage 1 related caches
>>
>> They map onto the related new IOMMU API functions.
>>
>> We introduce the capability to register specific interrupt
>> indexes (see [1]). A new DMA_FAULT interrupt index allows to register
>> an eventfd to be signaled whenever a stage 1 related fault
>> is detected at physical level. Also a specific region allows
>> to expose the fault records to the user space.
> 
> I am trying to get this running on one of our platform that has smmuv3 dual
> stage support. I am seeing some issues with this when an ixgbe vf dev is 
> made pass-through and is behind a vSMMUv3 in Guest.
> 
> Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
> Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5
> 
> And this is my Qemu cmd line,
> 
> ./qemu-system-aarch64
> -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host \
> -kernel Image \
> -drive if=none,file=ubuntu,id=fs \
> -device virtio-blk-device,drive=fs \
> -device vfio-pci,host=0000:01:10.1 \
> -bios QEMU_EFI.fd \
> -net none \
> -m 4G \
> -nographic -D -d -enable-kvm \
> -append "console=ttyAMA0 root=/dev/vda rw acpi=force"
> 
> The basic ping from Guest works fine,
> root@ubuntu:~# ping 10.202.225.185
> PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data.
> 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms
> 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms
> ...
> 
> But if I increase ping packet size, 
> 
> root@ubuntu:~# ping -s 1024 10.202.225.185
> PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data.
> 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms
> 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms
> From 10.202.225.169 icmp_seq=66 Destination Host Unreachable
> From 10.202.225.169 icmp_seq=67 Destination Host Unreachable
> From 10.202.225.169 icmp_seq=68 Destination Host Unreachable
> From 10.202.225.169 icmp_seq=69 Destination Host Unreachable
> 
> And from Host kernel I get,
> [  819.970742] ixgbe 0000:01:00.1 enp1s0f1: 3 Spoofed packets detected
> [  824.002707] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets detected
> [  828.034683] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets detected
> [  830.050673] ixgbe 0000:01:00.1 enp1s0f1: 4 Spoofed packets detected
> [  832.066659] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets detected
> [  834.082640] ixgbe 0000:01:00.1 enp1s0f1: 3 Spoofed packets detected
> 
> Also noted that iperf cannot work as it fails to establish the connection with iperf
> server. 
> 
> Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your reference.
> I haven't debugged this further yet and thought of checking with you if this is
> something you have seen already or not. Or maybe I am missing something here?

Please can you try to edit and modify hw/vfio/common.c, function
vfio_iommu_unmap_notify


/*
    if (size <= 0x10000) {
        ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
        ustruct.info.granularity = IOMMU_INV_GRANU_ADDR;
        ustruct.info.addr_info.flags = IOMMU_INV_ADDR_FLAGS_ARCHID;
        if (iotlb->leaf) {
            ustruct.info.addr_info.flags |= IOMMU_INV_ADDR_FLAGS_LEAF;
        }
        ustruct.info.addr_info.archid = iotlb->arch_id;
        ustruct.info.addr_info.addr = start;
        ustruct.info.addr_info.granule_size = size;
        ustruct.info.addr_info.nb_granules = 1;
        trace_vfio_iommu_addr_inv_iotlb(iotlb->arch_id, start, size, 1,
                                        iotlb->leaf);
    } else {
*/
        ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
        ustruct.info.granularity = IOMMU_INV_GRANU_PASID;
        ustruct.info.pasid_info.archid = iotlb->arch_id;
        ustruct.info.pasid_info.flags = IOMMU_INV_PASID_FLAGS_ARCHID;
        trace_vfio_iommu_asid_inv_iotlb(iotlb->arch_id);
//    }

This modification leads to invalidate the whole asid each time we get a
guest TLBI instead of invalidating the single IOVA (TLBI). On my end, I
saw this was the cause of such kind of issues. Please let me know if it
fixes your perf issues and then we may discuss further about the test
configuration.

Thanks

Eric



> 
> Please let me know.
> 
> Thanks,
> Shameer
> 
>> Best Regards
>>
>> Eric
>>
>> This series can be found at:
>> https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
>>
>> It series includes Tina's patch steming from
>> [1] "[RFC PATCH v2 1/3] vfio: Use capability chains to handle device
>> specific irq" plus patches originally contributed by Yi.
>>
>> History:
>>
>> v8 -> v9:
>> - introduce specific irq framework
>> - single fault region
>> - iommu_unregister_device_fault_handler failure case not handled
>>   yet.
>>
>> v7 -> v8:
>> - rebase on top of v5.2-rc1 and especially
>>   8be39a1a04c1  iommu/arm-smmu-v3: Add a master->domain pointer
>> - dynamic alloc of s1_cfg/s2_cfg
>> - __arm_smmu_tlb_inv_asid/s1_range_nosync
>> - check there is no HW MSI regions
>> - asid invalidation using pasid extended struct (change in the uapi)
>> - add s1_live/s2_live checks
>> - move check about support of nested stages in domain finalise
>> - fixes in error reporting according to the discussion with Robin
>> - reordered the patches to have first iommu/smmuv3 patches and then
>>   VFIO patches
>>
>> v6 -> v7:
>> - removed device handle from bind/unbind_guest_msi
>> - added "iommu/smmuv3: Nested mode single MSI doorbell per domain
>>   enforcement"
>> - added few uapi comments as suggested by Jean, Jacop and Alex
>>
>> v5 -> v6:
>> - Fix compilation issue when CONFIG_IOMMU_API is unset
>>
>> v4 -> v5:
>> - fix bug reported by Vincent: fault handler unregistration now happens in
>>   vfio_pci_release
>> - IOMMU_FAULT_PERM_* moved outside of struct definition + small
>>   uapi changes suggested by Kean-Philippe (except fetch_addr)
>> - iommu: introduce device fault report API: removed the PRI part.
>> - see individual logs for more details
>> - reset the ste abort flag on detach
>>
>> v3 -> v4:
>> - took into account Alex, jean-Philippe and Robin's comments on v3
>> - rework of the smmuv3 driver integration
>> - add tear down ops for msi binding and PASID table binding
>> - fix S1 fault propagation
>> - put fault reporting patches at the beginning of the series following
>>   Jean-Philippe's request
>> - update of the cache invalidate and fault API uapis
>> - VFIO fault reporting rework with 2 separate regions and one mmappable
>>   segment for the fault queue
>> - moved to PATCH
>>
>> v2 -> v3:
>> - When registering the S1 MSI binding we now store the device handle. This
>>   addresses Robin's comment about discimination of devices beonging to
>>   different S1 groups and using different physical MSI doorbells.
>> - Change the fault reporting API: use VFIO_PCI_DMA_FAULT_IRQ_INDEX to
>>   set the eventfd and expose the faults through an mmappable fault region
>>
>> v1 -> v2:
>> - Added the fault reporting capability
>> - asid properly passed on invalidation (fix assignment of multiple
>>   devices)
>> - see individual change logs for more info
>>
>>
>> Eric Auger (8):
>>   vfio: VFIO_IOMMU_SET_MSI_BINDING
>>   vfio/pci: Add VFIO_REGION_TYPE_NESTED region type
>>   vfio/pci: Register an iommu fault handler
>>   vfio/pci: Allow to mmap the fault queue
>>   vfio: Add new IRQ for DMA fault reporting
>>   vfio/pci: Add framework for custom interrupt indices
>>   vfio/pci: Register and allow DMA FAULT IRQ signaling
>>   vfio: Document nested stage control
>>
>> Liu, Yi L (2):
>>   vfio: VFIO_IOMMU_SET_PASID_TABLE
>>   vfio: VFIO_IOMMU_CACHE_INVALIDATE
>>
>> Tina Zhang (1):
>>   vfio: Use capability chains to handle device specific irq
>>
>>  Documentation/vfio.txt              |  77 ++++++++
>>  drivers/vfio/pci/vfio_pci.c         | 283 ++++++++++++++++++++++++++--
>>  drivers/vfio/pci/vfio_pci_intrs.c   |  62 ++++++
>>  drivers/vfio/pci/vfio_pci_private.h |  24 +++
>>  drivers/vfio/pci/vfio_pci_rdwr.c    |  45 +++++
>>  drivers/vfio/vfio_iommu_type1.c     | 166 ++++++++++++++++
>>  include/uapi/linux/vfio.h           | 109 ++++++++++-
>>  7 files changed, 747 insertions(+), 19 deletions(-)
>>
>> --
>> 2.20.1
>>
>> _______________________________________________
>> kvmarm mailing list
>> kvmarm@lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Shameerali Kolothum Thodi Nov. 12, 2019, 1:06 p.m. UTC | #2
Hi Eric,

> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: 12 November 2019 11:29
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> eric.auger.pro@gmail.com; iommu@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org; kvm@vger.kernel.org;
> kvmarm@lists.cs.columbia.edu; joro@8bytes.org;
> alex.williamson@redhat.com; jacob.jun.pan@linux.intel.com;
> yi.l.liu@intel.com; jean-philippe.brucker@arm.com; will.deacon@arm.com;
> robin.murphy@arm.com
> Cc: kevin.tian@intel.com; vincent.stehle@arm.com; ashok.raj@intel.com;
> marc.zyngier@arm.com; tina.zhang@intel.com; Linuxarm
> <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>
> Subject: Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> 
> Hi Shameer,
> On 11/12/19 12:08 PM, Shameerali Kolothum Thodi wrote:
> > Hi Eric,
> >
> >> -----Original Message-----
> >> From: kvmarm-bounces@lists.cs.columbia.edu
> >> [mailto:kvmarm-bounces@lists.cs.columbia.edu] On Behalf Of Eric Auger
> >> Sent: 11 July 2019 14:56
> >> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
> >> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> >> kvm@vger.kernel.org; kvmarm@lists.cs.columbia.edu; joro@8bytes.org;
> >> alex.williamson@redhat.com; jacob.jun.pan@linux.intel.com;
> >> yi.l.liu@intel.com; jean-philippe.brucker@arm.com; will.deacon@arm.com;
> >> robin.murphy@arm.com
> >> Cc: kevin.tian@intel.com; vincent.stehle@arm.com; ashok.raj@intel.com;
> >> marc.zyngier@arm.com; tina.zhang@intel.com
> >> Subject: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> >>
> >> This series brings the VFIO part of HW nested paging support
> >> in the SMMUv3.
> >>
> >> The series depends on:
> >> [PATCH v9 00/14] SMMUv3 Nested Stage Setup (IOMMU part)
> >> (https://www.spinics.net/lists/kernel/msg3187714.html)
> >>
> >> 3 new IOCTLs are introduced that allow the userspace to
> >> 1) pass the guest stage 1 configuration
> >> 2) pass stage 1 MSI bindings
> >> 3) invalidate stage 1 related caches
> >>
> >> They map onto the related new IOMMU API functions.
> >>
> >> We introduce the capability to register specific interrupt
> >> indexes (see [1]). A new DMA_FAULT interrupt index allows to register
> >> an eventfd to be signaled whenever a stage 1 related fault
> >> is detected at physical level. Also a specific region allows
> >> to expose the fault records to the user space.
> >
> > I am trying to get this running on one of our platform that has smmuv3 dual
> > stage support. I am seeing some issues with this when an ixgbe vf dev is
> > made pass-through and is behind a vSMMUv3 in Guest.
> >
> > Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
> > Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5
> >
> > And this is my Qemu cmd line,
> >
> > ./qemu-system-aarch64
> > -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host \
> > -kernel Image \
> > -drive if=none,file=ubuntu,id=fs \
> > -device virtio-blk-device,drive=fs \
> > -device vfio-pci,host=0000:01:10.1 \
> > -bios QEMU_EFI.fd \
> > -net none \
> > -m 4G \
> > -nographic -D -d -enable-kvm \
> > -append "console=ttyAMA0 root=/dev/vda rw acpi=force"
> >
> > The basic ping from Guest works fine,
> > root@ubuntu:~# ping 10.202.225.185
> > PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data.
> > 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms
> > 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms
> > ...
> >
> > But if I increase ping packet size,
> >
> > root@ubuntu:~# ping -s 1024 10.202.225.185
> > PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data.
> > 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms
> > 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms
> > From 10.202.225.169 icmp_seq=66 Destination Host Unreachable
> > From 10.202.225.169 icmp_seq=67 Destination Host Unreachable
> > From 10.202.225.169 icmp_seq=68 Destination Host Unreachable
> > From 10.202.225.169 icmp_seq=69 Destination Host Unreachable
> >
> > And from Host kernel I get,
> > [  819.970742] ixgbe 0000:01:00.1 enp1s0f1: 3 Spoofed packets detected
> > [  824.002707] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets detected
> > [  828.034683] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets detected
> > [  830.050673] ixgbe 0000:01:00.1 enp1s0f1: 4 Spoofed packets detected
> > [  832.066659] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets detected
> > [  834.082640] ixgbe 0000:01:00.1 enp1s0f1: 3 Spoofed packets detected
> >
> > Also noted that iperf cannot work as it fails to establish the connection with
> iperf
> > server.
> >
> > Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your
> reference.
> > I haven't debugged this further yet and thought of checking with you if this is
> > something you have seen already or not. Or maybe I am missing something
> here?
> 
> Please can you try to edit and modify hw/vfio/common.c, function
> vfio_iommu_unmap_notify
> 
> 
> /*
>     if (size <= 0x10000) {
>         ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
>         ustruct.info.granularity = IOMMU_INV_GRANU_ADDR;
>         ustruct.info.addr_info.flags = IOMMU_INV_ADDR_FLAGS_ARCHID;
>         if (iotlb->leaf) {
>             ustruct.info.addr_info.flags |=
> IOMMU_INV_ADDR_FLAGS_LEAF;
>         }
>         ustruct.info.addr_info.archid = iotlb->arch_id;
>         ustruct.info.addr_info.addr = start;
>         ustruct.info.addr_info.granule_size = size;
>         ustruct.info.addr_info.nb_granules = 1;
>         trace_vfio_iommu_addr_inv_iotlb(iotlb->arch_id, start, size, 1,
>                                         iotlb->leaf);
>     } else {
> */
>         ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
>         ustruct.info.granularity = IOMMU_INV_GRANU_PASID;
>         ustruct.info.pasid_info.archid = iotlb->arch_id;
>         ustruct.info.pasid_info.flags = IOMMU_INV_PASID_FLAGS_ARCHID;
>         trace_vfio_iommu_asid_inv_iotlb(iotlb->arch_id);
> //    }
> 
> This modification leads to invalidate the whole asid each time we get a
> guest TLBI instead of invalidating the single IOVA (TLBI). On my end, I
> saw this was the cause of such kind of issues. Please let me know if it
> fixes your perf issues

Yes, this seems to fix the issue.

root@ubuntu:~# iperf -c 10.202.225.185
------------------------------------------------------------
Client connecting to 10.202.225.185, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.202.225.169 port 47996 connected with 10.202.225.185 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  2.27 GBytes  1.95 Gbits/sec
root@ubuntu:~#

But the performance seems to be very poor as this is a 10Gbps interface(Of course
invalidating the whole asid may not be very helpful). It is interesting that why the
single iova invalidation is not working.

 and then we may discuss further about the test
> configuration.

Sure. Please let me know.

Cheers,
Shameer 

> Thanks
> 
> Eric
> 
> 
> 
> >
> > Please let me know.
> >
> > Thanks,
> > Shameer
> >
> >> Best Regards
> >>
> >> Eric
> >>
> >> This series can be found at:
> >> https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
> >>
> >> It series includes Tina's patch steming from
> >> [1] "[RFC PATCH v2 1/3] vfio: Use capability chains to handle device
> >> specific irq" plus patches originally contributed by Yi.
> >>
> >> History:
> >>
> >> v8 -> v9:
> >> - introduce specific irq framework
> >> - single fault region
> >> - iommu_unregister_device_fault_handler failure case not handled
> >>   yet.
> >>
> >> v7 -> v8:
> >> - rebase on top of v5.2-rc1 and especially
> >>   8be39a1a04c1  iommu/arm-smmu-v3: Add a master->domain pointer
> >> - dynamic alloc of s1_cfg/s2_cfg
> >> - __arm_smmu_tlb_inv_asid/s1_range_nosync
> >> - check there is no HW MSI regions
> >> - asid invalidation using pasid extended struct (change in the uapi)
> >> - add s1_live/s2_live checks
> >> - move check about support of nested stages in domain finalise
> >> - fixes in error reporting according to the discussion with Robin
> >> - reordered the patches to have first iommu/smmuv3 patches and then
> >>   VFIO patches
> >>
> >> v6 -> v7:
> >> - removed device handle from bind/unbind_guest_msi
> >> - added "iommu/smmuv3: Nested mode single MSI doorbell per domain
> >>   enforcement"
> >> - added few uapi comments as suggested by Jean, Jacop and Alex
> >>
> >> v5 -> v6:
> >> - Fix compilation issue when CONFIG_IOMMU_API is unset
> >>
> >> v4 -> v5:
> >> - fix bug reported by Vincent: fault handler unregistration now happens in
> >>   vfio_pci_release
> >> - IOMMU_FAULT_PERM_* moved outside of struct definition + small
> >>   uapi changes suggested by Kean-Philippe (except fetch_addr)
> >> - iommu: introduce device fault report API: removed the PRI part.
> >> - see individual logs for more details
> >> - reset the ste abort flag on detach
> >>
> >> v3 -> v4:
> >> - took into account Alex, jean-Philippe and Robin's comments on v3
> >> - rework of the smmuv3 driver integration
> >> - add tear down ops for msi binding and PASID table binding
> >> - fix S1 fault propagation
> >> - put fault reporting patches at the beginning of the series following
> >>   Jean-Philippe's request
> >> - update of the cache invalidate and fault API uapis
> >> - VFIO fault reporting rework with 2 separate regions and one mmappable
> >>   segment for the fault queue
> >> - moved to PATCH
> >>
> >> v2 -> v3:
> >> - When registering the S1 MSI binding we now store the device handle. This
> >>   addresses Robin's comment about discimination of devices beonging to
> >>   different S1 groups and using different physical MSI doorbells.
> >> - Change the fault reporting API: use VFIO_PCI_DMA_FAULT_IRQ_INDEX to
> >>   set the eventfd and expose the faults through an mmappable fault region
> >>
> >> v1 -> v2:
> >> - Added the fault reporting capability
> >> - asid properly passed on invalidation (fix assignment of multiple
> >>   devices)
> >> - see individual change logs for more info
> >>
> >>
> >> Eric Auger (8):
> >>   vfio: VFIO_IOMMU_SET_MSI_BINDING
> >>   vfio/pci: Add VFIO_REGION_TYPE_NESTED region type
> >>   vfio/pci: Register an iommu fault handler
> >>   vfio/pci: Allow to mmap the fault queue
> >>   vfio: Add new IRQ for DMA fault reporting
> >>   vfio/pci: Add framework for custom interrupt indices
> >>   vfio/pci: Register and allow DMA FAULT IRQ signaling
> >>   vfio: Document nested stage control
> >>
> >> Liu, Yi L (2):
> >>   vfio: VFIO_IOMMU_SET_PASID_TABLE
> >>   vfio: VFIO_IOMMU_CACHE_INVALIDATE
> >>
> >> Tina Zhang (1):
> >>   vfio: Use capability chains to handle device specific irq
> >>
> >>  Documentation/vfio.txt              |  77 ++++++++
> >>  drivers/vfio/pci/vfio_pci.c         | 283
> ++++++++++++++++++++++++++--
> >>  drivers/vfio/pci/vfio_pci_intrs.c   |  62 ++++++
> >>  drivers/vfio/pci/vfio_pci_private.h |  24 +++
> >>  drivers/vfio/pci/vfio_pci_rdwr.c    |  45 +++++
> >>  drivers/vfio/vfio_iommu_type1.c     | 166 ++++++++++++++++
> >>  include/uapi/linux/vfio.h           | 109 ++++++++++-
> >>  7 files changed, 747 insertions(+), 19 deletions(-)
> >>
> >> --
> >> 2.20.1
> >>
> >> _______________________________________________
> >> kvmarm mailing list
> >> kvmarm@lists.cs.columbia.edu
> >> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Auger Eric Nov. 12, 2019, 1:21 p.m. UTC | #3
Hi Shameer,

On 11/12/19 2:06 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: Auger Eric [mailto:eric.auger@redhat.com]
>> Sent: 12 November 2019 11:29
>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
>> eric.auger.pro@gmail.com; iommu@lists.linux-foundation.org;
>> linux-kernel@vger.kernel.org; kvm@vger.kernel.org;
>> kvmarm@lists.cs.columbia.edu; joro@8bytes.org;
>> alex.williamson@redhat.com; jacob.jun.pan@linux.intel.com;
>> yi.l.liu@intel.com; jean-philippe.brucker@arm.com; will.deacon@arm.com;
>> robin.murphy@arm.com
>> Cc: kevin.tian@intel.com; vincent.stehle@arm.com; ashok.raj@intel.com;
>> marc.zyngier@arm.com; tina.zhang@intel.com; Linuxarm
>> <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>
>> Subject: Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
>>
>> Hi Shameer,
>> On 11/12/19 12:08 PM, Shameerali Kolothum Thodi wrote:
>>> Hi Eric,
>>>
>>>> -----Original Message-----
>>>> From: kvmarm-bounces@lists.cs.columbia.edu
>>>> [mailto:kvmarm-bounces@lists.cs.columbia.edu] On Behalf Of Eric Auger
>>>> Sent: 11 July 2019 14:56
>>>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
>>>> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>>>> kvm@vger.kernel.org; kvmarm@lists.cs.columbia.edu; joro@8bytes.org;
>>>> alex.williamson@redhat.com; jacob.jun.pan@linux.intel.com;
>>>> yi.l.liu@intel.com; jean-philippe.brucker@arm.com; will.deacon@arm.com;
>>>> robin.murphy@arm.com
>>>> Cc: kevin.tian@intel.com; vincent.stehle@arm.com; ashok.raj@intel.com;
>>>> marc.zyngier@arm.com; tina.zhang@intel.com
>>>> Subject: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
>>>>
>>>> This series brings the VFIO part of HW nested paging support
>>>> in the SMMUv3.
>>>>
>>>> The series depends on:
>>>> [PATCH v9 00/14] SMMUv3 Nested Stage Setup (IOMMU part)
>>>> (https://www.spinics.net/lists/kernel/msg3187714.html)
>>>>
>>>> 3 new IOCTLs are introduced that allow the userspace to
>>>> 1) pass the guest stage 1 configuration
>>>> 2) pass stage 1 MSI bindings
>>>> 3) invalidate stage 1 related caches
>>>>
>>>> They map onto the related new IOMMU API functions.
>>>>
>>>> We introduce the capability to register specific interrupt
>>>> indexes (see [1]). A new DMA_FAULT interrupt index allows to register
>>>> an eventfd to be signaled whenever a stage 1 related fault
>>>> is detected at physical level. Also a specific region allows
>>>> to expose the fault records to the user space.
>>>
>>> I am trying to get this running on one of our platform that has smmuv3 dual
>>> stage support. I am seeing some issues with this when an ixgbe vf dev is
>>> made pass-through and is behind a vSMMUv3 in Guest.
>>>
>>> Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
>>> Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5
>>>
>>> And this is my Qemu cmd line,
>>>
>>> ./qemu-system-aarch64
>>> -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host \
>>> -kernel Image \
>>> -drive if=none,file=ubuntu,id=fs \
>>> -device virtio-blk-device,drive=fs \
>>> -device vfio-pci,host=0000:01:10.1 \
>>> -bios QEMU_EFI.fd \
>>> -net none \
>>> -m 4G \
>>> -nographic -D -d -enable-kvm \
>>> -append "console=ttyAMA0 root=/dev/vda rw acpi=force"
>>>
>>> The basic ping from Guest works fine,
>>> root@ubuntu:~# ping 10.202.225.185
>>> PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data.
>>> 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms
>>> 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms
>>> ...
>>>
>>> But if I increase ping packet size,
>>>
>>> root@ubuntu:~# ping -s 1024 10.202.225.185
>>> PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data.
>>> 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms
>>> 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms
>>> From 10.202.225.169 icmp_seq=66 Destination Host Unreachable
>>> From 10.202.225.169 icmp_seq=67 Destination Host Unreachable
>>> From 10.202.225.169 icmp_seq=68 Destination Host Unreachable
>>> From 10.202.225.169 icmp_seq=69 Destination Host Unreachable
>>>
>>> And from Host kernel I get,
>>> [  819.970742] ixgbe 0000:01:00.1 enp1s0f1: 3 Spoofed packets detected
>>> [  824.002707] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets detected
>>> [  828.034683] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets detected
>>> [  830.050673] ixgbe 0000:01:00.1 enp1s0f1: 4 Spoofed packets detected
>>> [  832.066659] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets detected
>>> [  834.082640] ixgbe 0000:01:00.1 enp1s0f1: 3 Spoofed packets detected
>>>
>>> Also noted that iperf cannot work as it fails to establish the connection with
>> iperf
>>> server.
>>>
>>> Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your
>> reference.
>>> I haven't debugged this further yet and thought of checking with you if this is
>>> something you have seen already or not. Or maybe I am missing something
>> here?
>>
>> Please can you try to edit and modify hw/vfio/common.c, function
>> vfio_iommu_unmap_notify
>>
>>
>> /*
>>     if (size <= 0x10000) {
>>         ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
>>         ustruct.info.granularity = IOMMU_INV_GRANU_ADDR;
>>         ustruct.info.addr_info.flags = IOMMU_INV_ADDR_FLAGS_ARCHID;
>>         if (iotlb->leaf) {
>>             ustruct.info.addr_info.flags |=
>> IOMMU_INV_ADDR_FLAGS_LEAF;
>>         }
>>         ustruct.info.addr_info.archid = iotlb->arch_id;
>>         ustruct.info.addr_info.addr = start;
>>         ustruct.info.addr_info.granule_size = size;
>>         ustruct.info.addr_info.nb_granules = 1;
>>         trace_vfio_iommu_addr_inv_iotlb(iotlb->arch_id, start, size, 1,
>>                                         iotlb->leaf);
>>     } else {
>> */
>>         ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
>>         ustruct.info.granularity = IOMMU_INV_GRANU_PASID;
>>         ustruct.info.pasid_info.archid = iotlb->arch_id;
>>         ustruct.info.pasid_info.flags = IOMMU_INV_PASID_FLAGS_ARCHID;
>>         trace_vfio_iommu_asid_inv_iotlb(iotlb->arch_id);
>> //    }
>>
>> This modification leads to invalidate the whole asid each time we get a
>> guest TLBI instead of invalidating the single IOVA (TLBI). On my end, I
>> saw this was the cause of such kind of issues. Please let me know if it
>> fixes your perf issues
> 
> Yes, this seems to fix the issue.
> 
> root@ubuntu:~# iperf -c 10.202.225.185
> ------------------------------------------------------------
> Client connecting to 10.202.225.185, TCP port 5001
> TCP window size: 85.0 KByte (default)
> ------------------------------------------------------------
> [  3] local 10.202.225.169 port 47996 connected with 10.202.225.185 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0-10.0 sec  2.27 GBytes  1.95 Gbits/sec
> root@ubuntu:~#
> 
> But the performance seems to be very poor as this is a 10Gbps interface(Of course
> invalidating the whole asid may not be very helpful). It is interesting that why the
> single iova invalidation is not working.
> 
>  and then we may discuss further about the test
>> configuration.
> 
> Sure. Please let me know.

I reported that issue earlier on the ML. I have not been able to find
any integration issue in the kernel/qemu code but maybe I am too blind
now as I wrote it ;-) When I get a guest stage1 TLBI I cascade it down
to the physical IOMMU. I also pass the LEAF flag.

As you are an expert of the SMMUv3 PMU, if your implementation has any
and you have cycles to look at this, it would be helpful to run it and
see if something weird gets highlighted.

Thanks

Eric
> 
> Cheers,
> Shameer 
> 
>> Thanks
>>
>> Eric
>>
>>
>>
>>>
>>> Please let me know.
>>>
>>> Thanks,
>>> Shameer
>>>
>>>> Best Regards
>>>>
>>>> Eric
>>>>
>>>> This series can be found at:
>>>> https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
>>>>
>>>> It series includes Tina's patch steming from
>>>> [1] "[RFC PATCH v2 1/3] vfio: Use capability chains to handle device
>>>> specific irq" plus patches originally contributed by Yi.
>>>>
>>>> History:
>>>>
>>>> v8 -> v9:
>>>> - introduce specific irq framework
>>>> - single fault region
>>>> - iommu_unregister_device_fault_handler failure case not handled
>>>>   yet.
>>>>
>>>> v7 -> v8:
>>>> - rebase on top of v5.2-rc1 and especially
>>>>   8be39a1a04c1  iommu/arm-smmu-v3: Add a master->domain pointer
>>>> - dynamic alloc of s1_cfg/s2_cfg
>>>> - __arm_smmu_tlb_inv_asid/s1_range_nosync
>>>> - check there is no HW MSI regions
>>>> - asid invalidation using pasid extended struct (change in the uapi)
>>>> - add s1_live/s2_live checks
>>>> - move check about support of nested stages in domain finalise
>>>> - fixes in error reporting according to the discussion with Robin
>>>> - reordered the patches to have first iommu/smmuv3 patches and then
>>>>   VFIO patches
>>>>
>>>> v6 -> v7:
>>>> - removed device handle from bind/unbind_guest_msi
>>>> - added "iommu/smmuv3: Nested mode single MSI doorbell per domain
>>>>   enforcement"
>>>> - added few uapi comments as suggested by Jean, Jacop and Alex
>>>>
>>>> v5 -> v6:
>>>> - Fix compilation issue when CONFIG_IOMMU_API is unset
>>>>
>>>> v4 -> v5:
>>>> - fix bug reported by Vincent: fault handler unregistration now happens in
>>>>   vfio_pci_release
>>>> - IOMMU_FAULT_PERM_* moved outside of struct definition + small
>>>>   uapi changes suggested by Kean-Philippe (except fetch_addr)
>>>> - iommu: introduce device fault report API: removed the PRI part.
>>>> - see individual logs for more details
>>>> - reset the ste abort flag on detach
>>>>
>>>> v3 -> v4:
>>>> - took into account Alex, jean-Philippe and Robin's comments on v3
>>>> - rework of the smmuv3 driver integration
>>>> - add tear down ops for msi binding and PASID table binding
>>>> - fix S1 fault propagation
>>>> - put fault reporting patches at the beginning of the series following
>>>>   Jean-Philippe's request
>>>> - update of the cache invalidate and fault API uapis
>>>> - VFIO fault reporting rework with 2 separate regions and one mmappable
>>>>   segment for the fault queue
>>>> - moved to PATCH
>>>>
>>>> v2 -> v3:
>>>> - When registering the S1 MSI binding we now store the device handle. This
>>>>   addresses Robin's comment about discimination of devices beonging to
>>>>   different S1 groups and using different physical MSI doorbells.
>>>> - Change the fault reporting API: use VFIO_PCI_DMA_FAULT_IRQ_INDEX to
>>>>   set the eventfd and expose the faults through an mmappable fault region
>>>>
>>>> v1 -> v2:
>>>> - Added the fault reporting capability
>>>> - asid properly passed on invalidation (fix assignment of multiple
>>>>   devices)
>>>> - see individual change logs for more info
>>>>
>>>>
>>>> Eric Auger (8):
>>>>   vfio: VFIO_IOMMU_SET_MSI_BINDING
>>>>   vfio/pci: Add VFIO_REGION_TYPE_NESTED region type
>>>>   vfio/pci: Register an iommu fault handler
>>>>   vfio/pci: Allow to mmap the fault queue
>>>>   vfio: Add new IRQ for DMA fault reporting
>>>>   vfio/pci: Add framework for custom interrupt indices
>>>>   vfio/pci: Register and allow DMA FAULT IRQ signaling
>>>>   vfio: Document nested stage control
>>>>
>>>> Liu, Yi L (2):
>>>>   vfio: VFIO_IOMMU_SET_PASID_TABLE
>>>>   vfio: VFIO_IOMMU_CACHE_INVALIDATE
>>>>
>>>> Tina Zhang (1):
>>>>   vfio: Use capability chains to handle device specific irq
>>>>
>>>>  Documentation/vfio.txt              |  77 ++++++++
>>>>  drivers/vfio/pci/vfio_pci.c         | 283
>> ++++++++++++++++++++++++++--
>>>>  drivers/vfio/pci/vfio_pci_intrs.c   |  62 ++++++
>>>>  drivers/vfio/pci/vfio_pci_private.h |  24 +++
>>>>  drivers/vfio/pci/vfio_pci_rdwr.c    |  45 +++++
>>>>  drivers/vfio/vfio_iommu_type1.c     | 166 ++++++++++++++++
>>>>  include/uapi/linux/vfio.h           | 109 ++++++++++-
>>>>  7 files changed, 747 insertions(+), 19 deletions(-)
>>>>
>>>> --
>>>> 2.20.1
>>>>
>>>> _______________________________________________
>>>> kvmarm mailing list
>>>> kvmarm@lists.cs.columbia.edu
>>>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>
Shameerali Kolothum Thodi Nov. 12, 2019, 2:21 p.m. UTC | #4
> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: 12 November 2019 13:22
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> eric.auger.pro@gmail.com; iommu@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org; kvm@vger.kernel.org;
> kvmarm@lists.cs.columbia.edu; joro@8bytes.org;
> alex.williamson@redhat.com; jacob.jun.pan@linux.intel.com;
> yi.l.liu@intel.com; jean-philippe.brucker@arm.com; will.deacon@arm.com;
> robin.murphy@arm.com
> Cc: kevin.tian@intel.com; vincent.stehle@arm.com; ashok.raj@intel.com;
> marc.zyngier@arm.com; tina.zhang@intel.com; Linuxarm
> <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>
> Subject: Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> 
> Hi Shameer,
> 
> On 11/12/19 2:06 PM, Shameerali Kolothum Thodi wrote:
> > Hi Eric,
> >
> >> -----Original Message-----
> >> From: Auger Eric [mailto:eric.auger@redhat.com]
> >> Sent: 12 November 2019 11:29
> >> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> >> eric.auger.pro@gmail.com; iommu@lists.linux-foundation.org;
> >> linux-kernel@vger.kernel.org; kvm@vger.kernel.org;
> >> kvmarm@lists.cs.columbia.edu; joro@8bytes.org;
> >> alex.williamson@redhat.com; jacob.jun.pan@linux.intel.com;
> >> yi.l.liu@intel.com; jean-philippe.brucker@arm.com; will.deacon@arm.com;
> >> robin.murphy@arm.com
> >> Cc: kevin.tian@intel.com; vincent.stehle@arm.com; ashok.raj@intel.com;
> >> marc.zyngier@arm.com; tina.zhang@intel.com; Linuxarm
> >> <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>
> >> Subject: Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> >>
> >> Hi Shameer,
> >> On 11/12/19 12:08 PM, Shameerali Kolothum Thodi wrote:
> >>> Hi Eric,
> >>>
> >>>> -----Original Message-----
> >>>> From: kvmarm-bounces@lists.cs.columbia.edu
> >>>> [mailto:kvmarm-bounces@lists.cs.columbia.edu] On Behalf Of Eric Auger
> >>>> Sent: 11 July 2019 14:56
> >>>> To: eric.auger.pro@gmail.com; eric.auger@redhat.com;
> >>>> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> >>>> kvm@vger.kernel.org; kvmarm@lists.cs.columbia.edu; joro@8bytes.org;
> >>>> alex.williamson@redhat.com; jacob.jun.pan@linux.intel.com;
> >>>> yi.l.liu@intel.com; jean-philippe.brucker@arm.com;
> will.deacon@arm.com;
> >>>> robin.murphy@arm.com
> >>>> Cc: kevin.tian@intel.com; vincent.stehle@arm.com;
> ashok.raj@intel.com;
> >>>> marc.zyngier@arm.com; tina.zhang@intel.com
> >>>> Subject: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> >>>>
> >>>> This series brings the VFIO part of HW nested paging support
> >>>> in the SMMUv3.
> >>>>
> >>>> The series depends on:
> >>>> [PATCH v9 00/14] SMMUv3 Nested Stage Setup (IOMMU part)
> >>>> (https://www.spinics.net/lists/kernel/msg3187714.html)
> >>>>
> >>>> 3 new IOCTLs are introduced that allow the userspace to
> >>>> 1) pass the guest stage 1 configuration
> >>>> 2) pass stage 1 MSI bindings
> >>>> 3) invalidate stage 1 related caches
> >>>>
> >>>> They map onto the related new IOMMU API functions.
> >>>>
> >>>> We introduce the capability to register specific interrupt
> >>>> indexes (see [1]). A new DMA_FAULT interrupt index allows to register
> >>>> an eventfd to be signaled whenever a stage 1 related fault
> >>>> is detected at physical level. Also a specific region allows
> >>>> to expose the fault records to the user space.
> >>>
> >>> I am trying to get this running on one of our platform that has smmuv3 dual
> >>> stage support. I am seeing some issues with this when an ixgbe vf dev is
> >>> made pass-through and is behind a vSMMUv3 in Guest.
> >>>
> >>> Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
> >>> Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5
> >>>
> >>> And this is my Qemu cmd line,
> >>>
> >>> ./qemu-system-aarch64
> >>> -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host \
> >>> -kernel Image \
> >>> -drive if=none,file=ubuntu,id=fs \
> >>> -device virtio-blk-device,drive=fs \
> >>> -device vfio-pci,host=0000:01:10.1 \
> >>> -bios QEMU_EFI.fd \
> >>> -net none \
> >>> -m 4G \
> >>> -nographic -D -d -enable-kvm \
> >>> -append "console=ttyAMA0 root=/dev/vda rw acpi=force"
> >>>
> >>> The basic ping from Guest works fine,
> >>> root@ubuntu:~# ping 10.202.225.185
> >>> PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data.
> >>> 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms
> >>> 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms
> >>> ...
> >>>
> >>> But if I increase ping packet size,
> >>>
> >>> root@ubuntu:~# ping -s 1024 10.202.225.185
> >>> PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data.
> >>> 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms
> >>> 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms
> >>> From 10.202.225.169 icmp_seq=66 Destination Host Unreachable
> >>> From 10.202.225.169 icmp_seq=67 Destination Host Unreachable
> >>> From 10.202.225.169 icmp_seq=68 Destination Host Unreachable
> >>> From 10.202.225.169 icmp_seq=69 Destination Host Unreachable
> >>>
> >>> And from Host kernel I get,
> >>> [  819.970742] ixgbe 0000:01:00.1 enp1s0f1: 3 Spoofed packets detected
> >>> [  824.002707] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets detected
> >>> [  828.034683] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets detected
> >>> [  830.050673] ixgbe 0000:01:00.1 enp1s0f1: 4 Spoofed packets detected
> >>> [  832.066659] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets detected
> >>> [  834.082640] ixgbe 0000:01:00.1 enp1s0f1: 3 Spoofed packets detected
> >>>
> >>> Also noted that iperf cannot work as it fails to establish the connection
> with
> >> iperf
> >>> server.
> >>>
> >>> Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your
> >> reference.
> >>> I haven't debugged this further yet and thought of checking with you if this
> is
> >>> something you have seen already or not. Or maybe I am missing something
> >> here?
> >>
> >> Please can you try to edit and modify hw/vfio/common.c, function
> >> vfio_iommu_unmap_notify
> >>
> >>
> >> /*
> >>     if (size <= 0x10000) {
> >>         ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
> >>         ustruct.info.granularity = IOMMU_INV_GRANU_ADDR;
> >>         ustruct.info.addr_info.flags =
> IOMMU_INV_ADDR_FLAGS_ARCHID;
> >>         if (iotlb->leaf) {
> >>             ustruct.info.addr_info.flags |=
> >> IOMMU_INV_ADDR_FLAGS_LEAF;
> >>         }
> >>         ustruct.info.addr_info.archid = iotlb->arch_id;
> >>         ustruct.info.addr_info.addr = start;
> >>         ustruct.info.addr_info.granule_size = size;
> >>         ustruct.info.addr_info.nb_granules = 1;
> >>         trace_vfio_iommu_addr_inv_iotlb(iotlb->arch_id, start, size, 1,
> >>                                         iotlb->leaf);
> >>     } else {
> >> */
> >>         ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
> >>         ustruct.info.granularity = IOMMU_INV_GRANU_PASID;
> >>         ustruct.info.pasid_info.archid = iotlb->arch_id;
> >>         ustruct.info.pasid_info.flags =
> IOMMU_INV_PASID_FLAGS_ARCHID;
> >>         trace_vfio_iommu_asid_inv_iotlb(iotlb->arch_id);
> >> //    }
> >>
> >> This modification leads to invalidate the whole asid each time we get a
> >> guest TLBI instead of invalidating the single IOVA (TLBI). On my end, I
> >> saw this was the cause of such kind of issues. Please let me know if it
> >> fixes your perf issues
> >
> > Yes, this seems to fix the issue.
> >
> > root@ubuntu:~# iperf -c 10.202.225.185
> > ------------------------------------------------------------
> > Client connecting to 10.202.225.185, TCP port 5001
> > TCP window size: 85.0 KByte (default)
> > ------------------------------------------------------------
> > [  3] local 10.202.225.169 port 47996 connected with 10.202.225.185 port
> 5001
> > [ ID] Interval       Transfer     Bandwidth
> > [  3]  0.0-10.0 sec  2.27 GBytes  1.95 Gbits/sec
> > root@ubuntu:~#
> >
> > But the performance seems to be very poor as this is a 10Gbps interface(Of
> course
> > invalidating the whole asid may not be very helpful). It is interesting that why
> the
> > single iova invalidation is not working.
> >
> >  and then we may discuss further about the test
> >> configuration.
> >
> > Sure. Please let me know.
> 
> I reported that issue earlier on the ML. I have not been able to find
> any integration issue in the kernel/qemu code but maybe I am too blind
> now as I wrote it ;-) When I get a guest stage1 TLBI I cascade it down
> to the physical IOMMU. I also pass the LEAF flag.

Ok.

> As you are an expert of the SMMUv3 PMU, if your implementation has any
> and you have cycles to look at this, it would be helpful to run it and
> see if something weird gets highlighted.

:). Sure. I will give it a try and report back if anything suspicious.

Thanks,
Shameer

 
> Thanks
> 
> Eric
> >
> > Cheers,
> > Shameer
> >
> >> Thanks
> >>
> >> Eric
> >>
> >>
> >>
> >>>
> >>> Please let me know.
> >>>
> >>> Thanks,
> >>> Shameer
> >>>
> >>>> Best Regards
> >>>>
> >>>> Eric
> >>>>
> >>>> This series can be found at:
> >>>> https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
> >>>>
> >>>> It series includes Tina's patch steming from
> >>>> [1] "[RFC PATCH v2 1/3] vfio: Use capability chains to handle device
> >>>> specific irq" plus patches originally contributed by Yi.
> >>>>
> >>>> History:
> >>>>
> >>>> v8 -> v9:
> >>>> - introduce specific irq framework
> >>>> - single fault region
> >>>> - iommu_unregister_device_fault_handler failure case not handled
> >>>>   yet.
> >>>>
> >>>> v7 -> v8:
> >>>> - rebase on top of v5.2-rc1 and especially
> >>>>   8be39a1a04c1  iommu/arm-smmu-v3: Add a master->domain pointer
> >>>> - dynamic alloc of s1_cfg/s2_cfg
> >>>> - __arm_smmu_tlb_inv_asid/s1_range_nosync
> >>>> - check there is no HW MSI regions
> >>>> - asid invalidation using pasid extended struct (change in the uapi)
> >>>> - add s1_live/s2_live checks
> >>>> - move check about support of nested stages in domain finalise
> >>>> - fixes in error reporting according to the discussion with Robin
> >>>> - reordered the patches to have first iommu/smmuv3 patches and then
> >>>>   VFIO patches
> >>>>
> >>>> v6 -> v7:
> >>>> - removed device handle from bind/unbind_guest_msi
> >>>> - added "iommu/smmuv3: Nested mode single MSI doorbell per domain
> >>>>   enforcement"
> >>>> - added few uapi comments as suggested by Jean, Jacop and Alex
> >>>>
> >>>> v5 -> v6:
> >>>> - Fix compilation issue when CONFIG_IOMMU_API is unset
> >>>>
> >>>> v4 -> v5:
> >>>> - fix bug reported by Vincent: fault handler unregistration now happens in
> >>>>   vfio_pci_release
> >>>> - IOMMU_FAULT_PERM_* moved outside of struct definition + small
> >>>>   uapi changes suggested by Kean-Philippe (except fetch_addr)
> >>>> - iommu: introduce device fault report API: removed the PRI part.
> >>>> - see individual logs for more details
> >>>> - reset the ste abort flag on detach
> >>>>
> >>>> v3 -> v4:
> >>>> - took into account Alex, jean-Philippe and Robin's comments on v3
> >>>> - rework of the smmuv3 driver integration
> >>>> - add tear down ops for msi binding and PASID table binding
> >>>> - fix S1 fault propagation
> >>>> - put fault reporting patches at the beginning of the series following
> >>>>   Jean-Philippe's request
> >>>> - update of the cache invalidate and fault API uapis
> >>>> - VFIO fault reporting rework with 2 separate regions and one mmappable
> >>>>   segment for the fault queue
> >>>> - moved to PATCH
> >>>>
> >>>> v2 -> v3:
> >>>> - When registering the S1 MSI binding we now store the device handle.
> This
> >>>>   addresses Robin's comment about discimination of devices beonging
> to
> >>>>   different S1 groups and using different physical MSI doorbells.
> >>>> - Change the fault reporting API: use VFIO_PCI_DMA_FAULT_IRQ_INDEX
> to
> >>>>   set the eventfd and expose the faults through an mmappable fault
> region
> >>>>
> >>>> v1 -> v2:
> >>>> - Added the fault reporting capability
> >>>> - asid properly passed on invalidation (fix assignment of multiple
> >>>>   devices)
> >>>> - see individual change logs for more info
> >>>>
> >>>>
> >>>> Eric Auger (8):
> >>>>   vfio: VFIO_IOMMU_SET_MSI_BINDING
> >>>>   vfio/pci: Add VFIO_REGION_TYPE_NESTED region type
> >>>>   vfio/pci: Register an iommu fault handler
> >>>>   vfio/pci: Allow to mmap the fault queue
> >>>>   vfio: Add new IRQ for DMA fault reporting
> >>>>   vfio/pci: Add framework for custom interrupt indices
> >>>>   vfio/pci: Register and allow DMA FAULT IRQ signaling
> >>>>   vfio: Document nested stage control
> >>>>
> >>>> Liu, Yi L (2):
> >>>>   vfio: VFIO_IOMMU_SET_PASID_TABLE
> >>>>   vfio: VFIO_IOMMU_CACHE_INVALIDATE
> >>>>
> >>>> Tina Zhang (1):
> >>>>   vfio: Use capability chains to handle device specific irq
> >>>>
> >>>>  Documentation/vfio.txt              |  77 ++++++++
> >>>>  drivers/vfio/pci/vfio_pci.c         | 283
> >> ++++++++++++++++++++++++++--
> >>>>  drivers/vfio/pci/vfio_pci_intrs.c   |  62 ++++++
> >>>>  drivers/vfio/pci/vfio_pci_private.h |  24 +++
> >>>>  drivers/vfio/pci/vfio_pci_rdwr.c    |  45 +++++
> >>>>  drivers/vfio/vfio_iommu_type1.c     | 166 ++++++++++++++++
> >>>>  include/uapi/linux/vfio.h           | 109 ++++++++++-
> >>>>  7 files changed, 747 insertions(+), 19 deletions(-)
> >>>>
> >>>> --
> >>>> 2.20.1
> >>>>
> >>>> _______________________________________________
> >>>> kvmarm mailing list
> >>>> kvmarm@lists.cs.columbia.edu
> >>>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> >
Shameerali Kolothum Thodi Nov. 12, 2019, 5:56 p.m. UTC | #5
Hi Eric,

> -----Original Message-----
> From: Shameerali Kolothum Thodi
> Sent: 12 November 2019 14:21
> To: 'Auger Eric' <eric.auger@redhat.com>; eric.auger.pro@gmail.com;
> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> kvm@vger.kernel.org; kvmarm@lists.cs.columbia.edu; joro@8bytes.org;
> alex.williamson@redhat.com; jacob.jun.pan@linux.intel.com;
> yi.l.liu@intel.com; jean-philippe.brucker@arm.com; will.deacon@arm.com;
> robin.murphy@arm.com
> Cc: kevin.tian@intel.com; vincent.stehle@arm.com; ashok.raj@intel.com;
> marc.zyngier@arm.com; tina.zhang@intel.com; Linuxarm
> <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>
> Subject: RE: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> 
[...]
> > >>> I am trying to get this running on one of our platform that has smmuv3
> dual
> > >>> stage support. I am seeing some issues with this when an ixgbe vf dev is
> > >>> made pass-through and is behind a vSMMUv3 in Guest.
> > >>>
> > >>> Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
> > >>> Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5
> > >>>
> > >>> And this is my Qemu cmd line,
> > >>>
> > >>> ./qemu-system-aarch64
> > >>> -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host
> \
> > >>> -kernel Image \
> > >>> -drive if=none,file=ubuntu,id=fs \
> > >>> -device virtio-blk-device,drive=fs \
> > >>> -device vfio-pci,host=0000:01:10.1 \
> > >>> -bios QEMU_EFI.fd \
> > >>> -net none \
> > >>> -m 4G \
> > >>> -nographic -D -d -enable-kvm \
> > >>> -append "console=ttyAMA0 root=/dev/vda rw acpi=force"
> > >>>
> > >>> The basic ping from Guest works fine,
> > >>> root@ubuntu:~# ping 10.202.225.185
> > >>> PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data.
> > >>> 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms
> > >>> 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms
> > >>> ...
> > >>>
> > >>> But if I increase ping packet size,
> > >>>
> > >>> root@ubuntu:~# ping -s 1024 10.202.225.185
> > >>> PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data.
> > >>> 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms
> > >>> 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms
> > >>> From 10.202.225.169 icmp_seq=66 Destination Host Unreachable
> > >>> From 10.202.225.169 icmp_seq=67 Destination Host Unreachable
> > >>> From 10.202.225.169 icmp_seq=68 Destination Host Unreachable
> > >>> From 10.202.225.169 icmp_seq=69 Destination Host Unreachable
> > >>>
> > >>> And from Host kernel I get,
> > >>> [  819.970742] ixgbe 0000:01:00.1 enp1s0f1: 3 Spoofed packets
> detected
> > >>> [  824.002707] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets
> detected
> > >>> [  828.034683] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets
> detected
> > >>> [  830.050673] ixgbe 0000:01:00.1 enp1s0f1: 4 Spoofed packets
> detected
> > >>> [  832.066659] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets
> detected
> > >>> [  834.082640] ixgbe 0000:01:00.1 enp1s0f1: 3 Spoofed packets
> detected
> > >>>
> > >>> Also noted that iperf cannot work as it fails to establish the connection
> > with
> > >> iperf
> > >>> server.
> > >>>
> > >>> Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your
> > >> reference.
> > >>> I haven't debugged this further yet and thought of checking with you if
> this
> > is
> > >>> something you have seen already or not. Or maybe I am missing
> something
> > >> here?
> > >>
> > >> Please can you try to edit and modify hw/vfio/common.c, function
> > >> vfio_iommu_unmap_notify
> > >>
> > >>
> > >> /*
> > >>     if (size <= 0x10000) {
> > >>         ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
> > >>         ustruct.info.granularity = IOMMU_INV_GRANU_ADDR;
> > >>         ustruct.info.addr_info.flags =
> > IOMMU_INV_ADDR_FLAGS_ARCHID;
> > >>         if (iotlb->leaf) {
> > >>             ustruct.info.addr_info.flags |=
> > >> IOMMU_INV_ADDR_FLAGS_LEAF;
> > >>         }
> > >>         ustruct.info.addr_info.archid = iotlb->arch_id;
> > >>         ustruct.info.addr_info.addr = start;
> > >>         ustruct.info.addr_info.granule_size = size;
> > >>         ustruct.info.addr_info.nb_granules = 1;
> > >>         trace_vfio_iommu_addr_inv_iotlb(iotlb->arch_id, start, size, 1,
> > >>                                         iotlb->leaf);
> > >>     } else {
> > >> */
> > >>         ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
> > >>         ustruct.info.granularity = IOMMU_INV_GRANU_PASID;
> > >>         ustruct.info.pasid_info.archid = iotlb->arch_id;
> > >>         ustruct.info.pasid_info.flags =
> > IOMMU_INV_PASID_FLAGS_ARCHID;
> > >>         trace_vfio_iommu_asid_inv_iotlb(iotlb->arch_id);
> > >> //    }
> > >>
> > >> This modification leads to invalidate the whole asid each time we get a
> > >> guest TLBI instead of invalidating the single IOVA (TLBI). On my end, I
> > >> saw this was the cause of such kind of issues. Please let me know if it
> > >> fixes your perf issues
> > >
> > > Yes, this seems to fix the issue.
> > >
> > > root@ubuntu:~# iperf -c 10.202.225.185
> > > ------------------------------------------------------------
> > > Client connecting to 10.202.225.185, TCP port 5001
> > > TCP window size: 85.0 KByte (default)
> > > ------------------------------------------------------------
> > > [  3] local 10.202.225.169 port 47996 connected with 10.202.225.185 port
> > 5001
> > > [ ID] Interval       Transfer     Bandwidth
> > > [  3]  0.0-10.0 sec  2.27 GBytes  1.95 Gbits/sec
> > > root@ubuntu:~#
> > >
> > > But the performance seems to be very poor as this is a 10Gbps interface(Of
> > course
> > > invalidating the whole asid may not be very helpful). It is interesting that
> why
> > the
> > > single iova invalidation is not working.
> > >
> > >  and then we may discuss further about the test
> > >> configuration.
> > >
> > > Sure. Please let me know.
> >
> > I reported that issue earlier on the ML. I have not been able to find
> > any integration issue in the kernel/qemu code but maybe I am too blind
> > now as I wrote it ;-) When I get a guest stage1 TLBI I cascade it down
> > to the physical IOMMU. I also pass the LEAF flag.
> 
> Ok.
> 
> > As you are an expert of the SMMUv3 PMU, if your implementation has any
> > and you have cycles to look at this, it would be helpful to run it and
> > see if something weird gets highlighted.
> 
> :). Sure. I will give it a try and report back if anything suspicious.

I just noted that CMDQ_OP_TLBI_NH_VA is missing the vmid filed which seems
to be the cause for single IOVA TLBI not working properly.

I had this fix in arm-smmuv3.c,

@@ -947,6 +947,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
		break;
	case CMDQ_OP_TLBI_NH_VA:
+		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
		cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf);
		cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK;


With this, your original qemu branch is working. 

root@ubuntu:~# iperf -c 10.202.225.185
------------------------------------------------------------
Client connecting to 10.202.225.185, TCP port 5001 TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.202.225.169 port 44894 connected with 10.202.225.185 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  3.21 GBytes  2.76 Gbits/sec

Could you please check this...

I also have a rebase of your patches on top of 5.4-rc5. This has some optimizations
From Will such as batched TLBI inv. Please find it here,

https://github.com/hisilicon/kernel-dev/tree/private-vSMMUv3-v9-v5.4-rc5

This gives me a better performance with iperf,

root@ubuntu:~# iperf -c 10.202.225.185
------------------------------------------------------------
Client connecting to 10.202.225.185, TCP port 5001 TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.202.225.169 port 55450 connected with 10.202.225.185 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  4.91 GBytes  4.22 Gbits/sec root@ubuntu:~#

If possible please check this branch as well.

Thanks,
Shameer

> Thanks,
> Shameer
> 
> 
> > Thanks
> >
> > Eric
> > >
> > > Cheers,
> > > Shameer
> > >
> > >> Thanks
> > >>
> > >> Eric
> > >>
> > >>
> > >>
> > >>>
> > >>> Please let me know.
> > >>>
> > >>> Thanks,
> > >>> Shameer
> > >>>
> > >>>> Best Regards
> > >>>>
> > >>>> Eric
> > >>>>
> > >>>> This series can be found at:
> > >>>> https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
> > >>>>
> > >>>> It series includes Tina's patch steming from
> > >>>> [1] "[RFC PATCH v2 1/3] vfio: Use capability chains to handle device
> > >>>> specific irq" plus patches originally contributed by Yi.
> > >>>>
> > >>>> History:
> > >>>>
> > >>>> v8 -> v9:
> > >>>> - introduce specific irq framework
> > >>>> - single fault region
> > >>>> - iommu_unregister_device_fault_handler failure case not handled
> > >>>>   yet.
> > >>>>
> > >>>> v7 -> v8:
> > >>>> - rebase on top of v5.2-rc1 and especially
> > >>>>   8be39a1a04c1  iommu/arm-smmu-v3: Add a master->domain
> pointer
> > >>>> - dynamic alloc of s1_cfg/s2_cfg
> > >>>> - __arm_smmu_tlb_inv_asid/s1_range_nosync
> > >>>> - check there is no HW MSI regions
> > >>>> - asid invalidation using pasid extended struct (change in the uapi)
> > >>>> - add s1_live/s2_live checks
> > >>>> - move check about support of nested stages in domain finalise
> > >>>> - fixes in error reporting according to the discussion with Robin
> > >>>> - reordered the patches to have first iommu/smmuv3 patches and then
> > >>>>   VFIO patches
> > >>>>
> > >>>> v6 -> v7:
> > >>>> - removed device handle from bind/unbind_guest_msi
> > >>>> - added "iommu/smmuv3: Nested mode single MSI doorbell per domain
> > >>>>   enforcement"
> > >>>> - added few uapi comments as suggested by Jean, Jacop and Alex
> > >>>>
> > >>>> v5 -> v6:
> > >>>> - Fix compilation issue when CONFIG_IOMMU_API is unset
> > >>>>
> > >>>> v4 -> v5:
> > >>>> - fix bug reported by Vincent: fault handler unregistration now happens
> in
> > >>>>   vfio_pci_release
> > >>>> - IOMMU_FAULT_PERM_* moved outside of struct definition + small
> > >>>>   uapi changes suggested by Kean-Philippe (except fetch_addr)
> > >>>> - iommu: introduce device fault report API: removed the PRI part.
> > >>>> - see individual logs for more details
> > >>>> - reset the ste abort flag on detach
> > >>>>
> > >>>> v3 -> v4:
> > >>>> - took into account Alex, jean-Philippe and Robin's comments on v3
> > >>>> - rework of the smmuv3 driver integration
> > >>>> - add tear down ops for msi binding and PASID table binding
> > >>>> - fix S1 fault propagation
> > >>>> - put fault reporting patches at the beginning of the series following
> > >>>>   Jean-Philippe's request
> > >>>> - update of the cache invalidate and fault API uapis
> > >>>> - VFIO fault reporting rework with 2 separate regions and one
> mmappable
> > >>>>   segment for the fault queue
> > >>>> - moved to PATCH
> > >>>>
> > >>>> v2 -> v3:
> > >>>> - When registering the S1 MSI binding we now store the device handle.
> > This
> > >>>>   addresses Robin's comment about discimination of devices beonging
> > to
> > >>>>   different S1 groups and using different physical MSI doorbells.
> > >>>> - Change the fault reporting API: use
> VFIO_PCI_DMA_FAULT_IRQ_INDEX
> > to
> > >>>>   set the eventfd and expose the faults through an mmappable fault
> > region
> > >>>>
> > >>>> v1 -> v2:
> > >>>> - Added the fault reporting capability
> > >>>> - asid properly passed on invalidation (fix assignment of multiple
> > >>>>   devices)
> > >>>> - see individual change logs for more info
> > >>>>
> > >>>>
> > >>>> Eric Auger (8):
> > >>>>   vfio: VFIO_IOMMU_SET_MSI_BINDING
> > >>>>   vfio/pci: Add VFIO_REGION_TYPE_NESTED region type
> > >>>>   vfio/pci: Register an iommu fault handler
> > >>>>   vfio/pci: Allow to mmap the fault queue
> > >>>>   vfio: Add new IRQ for DMA fault reporting
> > >>>>   vfio/pci: Add framework for custom interrupt indices
> > >>>>   vfio/pci: Register and allow DMA FAULT IRQ signaling
> > >>>>   vfio: Document nested stage control
> > >>>>
> > >>>> Liu, Yi L (2):
> > >>>>   vfio: VFIO_IOMMU_SET_PASID_TABLE
> > >>>>   vfio: VFIO_IOMMU_CACHE_INVALIDATE
> > >>>>
> > >>>> Tina Zhang (1):
> > >>>>   vfio: Use capability chains to handle device specific irq
> > >>>>
> > >>>>  Documentation/vfio.txt              |  77 ++++++++
> > >>>>  drivers/vfio/pci/vfio_pci.c         | 283
> > >> ++++++++++++++++++++++++++--
> > >>>>  drivers/vfio/pci/vfio_pci_intrs.c   |  62 ++++++
> > >>>>  drivers/vfio/pci/vfio_pci_private.h |  24 +++
> > >>>>  drivers/vfio/pci/vfio_pci_rdwr.c    |  45 +++++
> > >>>>  drivers/vfio/vfio_iommu_type1.c     | 166 ++++++++++++++++
> > >>>>  include/uapi/linux/vfio.h           | 109 ++++++++++-
> > >>>>  7 files changed, 747 insertions(+), 19 deletions(-)
> > >>>>
> > >>>> --
> > >>>> 2.20.1
> > >>>>
> > >>>> _______________________________________________
> > >>>> kvmarm mailing list
> > >>>> kvmarm@lists.cs.columbia.edu
> > >>>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> > >
Auger Eric Nov. 12, 2019, 8:34 p.m. UTC | #6
Hi Shameer,

On 11/12/19 6:56 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: Shameerali Kolothum Thodi
>> Sent: 12 November 2019 14:21
>> To: 'Auger Eric' <eric.auger@redhat.com>; eric.auger.pro@gmail.com;
>> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>> kvm@vger.kernel.org; kvmarm@lists.cs.columbia.edu; joro@8bytes.org;
>> alex.williamson@redhat.com; jacob.jun.pan@linux.intel.com;
>> yi.l.liu@intel.com; jean-philippe.brucker@arm.com; will.deacon@arm.com;
>> robin.murphy@arm.com
>> Cc: kevin.tian@intel.com; vincent.stehle@arm.com; ashok.raj@intel.com;
>> marc.zyngier@arm.com; tina.zhang@intel.com; Linuxarm
>> <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>
>> Subject: RE: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
>>
> [...]
>>>>>> I am trying to get this running on one of our platform that has smmuv3
>> dual
>>>>>> stage support. I am seeing some issues with this when an ixgbe vf dev is
>>>>>> made pass-through and is behind a vSMMUv3 in Guest.
>>>>>>
>>>>>> Kernel used : https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
>>>>>> Qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5
>>>>>>
>>>>>> And this is my Qemu cmd line,
>>>>>>
>>>>>> ./qemu-system-aarch64
>>>>>> -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 -cpu host
>> \
>>>>>> -kernel Image \
>>>>>> -drive if=none,file=ubuntu,id=fs \
>>>>>> -device virtio-blk-device,drive=fs \
>>>>>> -device vfio-pci,host=0000:01:10.1 \
>>>>>> -bios QEMU_EFI.fd \
>>>>>> -net none \
>>>>>> -m 4G \
>>>>>> -nographic -D -d -enable-kvm \
>>>>>> -append "console=ttyAMA0 root=/dev/vda rw acpi=force"
>>>>>>
>>>>>> The basic ping from Guest works fine,
>>>>>> root@ubuntu:~# ping 10.202.225.185
>>>>>> PING 10.202.225.185 (10.202.225.185) 56(84) bytes of data.
>>>>>> 64 bytes from 10.202.225.185: icmp_seq=2 ttl=64 time=0.207 ms
>>>>>> 64 bytes from 10.202.225.185: icmp_seq=3 ttl=64 time=0.203 ms
>>>>>> ...
>>>>>>
>>>>>> But if I increase ping packet size,
>>>>>>
>>>>>> root@ubuntu:~# ping -s 1024 10.202.225.185
>>>>>> PING 10.202.225.185 (10.202.225.185) 1024(1052) bytes of data.
>>>>>> 1032 bytes from 10.202.225.185: icmp_seq=22 ttl=64 time=0.292 ms
>>>>>> 1032 bytes from 10.202.225.185: icmp_seq=23 ttl=64 time=0.207 ms
>>>>>> From 10.202.225.169 icmp_seq=66 Destination Host Unreachable
>>>>>> From 10.202.225.169 icmp_seq=67 Destination Host Unreachable
>>>>>> From 10.202.225.169 icmp_seq=68 Destination Host Unreachable
>>>>>> From 10.202.225.169 icmp_seq=69 Destination Host Unreachable
>>>>>>
>>>>>> And from Host kernel I get,
>>>>>> [  819.970742] ixgbe 0000:01:00.1 enp1s0f1: 3 Spoofed packets
>> detected
>>>>>> [  824.002707] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets
>> detected
>>>>>> [  828.034683] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets
>> detected
>>>>>> [  830.050673] ixgbe 0000:01:00.1 enp1s0f1: 4 Spoofed packets
>> detected
>>>>>> [  832.066659] ixgbe 0000:01:00.1 enp1s0f1: 1 Spoofed packets
>> detected
>>>>>> [  834.082640] ixgbe 0000:01:00.1 enp1s0f1: 3 Spoofed packets
>> detected
>>>>>>
>>>>>> Also noted that iperf cannot work as it fails to establish the connection
>>> with
>>>>> iperf
>>>>>> server.
>>>>>>
>>>>>> Please find attached the trace logs(vfio*, smmuv3*) from Qemu for your
>>>>> reference.
>>>>>> I haven't debugged this further yet and thought of checking with you if
>> this
>>> is
>>>>>> something you have seen already or not. Or maybe I am missing
>> something
>>>>> here?
>>>>>
>>>>> Please can you try to edit and modify hw/vfio/common.c, function
>>>>> vfio_iommu_unmap_notify
>>>>>
>>>>>
>>>>> /*
>>>>>     if (size <= 0x10000) {
>>>>>         ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
>>>>>         ustruct.info.granularity = IOMMU_INV_GRANU_ADDR;
>>>>>         ustruct.info.addr_info.flags =
>>> IOMMU_INV_ADDR_FLAGS_ARCHID;
>>>>>         if (iotlb->leaf) {
>>>>>             ustruct.info.addr_info.flags |=
>>>>> IOMMU_INV_ADDR_FLAGS_LEAF;
>>>>>         }
>>>>>         ustruct.info.addr_info.archid = iotlb->arch_id;
>>>>>         ustruct.info.addr_info.addr = start;
>>>>>         ustruct.info.addr_info.granule_size = size;
>>>>>         ustruct.info.addr_info.nb_granules = 1;
>>>>>         trace_vfio_iommu_addr_inv_iotlb(iotlb->arch_id, start, size, 1,
>>>>>                                         iotlb->leaf);
>>>>>     } else {
>>>>> */
>>>>>         ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
>>>>>         ustruct.info.granularity = IOMMU_INV_GRANU_PASID;
>>>>>         ustruct.info.pasid_info.archid = iotlb->arch_id;
>>>>>         ustruct.info.pasid_info.flags =
>>> IOMMU_INV_PASID_FLAGS_ARCHID;
>>>>>         trace_vfio_iommu_asid_inv_iotlb(iotlb->arch_id);
>>>>> //    }
>>>>>
>>>>> This modification leads to invalidate the whole asid each time we get a
>>>>> guest TLBI instead of invalidating the single IOVA (TLBI). On my end, I
>>>>> saw this was the cause of such kind of issues. Please let me know if it
>>>>> fixes your perf issues
>>>>
>>>> Yes, this seems to fix the issue.
>>>>
>>>> root@ubuntu:~# iperf -c 10.202.225.185
>>>> ------------------------------------------------------------
>>>> Client connecting to 10.202.225.185, TCP port 5001
>>>> TCP window size: 85.0 KByte (default)
>>>> ------------------------------------------------------------
>>>> [  3] local 10.202.225.169 port 47996 connected with 10.202.225.185 port
>>> 5001
>>>> [ ID] Interval       Transfer     Bandwidth
>>>> [  3]  0.0-10.0 sec  2.27 GBytes  1.95 Gbits/sec
>>>> root@ubuntu:~#
>>>>
>>>> But the performance seems to be very poor as this is a 10Gbps interface(Of
>>> course
>>>> invalidating the whole asid may not be very helpful). It is interesting that
>> why
>>> the
>>>> single iova invalidation is not working.
>>>>
>>>>  and then we may discuss further about the test
>>>>> configuration.
>>>>
>>>> Sure. Please let me know.
>>>
>>> I reported that issue earlier on the ML. I have not been able to find
>>> any integration issue in the kernel/qemu code but maybe I am too blind
>>> now as I wrote it ;-) When I get a guest stage1 TLBI I cascade it down
>>> to the physical IOMMU. I also pass the LEAF flag.
>>
>> Ok.
>>
>>> As you are an expert of the SMMUv3 PMU, if your implementation has any
>>> and you have cycles to look at this, it would be helpful to run it and
>>> see if something weird gets highlighted.
>>
>> :). Sure. I will give it a try and report back if anything suspicious.
> 
> I just noted that CMDQ_OP_TLBI_NH_VA is missing the vmid filed which seems
> to be the cause for single IOVA TLBI not working properly.
> 
> I had this fix in arm-smmuv3.c,
> 
> @@ -947,6 +947,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
> 		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
> 		break;
> 	case CMDQ_OP_TLBI_NH_VA:
> +		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
Damn, I did not see that! That's it. ASID invalidation fills this field
indeed. You may post an independent patch for that.> 		cmd[0] |=
FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
> 		cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf);
> 		cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK;
> 
> 
> With this, your original qemu branch is working. 
> 
> root@ubuntu:~# iperf -c 10.202.225.185
> ------------------------------------------------------------
> Client connecting to 10.202.225.185, TCP port 5001 TCP window size: 85.0 KByte (default)
> ------------------------------------------------------------
> [  3] local 10.202.225.169 port 44894 connected with 10.202.225.185 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0-10.0 sec  3.21 GBytes  2.76 Gbits/sec
> 
> Could you please check this...
> 
> I also have a rebase of your patches on top of 5.4-rc5. This has some optimizations
> From Will such as batched TLBI inv. Please find it here,
> 
> https://github.com/hisilicon/kernel-dev/tree/private-vSMMUv3-v9-v5.4-rc5
> 
> This gives me a better performance with iperf,
> 
> root@ubuntu:~# iperf -c 10.202.225.185
> ------------------------------------------------------------
> Client connecting to 10.202.225.185, TCP port 5001 TCP window size: 85.0 KByte (default)
> ------------------------------------------------------------
> [  3] local 10.202.225.169 port 55450 connected with 10.202.225.185 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0-10.0 sec  4.91 GBytes  4.22 Gbits/sec root@ubuntu:~#
> 
> If possible please check this branch as well.

To be honest I don't really know what to do with this work. Despite the
efforts, this has suffered from a lack of traction in the community. My
last attempt to explain the use cases, upon Will's request at Plumber,
has not received any comment (https://lkml.org/lkml/2019/9/20/104).

I think I will post a rebased version with your fix, as a matter to get
a clean snapshot. If you think this work is useful for your projects,
please let it know on the ML.

Thank you again!

Eric
> 
> Thanks,
> Shameer
> 
>> Thanks,
>> Shameer
>>
>>
>>> Thanks
>>>
>>> Eric
>>>>
>>>> Cheers,
>>>> Shameer
>>>>
>>>>> Thanks
>>>>>
>>>>> Eric
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Please let me know.
>>>>>>
>>>>>> Thanks,
>>>>>> Shameer
>>>>>>
>>>>>>> Best Regards
>>>>>>>
>>>>>>> Eric
>>>>>>>
>>>>>>> This series can be found at:
>>>>>>> https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
>>>>>>>
>>>>>>> It series includes Tina's patch steming from
>>>>>>> [1] "[RFC PATCH v2 1/3] vfio: Use capability chains to handle device
>>>>>>> specific irq" plus patches originally contributed by Yi.
>>>>>>>
>>>>>>> History:
>>>>>>>
>>>>>>> v8 -> v9:
>>>>>>> - introduce specific irq framework
>>>>>>> - single fault region
>>>>>>> - iommu_unregister_device_fault_handler failure case not handled
>>>>>>>   yet.
>>>>>>>
>>>>>>> v7 -> v8:
>>>>>>> - rebase on top of v5.2-rc1 and especially
>>>>>>>   8be39a1a04c1  iommu/arm-smmu-v3: Add a master->domain
>> pointer
>>>>>>> - dynamic alloc of s1_cfg/s2_cfg
>>>>>>> - __arm_smmu_tlb_inv_asid/s1_range_nosync
>>>>>>> - check there is no HW MSI regions
>>>>>>> - asid invalidation using pasid extended struct (change in the uapi)
>>>>>>> - add s1_live/s2_live checks
>>>>>>> - move check about support of nested stages in domain finalise
>>>>>>> - fixes in error reporting according to the discussion with Robin
>>>>>>> - reordered the patches to have first iommu/smmuv3 patches and then
>>>>>>>   VFIO patches
>>>>>>>
>>>>>>> v6 -> v7:
>>>>>>> - removed device handle from bind/unbind_guest_msi
>>>>>>> - added "iommu/smmuv3: Nested mode single MSI doorbell per domain
>>>>>>>   enforcement"
>>>>>>> - added few uapi comments as suggested by Jean, Jacop and Alex
>>>>>>>
>>>>>>> v5 -> v6:
>>>>>>> - Fix compilation issue when CONFIG_IOMMU_API is unset
>>>>>>>
>>>>>>> v4 -> v5:
>>>>>>> - fix bug reported by Vincent: fault handler unregistration now happens
>> in
>>>>>>>   vfio_pci_release
>>>>>>> - IOMMU_FAULT_PERM_* moved outside of struct definition + small
>>>>>>>   uapi changes suggested by Kean-Philippe (except fetch_addr)
>>>>>>> - iommu: introduce device fault report API: removed the PRI part.
>>>>>>> - see individual logs for more details
>>>>>>> - reset the ste abort flag on detach
>>>>>>>
>>>>>>> v3 -> v4:
>>>>>>> - took into account Alex, jean-Philippe and Robin's comments on v3
>>>>>>> - rework of the smmuv3 driver integration
>>>>>>> - add tear down ops for msi binding and PASID table binding
>>>>>>> - fix S1 fault propagation
>>>>>>> - put fault reporting patches at the beginning of the series following
>>>>>>>   Jean-Philippe's request
>>>>>>> - update of the cache invalidate and fault API uapis
>>>>>>> - VFIO fault reporting rework with 2 separate regions and one
>> mmappable
>>>>>>>   segment for the fault queue
>>>>>>> - moved to PATCH
>>>>>>>
>>>>>>> v2 -> v3:
>>>>>>> - When registering the S1 MSI binding we now store the device handle.
>>> This
>>>>>>>   addresses Robin's comment about discimination of devices beonging
>>> to
>>>>>>>   different S1 groups and using different physical MSI doorbells.
>>>>>>> - Change the fault reporting API: use
>> VFIO_PCI_DMA_FAULT_IRQ_INDEX
>>> to
>>>>>>>   set the eventfd and expose the faults through an mmappable fault
>>> region
>>>>>>>
>>>>>>> v1 -> v2:
>>>>>>> - Added the fault reporting capability
>>>>>>> - asid properly passed on invalidation (fix assignment of multiple
>>>>>>>   devices)
>>>>>>> - see individual change logs for more info
>>>>>>>
>>>>>>>
>>>>>>> Eric Auger (8):
>>>>>>>   vfio: VFIO_IOMMU_SET_MSI_BINDING
>>>>>>>   vfio/pci: Add VFIO_REGION_TYPE_NESTED region type
>>>>>>>   vfio/pci: Register an iommu fault handler
>>>>>>>   vfio/pci: Allow to mmap the fault queue
>>>>>>>   vfio: Add new IRQ for DMA fault reporting
>>>>>>>   vfio/pci: Add framework for custom interrupt indices
>>>>>>>   vfio/pci: Register and allow DMA FAULT IRQ signaling
>>>>>>>   vfio: Document nested stage control
>>>>>>>
>>>>>>> Liu, Yi L (2):
>>>>>>>   vfio: VFIO_IOMMU_SET_PASID_TABLE
>>>>>>>   vfio: VFIO_IOMMU_CACHE_INVALIDATE
>>>>>>>
>>>>>>> Tina Zhang (1):
>>>>>>>   vfio: Use capability chains to handle device specific irq
>>>>>>>
>>>>>>>  Documentation/vfio.txt              |  77 ++++++++
>>>>>>>  drivers/vfio/pci/vfio_pci.c         | 283
>>>>> ++++++++++++++++++++++++++--
>>>>>>>  drivers/vfio/pci/vfio_pci_intrs.c   |  62 ++++++
>>>>>>>  drivers/vfio/pci/vfio_pci_private.h |  24 +++
>>>>>>>  drivers/vfio/pci/vfio_pci_rdwr.c    |  45 +++++
>>>>>>>  drivers/vfio/vfio_iommu_type1.c     | 166 ++++++++++++++++
>>>>>>>  include/uapi/linux/vfio.h           | 109 ++++++++++-
>>>>>>>  7 files changed, 747 insertions(+), 19 deletions(-)
>>>>>>>
>>>>>>> --
>>>>>>> 2.20.1
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> kvmarm mailing list
>>>>>>> kvmarm@lists.cs.columbia.edu
>>>>>>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>>>>
>
Shameerali Kolothum Thodi Nov. 13, 2019, 4:24 p.m. UTC | #7
Hi Eric,

> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: 12 November 2019 20:35
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> eric.auger.pro@gmail.com; iommu@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org; kvm@vger.kernel.org;
> kvmarm@lists.cs.columbia.edu; joro@8bytes.org;
> alex.williamson@redhat.com; jacob.jun.pan@linux.intel.com;
> yi.l.liu@intel.com; jean-philippe.brucker@arm.com; will.deacon@arm.com;
> robin.murphy@arm.com
> Cc: kevin.tian@intel.com; vincent.stehle@arm.com; ashok.raj@intel.com;
> marc.zyngier@arm.com; tina.zhang@intel.com; Linuxarm
> <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>
> Subject: Re: [PATCH v9 00/11] SMMUv3 Nested Stage Setup (VFIO part)
> 
> Hi Shameer,
> 

[..]

> >
> > I just noted that CMDQ_OP_TLBI_NH_VA is missing the vmid filed which
> seems
> > to be the cause for single IOVA TLBI not working properly.
> >
> > I had this fix in arm-smmuv3.c,
> >
> > @@ -947,6 +947,7 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd,
> struct arm_smmu_cmdq_ent *ent)
> > 		cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31);
> > 		break;
> > 	case CMDQ_OP_TLBI_NH_VA:
> > +		cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
> Damn, I did not see that! That's it. ASID invalidation fills this field
> indeed. You may post an independent patch for that.

Sure. Just did that.
" iommu/arm-smmu-v3: Populate VMID field for CMDQ_OP_TLBI_NH_VA"

> 		cmd[0] |=
> FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid);
> > 		cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf);
> > 		cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK;
> >
> >
> > With this, your original qemu branch is working.
> >
> > root@ubuntu:~# iperf -c 10.202.225.185
> > ------------------------------------------------------------
> > Client connecting to 10.202.225.185, TCP port 5001 TCP window size: 85.0
> KByte (default)
> > ------------------------------------------------------------
> > [  3] local 10.202.225.169 port 44894 connected with 10.202.225.185 port
> 5001
> > [ ID] Interval       Transfer     Bandwidth
> > [  3]  0.0-10.0 sec  3.21 GBytes  2.76 Gbits/sec
> >
> > Could you please check this...
> >
> > I also have a rebase of your patches on top of 5.4-rc5. This has some
> optimizations
> > From Will such as batched TLBI inv. Please find it here,
> >
> > https://github.com/hisilicon/kernel-dev/tree/private-vSMMUv3-v9-v5.4-rc5
> >
> > This gives me a better performance with iperf,
> >
> > root@ubuntu:~# iperf -c 10.202.225.185
> > ------------------------------------------------------------
> > Client connecting to 10.202.225.185, TCP port 5001 TCP window size: 85.0
> KByte (default)
> > ------------------------------------------------------------
> > [  3] local 10.202.225.169 port 55450 connected with 10.202.225.185 port
> 5001
> > [ ID] Interval       Transfer     Bandwidth
> > [  3]  0.0-10.0 sec  4.91 GBytes  4.22 Gbits/sec root@ubuntu:~#
> >
> > If possible please check this branch as well.
> 
> To be honest I don't really know what to do with this work. Despite the
> efforts, this has suffered from a lack of traction in the community. My
> last attempt to explain the use cases, upon Will's request at Plumber,
> has not received any comment (https://lkml.org/lkml/2019/9/20/104).
> 
> I think I will post a rebased version with your fix, as a matter to get
> a clean snapshot.

Thanks. That makes sense.

 If you think this work is useful for your projects,
> please let it know on the ML.

Right. While SVA use case is definitely the one we are very much interested, I will
check within our team the priority for use case 1(native drivers in Guest) you
mentioned in the above link. 

Cheers,
Shameer

> Thank you again!
> 
> Eric
> >
> > Thanks,
> > Shameer
> >
> >> Thanks,
> >> Shameer
> >>
> >>
> >>> Thanks
> >>>
> >>> Eric
> >>>>
> >>>> Cheers,
> >>>> Shameer
> >>>>
> >>>>> Thanks
> >>>>>
> >>>>> Eric
> >>>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Please let me know.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Shameer
> >>>>>>
> >>>>>>> Best Regards
> >>>>>>>
> >>>>>>> Eric
> >>>>>>>
> >>>>>>> This series can be found at:
> >>>>>>> https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
> >>>>>>>
> >>>>>>> It series includes Tina's patch steming from
> >>>>>>> [1] "[RFC PATCH v2 1/3] vfio: Use capability chains to handle device
> >>>>>>> specific irq" plus patches originally contributed by Yi.
> >>>>>>>
> >>>>>>> History:
> >>>>>>>
> >>>>>>> v8 -> v9:
> >>>>>>> - introduce specific irq framework
> >>>>>>> - single fault region
> >>>>>>> - iommu_unregister_device_fault_handler failure case not handled
> >>>>>>>   yet.
> >>>>>>>
> >>>>>>> v7 -> v8:
> >>>>>>> - rebase on top of v5.2-rc1 and especially
> >>>>>>>   8be39a1a04c1  iommu/arm-smmu-v3: Add a master->domain
> >> pointer
> >>>>>>> - dynamic alloc of s1_cfg/s2_cfg
> >>>>>>> - __arm_smmu_tlb_inv_asid/s1_range_nosync
> >>>>>>> - check there is no HW MSI regions
> >>>>>>> - asid invalidation using pasid extended struct (change in the uapi)
> >>>>>>> - add s1_live/s2_live checks
> >>>>>>> - move check about support of nested stages in domain finalise
> >>>>>>> - fixes in error reporting according to the discussion with Robin
> >>>>>>> - reordered the patches to have first iommu/smmuv3 patches and
> then
> >>>>>>>   VFIO patches
> >>>>>>>
> >>>>>>> v6 -> v7:
> >>>>>>> - removed device handle from bind/unbind_guest_msi
> >>>>>>> - added "iommu/smmuv3: Nested mode single MSI doorbell per
> domain
> >>>>>>>   enforcement"
> >>>>>>> - added few uapi comments as suggested by Jean, Jacop and Alex
> >>>>>>>
> >>>>>>> v5 -> v6:
> >>>>>>> - Fix compilation issue when CONFIG_IOMMU_API is unset
> >>>>>>>
> >>>>>>> v4 -> v5:
> >>>>>>> - fix bug reported by Vincent: fault handler unregistration now
> happens
> >> in
> >>>>>>>   vfio_pci_release
> >>>>>>> - IOMMU_FAULT_PERM_* moved outside of struct definition + small
> >>>>>>>   uapi changes suggested by Kean-Philippe (except fetch_addr)
> >>>>>>> - iommu: introduce device fault report API: removed the PRI part.
> >>>>>>> - see individual logs for more details
> >>>>>>> - reset the ste abort flag on detach
> >>>>>>>
> >>>>>>> v3 -> v4:
> >>>>>>> - took into account Alex, jean-Philippe and Robin's comments on v3
> >>>>>>> - rework of the smmuv3 driver integration
> >>>>>>> - add tear down ops for msi binding and PASID table binding
> >>>>>>> - fix S1 fault propagation
> >>>>>>> - put fault reporting patches at the beginning of the series following
> >>>>>>>   Jean-Philippe's request
> >>>>>>> - update of the cache invalidate and fault API uapis
> >>>>>>> - VFIO fault reporting rework with 2 separate regions and one
> >> mmappable
> >>>>>>>   segment for the fault queue
> >>>>>>> - moved to PATCH
> >>>>>>>
> >>>>>>> v2 -> v3:
> >>>>>>> - When registering the S1 MSI binding we now store the device
> handle.
> >>> This
> >>>>>>>   addresses Robin's comment about discimination of devices
> beonging
> >>> to
> >>>>>>>   different S1 groups and using different physical MSI doorbells.
> >>>>>>> - Change the fault reporting API: use
> >> VFIO_PCI_DMA_FAULT_IRQ_INDEX
> >>> to
> >>>>>>>   set the eventfd and expose the faults through an mmappable fault
> >>> region
> >>>>>>>
> >>>>>>> v1 -> v2:
> >>>>>>> - Added the fault reporting capability
> >>>>>>> - asid properly passed on invalidation (fix assignment of multiple
> >>>>>>>   devices)
> >>>>>>> - see individual change logs for more info
> >>>>>>>
> >>>>>>>
> >>>>>>> Eric Auger (8):
> >>>>>>>   vfio: VFIO_IOMMU_SET_MSI_BINDING
> >>>>>>>   vfio/pci: Add VFIO_REGION_TYPE_NESTED region type
> >>>>>>>   vfio/pci: Register an iommu fault handler
> >>>>>>>   vfio/pci: Allow to mmap the fault queue
> >>>>>>>   vfio: Add new IRQ for DMA fault reporting
> >>>>>>>   vfio/pci: Add framework for custom interrupt indices
> >>>>>>>   vfio/pci: Register and allow DMA FAULT IRQ signaling
> >>>>>>>   vfio: Document nested stage control
> >>>>>>>
> >>>>>>> Liu, Yi L (2):
> >>>>>>>   vfio: VFIO_IOMMU_SET_PASID_TABLE
> >>>>>>>   vfio: VFIO_IOMMU_CACHE_INVALIDATE
> >>>>>>>
> >>>>>>> Tina Zhang (1):
> >>>>>>>   vfio: Use capability chains to handle device specific irq
> >>>>>>>
> >>>>>>>  Documentation/vfio.txt              |  77 ++++++++
> >>>>>>>  drivers/vfio/pci/vfio_pci.c         | 283
> >>>>> ++++++++++++++++++++++++++--
> >>>>>>>  drivers/vfio/pci/vfio_pci_intrs.c   |  62 ++++++
> >>>>>>>  drivers/vfio/pci/vfio_pci_private.h |  24 +++
> >>>>>>>  drivers/vfio/pci/vfio_pci_rdwr.c    |  45 +++++
> >>>>>>>  drivers/vfio/vfio_iommu_type1.c     | 166 ++++++++++++++++
> >>>>>>>  include/uapi/linux/vfio.h           | 109 ++++++++++-
> >>>>>>>  7 files changed, 747 insertions(+), 19 deletions(-)
> >>>>>>>
> >>>>>>> --
> >>>>>>> 2.20.1
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> kvmarm mailing list
> >>>>>>> kvmarm@lists.cs.columbia.edu
> >>>>>>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> >>>>
> >
Tomasz Nowicki Nov. 20, 2019, 8:15 a.m. UTC | #8
Hi Eric,

On 11.07.2019 15:56, Eric Auger wrote:
> This series brings the VFIO part of HW nested paging support
> in the SMMUv3.
> 
> The series depends on:
> [PATCH v9 00/14] SMMUv3 Nested Stage Setup (IOMMU part)
> (https://www.spinics.net/lists/kernel/msg3187714.html)
> 
> 3 new IOCTLs are introduced that allow the userspace to
> 1) pass the guest stage 1 configuration
> 2) pass stage 1 MSI bindings
> 3) invalidate stage 1 related caches
> 
> They map onto the related new IOMMU API functions.
> 
> We introduce the capability to register specific interrupt
> indexes (see [1]). A new DMA_FAULT interrupt index allows to register
> an eventfd to be signaled whenever a stage 1 related fault
> is detected at physical level. Also a specific region allows
> to expose the fault records to the user space.
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9

I think you have already tested on ThunderX2, but as a formality, for 
the whole series:

Tested-by: Tomasz Nowicki <tnowicki@marvell.com>
qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5
kernel: https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9 + 
Shameer's fix patch

In my test I assigned Intel 82574L NIC and perform iperf tests.

Other folks from Marvell claimed this to be important feature so I asked 
them to review and speak up on mailing list.

Thanks,
Tomasz
Auger Eric Nov. 20, 2019, 10:18 a.m. UTC | #9
Hi Tomasz,

On 11/20/19 9:15 AM, Tomasz Nowicki wrote:
> Hi Eric,
> 
> On 11.07.2019 15:56, Eric Auger wrote:
>> This series brings the VFIO part of HW nested paging support
>> in the SMMUv3.
>>
>> The series depends on:
>> [PATCH v9 00/14] SMMUv3 Nested Stage Setup (IOMMU part)
>> (https://www.spinics.net/lists/kernel/msg3187714.html)
>>
>> 3 new IOCTLs are introduced that allow the userspace to
>> 1) pass the guest stage 1 configuration
>> 2) pass stage 1 MSI bindings
>> 3) invalidate stage 1 related caches
>>
>> They map onto the related new IOMMU API functions.
>>
>> We introduce the capability to register specific interrupt
>> indexes (see [1]). A new DMA_FAULT interrupt index allows to register
>> an eventfd to be signaled whenever a stage 1 related fault
>> is detected at physical level. Also a specific region allows
>> to expose the fault records to the user space.
>>
>> Best Regards
>>
>> Eric
>>
>> This series can be found at:
>> https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
> 
> I think you have already tested on ThunderX2, but as a formality, for 
> the whole series:
> 
> Tested-by: Tomasz Nowicki <tnowicki@marvell.com>
> qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5
> kernel: https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9 + 
> Shameer's fix patch
> 
> In my test I assigned Intel 82574L NIC and perform iperf tests.

Thank you for your testing efforts.
> 
> Other folks from Marvell claimed this to be important feature so I asked 
> them to review and speak up on mailing list.

That's nice to read that!  So it is time for me to rebase both the iommu
and vfio parts. I will submit something quickly. Then I would encourage
the review efforts to focus first on the iommu part.

Thanks

Eric
> 
> Thanks,
> Tomasz
>
Zhangfei Gao March 3, 2020, 12:57 p.m. UTC | #10
Hi, Eric

On 2019/11/20 下午6:18, Auger Eric wrote:
>
>>> This series brings the VFIO part of HW nested paging support
>>> in the SMMUv3.
>>>
>>> The series depends on:
>>> [PATCH v9 00/14] SMMUv3 Nested Stage Setup (IOMMU part)
>>> (https://www.spinics.net/lists/kernel/msg3187714.html)
>>>
>>> 3 new IOCTLs are introduced that allow the userspace to
>>> 1) pass the guest stage 1 configuration
>>> 2) pass stage 1 MSI bindings
>>> 3) invalidate stage 1 related caches
>>>
>>> They map onto the related new IOMMU API functions.
>>>
>>> We introduce the capability to register specific interrupt
>>> indexes (see [1]). A new DMA_FAULT interrupt index allows to register
>>> an eventfd to be signaled whenever a stage 1 related fault
>>> is detected at physical level. Also a specific region allows
>>> to expose the fault records to the user space.
>>>
>>> Best Regards
>>>
>>> Eric
>>>
>>> This series can be found at:
>>> https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
>> I think you have already tested on ThunderX2, but as a formality, for
>> the whole series:
>>
>> Tested-by: Tomasz Nowicki <tnowicki@marvell.com>
>> qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5
>> kernel: https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9 +
>> Shameer's fix patch
>>
>> In my test I assigned Intel 82574L NIC and perform iperf tests.
> Thank you for your testing efforts.
>> Other folks from Marvell claimed this to be important feature so I asked
>> them to review and speak up on mailing list.
> That's nice to read that!  So it is time for me to rebase both the iommu
> and vfio parts. I will submit something quickly. Then I would encourage
> the review efforts to focus first on the iommu part.
>
>
vSVA feature is also very important to us, it will be great if vSVA can 
be supported in guest world.

We just submitted uacce for accelerator, which will be supporting SVA on 
host, thanks to Jean's effort.

https://lkml.org/lkml/2020/2/11/54


However, supporting vSVA in guest is also a key component for accelerator.

Looking forward this going to be happen.


Any respin, I will be very happy to test.


Thanks
Auger Eric March 3, 2020, 1:14 p.m. UTC | #11
Hi Zhangfei,

On 3/3/20 1:57 PM, zhangfei wrote:
> Hi, Eric
> 
> On 2019/11/20 下午6:18, Auger Eric wrote:
>>
>>>> This series brings the VFIO part of HW nested paging support
>>>> in the SMMUv3.
>>>>
>>>> The series depends on:
>>>> [PATCH v9 00/14] SMMUv3 Nested Stage Setup (IOMMU part)
>>>> (https://www.spinics.net/lists/kernel/msg3187714.html)
>>>>
>>>> 3 new IOCTLs are introduced that allow the userspace to
>>>> 1) pass the guest stage 1 configuration
>>>> 2) pass stage 1 MSI bindings
>>>> 3) invalidate stage 1 related caches
>>>>
>>>> They map onto the related new IOMMU API functions.
>>>>
>>>> We introduce the capability to register specific interrupt
>>>> indexes (see [1]). A new DMA_FAULT interrupt index allows to register
>>>> an eventfd to be signaled whenever a stage 1 related fault
>>>> is detected at physical level. Also a specific region allows
>>>> to expose the fault records to the user space.
>>>>
>>>> Best Regards
>>>>
>>>> Eric
>>>>
>>>> This series can be found at:
>>>> https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9
>>> I think you have already tested on ThunderX2, but as a formality, for
>>> the whole series:
>>>
>>> Tested-by: Tomasz Nowicki <tnowicki@marvell.com>
>>> qemu: https://github.com/eauger/qemu/tree/v4.1.0-rc0-2stage-rfcv5
>>> kernel: https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9 +
>>> Shameer's fix patch
>>>
>>> In my test I assigned Intel 82574L NIC and perform iperf tests.
>> Thank you for your testing efforts.
>>> Other folks from Marvell claimed this to be important feature so I asked
>>> them to review and speak up on mailing list.
>> That's nice to read that!  So it is time for me to rebase both the iommu
>> and vfio parts. I will submit something quickly. Then I would encourage
>> the review efforts to focus first on the iommu part.
>>
>>
> vSVA feature is also very important to us, it will be great if vSVA can
> be supported in guest world.
> 
> We just submitted uacce for accelerator, which will be supporting SVA on
> host, thanks to Jean's effort.
> 
> https://lkml.org/lkml/2020/2/11/54
> 
> 
> However, supporting vSVA in guest is also a key component for accelerator.
> 
> Looking forward this going to be happen.
> 
> 
> Any respin, I will be very happy to test.

OK. Based on your interest and Marvell's interest too, I will respin
both iommu & vfio series.

Thanks

Eric
> 
> 
> Thanks
> 
> 
> 
>