mbox series

[v15,00/12] SMMUv3 Nested Stage Setup (IOMMU part)

Message ID 20210411111228.14386-1-eric.auger@redhat.com (mailing list archive)
Headers show
Series SMMUv3 Nested Stage Setup (IOMMU part) | expand

Message

Eric Auger April 11, 2021, 11:12 a.m. UTC
SMMUv3 Nested Stage Setup (IOMMU part)

This series brings the IOMMU part of HW nested paging support
in the SMMUv3. The VFIO part is submitted separately.

This is based on Jean-Philippe's
[PATCH v14 00/10] iommu: I/O page faults for SMMUv3
https://www.spinics.net/lists/arm-kernel/msg886518.html
(including the patches that were not pulled for 5.13)

The IOMMU API is extended to support 2 new API functionalities:
1) pass the guest stage 1 configuration
2) pass stage 1 MSI bindings

Then those capabilities gets implemented in the SMMUv3 driver.

The virtualizer passes information through the VFIO user API
which cascades them to the iommu subsystem. This allows the guest
to own stage 1 tables and context descriptors (so-called PASID
table) while the host owns stage 2 tables and main configuration
structures (STE).

Best Regards

Eric

This series can be found at:
v5.12-rc6-jean-iopf-14-2stage-v15
(including the VFIO part in its last version: v13)

The VFIO series is sent separately.

History:

Previous version:
https://github.com/eauger/linux/tree/v5.11-stallv12-2stage-v14

v14 -> v15:
- on S1 invalidation, always use CMDQ_OP_TLBI_NH_VA
  independently on host ARM_SMMU_FEAT_E2H support (Zenghui)
- remove  iommu/smmuv3: Accept configs with more than one
  context descriptor
- Remove spurious arm_smmu_cmdq_issue_sync in
  IOMMU_INV_GRANU_ADDR cache invalidation (Zenghui)
- dma-iommu.c changes induced by Zenghui's comments
  including the locking rework
- fix cache invalidation when guest uses RIL
  and host does not support it (Chenxiang)
- removed iommu/smmuv3: Accept configs with more than one
  context descriptor (Zenghui, Shameer)
- At this point I have kept the MSI binding API.

v13 -> v14:
- Took into account all received comments I think. Great
  thanks to all the testers for their effort and sometimes
  tentative fixes. I am really grateful to you!
- numerous fixes including guest running in
  noiommu, iommu.strict=0, iommu.passthrough=on,
  enable_unsafe_noiommu_mode

v12 -> v13:
- fixed compilation issue with CONFIG_ARM_SMMU_V3_SVA
  reported by Shameer. This urged me to revisit patch 4 into
  iommu/smmuv3: Allow s1 and s2 configs to coexist where
  s1_cfg and s2_cfg are not dynamically allocated anymore.
  Instead I use a new set field in existing structs
- fixed 2 others config checks
- Updated "iommu/arm-smmu-v3: Maintain a SID->device structure"
  according to the last version

v11 -> v12:
- rebase on top of v5.10-rc4


Eric Auger (12):
  iommu: Introduce attach/detach_pasid_table API
  iommu: Introduce bind/unbind_guest_msi
  iommu/smmuv3: Allow s1 and s2 configs to coexist
  iommu/smmuv3: Get prepared for nested stage support
  iommu/smmuv3: Implement attach/detach_pasid_table
  iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
  iommu/smmuv3: Implement cache_invalidate
  dma-iommu: Implement NESTED_MSI cookie
  iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
  iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI
    regions
  iommu/smmuv3: Implement bind/unbind_guest_msi
  iommu/smmuv3: report additional recoverable faults

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 463 ++++++++++++++++++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  14 +-
 drivers/iommu/dma-iommu.c                   | 180 +++++++-
 drivers/iommu/iommu.c                       | 106 +++++
 include/linux/dma-iommu.h                   |  16 +
 include/linux/iommu.h                       |  47 ++
 include/uapi/linux/iommu.h                  |  54 +++
 7 files changed, 838 insertions(+), 42 deletions(-)

Comments

Wang Xingang April 14, 2021, 2:36 a.m. UTC | #1
Hi Eric, Jean-Philippe

On 2021/4/11 19:12, Eric Auger wrote:
> SMMUv3 Nested Stage Setup (IOMMU part)
> 
> This series brings the IOMMU part of HW nested paging support
> in the SMMUv3. The VFIO part is submitted separately.
> 
> This is based on Jean-Philippe's
> [PATCH v14 00/10] iommu: I/O page faults for SMMUv3
> https://www.spinics.net/lists/arm-kernel/msg886518.html
> (including the patches that were not pulled for 5.13)
> 
> The IOMMU API is extended to support 2 new API functionalities:
> 1) pass the guest stage 1 configuration
> 2) pass stage 1 MSI bindings
> 
> Then those capabilities gets implemented in the SMMUv3 driver.
> 
> The virtualizer passes information through the VFIO user API
> which cascades them to the iommu subsystem. This allows the guest
> to own stage 1 tables and context descriptors (so-called PASID
> table) while the host owns stage 2 tables and main configuration
> structures (STE).
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> v5.12-rc6-jean-iopf-14-2stage-v15
> (including the VFIO part in its last version: v13)
> 

I am testing the performance of an accelerator with/without SVA/vSVA,
and found there might be some potential performance loss risk for SVA/vSVA.

I use a Network and computing encryption device (SEC), and send 1MB 
request for 10000 times.

I trigger mm fault before I send the request, so there should be no iopf.

Here's what I got:

physical scenario:
performance:		SVA:9MB/s  	NOSVA:9MB/s
tlb_miss: 		SVA:302,651	NOSVA:1,223
trans_table_walk_access:SVA:302,276	NOSVA:1,237

VM scenario:
performance:		vSVA:9MB/s  	NOvSVA:6MB/s  about 30~40% loss
tlb_miss: 		vSVA:4,423,897	NOvSVA:1,907
trans_table_walk_access:vSVA:61,928,430	NOvSVA:21,948

In physical scenario, there's almost no performance loss, but the 
tlb_miss and trans_table_walk_access of stage 1 for SVA is quite high, 
comparing to NOSVA.

In VM scenario, there's about 30~40% performance loss, this is because 
the two stage tlb_miss and trans_table_walk_access is even higher, and 
impact the performance.

I compare the procedure of building page table of SVA and NOSVA, and 
found that NOSVA uses 2MB mapping as far as possible, while SVA uses 
only 4KB.

I retest with huge page, and huge page could solve this problem, the 
performance of SVA/vSVA is almost the same as NOSVA.

I am wondering do you have any other solution for the performance loss 
of vSVA, or any other method to reduce the tlb_miss/trans_table_walk.

Thanks

Xingang

.
Shameer Kolothum April 14, 2021, 6:56 a.m. UTC | #2
> -----Original Message-----
> From: wangxingang
> Sent: 14 April 2021 03:36
> To: Eric Auger <eric.auger@redhat.com>; eric.auger.pro@gmail.com;
> jean-philippe@linaro.org; iommu@lists.linux-foundation.org;
> linux-kernel@vger.kernel.org; kvm@vger.kernel.org;
> kvmarm@lists.cs.columbia.edu; will@kernel.org; maz@kernel.org;
> robin.murphy@arm.com; joro@8bytes.org; alex.williamson@redhat.com;
> tn@semihalf.com; zhukeqian <zhukeqian1@huawei.com>
> Cc: jacob.jun.pan@linux.intel.com; yi.l.liu@intel.com; zhangfei.gao@linaro.org;
> zhangfei.gao@gmail.com; vivek.gautam@arm.com; Shameerali Kolothum
> Thodi <shameerali.kolothum.thodi@huawei.com>; yuzenghui
> <yuzenghui@huawei.com>; nicoleotsuka@gmail.com; lushenming
> <lushenming@huawei.com>; vsethi@nvidia.com; chenxiang (M)
> <chenxiang66@hisilicon.com>; vdumpa@nvidia.com; jiangkunkun
> <jiangkunkun@huawei.com>
> Subject: Re: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)
> 
> Hi Eric, Jean-Philippe
> 
> On 2021/4/11 19:12, Eric Auger wrote:
> > SMMUv3 Nested Stage Setup (IOMMU part)
> >
> > This series brings the IOMMU part of HW nested paging support
> > in the SMMUv3. The VFIO part is submitted separately.
> >
> > This is based on Jean-Philippe's
> > [PATCH v14 00/10] iommu: I/O page faults for SMMUv3
> > https://www.spinics.net/lists/arm-kernel/msg886518.html
> > (including the patches that were not pulled for 5.13)
> >
> > The IOMMU API is extended to support 2 new API functionalities:
> > 1) pass the guest stage 1 configuration
> > 2) pass stage 1 MSI bindings
> >
> > Then those capabilities gets implemented in the SMMUv3 driver.
> >
> > The virtualizer passes information through the VFIO user API
> > which cascades them to the iommu subsystem. This allows the guest
> > to own stage 1 tables and context descriptors (so-called PASID
> > table) while the host owns stage 2 tables and main configuration
> > structures (STE).
> >
> > Best Regards
> >
> > Eric
> >
> > This series can be found at:
> > v5.12-rc6-jean-iopf-14-2stage-v15
> > (including the VFIO part in its last version: v13)
> >
> 
> I am testing the performance of an accelerator with/without SVA/vSVA,
> and found there might be some potential performance loss risk for SVA/vSVA.
> 
> I use a Network and computing encryption device (SEC), and send 1MB
> request for 10000 times.
> 
> I trigger mm fault before I send the request, so there should be no iopf.
> 
> Here's what I got:
> 
> physical scenario:
> performance:		SVA:9MB/s  	NOSVA:9MB/s
> tlb_miss: 		SVA:302,651	NOSVA:1,223
> trans_table_walk_access:SVA:302,276	NOSVA:1,237
> 
> VM scenario:
> performance:		vSVA:9MB/s  	NOvSVA:6MB/s  about 30~40% loss
> tlb_miss: 		vSVA:4,423,897	NOvSVA:1,907
> trans_table_walk_access:vSVA:61,928,430	NOvSVA:21,948
> 
> In physical scenario, there's almost no performance loss, but the
> tlb_miss and trans_table_walk_access of stage 1 for SVA is quite high,
> comparing to NOSVA.
> 
> In VM scenario, there's about 30~40% performance loss, this is because
> the two stage tlb_miss and trans_table_walk_access is even higher, and
> impact the performance.
> 
> I compare the procedure of building page table of SVA and NOSVA, and
> found that NOSVA uses 2MB mapping as far as possible, while SVA uses
> only 4KB.
> 
> I retest with huge page, and huge page could solve this problem, the
> performance of SVA/vSVA is almost the same as NOSVA.
> 
> I am wondering do you have any other solution for the performance loss
> of vSVA, or any other method to reduce the tlb_miss/trans_table_walk.

Hi Xingang,

Just curious, do you have DVM enabled on this board or does it use explicit
SMMU TLB invalidations?

Thanks,
Shameer
Wang Xingang April 14, 2021, 7:08 a.m. UTC | #3
Hi Shameer,

On 2021/4/14 14:56, Shameerali Kolothum Thodi wrote:
> 
> 
>> -----Original Message-----
>> From: wangxingang
>> Sent: 14 April 2021 03:36
>> To: Eric Auger <eric.auger@redhat.com>; eric.auger.pro@gmail.com;
>> jean-philippe@linaro.org; iommu@lists.linux-foundation.org;
>> linux-kernel@vger.kernel.org; kvm@vger.kernel.org;
>> kvmarm@lists.cs.columbia.edu; will@kernel.org; maz@kernel.org;
>> robin.murphy@arm.com; joro@8bytes.org; alex.williamson@redhat.com;
>> tn@semihalf.com; zhukeqian <zhukeqian1@huawei.com>
>> Cc: jacob.jun.pan@linux.intel.com; yi.l.liu@intel.com; zhangfei.gao@linaro.org;
>> zhangfei.gao@gmail.com; vivek.gautam@arm.com; Shameerali Kolothum
>> Thodi <shameerali.kolothum.thodi@huawei.com>; yuzenghui
>> <yuzenghui@huawei.com>; nicoleotsuka@gmail.com; lushenming
>> <lushenming@huawei.com>; vsethi@nvidia.com; chenxiang (M)
>> <chenxiang66@hisilicon.com>; vdumpa@nvidia.com; jiangkunkun
>> <jiangkunkun@huawei.com>
>> Subject: Re: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)
>>
>> Hi Eric, Jean-Philippe
>>
>> On 2021/4/11 19:12, Eric Auger wrote:
>>> SMMUv3 Nested Stage Setup (IOMMU part)
>>>
>>> This series brings the IOMMU part of HW nested paging support
>>> in the SMMUv3. The VFIO part is submitted separately.
>>>
>>> This is based on Jean-Philippe's
>>> [PATCH v14 00/10] iommu: I/O page faults for SMMUv3
>>> https://www.spinics.net/lists/arm-kernel/msg886518.html
>>> (including the patches that were not pulled for 5.13)
>>>
>>> The IOMMU API is extended to support 2 new API functionalities:
>>> 1) pass the guest stage 1 configuration
>>> 2) pass stage 1 MSI bindings
>>>
>>> Then those capabilities gets implemented in the SMMUv3 driver.
>>>
>>> The virtualizer passes information through the VFIO user API
>>> which cascades them to the iommu subsystem. This allows the guest
>>> to own stage 1 tables and context descriptors (so-called PASID
>>> table) while the host owns stage 2 tables and main configuration
>>> structures (STE).
>>>
>>> Best Regards
>>>
>>> Eric
>>>
>>> This series can be found at:
>>> v5.12-rc6-jean-iopf-14-2stage-v15
>>> (including the VFIO part in its last version: v13)
>>>
>>
>> I am testing the performance of an accelerator with/without SVA/vSVA,
>> and found there might be some potential performance loss risk for SVA/vSVA.
>>
>> I use a Network and computing encryption device (SEC), and send 1MB
>> request for 10000 times.
>>
>> I trigger mm fault before I send the request, so there should be no iopf.
>>
>> Here's what I got:
>>
>> physical scenario:
>> performance:		SVA:9MB/s  	NOSVA:9MB/s
>> tlb_miss: 		SVA:302,651	NOSVA:1,223
>> trans_table_walk_access:SVA:302,276	NOSVA:1,237
>>
>> VM scenario:
>> performance:		vSVA:9MB/s  	NOvSVA:6MB/s  about 30~40% loss
>> tlb_miss: 		vSVA:4,423,897	NOvSVA:1,907
>> trans_table_walk_access:vSVA:61,928,430	NOvSVA:21,948
>>
>> In physical scenario, there's almost no performance loss, but the
>> tlb_miss and trans_table_walk_access of stage 1 for SVA is quite high,
>> comparing to NOSVA.
>>
>> In VM scenario, there's about 30~40% performance loss, this is because
>> the two stage tlb_miss and trans_table_walk_access is even higher, and
>> impact the performance.
>>
>> I compare the procedure of building page table of SVA and NOSVA, and
>> found that NOSVA uses 2MB mapping as far as possible, while SVA uses
>> only 4KB.
>>
>> I retest with huge page, and huge page could solve this problem, the
>> performance of SVA/vSVA is almost the same as NOSVA.
>>
>> I am wondering do you have any other solution for the performance loss
>> of vSVA, or any other method to reduce the tlb_miss/trans_table_walk.
> 
> Hi Xingang,
> 
> Just curious, do you have DVM enabled on this board or does it use explicit
> SMMU TLB invalidations?
> 
> Thanks,
> Shameer
> 

For now, DVM is enabled and TLBI is not explicit used.

And by the way the performance data above is
performance:	vSVA:9GB/s(not 9MB/s)  NOvSVA:6GB/s(not 6GB/s)

Thanks

Xingang

.
Vivek Kumar Gautam April 21, 2021, 11:28 a.m. UTC | #4
Hi Eric,

On 4/11/21 4:42 PM, Eric Auger wrote:
> SMMUv3 Nested Stage Setup (IOMMU part)
>

[snip]

>
> Eric Auger (12):
>    iommu: Introduce attach/detach_pasid_table API
>    iommu: Introduce bind/unbind_guest_msi
>    iommu/smmuv3: Allow s1 and s2 configs to coexist
>    iommu/smmuv3: Get prepared for nested stage support
>    iommu/smmuv3: Implement attach/detach_pasid_table
>    iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
>    iommu/smmuv3: Implement cache_invalidate
>    dma-iommu: Implement NESTED_MSI cookie
>    iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
>    iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI
>      regions
>    iommu/smmuv3: Implement bind/unbind_guest_msi
>    iommu/smmuv3: report additional recoverable faults

[snip]

I noticed that the patch[1]:
[PATCH v13 15/15] iommu/smmuv3: Add PASID cache invalidation per PASID
has been dropped in the v14 and v15 of
  this series.

Is this planned to be part of any future series, or did I miss a
discussion about dropping the patch? :-)


[1]
https://patchwork.kernel.org/project/kvm/patch/20201118112151.25412-16-eric.auger@redhat.com/


Best regards
Vivek
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Sumit Gupta April 23, 2021, 6:21 p.m. UTC | #5
Hi Eric,

I have validated v15 of the patch series from your branch "v5.12-rc6-jean-iopf-14-2stage-v15"
on top of Jean's current sva patches with Kernel-5.12.0-rc8.
Verfied nested translations with NVMe PCI device assigned to Guest VM.

Tested-by: Sumit Gupta <sumitg@nvidia.com>
Krishna Reddy Sept. 27, 2021, 9:17 p.m. UTC | #6
Hi Eric,
> This is based on Jean-Philippe's
> [PATCH v14 00/10] iommu: I/O page faults for SMMUv3
> https://www.spinics.net/lists/arm-kernel/msg886518.html
> (including the patches that were not pulled for 5.13)
>

Jean's patches have been merged to v5.14.
Do you anticipate IOMMU/VFIO part patches getting into upstream kernel soon?

-KR
Eric Auger Sept. 28, 2021, 6:25 a.m. UTC | #7
Hi Krishna,

On 9/27/21 11:17 PM, Krishna Reddy wrote:
> Hi Eric,
>> This is based on Jean-Philippe's
>> [PATCH v14 00/10] iommu: I/O page faults for SMMUv3
>> https://www.spinics.net/lists/arm-kernel/msg886518.html
>> (including the patches that were not pulled for 5.13)
>>
> Jean's patches have been merged to v5.14.
> Do you anticipate IOMMU/VFIO part patches getting into upstream kernel soon?

I am going to respin the smmu part rebased on v5.15. As for the VFIO
part, this needs to be totally redesigned based on /dev/iommu (see
[RFC 00/20] Introduce /dev/iommu for userspace I/O address space
management).

I will provide some updated kernel and qemu branches for testing purpose
only.

Thanks

Eric
>
> -KR
>