mbox series

[v3,00/19] Update SMMUv3 to the modern iommu API (part 1/3)

Message ID 0-v3-d794f8d934da+411a-smmuv3_newapi_p1_jgg@nvidia.com (mailing list archive)
Headers show
Series Update SMMUv3 to the modern iommu API (part 1/3) | expand

Message

Jason Gunthorpe Dec. 5, 2023, 7:14 p.m. UTC
The SMMUv3 driver was originally written in 2015 when the iommu driver
facing API looked quite different. The API has evolved, especially lately,
and the driver has fallen behind.

This work aims to bring make the SMMUv3 driver the best IOMMU driver with
the most comprehensive implementation of the API. After all parts it
addresses:

 - Global static BLOCKED and IDENTITY domains with 'never fail' attach
   semantics. BLOCKED is desired for efficient VFIO.

 - Support map before attach for PAGING iommu_domains.

 - attach_dev failure does not change the HW configuration.

 - Fully hitless transitions between IDENTITY -> DMA -> IDENTITY.
   The API has IOMMU_RESV_DIRECT which is expected to be
   continuously translating.

 - Safe transitions between PAGING -> BLOCKED, do not ever temporarily
   do IDENTITY. This is required for iommufd security.

 - Full PASID API support including:
    - S1/SVA domains attached to PASIDs
    - IDENTITY/BLOCKED/S1 attached to RID
    - Change of the RID domain while PASIDs are attached

 - Streamlined SVA support using the core infrastructure

 - Hitless, whenever possible, change between two domains

 - iommufd IOMMU_GET_HW_INFO, IOMMU_HWPT_ALLOC_NEST_PARENT, and
   IOMMU_DOMAIN_NESTED support

Over all these things are going to become more accessible to iommufd, and
exposed to VMs, so it is important for the driver to have a robust
implementation of the API.

The work is split into three parts, with this part largely focusing on the
STE and building up to the BLOCKED & IDENTITY global static domains.

The second part largely focuses on the CD and builds up to having a common
PASID infrastructure that SVA and S1 domains equally use.

The third part has some random cleanups and the iommufd related parts.

Overall this takes the approach of turning the STE/CD programming upside
down where the CD/STE value is computed right at a driver callback
function and then pushed down into programming logic. The programming
logic hides the details of the required CD/STE tear-less update. This
makes the CD/STE functions independent of the arm_smmu_domain which makes
it fairly straightforward to untangle all the different call chains, and
add news ones.

Further, this frees the arm_smmu_domain related logic from keeping track
of what state the STE/CD is currently in so it can carefully sequence the
correct update. There are many new update pairs that are subtly introduced
as the work progresses.

The locking to support BTM via arm_smmu_asid_lock is a bit subtle right
now and patches throughout this work adjust and tighten this so that it is
clearer and doesn't get broken.

Once the lower STE layers no longer need to touch arm_smmu_domain we can
isolate struct arm_smmu_domain to be only used for PAGING domains, audit
all the to_smmu_domain() calls to be only in PAGING domain ops, and
introduce the normal global static BLOCKED/IDENTITY domains using the new
STE infrastructure. Part 2 will ultimately migrate SVA over to use
arm_smmu_domain as well.

All parts are on github:

 https://github.com/jgunthorpe/linux/commits/smmuv3_newapi

v3:
 - Use some local variables in arm_smmu_get_step_for_sid() for clarity
 - White space and spelling changes
 - Commit message updates
 - Keep master->domain_head initialized to avoid a list_del corruption
v2: https://lore.kernel.org/r/0-v2-de8b10590bf5+400-smmuv3_newapi_p1_jgg@nvidia.com
 - Rebased on v6.7-rc1
 - Improve the comment for arm_smmu_write_entry_step()
 - Fix the botched memcmp
 - Document the spec justification for the SHCFG exclusion in used
 - Include STRTAB_STE_1_SHCFG for STRTAB_STE_0_CFG_S2_TRANS in used
 - WARN_ON for unknown STEs in used
 - Fix error unwind in arm_smmu_attach_dev()
 - Whitespace, spelling, and checkpatch related items
v1: https://lore.kernel.org/r/0-v1-e289ca9121be+2be-smmuv3_newapi_p1_jgg@nvidia.com

Jason Gunthorpe (19):
  iommu/arm-smmu-v3: Add a type for the STE
  iommu/arm-smmu-v3: Master cannot be NULL in
    arm_smmu_write_strtab_ent()
  iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED
  iommu/arm-smmu-v3: Make STE programming independent of the callers
  iommu/arm-smmu-v3: Consolidate the STE generation for abort/bypass
  iommu/arm-smmu-v3: Move arm_smmu_rmr_install_bypass_ste()
  iommu/arm-smmu-v3: Move the STE generation for S1 and S2 domains into
    functions
  iommu/arm-smmu-v3: Build the whole STE in
    arm_smmu_make_s2_domain_ste()
  iommu/arm-smmu-v3: Hold arm_smmu_asid_lock during all of attach_dev
  iommu/arm-smmu-v3: Compute the STE only once for each master
  iommu/arm-smmu-v3: Do not change the STE twice during
    arm_smmu_attach_dev()
  iommu/arm-smmu-v3: Put writing the context descriptor in the right
    order
  iommu/arm-smmu-v3: Pass smmu_domain to arm_enable/disable_ats()
  iommu/arm-smmu-v3: Remove arm_smmu_master->domain
  iommu/arm-smmu-v3: Add a global static IDENTITY domain
  iommu/arm-smmu-v3: Add a global static BLOCKED domain
  iommu/arm-smmu-v3: Use the identity/blocked domain during release
  iommu/arm-smmu-v3: Pass arm_smmu_domain and arm_smmu_device to
    finalize
  iommu/arm-smmu-v3: Convert to domain_alloc_paging()

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 729 +++++++++++++-------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  12 +-
 2 files changed, 477 insertions(+), 264 deletions(-)


base-commit: ca7fcaff577c92d85f0e05cc7be79759155fe328

Comments

Moritz Fischer Dec. 6, 2023, 1:53 a.m. UTC | #1
Hi Jason,

just got back to actually having access to my machine...

On Tue, Dec 05, 2023 at 03:14:32PM -0400, Jason Gunthorpe wrote:
> The SMMUv3 driver was originally written in 2015 when the iommu driver
> facing API looked quite different. The API has evolved, especially lately,
> and the driver has fallen behind.

> This work aims to bring make the SMMUv3 driver the best IOMMU driver with
> the most comprehensive implementation of the API. After all parts it
> addresses:

>   - Global static BLOCKED and IDENTITY domains with 'never fail' attach
>     semantics. BLOCKED is desired for efficient VFIO.

>   - Support map before attach for PAGING iommu_domains.

>   - attach_dev failure does not change the HW configuration.

>   - Fully hitless transitions between IDENTITY -> DMA -> IDENTITY.
>     The API has IOMMU_RESV_DIRECT which is expected to be
>     continuously translating.

>   - Safe transitions between PAGING -> BLOCKED, do not ever temporarily
>     do IDENTITY. This is required for iommufd security.

>   - Full PASID API support including:
>      - S1/SVA domains attached to PASIDs
>      - IDENTITY/BLOCKED/S1 attached to RID
>      - Change of the RID domain while PASIDs are attached

>   - Streamlined SVA support using the core infrastructure

>   - Hitless, whenever possible, change between two domains

>   - iommufd IOMMU_GET_HW_INFO, IOMMU_HWPT_ALLOC_NEST_PARENT, and
>     IOMMU_DOMAIN_NESTED support

> Over all these things are going to become more accessible to iommufd, and
> exposed to VMs, so it is important for the driver to have a robust
> implementation of the API.

> The work is split into three parts, with this part largely focusing on the
> STE and building up to the BLOCKED & IDENTITY global static domains.

> The second part largely focuses on the CD and builds up to having a common
> PASID infrastructure that SVA and S1 domains equally use.

> The third part has some random cleanups and the iommufd related parts.

> Overall this takes the approach of turning the STE/CD programming upside
> down where the CD/STE value is computed right at a driver callback
> function and then pushed down into programming logic. The programming
> logic hides the details of the required CD/STE tear-less update. This
> makes the CD/STE functions independent of the arm_smmu_domain which makes
> it fairly straightforward to untangle all the different call chains, and
> add news ones.

> Further, this frees the arm_smmu_domain related logic from keeping track
> of what state the STE/CD is currently in so it can carefully sequence the
> correct update. There are many new update pairs that are subtly introduced
> as the work progresses.

> The locking to support BTM via arm_smmu_asid_lock is a bit subtle right
> now and patches throughout this work adjust and tighten this so that it is
> clearer and doesn't get broken.

> Once the lower STE layers no longer need to touch arm_smmu_domain we can
> isolate struct arm_smmu_domain to be only used for PAGING domains, audit
> all the to_smmu_domain() calls to be only in PAGING domain ops, and
> introduce the normal global static BLOCKED/IDENTITY domains using the new
> STE infrastructure. Part 2 will ultimately migrate SVA over to use
> arm_smmu_domain as well.

> All parts are on github:

>   https://github.com/jgunthorpe/linux/commits/smmuv3_newapi

> v3:
>   - Use some local variables in arm_smmu_get_step_for_sid() for clarity
>   - White space and spelling changes
>   - Commit message updates
>   - Keep master->domain_head initialized to avoid a list_del corruption
> v2:  
> https://lore.kernel.org/r/0-v2-de8b10590bf5+400-smmuv3_newapi_p1_jgg@nvidia.com
>   - Rebased on v6.7-rc1
>   - Improve the comment for arm_smmu_write_entry_step()
>   - Fix the botched memcmp
>   - Document the spec justification for the SHCFG exclusion in used
>   - Include STRTAB_STE_1_SHCFG for STRTAB_STE_0_CFG_S2_TRANS in used
>   - WARN_ON for unknown STEs in used
>   - Fix error unwind in arm_smmu_attach_dev()
>   - Whitespace, spelling, and checkpatch related items
> v1:  
> https://lore.kernel.org/r/0-v1-e289ca9121be+2be-smmuv3_newapi_p1_jgg@nvidia.com

> Jason Gunthorpe (19):
>    iommu/arm-smmu-v3: Add a type for the STE
>    iommu/arm-smmu-v3: Master cannot be NULL in
>      arm_smmu_write_strtab_ent()
>    iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED
>    iommu/arm-smmu-v3: Make STE programming independent of the callers
>    iommu/arm-smmu-v3: Consolidate the STE generation for abort/bypass
>    iommu/arm-smmu-v3: Move arm_smmu_rmr_install_bypass_ste()
>    iommu/arm-smmu-v3: Move the STE generation for S1 and S2 domains into
>      functions
>    iommu/arm-smmu-v3: Build the whole STE in
>      arm_smmu_make_s2_domain_ste()
>    iommu/arm-smmu-v3: Hold arm_smmu_asid_lock during all of attach_dev
>    iommu/arm-smmu-v3: Compute the STE only once for each master
>    iommu/arm-smmu-v3: Do not change the STE twice during
>      arm_smmu_attach_dev()
>    iommu/arm-smmu-v3: Put writing the context descriptor in the right
>      order
>    iommu/arm-smmu-v3: Pass smmu_domain to arm_enable/disable_ats()
>    iommu/arm-smmu-v3: Remove arm_smmu_master->domain
>    iommu/arm-smmu-v3: Add a global static IDENTITY domain
>    iommu/arm-smmu-v3: Add a global static BLOCKED domain
>    iommu/arm-smmu-v3: Use the identity/blocked domain during release
>    iommu/arm-smmu-v3: Pass arm_smmu_domain and arm_smmu_device to
>      finalize
>    iommu/arm-smmu-v3: Convert to domain_alloc_paging()

>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 729 +++++++++++++-------
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  12 +-
>   2 files changed, 477 insertions(+), 264 deletions(-)


> base-commit: ca7fcaff577c92d85f0e05cc7be79759155fe328
> --
> 2.43.0


For whole series:
Tested-by: Moritz Fischer <moritzf@google.com>

Cheers,
Moritz
Jason Gunthorpe Dec. 11, 2023, 6:03 p.m. UTC | #2
On Tue, Dec 05, 2023 at 03:14:32PM -0400, Jason Gunthorpe wrote:
> All parts are on github:
> 
>  https://github.com/jgunthorpe/linux/commits/smmuv3_newapi
> 
> v3:
>  - Use some local variables in arm_smmu_get_step_for_sid() for clarity
>  - White space and spelling changes
>  - Commit message updates
>  - Keep master->domain_head initialized to avoid a list_del corruption
> v2: https://lore.kernel.org/r/0-v2-de8b10590bf5+400-smmuv3_newapi_p1_jgg@nvidia.com
>  - Rebased on v6.7-rc1
>  - Improve the comment for arm_smmu_write_entry_step()
>  - Fix the botched memcmp
>  - Document the spec justification for the SHCFG exclusion in used
>  - Include STRTAB_STE_1_SHCFG for STRTAB_STE_0_CFG_S2_TRANS in used
>  - WARN_ON for unknown STEs in used
>  - Fix error unwind in arm_smmu_attach_dev()
>  - Whitespace, spelling, and checkpatch related items
> v1: https://lore.kernel.org/r/0-v1-e289ca9121be+2be-smmuv3_newapi_p1_jgg@nvidia.com

This hasn't changed significantly in the last three months, so I feel
done now. I think Eric may still have a formal Tested-by for his
Fujitsu system to record the run he did.

Will, we are waiting for you to say something so we can shift review
and testing focus to part 2, ideally in January. Many people are
waiting for this.

Thanks,
Jason
Will Deacon Dec. 11, 2023, 6:15 p.m. UTC | #3
On Mon, Dec 11, 2023 at 02:03:24PM -0400, Jason Gunthorpe wrote:
> On Tue, Dec 05, 2023 at 03:14:32PM -0400, Jason Gunthorpe wrote:
> > All parts are on github:
> > 
> >  https://github.com/jgunthorpe/linux/commits/smmuv3_newapi
> > 
> > v3:
> >  - Use some local variables in arm_smmu_get_step_for_sid() for clarity
> >  - White space and spelling changes
> >  - Commit message updates
> >  - Keep master->domain_head initialized to avoid a list_del corruption
> > v2: https://lore.kernel.org/r/0-v2-de8b10590bf5+400-smmuv3_newapi_p1_jgg@nvidia.com
> >  - Rebased on v6.7-rc1
> >  - Improve the comment for arm_smmu_write_entry_step()
> >  - Fix the botched memcmp
> >  - Document the spec justification for the SHCFG exclusion in used
> >  - Include STRTAB_STE_1_SHCFG for STRTAB_STE_0_CFG_S2_TRANS in used
> >  - WARN_ON for unknown STEs in used
> >  - Fix error unwind in arm_smmu_attach_dev()
> >  - Whitespace, spelling, and checkpatch related items
> > v1: https://lore.kernel.org/r/0-v1-e289ca9121be+2be-smmuv3_newapi_p1_jgg@nvidia.com
> 
> This hasn't changed significantly in the last three months, so I feel
> done now. I think Eric may still have a formal Tested-by for his
> Fujitsu system to record the run he did.
> 
> Will, we are waiting for you to say something so we can shift review
> and testing focus to part 2, ideally in January. Many people are
> waiting for this.

I'm sorry that you're waiting for me, but I'm snowed under with other
changes and the arm64 tree is my priority at the moment. This series _is_ on
my list and I appreciate that you've got some review, however the fact that
you seem to be lacking any comments from the usual SMMU folks such as Robin
and Jean-Philippe does make me worry about this series to the point that I'm
not prepared just to pick it up without a thorough look.

It sucks, but I don't know what else to tell you.

Will
Mostafa Saleh Jan. 29, 2024, 7:13 p.m. UTC | #4
Hi Jason,

On Tue, Dec 05, 2023 at 03:14:32PM -0400, Jason Gunthorpe wrote:
> The SMMUv3 driver was originally written in 2015 when the iommu driver
> facing API looked quite different. The API has evolved, especially lately,
> and the driver has fallen behind.
> 
> This work aims to bring make the SMMUv3 driver the best IOMMU driver with
> the most comprehensive implementation of the API. After all parts it
> addresses:
> 
>  - Global static BLOCKED and IDENTITY domains with 'never fail' attach
>    semantics. BLOCKED is desired for efficient VFIO.
> 
>  - Support map before attach for PAGING iommu_domains.
> 
>  - attach_dev failure does not change the HW configuration.
> 
>  - Fully hitless transitions between IDENTITY -> DMA -> IDENTITY.
>    The API has IOMMU_RESV_DIRECT which is expected to be
>    continuously translating.
> 
>  - Safe transitions between PAGING -> BLOCKED, do not ever temporarily
>    do IDENTITY. This is required for iommufd security.
> 
>  - Full PASID API support including:
>     - S1/SVA domains attached to PASIDs
>     - IDENTITY/BLOCKED/S1 attached to RID
>     - Change of the RID domain while PASIDs are attached
> 
>  - Streamlined SVA support using the core infrastructure
> 
>  - Hitless, whenever possible, change between two domains
> 
>  - iommufd IOMMU_GET_HW_INFO, IOMMU_HWPT_ALLOC_NEST_PARENT, and
>    IOMMU_DOMAIN_NESTED support
> 
> Over all these things are going to become more accessible to iommufd, and
> exposed to VMs, so it is important for the driver to have a robust
> implementation of the API.
> 
> The work is split into three parts, with this part largely focusing on the
> STE and building up to the BLOCKED & IDENTITY global static domains.
> 
> The second part largely focuses on the CD and builds up to having a common
> PASID infrastructure that SVA and S1 domains equally use.
> 
> The third part has some random cleanups and the iommufd related parts.
> 
> Overall this takes the approach of turning the STE/CD programming upside
> down where the CD/STE value is computed right at a driver callback
> function and then pushed down into programming logic. The programming
> logic hides the details of the required CD/STE tear-less update. This
> makes the CD/STE functions independent of the arm_smmu_domain which makes
> it fairly straightforward to untangle all the different call chains, and
> add news ones.
> 
> Further, this frees the arm_smmu_domain related logic from keeping track
> of what state the STE/CD is currently in so it can carefully sequence the
> correct update. There are many new update pairs that are subtly introduced
> as the work progresses.
> 
> The locking to support BTM via arm_smmu_asid_lock is a bit subtle right
> now and patches throughout this work adjust and tighten this so that it is
> clearer and doesn't get broken.
> 
> Once the lower STE layers no longer need to touch arm_smmu_domain we can
> isolate struct arm_smmu_domain to be only used for PAGING domains, audit
> all the to_smmu_domain() calls to be only in PAGING domain ops, and
> introduce the normal global static BLOCKED/IDENTITY domains using the new
> STE infrastructure. Part 2 will ultimately migrate SVA over to use
> arm_smmu_domain as well.
> 
> All parts are on github:
> 
>  https://github.com/jgunthorpe/linux/commits/smmuv3_newapi
I added some comments/questions for this series, but didn’t review it
thoroughly as I see the code on github is quite different from these patches,
and it seems to be targeted for v4. Do you have any plans to send it soon?

> 
> v3:
>  - Use some local variables in arm_smmu_get_step_for_sid() for clarity
>  - White space and spelling changes
>  - Commit message updates
>  - Keep master->domain_head initialized to avoid a list_del corruption
> v2: https://lore.kernel.org/r/0-v2-de8b10590bf5+400-smmuv3_newapi_p1_jgg@nvidia.com
>  - Rebased on v6.7-rc1
>  - Improve the comment for arm_smmu_write_entry_step()
>  - Fix the botched memcmp
>  - Document the spec justification for the SHCFG exclusion in used
>  - Include STRTAB_STE_1_SHCFG for STRTAB_STE_0_CFG_S2_TRANS in used
>  - WARN_ON for unknown STEs in used
>  - Fix error unwind in arm_smmu_attach_dev()
>  - Whitespace, spelling, and checkpatch related items
> v1: https://lore.kernel.org/r/0-v1-e289ca9121be+2be-smmuv3_newapi_p1_jgg@nvidia.com
> 
> Jason Gunthorpe (19):
>   iommu/arm-smmu-v3: Add a type for the STE
>   iommu/arm-smmu-v3: Master cannot be NULL in
>     arm_smmu_write_strtab_ent()
>   iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED
>   iommu/arm-smmu-v3: Make STE programming independent of the callers
>   iommu/arm-smmu-v3: Consolidate the STE generation for abort/bypass
>   iommu/arm-smmu-v3: Move arm_smmu_rmr_install_bypass_ste()
>   iommu/arm-smmu-v3: Move the STE generation for S1 and S2 domains into
>     functions
>   iommu/arm-smmu-v3: Build the whole STE in
>     arm_smmu_make_s2_domain_ste()
>   iommu/arm-smmu-v3: Hold arm_smmu_asid_lock during all of attach_dev
>   iommu/arm-smmu-v3: Compute the STE only once for each master
>   iommu/arm-smmu-v3: Do not change the STE twice during
>     arm_smmu_attach_dev()
>   iommu/arm-smmu-v3: Put writing the context descriptor in the right
>     order
>   iommu/arm-smmu-v3: Pass smmu_domain to arm_enable/disable_ats()
>   iommu/arm-smmu-v3: Remove arm_smmu_master->domain
>   iommu/arm-smmu-v3: Add a global static IDENTITY domain
>   iommu/arm-smmu-v3: Add a global static BLOCKED domain
>   iommu/arm-smmu-v3: Use the identity/blocked domain during release
>   iommu/arm-smmu-v3: Pass arm_smmu_domain and arm_smmu_device to
>     finalize
>   iommu/arm-smmu-v3: Convert to domain_alloc_paging()
> 
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 729 +++++++++++++-------
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  12 +-
>  2 files changed, 477 insertions(+), 264 deletions(-)
> 
> 
> base-commit: ca7fcaff577c92d85f0e05cc7be79759155fe328
> -- 
> 2.43.0
>
Thanks,
Mostafa
Jason Gunthorpe Jan. 29, 2024, 7:42 p.m. UTC | #5
On Mon, Jan 29, 2024 at 07:13:13PM +0000, Mostafa Saleh wrote:

> > All parts are on github:
> > 
> >  https://github.com/jgunthorpe/linux/commits/smmuv3_newapi
> I added some comments/questions for this series, but didn’t review it
> thoroughly as I see the code on github is quite different from these patches,
> and it seems to be targeted for v4. Do you have any plans to send it soon?

The part 1 didn't change too, much aside from patch 4, but v4 is already posted:

https://lore.kernel.org/linux-iommu/0-v4-c93b774edcc4+42d2b-smmuv3_newapi_p1_jgg@nvidia.com/

Thanks,
Jason
Mostafa Saleh Jan. 29, 2024, 8:45 p.m. UTC | #6
On Mon, Jan 29, 2024 at 03:42:45PM -0400, Jason Gunthorpe wrote:
> On Mon, Jan 29, 2024 at 07:13:13PM +0000, Mostafa Saleh wrote:
> 
> > > All parts are on github:
> > > 
> > >  https://github.com/jgunthorpe/linux/commits/smmuv3_newapi
> > I added some comments/questions for this series, but didn’t review it
> > thoroughly as I see the code on github is quite different from these patches,
> > and it seems to be targeted for v4. Do you have any plans to send it soon?
> 
> The part 1 didn't change too, much aside from patch 4, but v4 is already posted:
> 
> https://lore.kernel.org/linux-iommu/0-v4-c93b774edcc4+42d2b-smmuv3_newapi_p1_jgg@nvidia.com/

Oh, I missed it, thanks, I will move to reviewing v4.
> Thanks,
> Jason