mbox series

[v9,0/9] Make the SMMUv3 CD logic match the new STE design (part 2a/3)

Message ID 0-v9-5040dc602008+177d7-smmuv3_newapi_p2_jgg@nvidia.com (mailing list archive)
Headers show
Series Make the SMMUv3 CD logic match the new STE design (part 2a/3) | expand

Message

Jason Gunthorpe April 30, 2024, 5:21 p.m. UTC
This is split out from the larger part two which aimes to rework the PASID
related code.

No new functionality is introduced in theses commits, it just reorganizes
the CD logic to follow the same design of the new STE logic using make
functions and a single programming flow without leaking details to
callers.

CD does not have as strong a need for this as STE, but all the code exists
and continuing with the same pattern makes for fewer things to understand
inside the driver.

The following PASID code makes use of this to rethread how the CD
programming works to take a caller created struct arm_smmu_cd and then
stick whatever that is into the live CD entry. This allows the actual
PASID and CD logic to be general and then the PAGING and SVA domain types
can sit on top of it.

There are four kinds of CDs:
 - Blocking (ie cleared)
 - S1 PAGING
 - SVA
 - SVA with a released MM (all fault)

The last two have to transition hitlessly.

v9:
 - Remove arm_smmu_clean_cd_entry() and instead zero the required CD bits
   inside arm_smmu_write_ctx_desc()
v8: https://lore.kernel.org/r/0-v8-4c4298c63951+13484-smmuv3_newapi_p2_jgg@nvidia.com
 - Remove ops->v_bit and just use entry[0] = 0
 - Revise commit messages
 - Revise comment in arm_smmu_get_cd_used()
 - Move hunks removing code from arm_smmu_write_ctx_desc() earlier for
   clarity
 - Consistently avoid CTXDESC_SPLIT and remove the macro
 - Don't check master->stall_enabled for the STALL_FORCE mm release
 - Fixup rand config failures on the kunit
 - Use VISIBLE_IF_KUNIT
 - Move more mock structs to .data to avoid stack frame size warnings
v7: https://lore.kernel.org/r/0-v7-cb149db3a320+3b5-smmuv3_newapi_p2_jgg@nvidia.com
 - Rebase on Will's for-next & v6.9-rc2
 - Split series in half
 - Include the kunit test
 - Update comments to refer to the STE & CD in the writer logic
v6: https://lore.kernel.org/r/0-v6-228e7adf25eb+4155-smmuv3_newapi_p2_jgg@nvidia.com

Jason Gunthorpe (9):
  iommu/arm-smmu-v3: Add an ops indirection to the STE code
  iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()
  iommu/arm-smmu-v3: Move the CD generation for S1 domains into a
    function
  iommu/arm-smmu-v3: Consolidate clearing a CD table entry
  iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr()
  iommu/arm-smmu-v3: Allocate the CD table entry in advance
  iommu/arm-smmu-v3: Move the CD generation for SVA into a function
  iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd()
  iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry

 drivers/iommu/Kconfig                         |  13 +-
 drivers/iommu/arm/arm-smmu-v3/Makefile        |   1 +
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 164 ++++--
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c  | 465 ++++++++++++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 504 ++++++++++--------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  53 +-
 6 files changed, 903 insertions(+), 297 deletions(-)
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c


base-commit: e8e4398d53f98be7ac48e0bda9ea6e26df24136d

Comments

Nicolin Chen April 30, 2024, 9:11 p.m. UTC | #1
On Tue, Apr 30, 2024 at 02:21:32PM -0300, Jason Gunthorpe wrote:
> v9:
>  - Remove arm_smmu_clean_cd_entry() and instead zero the required CD bits
>    inside arm_smmu_write_ctx_desc()

SVA sanity works well with v9 as my previous test on v7.

Thanks
Nicolin
Jason Gunthorpe May 1, 2024, 12:02 a.m. UTC | #2
On Tue, Apr 30, 2024 at 02:11:35PM -0700, Nicolin Chen wrote:
> On Tue, Apr 30, 2024 at 02:21:32PM -0300, Jason Gunthorpe wrote:
> > v9:
> >  - Remove arm_smmu_clean_cd_entry() and instead zero the required CD bits
> >    inside arm_smmu_write_ctx_desc()
> 
> SVA sanity works well with v9 as my previous test on v7.

Yeah, very little changed in the main code after everything is
applied. Most of the edits were adjusting interior patches

Here is the diff from v7 to v9.

Thanks,
Jason

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 2e597102baf6e5..f872aeccd82041 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -411,8 +411,9 @@ config ARM_SMMU_V3_SVA
 	  and PRI.
 
 config ARM_SMMU_V3_KUNIT_TEST
-	tristate "KUnit tests for arm-smmu-v3 driver"  if !KUNIT_ALL_TESTS
+	bool "KUnit tests for arm-smmu-v3 driver"  if !KUNIT_ALL_TESTS
 	depends on KUNIT
+	depends on ARM_SMMU_V3_SVA
 	default KUNIT_ALL_TESTS
 	help
 	  Enable this option to unit-test arm-smmu-v3 driver functions.
diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
index 014a997753a8a2..0b97054b3929b7 100644
--- a/drivers/iommu/arm/arm-smmu-v3/Makefile
+++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
@@ -2,6 +2,5 @@
 obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o
 arm_smmu_v3-objs-y += arm-smmu-v3.o
 arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
+arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_KUNIT_TEST) += arm-smmu-v3-test.o
 arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
-
-obj-$(CONFIG_ARM_SMMU_V3_KUNIT_TEST) += arm-smmu-v3-test.o
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index f56a2d38012b5c..34a977a0767d46 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -8,6 +8,7 @@
 #include <linux/mmu_notifier.h>
 #include <linux/sched/mm.h>
 #include <linux/slab.h>
+#include <kunit/visibility.h>
 
 #include "arm-smmu-v3.h"
 #include "../../io-pgtable-arm.h"
@@ -120,6 +121,7 @@ static u64 page_size_to_cd(void)
 	return ARM_LPAE_TCR_TG0_4K;
 }
 
+VISIBLE_IF_KUNIT
 void arm_smmu_make_sva_cd(struct arm_smmu_cd *target,
 			  struct arm_smmu_master *master, struct mm_struct *mm,
 			  u16 asid)
@@ -172,8 +174,7 @@ void arm_smmu_make_sva_cd(struct arm_smmu_cd *target,
 		 * disable is permitted. This speeds up cleanup for an unclean
 		 * exit if the device is still doing a lot of DMA.
 		 */
-		if (master->stall_enabled &&
-		    !(master->smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
+		if (!(master->smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
 			target->data[0] &=
 				cpu_to_le64(~(CTXDESC_CD_0_S | CTXDESC_CD_0_R));
 	}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
index 14c8e40712a70e..417804392ff089 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
@@ -26,6 +26,9 @@ static struct arm_smmu_ste abort_ste;
 static struct arm_smmu_device smmu = {
 	.features = ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_ATTR_TYPES_OVR
 };
+static struct mm_struct sva_mm = {
+	.pgd = (void *)0xdaedbeefdeadbeefULL,
+};
 
 static bool arm_smmu_entry_differs_in_used_bits(const __le64 *entry,
 						const __le64 *used_bits,
@@ -61,7 +64,7 @@ arm_smmu_test_writer_record_syncs(struct arm_smmu_entry_writer *writer)
 			     false);
 
 	test_writer->num_syncs += 1;
-	if (!(test_writer->entry[0] & writer->ops->v_bit)) {
+	if (!test_writer->entry[0]) {
 		test_writer->invalid_entry_written = true;
 	} else {
 		/*
@@ -96,13 +99,11 @@ arm_smmu_v3_test_debug_print_used_bits(struct arm_smmu_entry_writer *writer,
 }
 
 static const struct arm_smmu_entry_writer_ops test_ste_ops = {
-	.v_bit = cpu_to_le64(STRTAB_STE_0_V),
 	.sync = arm_smmu_test_writer_record_syncs,
 	.get_used = arm_smmu_get_ste_used,
 };
 
 static const struct arm_smmu_entry_writer_ops test_cd_ops = {
-	.v_bit = cpu_to_le64(CTXDESC_CD_0_V),
 	.sync = arm_smmu_test_writer_record_syncs,
 	.get_used = arm_smmu_get_cd_used,
 };
@@ -392,11 +393,8 @@ static void arm_smmu_test_make_sva_cd(struct arm_smmu_cd *cd, unsigned int asid)
 	struct arm_smmu_master master = {
 		.smmu = &smmu,
 	};
-	struct mm_struct mm = {
-		.pgd = (void *)0xdaedbeefdeadbeefULL,
-	};
 
-	arm_smmu_make_sva_cd(cd, &master, &mm, asid);
+	arm_smmu_make_sva_cd(cd, &master, &sva_mm, asid);
 }
 
 static void arm_smmu_test_make_sva_release_cd(struct arm_smmu_cd *cd,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 3ffaa3b34b44bf..15bad76cf84a61 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -26,6 +26,7 @@
 #include <linux/pci.h>
 #include <linux/pci-ats.h>
 #include <linux/platform_device.h>
+#include <kunit/visibility.h>
 
 #include "arm-smmu-v3.h"
 #include "../../dma-iommu.h"
@@ -968,6 +969,7 @@ void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid)
  * would be nice if this was complete according to the spec, but minimally it
  * has to capture the bits this driver uses.
  */
+VISIBLE_IF_KUNIT
 void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
 {
 	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
@@ -1090,6 +1092,7 @@ static bool entry_set(struct arm_smmu_entry_writer *writer, __le64 *entry,
  * V=0 process. This relies on the IGNORED behavior described in the
  * specification.
  */
+VISIBLE_IF_KUNIT
 void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *entry,
 			  const __le64 *target)
 {
@@ -1124,7 +1127,7 @@ void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *entry,
 		 * requires a breaking update, zero the V bit, write all qwords
 		 * but 0, then set qword 0
 		 */
-		unused_update[0] = entry[0] & (~writer->ops->v_bit);
+		unused_update[0] = 0;
 		entry_set(writer, entry, unused_update, 0, 1);
 		entry_set(writer, entry, target, 1, NUM_ENTRY_QWORDS - 1);
 		entry_set(writer, entry, target, 0, 1);
@@ -1221,7 +1224,7 @@ struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
 	}
 
 	if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_64K_L2) {
-		unsigned int idx = ssid >> CTXDESC_SPLIT;
+		unsigned int idx = ssid / CTXDESC_L2_ENTRIES;
 		struct arm_smmu_l1_ctx_desc *l1_desc;
 
 		l1_desc = &cd_table->l1_desc[idx];
@@ -1245,6 +1248,7 @@ struct arm_smmu_cd_writer {
 	unsigned int ssid;
 };
 
+VISIBLE_IF_KUNIT
 void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
 {
 	used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
@@ -1252,7 +1256,10 @@ void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
 		return;
 	memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));
 
-	/* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
+	/*
+	 * If EPD0 is set by the make function it means
+	 * T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED
+	 */
 	if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
 		used_bits[0] &= ~cpu_to_le64(
 			CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
@@ -1273,7 +1280,6 @@ static void arm_smmu_cd_writer_sync_entry(struct arm_smmu_entry_writer *writer)
 static const struct arm_smmu_entry_writer_ops arm_smmu_cd_writer_ops = {
 	.sync = arm_smmu_cd_writer_sync_entry,
 	.get_used = arm_smmu_get_cd_used,
-	.v_bit = cpu_to_le64(CTXDESC_CD_0_V),
 };
 
 void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid,
@@ -1472,7 +1478,6 @@ static void arm_smmu_ste_writer_sync_entry(struct arm_smmu_entry_writer *writer)
 static const struct arm_smmu_entry_writer_ops arm_smmu_ste_writer_ops = {
 	.sync = arm_smmu_ste_writer_sync_entry,
 	.get_used = arm_smmu_get_ste_used,
-	.v_bit = cpu_to_le64(STRTAB_STE_0_V),
 };
 
 static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
@@ -1502,6 +1507,7 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
 	}
 }
 
+VISIBLE_IF_KUNIT
 void arm_smmu_make_abort_ste(struct arm_smmu_ste *target)
 {
 	memset(target, 0, sizeof(*target));
@@ -1510,6 +1516,7 @@ void arm_smmu_make_abort_ste(struct arm_smmu_ste *target)
 		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT));
 }
 
+VISIBLE_IF_KUNIT
 void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu,
 			      struct arm_smmu_ste *target)
 {
@@ -1523,6 +1530,7 @@ void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu,
 							 STRTAB_STE_1_SHCFG_INCOMING));
 }
 
+VISIBLE_IF_KUNIT
 void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
 			       struct arm_smmu_master *master)
 {
@@ -1573,6 +1581,7 @@ void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
 	}
 }
 
+VISIBLE_IF_KUNIT
 void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
 				 struct arm_smmu_master *master,
 				 struct arm_smmu_domain *smmu_domain)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 0455498d24c730..1242a086c9f948 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -275,8 +275,7 @@ struct arm_smmu_ste {
  * 2lvl: at most 1024 L1 entries,
  *       1024 lazy entries per table.
  */
-#define CTXDESC_SPLIT			10
-#define CTXDESC_L2_ENTRIES		(1 << CTXDESC_SPLIT)
+#define CTXDESC_L2_ENTRIES		1024
 
 #define CTXDESC_L1_DESC_DWORDS		1
 #define CTXDESC_L1_DESC_V		(1UL << 0)
@@ -745,16 +744,15 @@ struct arm_smmu_entry_writer {
 };
 
 struct arm_smmu_entry_writer_ops {
-	__le64 v_bit;
 	void (*get_used)(const __le64 *entry, __le64 *used);
 	void (*sync)(struct arm_smmu_entry_writer *writer);
 };
 
+#if IS_ENABLED(CONFIG_KUNIT)
 void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits);
-void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits);
 void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *cur,
 			  const __le64 *target);
-
+void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits);
 void arm_smmu_make_abort_ste(struct arm_smmu_ste *target);
 void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu,
 			      struct arm_smmu_ste *target);
@@ -766,6 +764,7 @@ void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
 void arm_smmu_make_sva_cd(struct arm_smmu_cd *target,
 			  struct arm_smmu_master *master, struct mm_struct *mm,
 			  u16 asid);
+#endif
 
 static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
 {
Will Deacon May 1, 2024, 4:20 p.m. UTC | #3
On Tue, 30 Apr 2024 14:21:32 -0300, Jason Gunthorpe wrote:
> This is split out from the larger part two which aimes to rework the PASID
> related code.
> 
> No new functionality is introduced in theses commits, it just reorganizes
> the CD logic to follow the same design of the new STE logic using make
> functions and a single programming flow without leaking details to
> callers.
> 
> [...]

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/9] iommu/arm-smmu-v3: Add an ops indirection to the STE code
	https://git.kernel.org/will/c/de31c3555412
[2/9] iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()
	https://git.kernel.org/will/c/78a5fbe8395b
[3/9] iommu/arm-smmu-v3: Move the CD generation for S1 domains into a function
	https://git.kernel.org/will/c/e9d1e4ff74b9
[4/9] iommu/arm-smmu-v3: Consolidate clearing a CD table entry
	https://git.kernel.org/will/c/af8f0b83ea2b
[5/9] iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr()
	https://git.kernel.org/will/c/b2f4c0fcf094
[6/9] iommu/arm-smmu-v3: Allocate the CD table entry in advance
	https://git.kernel.org/will/c/13abe4faac43
[7/9] iommu/arm-smmu-v3: Move the CD generation for SVA into a function
	https://git.kernel.org/will/c/7b87c93c8b86
[8/9] iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd()
	https://git.kernel.org/will/c/04905c17f648
[9/9] iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry
	https://git.kernel.org/will/c/56e1a4cc2588

Cheers,
Jason Gunthorpe May 1, 2024, 4:50 p.m. UTC | #4
On Wed, May 01, 2024 at 05:20:29PM +0100, Will Deacon wrote:
> On Tue, 30 Apr 2024 14:21:32 -0300, Jason Gunthorpe wrote:
> > This is split out from the larger part two which aimes to rework the PASID
> > related code.
> > 
> > No new functionality is introduced in theses commits, it just reorganizes
> > the CD logic to follow the same design of the new STE logic using make
> > functions and a single programming flow without leaking details to
> > callers.
> > 
> > [...]
> 
> Applied to will (for-joerg/arm-smmu/updates), thanks!
> 
> [1/9] iommu/arm-smmu-v3: Add an ops indirection to the STE code
> 	https://git.kernel.org/will/c/de31c3555412
> [2/9] iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()
> 	https://git.kernel.org/will/c/78a5fbe8395b
> [3/9] iommu/arm-smmu-v3: Move the CD generation for S1 domains into a function
> 	https://git.kernel.org/will/c/e9d1e4ff74b9
> [4/9] iommu/arm-smmu-v3: Consolidate clearing a CD table entry
> 	https://git.kernel.org/will/c/af8f0b83ea2b
> [5/9] iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr()
> 	https://git.kernel.org/will/c/b2f4c0fcf094
> [6/9] iommu/arm-smmu-v3: Allocate the CD table entry in advance
> 	https://git.kernel.org/will/c/13abe4faac43
> [7/9] iommu/arm-smmu-v3: Move the CD generation for SVA into a function
> 	https://git.kernel.org/will/c/7b87c93c8b86
> [8/9] iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd()
> 	https://git.kernel.org/will/c/04905c17f648
> [9/9] iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry
> 	https://git.kernel.org/will/c/56e1a4cc2588

Thanks Will, I will repost the rebased 2b part once Joerg picks this
up.

Jason