Message ID | 11-v2-de8b10590bf5+400-smmuv3_newapi_p1_jgg@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Update SMMUv3 to the modern iommu API (part 1/3) | expand |
On Tue, Nov 14, 2023 at 1:53 AM Jason Gunthorpe <jgg@nvidia.com> wrote: > > This was needed because the STE code required the STE to be in > ABORT/BYPASS inorder to program a cdtable or S2 STE. Now that the STE code > can automatically handle all transitions we can remove this step > from the attach_dev flow. > > A few small bugs exist because of this: > > 1) If the core code does BLOCKED -> UNMANAGED with disable_bypass=false > then there will be a moment where the STE points at BYPASS. Since > this can be done by VFIO/IOMMUFD it is a small security race. > > 2) If the core code does IDENTITY -> DMA then any IOMMU_RESV_DIRECT > regions will temporarily become BLOCKED. We'd like drivers to > work in a way that allows IOMMU_RESV_DIRECT to be continuously > functional during these transitions. > > Make arm_smmu_release_device() put the STE back to the correct > ABORT/BYPASS setting. Fix a bug where a IOMMU_RESV_DIRECT was ignored on > this path. > > Notice this subtly depends on the prior arm_smmu_asid_lock change as the > STE must be put to non-paging before removing the device for the linked > list to avoid races with arm_smmu_share_asid(). I'm a little confused by this comment. Is this suggesting that arm_smmu_detach_dev had a race condition before the arm_smmu_asid_lock changes, since it deletes the list entry before deactivating the STE that uses the domain and without grabbing the asid_lock, thus allowing a gap where the ASID might be re-acquired by an SVA domain while an STE with that ASID is still live on this device? Wouldn't that belong on the asid_lock patch instead if so? > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> > --- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++++++++------ > 1 file changed, 9 insertions(+), 6 deletions(-) > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > index 4b157c2ddf9a80..f70862806211de 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > @@ -2482,7 +2482,6 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master) > static void arm_smmu_detach_dev(struct arm_smmu_master *master) > { > unsigned long flags; > - struct arm_smmu_ste target; > struct arm_smmu_domain *smmu_domain = master->domain; > > if (!smmu_domain) > @@ -2496,11 +2495,6 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master) > > master->domain = NULL; > master->ats_enabled = false; > - if (disable_bypass) > - arm_smmu_make_abort_ste(&target); > - else > - arm_smmu_make_bypass_ste(&target); > - arm_smmu_install_ste_for_dev(master, &target); > /* > * Clearing the CD entry isn't strictly required to detach the domain > * since the table is uninstalled anyway, but it helps avoid confusion > @@ -2852,9 +2846,18 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev) > static void arm_smmu_release_device(struct device *dev) > { > struct arm_smmu_master *master = dev_iommu_priv_get(dev); > + struct arm_smmu_ste target; > > if (WARN_ON(arm_smmu_master_sva_enabled(master))) > iopf_queue_remove_device(master->smmu->evtq.iopf, dev); > + > + /* Put the STE back to what arm_smmu_init_strtab() sets */ Hmmmm, it seems like checking iommu->require_direct may put STEs in bypass in scenarios where arm_smmu_init_strtab() wouldn't have. arm_smmu_init_strtab is calling iort_get_rmr_sids to pick streams to put into bypass, but IIUC iommu->require_direct also applies to dts-based reserved-memory regions, not just iort. I'm not very familiar with the history behind disable_bypass; why is putting an entire stream into bypass the correct behavior if a reserved-memory (which may be for a small finite region) exists? > + if (disable_bypass && !dev->iommu->require_direct) > + arm_smmu_make_abort_ste(&target); > + else > + arm_smmu_make_bypass_ste(&target); > + arm_smmu_install_ste_for_dev(master, &target); > + > arm_smmu_detach_dev(master); > arm_smmu_disable_pasid(master); > arm_smmu_remove_master(master); > -- > 2.42.0 >
On Wed, Nov 15, 2023 at 11:15:23PM +0800, Michael Shavit wrote: > On Tue, Nov 14, 2023 at 1:53 AM Jason Gunthorpe <jgg@nvidia.com> wrote: > > > > This was needed because the STE code required the STE to be in > > ABORT/BYPASS inorder to program a cdtable or S2 STE. Now that the STE code > > can automatically handle all transitions we can remove this step > > from the attach_dev flow. > > > > A few small bugs exist because of this: > > > > 1) If the core code does BLOCKED -> UNMANAGED with disable_bypass=false > > then there will be a moment where the STE points at BYPASS. Since > > this can be done by VFIO/IOMMUFD it is a small security race. > > > > 2) If the core code does IDENTITY -> DMA then any IOMMU_RESV_DIRECT > > regions will temporarily become BLOCKED. We'd like drivers to > > work in a way that allows IOMMU_RESV_DIRECT to be continuously > > functional during these transitions. > > > > Make arm_smmu_release_device() put the STE back to the correct > > ABORT/BYPASS setting. Fix a bug where a IOMMU_RESV_DIRECT was ignored on > > this path. > > > > Notice this subtly depends on the prior arm_smmu_asid_lock change as the > > STE must be put to non-paging before removing the device for the linked > > list to avoid races with arm_smmu_share_asid(). > > I'm a little confused by this comment. Is this suggesting that > arm_smmu_detach_dev had a race condition before the arm_smmu_asid_lock > changes, since it deletes the list entry before deactivating the STE > that uses the domain and without grabbing the asid_lock, thus allowing > a gap where the ASID might be re-acquired by an SVA domain while an > STE with that ASID is still live on this device? Wouldn't that belong > on the asid_lock patch instead if so? I wasn't intending to say there is an existing bug, this was more to point out why it was organized like this, and why it is OK to remove the detach manipulation of the STE considering races with share_asid. However, I agree that the code in rc1 is troubled and fixed in the prior patch: spin_lock_irqsave(&smmu_domain->devices_lock, flags); list_del(&master->domain_head); spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); ^^^^ Prevents arm_smmu_update_ctx_desc_devices() from storing to the STE However the STE is still pointing at the ASID master->domain = NULL; master->ats_enabled = false; arm_smmu_install_ste_for_dev(master); ^^^^ Now the STE is gone, so the CD becomes unreferenced if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 && master->cd_table.cdtab) arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL); ^^^^ Now the CD is non-valid I was primarily concerned with corrupting the CD, ie that share_asid would race and un-clear the write_ctx_desc(). That is prevented by the ordering above. However, I agree the above is still problematic because there is a short time window where the ASID can be installed in two CDs with two different translations. I suppose there is a security issue where this could corrupt the IOTLB. This is all fixed in this series too by having more robust locking. So this does deserve a note in the commit message for the earlier patch about this issue. > > @@ -2852,9 +2846,18 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev) > > static void arm_smmu_release_device(struct device *dev) > > { > > struct arm_smmu_master *master = dev_iommu_priv_get(dev); > > + struct arm_smmu_ste target; > > > > if (WARN_ON(arm_smmu_master_sva_enabled(master))) > > iopf_queue_remove_device(master->smmu->evtq.iopf, dev); > > + > > + /* Put the STE back to what arm_smmu_init_strtab() sets */ > > Hmmmm, it seems like checking iommu->require_direct may put STEs in > bypass in scenarios where arm_smmu_init_strtab() wouldn't have. > arm_smmu_init_strtab is calling iort_get_rmr_sids to pick streams to > put into bypass, but IIUC iommu->require_direct also applies to > dts-based reserved-memory regions, not just iort. Indeed, that actually looks like a little bug as the DT should technicaly be the same behavior as the iort.. I'm going to ignore it :) > I'm not very familiar with the history behind disable_bypass; why is > putting an entire stream into bypass the correct behavior if a > reserved-memory (which may be for a small finite region) exists? This specific reserved memory region is requesting a 1:1 translation for a chunk of IOVA. This translation is being used by some agent outside Linux's knowledge and the desire is for the translation to always be in effect. So, if we put the STE to ABORT then the translation will stop working with unknown side effects. This is also why we install the translation in the DMA domain and block use of VIFO if these are set - to ensure the 1:1 translation is always there. Thanks, Jason
On Mon, Nov 13, 2023 at 01:53:18PM -0400, Jason Gunthorpe wrote: > This was needed because the STE code required the STE to be in > ABORT/BYPASS inorder to program a cdtable or S2 STE. Now that the STE code > can automatically handle all transitions we can remove this step > from the attach_dev flow. > > A few small bugs exist because of this: > > 1) If the core code does BLOCKED -> UNMANAGED with disable_bypass=false > then there will be a moment where the STE points at BYPASS. Since > this can be done by VFIO/IOMMUFD it is a small security race. > > 2) If the core code does IDENTITY -> DMA then any IOMMU_RESV_DIRECT > regions will temporarily become BLOCKED. We'd like drivers to > work in a way that allows IOMMU_RESV_DIRECT to be continuously > functional during these transitions. > > Make arm_smmu_release_device() put the STE back to the correct > ABORT/BYPASS setting. Fix a bug where a IOMMU_RESV_DIRECT was ignored on > this path. > > Notice this subtly depends on the prior arm_smmu_asid_lock change as the > STE must be put to non-paging before removing the device for the linked > list to avoid races with arm_smmu_share_asid(). > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 4b157c2ddf9a80..f70862806211de 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2482,7 +2482,6 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master) static void arm_smmu_detach_dev(struct arm_smmu_master *master) { unsigned long flags; - struct arm_smmu_ste target; struct arm_smmu_domain *smmu_domain = master->domain; if (!smmu_domain) @@ -2496,11 +2495,6 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master) master->domain = NULL; master->ats_enabled = false; - if (disable_bypass) - arm_smmu_make_abort_ste(&target); - else - arm_smmu_make_bypass_ste(&target); - arm_smmu_install_ste_for_dev(master, &target); /* * Clearing the CD entry isn't strictly required to detach the domain * since the table is uninstalled anyway, but it helps avoid confusion @@ -2852,9 +2846,18 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev) static void arm_smmu_release_device(struct device *dev) { struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_ste target; if (WARN_ON(arm_smmu_master_sva_enabled(master))) iopf_queue_remove_device(master->smmu->evtq.iopf, dev); + + /* Put the STE back to what arm_smmu_init_strtab() sets */ + if (disable_bypass && !dev->iommu->require_direct) + arm_smmu_make_abort_ste(&target); + else + arm_smmu_make_bypass_ste(&target); + arm_smmu_install_ste_for_dev(master, &target); + arm_smmu_detach_dev(master); arm_smmu_disable_pasid(master); arm_smmu_remove_master(master);
This was needed because the STE code required the STE to be in ABORT/BYPASS inorder to program a cdtable or S2 STE. Now that the STE code can automatically handle all transitions we can remove this step from the attach_dev flow. A few small bugs exist because of this: 1) If the core code does BLOCKED -> UNMANAGED with disable_bypass=false then there will be a moment where the STE points at BYPASS. Since this can be done by VFIO/IOMMUFD it is a small security race. 2) If the core code does IDENTITY -> DMA then any IOMMU_RESV_DIRECT regions will temporarily become BLOCKED. We'd like drivers to work in a way that allows IOMMU_RESV_DIRECT to be continuously functional during these transitions. Make arm_smmu_release_device() put the STE back to the correct ABORT/BYPASS setting. Fix a bug where a IOMMU_RESV_DIRECT was ignored on this path. Notice this subtly depends on the prior arm_smmu_asid_lock change as the STE must be put to non-paging before removing the device for the linked list to avoid races with arm_smmu_share_asid(). Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)