diff mbox series

[v2,11/19] iommu/arm-smmu-v3: Do not change the STE twice during arm_smmu_attach_dev()

Message ID 11-v2-de8b10590bf5+400-smmuv3_newapi_p1_jgg@nvidia.com (mailing list archive)
State New, archived
Headers show
Series Update SMMUv3 to the modern iommu API (part 1/3) | expand

Commit Message

Jason Gunthorpe Nov. 13, 2023, 5:53 p.m. UTC
This was needed because the STE code required the STE to be in
ABORT/BYPASS inorder to program a cdtable or S2 STE. Now that the STE code
can automatically handle all transitions we can remove this step
from the attach_dev flow.

A few small bugs exist because of this:

1) If the core code does BLOCKED -> UNMANAGED with disable_bypass=false
   then there will be a moment where the STE points at BYPASS. Since
   this can be done by VFIO/IOMMUFD it is a small security race.

2) If the core code does IDENTITY -> DMA then any IOMMU_RESV_DIRECT
   regions will temporarily become BLOCKED. We'd like drivers to
   work in a way that allows IOMMU_RESV_DIRECT to be continuously
   functional during these transitions.

Make arm_smmu_release_device() put the STE back to the correct
ABORT/BYPASS setting. Fix a bug where a IOMMU_RESV_DIRECT was ignored on
this path.

Notice this subtly depends on the prior arm_smmu_asid_lock change as the
STE must be put to non-paging before removing the device for the linked
list to avoid races with arm_smmu_share_asid().

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

Comments

Michael Shavit Nov. 15, 2023, 3:15 p.m. UTC | #1
On Tue, Nov 14, 2023 at 1:53 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> This was needed because the STE code required the STE to be in
> ABORT/BYPASS inorder to program a cdtable or S2 STE. Now that the STE code
> can automatically handle all transitions we can remove this step
> from the attach_dev flow.
>
> A few small bugs exist because of this:
>
> 1) If the core code does BLOCKED -> UNMANAGED with disable_bypass=false
>    then there will be a moment where the STE points at BYPASS. Since
>    this can be done by VFIO/IOMMUFD it is a small security race.
>
> 2) If the core code does IDENTITY -> DMA then any IOMMU_RESV_DIRECT
>    regions will temporarily become BLOCKED. We'd like drivers to
>    work in a way that allows IOMMU_RESV_DIRECT to be continuously
>    functional during these transitions.
>
> Make arm_smmu_release_device() put the STE back to the correct
> ABORT/BYPASS setting. Fix a bug where a IOMMU_RESV_DIRECT was ignored on
> this path.
>
> Notice this subtly depends on the prior arm_smmu_asid_lock change as the
> STE must be put to non-paging before removing the device for the linked
> list to avoid races with arm_smmu_share_asid().

I'm a little confused by this comment. Is this suggesting that
arm_smmu_detach_dev had a race condition before the arm_smmu_asid_lock
changes, since it deletes the list entry before deactivating the STE
that uses the domain and without grabbing the asid_lock, thus allowing
a gap where the ASID might be re-acquired by an SVA domain while an
STE with that ASID is still live on this device? Wouldn't that belong
on the asid_lock patch instead if so?

>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 4b157c2ddf9a80..f70862806211de 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2482,7 +2482,6 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
>  static void arm_smmu_detach_dev(struct arm_smmu_master *master)
>  {
>         unsigned long flags;
> -       struct arm_smmu_ste target;
>         struct arm_smmu_domain *smmu_domain = master->domain;
>
>         if (!smmu_domain)
> @@ -2496,11 +2495,6 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
>
>         master->domain = NULL;
>         master->ats_enabled = false;
> -       if (disable_bypass)
> -               arm_smmu_make_abort_ste(&target);
> -       else
> -               arm_smmu_make_bypass_ste(&target);
> -       arm_smmu_install_ste_for_dev(master, &target);
>         /*
>          * Clearing the CD entry isn't strictly required to detach the domain
>          * since the table is uninstalled anyway, but it helps avoid confusion
> @@ -2852,9 +2846,18 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>  static void arm_smmu_release_device(struct device *dev)
>  {
>         struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> +       struct arm_smmu_ste target;
>
>         if (WARN_ON(arm_smmu_master_sva_enabled(master)))
>                 iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
> +
> +       /* Put the STE back to what arm_smmu_init_strtab() sets */

Hmmmm, it seems like checking iommu->require_direct may put STEs in
bypass in scenarios where arm_smmu_init_strtab() wouldn't have.
arm_smmu_init_strtab is calling iort_get_rmr_sids to pick streams to
put into bypass, but IIUC iommu->require_direct also applies to
dts-based reserved-memory regions, not just iort.

I'm not very familiar with the history behind disable_bypass; why is
putting an entire stream into bypass the correct behavior if a
reserved-memory (which may be for a small finite region) exists?

> +       if (disable_bypass && !dev->iommu->require_direct)
> +               arm_smmu_make_abort_ste(&target);
> +       else
> +               arm_smmu_make_bypass_ste(&target);
> +       arm_smmu_install_ste_for_dev(master, &target);
> +
>         arm_smmu_detach_dev(master);
>         arm_smmu_disable_pasid(master);
>         arm_smmu_remove_master(master);
> --
> 2.42.0
>
Jason Gunthorpe Nov. 16, 2023, 4:28 p.m. UTC | #2
On Wed, Nov 15, 2023 at 11:15:23PM +0800, Michael Shavit wrote:
> On Tue, Nov 14, 2023 at 1:53 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > This was needed because the STE code required the STE to be in
> > ABORT/BYPASS inorder to program a cdtable or S2 STE. Now that the STE code
> > can automatically handle all transitions we can remove this step
> > from the attach_dev flow.
> >
> > A few small bugs exist because of this:
> >
> > 1) If the core code does BLOCKED -> UNMANAGED with disable_bypass=false
> >    then there will be a moment where the STE points at BYPASS. Since
> >    this can be done by VFIO/IOMMUFD it is a small security race.
> >
> > 2) If the core code does IDENTITY -> DMA then any IOMMU_RESV_DIRECT
> >    regions will temporarily become BLOCKED. We'd like drivers to
> >    work in a way that allows IOMMU_RESV_DIRECT to be continuously
> >    functional during these transitions.
> >
> > Make arm_smmu_release_device() put the STE back to the correct
> > ABORT/BYPASS setting. Fix a bug where a IOMMU_RESV_DIRECT was ignored on
> > this path.
> >
> > Notice this subtly depends on the prior arm_smmu_asid_lock change as the
> > STE must be put to non-paging before removing the device for the linked
> > list to avoid races with arm_smmu_share_asid().
> 
> I'm a little confused by this comment. Is this suggesting that
> arm_smmu_detach_dev had a race condition before the arm_smmu_asid_lock
> changes, since it deletes the list entry before deactivating the STE
> that uses the domain and without grabbing the asid_lock, thus allowing
> a gap where the ASID might be re-acquired by an SVA domain while an
> STE with that ASID is still live on this device? Wouldn't that belong
> on the asid_lock patch instead if so?

I wasn't intending to say there is an existing bug, this was more to
point out why it was organized like this, and why it is OK to remove
the detach manipulation of the STE considering races with share_asid.

However, I agree that the code in rc1 is troubled and fixed in the
prior patch:

	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
	list_del(&master->domain_head);
	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);

^^^^ Prevents arm_smmu_update_ctx_desc_devices() from storing to the STE
     However the STE is still pointing at the ASID

	master->domain = NULL;
	master->ats_enabled = false;
	arm_smmu_install_ste_for_dev(master);

^^^^ Now the STE is gone, so the CD becomes unreferenced

	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 && master->cd_table.cdtab)
		arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL);

^^^^ Now the CD is non-valid

I was primarily concerned with corrupting the CD, ie that share_asid
would race and un-clear the write_ctx_desc(). That is prevented by the
ordering above.

However, I agree the above is still problematic because there is a
short time window where the ASID can be installed in two CDs with two
different translations. I suppose there is a security issue where this
could corrupt the IOTLB.

This is all fixed in this series too by having more robust locking. So
this does deserve a note in the commit message for the earlier patch
about this issue.

> > @@ -2852,9 +2846,18 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
> >  static void arm_smmu_release_device(struct device *dev)
> >  {
> >         struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> > +       struct arm_smmu_ste target;
> >
> >         if (WARN_ON(arm_smmu_master_sva_enabled(master)))
> >                 iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
> > +
> > +       /* Put the STE back to what arm_smmu_init_strtab() sets */
> 
> Hmmmm, it seems like checking iommu->require_direct may put STEs in
> bypass in scenarios where arm_smmu_init_strtab() wouldn't have.
> arm_smmu_init_strtab is calling iort_get_rmr_sids to pick streams to
> put into bypass, but IIUC iommu->require_direct also applies to
> dts-based reserved-memory regions, not just iort.

Indeed, that actually looks like a little bug as the DT should
technicaly be the same behavior as the iort.. I'm going to ignore it
:)

> I'm not very familiar with the history behind disable_bypass; why is
> putting an entire stream into bypass the correct behavior if a
> reserved-memory (which may be for a small finite region) exists?

This specific reserved memory region is requesting a 1:1 translation
for a chunk of IOVA. This translation is being used by some agent
outside Linux's knowledge and the desire is for the translation to
always be in effect.

So, if we put the STE to ABORT then the translation will stop working
with unknown side effects.

This is also why we install the translation in the DMA domain and
block use of VIFO if these are set - to ensure the 1:1 translation is
always there.

Thanks,
Jason
Nicolin Chen Dec. 5, 2023, 2:46 a.m. UTC | #3
On Mon, Nov 13, 2023 at 01:53:18PM -0400, Jason Gunthorpe wrote:
> This was needed because the STE code required the STE to be in
> ABORT/BYPASS inorder to program a cdtable or S2 STE. Now that the STE code
> can automatically handle all transitions we can remove this step
> from the attach_dev flow.
> 
> A few small bugs exist because of this:
> 
> 1) If the core code does BLOCKED -> UNMANAGED with disable_bypass=false
>    then there will be a moment where the STE points at BYPASS. Since
>    this can be done by VFIO/IOMMUFD it is a small security race.
> 
> 2) If the core code does IDENTITY -> DMA then any IOMMU_RESV_DIRECT
>    regions will temporarily become BLOCKED. We'd like drivers to
>    work in a way that allows IOMMU_RESV_DIRECT to be continuously
>    functional during these transitions.
> 
> Make arm_smmu_release_device() put the STE back to the correct
> ABORT/BYPASS setting. Fix a bug where a IOMMU_RESV_DIRECT was ignored on
> this path.
> 
> Notice this subtly depends on the prior arm_smmu_asid_lock change as the
> STE must be put to non-paging before removing the device for the linked
> list to avoid races with arm_smmu_share_asid().
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
diff mbox series

Patch

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 4b157c2ddf9a80..f70862806211de 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2482,7 +2482,6 @@  static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
 static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 {
 	unsigned long flags;
-	struct arm_smmu_ste target;
 	struct arm_smmu_domain *smmu_domain = master->domain;
 
 	if (!smmu_domain)
@@ -2496,11 +2495,6 @@  static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 
 	master->domain = NULL;
 	master->ats_enabled = false;
-	if (disable_bypass)
-		arm_smmu_make_abort_ste(&target);
-	else
-		arm_smmu_make_bypass_ste(&target);
-	arm_smmu_install_ste_for_dev(master, &target);
 	/*
 	 * Clearing the CD entry isn't strictly required to detach the domain
 	 * since the table is uninstalled anyway, but it helps avoid confusion
@@ -2852,9 +2846,18 @@  static struct iommu_device *arm_smmu_probe_device(struct device *dev)
 static void arm_smmu_release_device(struct device *dev)
 {
 	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+	struct arm_smmu_ste target;
 
 	if (WARN_ON(arm_smmu_master_sva_enabled(master)))
 		iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
+
+	/* Put the STE back to what arm_smmu_init_strtab() sets */
+	if (disable_bypass && !dev->iommu->require_direct)
+		arm_smmu_make_abort_ste(&target);
+	else
+		arm_smmu_make_bypass_ste(&target);
+	arm_smmu_install_ste_for_dev(master, &target);
+
 	arm_smmu_detach_dev(master);
 	arm_smmu_disable_pasid(master);
 	arm_smmu_remove_master(master);