diff mbox series

iommu: Allow ATS to work on VFs when the PF uses IDENTITY

Message ID 0-v1-0fb4d2ab6770+7e706-ats_vf_jgg@nvidia.com (mailing list archive)
State Handled Elsewhere
Headers show
Series iommu: Allow ATS to work on VFs when the PF uses IDENTITY | expand

Commit Message

Jason Gunthorpe Aug. 7, 2024, 6:19 p.m. UTC
PCI ATS has a global Smallest Translation Unit field that is located in
the PF but shared by all of the VFs.

The expectation is that the STU will be set to the root port's global STU
capability which is driven by the IO page table configuration of the iommu
HW. Today it becomes set when the iommu driver first enables ATS.

Thus, to enable ATS on the VF, the PF must have already had the correct
STU programmed, even if ATS is off on the PF.

Unfortunately the PF only programs the STU when the PF enables ATS. The
iommu drivers tend to leave ATS disabled when IDENTITY translation is
being used.

Thus we can get into a state where the PF is setup to use IDENTITY with
the DMA API while the VF would like to use VFIO with a PAGING domain and
have ATS turned on. This fails because the PF never loaded a PAGING domain
and so it never setup the STU, and the VF can't do it.

The simplest solution is to have the iommu driver set the ATS STU when it
probes the device. This way the ATS STU is loaded immediately at boot time
to all PFs and there is no issue when a VF comes to use it.

Add a new call pci_prepare_ats() which should be called by iommu drivers
in their probe_device() op for every PCI device if the iommu driver
supports ATS. This will setup the STU based on whatever page size
capability the iommu HW has.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/amd/iommu.c                   |  3 ++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  6 ++++
 drivers/iommu/intel/iommu.c                 |  1 +
 drivers/pci/ats.c                           | 33 +++++++++++++++++++++
 include/linux/pci-ats.h                     |  1 +
 5 files changed, 44 insertions(+)


base-commit: e7153d9c8cee2f17fdcd011509860717bfa91423

Comments

Tian, Kevin Aug. 9, 2024, 2:36 a.m. UTC | #1
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, August 8, 2024 2:19 AM
> 
> PCI ATS has a global Smallest Translation Unit field that is located in
> the PF but shared by all of the VFs.
> 
> The expectation is that the STU will be set to the root port's global STU
> capability which is driven by the IO page table configuration of the iommu
> HW. Today it becomes set when the iommu driver first enables ATS.
> 
> Thus, to enable ATS on the VF, the PF must have already had the correct
> STU programmed, even if ATS is off on the PF.
> 
> Unfortunately the PF only programs the STU when the PF enables ATS. The
> iommu drivers tend to leave ATS disabled when IDENTITY translation is
> being used.

Is there more context on this?

Looking at intel-iommu driver ATS is disabled for IDENETITY when
the iommu is in legacy mode:

dmar_domain_attach_device()
{
	...
	if (sm_supported(info->iommu) || !domain_type_is_si(info->domain))
		iommu_enable_pci_caps(info);
	...
}

But this follows what VT-d spec says (section 9.3):

TT: Translate Type
10b: Untranslated requests are processed as pass-through. The SSPTPTR
field is ignored by hardware. Translated and Translation Requests are
blocked.

 
> +/**
> + * pci_prepare_ats - Setup the PS for ATS
> + * @dev: the PCI device
> + * @ps: the IOMMU page shift
> + *
> + * This must be done by the IOMMU driver on the PF before any VFs are
> created to
> + * ensure that the VF can have ATS enabled.
> + *
> + * Returns 0 on success, or negative on failure.
> + */
> +int pci_prepare_ats(struct pci_dev *dev, int ps)
> +{
> +	u16 ctrl;
> +
> +	if (!pci_ats_supported(dev))
> +		return -EINVAL;
> +
> +	if (WARN_ON(dev->ats_enabled))
> +		return -EBUSY;
> +
> +	if (ps < PCI_ATS_MIN_STU)
> +		return -EINVAL;
> +
> +	if (dev->is_virtfn)
> +		return 0;

missed a check that 'ps' matches pf's ats_stu.

> +
> +	dev->ats_stu = ps;
> +	ctrl = PCI_ATS_CTRL_STU(dev->ats_stu - PCI_ATS_MIN_STU);
> +	pci_write_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, ctrl);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(pci_prepare_ats);
> +

Then there is no need to keep the 'ps' parameter in pci_enable_ats().
Jason Gunthorpe Aug. 9, 2024, 1:28 p.m. UTC | #2
On Fri, Aug 09, 2024 at 02:36:14AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Thursday, August 8, 2024 2:19 AM
> > 
> > PCI ATS has a global Smallest Translation Unit field that is located in
> > the PF but shared by all of the VFs.
> > 
> > The expectation is that the STU will be set to the root port's global STU
> > capability which is driven by the IO page table configuration of the iommu
> > HW. Today it becomes set when the iommu driver first enables ATS.
> > 
> > Thus, to enable ATS on the VF, the PF must have already had the correct
> > STU programmed, even if ATS is off on the PF.
> > 
> > Unfortunately the PF only programs the STU when the PF enables ATS. The
> > iommu drivers tend to leave ATS disabled when IDENTITY translation is
> > being used.
> 
> Is there more context on this?

How do you mean?

> Looking at intel-iommu driver ATS is disabled for IDENETITY when
> the iommu is in legacy mode:
> 
> dmar_domain_attach_device()
> {
> 	...
> 	if (sm_supported(info->iommu) || !domain_type_is_si(info->domain))
> 		iommu_enable_pci_caps(info);
> 	...
> }
> 
> But this follows what VT-d spec says (section 9.3):
> 
> TT: Translate Type
> 10b: Untranslated requests are processed as pass-through. The SSPTPTR
> field is ignored by hardware. Translated and Translation Requests are
> blocked.

Yes, HW like this is exactly the problem, it ends up not enabling ATS
on the PF and then we don't have the STU programmed so the VF is
effectively disabled too.

Ideally iommus would continue to work with translated requests when
ATS is enabled. Not supporting this configuration creates a nasty
problem for devices that are using PASID. 

The PASID may require ATS to be enabled (ie SVA), but the RID may be
IDENTITY for performance. The poor device has no idea it is not
allowed to use ATS on the RID side :(

> > +/**
> > + * pci_prepare_ats - Setup the PS for ATS
> > + * @dev: the PCI device
> > + * @ps: the IOMMU page shift
> > + *
> > + * This must be done by the IOMMU driver on the PF before any VFs are
> > created to
> > + * ensure that the VF can have ATS enabled.
> > + *
> > + * Returns 0 on success, or negative on failure.
> > + */
> > +int pci_prepare_ats(struct pci_dev *dev, int ps)
> > +{
> > +	u16 ctrl;
> > +
> > +	if (!pci_ats_supported(dev))
> > +		return -EINVAL;
> > +
> > +	if (WARN_ON(dev->ats_enabled))
> > +		return -EBUSY;
> > +
> > +	if (ps < PCI_ATS_MIN_STU)
> > +		return -EINVAL;
> > +
> > +	if (dev->is_virtfn)
> > +		return 0;
> 
> missed a check that 'ps' matches pf's ats_stu.

Deliberate, that check is done when enabling ats:

> > +
> > +	dev->ats_stu = ps;
> > +	ctrl = PCI_ATS_CTRL_STU(dev->ats_stu - PCI_ATS_MIN_STU);
> > +	pci_write_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, ctrl);
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(pci_prepare_ats);
> > +
> 
> Then there is no need to keep the 'ps' parameter in pci_enable_ats().

Which is why I left it here.

Jason
Bjorn Helgaas Aug. 9, 2024, 7:10 p.m. UTC | #3
On Wed, Aug 07, 2024 at 03:19:20PM -0300, Jason Gunthorpe wrote:
> PCI ATS has a global Smallest Translation Unit field that is located in
> the PF but shared by all of the VFs.
> 
> The expectation is that the STU will be set to the root port's global STU
> capability which is driven by the IO page table configuration of the iommu
> HW. Today it becomes set when the iommu driver first enables ATS.
> 
> Thus, to enable ATS on the VF, the PF must have already had the correct
> STU programmed, even if ATS is off on the PF.
> 
> Unfortunately the PF only programs the STU when the PF enables ATS. The
> iommu drivers tend to leave ATS disabled when IDENTITY translation is
> being used.
> 
> Thus we can get into a state where the PF is setup to use IDENTITY with
> the DMA API while the VF would like to use VFIO with a PAGING domain and
> have ATS turned on. This fails because the PF never loaded a PAGING domain
> and so it never setup the STU, and the VF can't do it.
> 
> The simplest solution is to have the iommu driver set the ATS STU when it
> probes the device. This way the ATS STU is loaded immediately at boot time
> to all PFs and there is no issue when a VF comes to use it.
> 
> Add a new call pci_prepare_ats() which should be called by iommu drivers
> in their probe_device() op for every PCI device if the iommu driver
> supports ATS. This will setup the STU based on whatever page size
> capability the iommu HW has.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/amd/iommu.c                   |  3 ++
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  6 ++++
>  drivers/iommu/intel/iommu.c                 |  1 +
>  drivers/pci/ats.c                           | 33 +++++++++++++++++++++
>  include/linux/pci-ats.h                     |  1 +
>  5 files changed, 44 insertions(+)

For the pci/ats.c change:

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

Happy to merge via PCI if the IOMMU folks ack it, but the behavior
change is more important on the IOMMU side, so maybe it makes more
sense to merge there.

> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index b19e8c0f48fa25..98054497d343bc 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2203,6 +2203,9 @@ static struct iommu_device *amd_iommu_probe_device(struct device *dev)
>  
>  	iommu_completion_wait(iommu);
>  
> +	if (dev_is_pci(dev))
> +		pci_prepare_ats(to_pci_dev(dev), PAGE_SHIFT);
> +
>  	return iommu_dev;
>  }
>  
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index a31460f9f3d421..9bc50bded5af72 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3295,6 +3295,12 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>  	    smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
>  		master->stall_enabled = true;
>  
> +	if (dev_is_pci(dev)) {
> +		unsigned int stu = __ffs(smmu->pgsize_bitmap);
> +
> +		pci_prepare_ats(to_pci_dev(dev), stu);
> +	}
> +
>  	return &smmu->iommu;
>  
>  err_free_master:
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 9ff8b83c19a3e2..ad81db026ab236 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4091,6 +4091,7 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
>  
>  	dev_iommu_priv_set(dev, info);
>  	if (pdev && pci_ats_supported(pdev)) {
> +		pci_prepare_ats(pdev, VTD_PAGE_SHIFT);
>  		ret = device_rbtree_insert(iommu, info);
>  		if (ret)
>  			goto free;
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index c570892b209095..87fa03540b8a21 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -47,6 +47,39 @@ bool pci_ats_supported(struct pci_dev *dev)
>  }
>  EXPORT_SYMBOL_GPL(pci_ats_supported);
>  
> +/**
> + * pci_prepare_ats - Setup the PS for ATS
> + * @dev: the PCI device
> + * @ps: the IOMMU page shift
> + *
> + * This must be done by the IOMMU driver on the PF before any VFs are created to
> + * ensure that the VF can have ATS enabled.
> + *
> + * Returns 0 on success, or negative on failure.
> + */
> +int pci_prepare_ats(struct pci_dev *dev, int ps)
> +{
> +	u16 ctrl;
> +
> +	if (!pci_ats_supported(dev))
> +		return -EINVAL;
> +
> +	if (WARN_ON(dev->ats_enabled))
> +		return -EBUSY;
> +
> +	if (ps < PCI_ATS_MIN_STU)
> +		return -EINVAL;
> +
> +	if (dev->is_virtfn)
> +		return 0;
> +
> +	dev->ats_stu = ps;
> +	ctrl = PCI_ATS_CTRL_STU(dev->ats_stu - PCI_ATS_MIN_STU);
> +	pci_write_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, ctrl);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(pci_prepare_ats);
> +
>  /**
>   * pci_enable_ats - enable the ATS capability
>   * @dev: the PCI device
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index df54cd5b15db09..d98929c86991be 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -8,6 +8,7 @@
>  /* Address Translation Service */
>  bool pci_ats_supported(struct pci_dev *dev);
>  int pci_enable_ats(struct pci_dev *dev, int ps);
> +int pci_prepare_ats(struct pci_dev *dev, int ps);
>  void pci_disable_ats(struct pci_dev *dev);
>  int pci_ats_queue_depth(struct pci_dev *dev);
>  int pci_ats_page_aligned(struct pci_dev *dev);
> 
> base-commit: e7153d9c8cee2f17fdcd011509860717bfa91423
> -- 
> 2.46.0
>
Tian, Kevin Aug. 12, 2024, 12:09 a.m. UTC | #4
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, August 9, 2024 9:29 PM
> 
> On Fri, Aug 09, 2024 at 02:36:14AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Thursday, August 8, 2024 2:19 AM
> > >
> > > PCI ATS has a global Smallest Translation Unit field that is located in
> > > the PF but shared by all of the VFs.
> > >
> > > The expectation is that the STU will be set to the root port's global STU
> > > capability which is driven by the IO page table configuration of the iommu
> > > HW. Today it becomes set when the iommu driver first enables ATS.
> > >
> > > Thus, to enable ATS on the VF, the PF must have already had the correct
> > > STU programmed, even if ATS is off on the PF.
> > >
> > > Unfortunately the PF only programs the STU when the PF enables ATS.
> The
> > > iommu drivers tend to leave ATS disabled when IDENTITY translation is
> > > being used.
> >
> > Is there more context on this?
> 
> How do you mean?
> 
> > Looking at intel-iommu driver ATS is disabled for IDENETITY when
> > the iommu is in legacy mode:
> >
> > dmar_domain_attach_device()
> > {
> > 	...
> > 	if (sm_supported(info->iommu) || !domain_type_is_si(info-
> >domain))
> > 		iommu_enable_pci_caps(info);
> > 	...
> > }
> >
> > But this follows what VT-d spec says (section 9.3):
> >
> > TT: Translate Type
> > 10b: Untranslated requests are processed as pass-through. The SSPTPTR
> > field is ignored by hardware. Translated and Translation Requests are
> > blocked.
> 
> Yes, HW like this is exactly the problem, it ends up not enabling ATS
> on the PF and then we don't have the STU programmed so the VF is
> effectively disabled too.
> 
> Ideally iommus would continue to work with translated requests when
> ATS is enabled. Not supporting this configuration creates a nasty
> problem for devices that are using PASID.
> 
> The PASID may require ATS to be enabled (ie SVA), but the RID may be
> IDENTITY for performance. The poor device has no idea it is not
> allowed to use ATS on the RID side :(

Okay, I see the problem now. 
Baolu Lu Aug. 12, 2024, 2:20 a.m. UTC | #5
On 8/8/24 2:19 AM, Jason Gunthorpe wrote:
> PCI ATS has a global Smallest Translation Unit field that is located in
> the PF but shared by all of the VFs.
> 
> The expectation is that the STU will be set to the root port's global STU
> capability which is driven by the IO page table configuration of the iommu
> HW. Today it becomes set when the iommu driver first enables ATS.
> 
> Thus, to enable ATS on the VF, the PF must have already had the correct
> STU programmed, even if ATS is off on the PF.
> 
> Unfortunately the PF only programs the STU when the PF enables ATS. The
> iommu drivers tend to leave ATS disabled when IDENTITY translation is
> being used.
> 
> Thus we can get into a state where the PF is setup to use IDENTITY with
> the DMA API while the VF would like to use VFIO with a PAGING domain and
> have ATS turned on. This fails because the PF never loaded a PAGING domain
> and so it never setup the STU, and the VF can't do it.
> 
> The simplest solution is to have the iommu driver set the ATS STU when it
> probes the device. This way the ATS STU is loaded immediately at boot time
> to all PFs and there is no issue when a VF comes to use it.
> 
> Add a new call pci_prepare_ats() which should be called by iommu drivers
> in their probe_device() op for every PCI device if the iommu driver
> supports ATS. This will setup the STU based on whatever page size
> capability the iommu HW has.
> 
> Signed-off-by: Jason Gunthorpe<jgg@nvidia.com>
> ---
>   drivers/iommu/amd/iommu.c                   |  3 ++
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  6 ++++
>   drivers/iommu/intel/iommu.c                 |  1 +
>   drivers/pci/ats.c                           | 33 +++++++++++++++++++++
>   include/linux/pci-ats.h                     |  1 +
>   5 files changed, 44 insertions(+)

For Intel VT-d,

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

Thanks,
baolu
Yi Liu Aug. 12, 2024, 9:03 a.m. UTC | #6
On 2024/8/9 21:28, Jason Gunthorpe wrote:
> On Fri, Aug 09, 2024 at 02:36:14AM +0000, Tian, Kevin wrote:
>>> From: Jason Gunthorpe <jgg@nvidia.com>
>>> Sent: Thursday, August 8, 2024 2:19 AM
>>>
>>> PCI ATS has a global Smallest Translation Unit field that is located in
>>> the PF but shared by all of the VFs.
>>>
>>> The expectation is that the STU will be set to the root port's global STU
>>> capability which is driven by the IO page table configuration of the iommu
>>> HW. Today it becomes set when the iommu driver first enables ATS.
>>>
>>> Thus, to enable ATS on the VF, the PF must have already had the correct
>>> STU programmed, even if ATS is off on the PF.
>>>
>>> Unfortunately the PF only programs the STU when the PF enables ATS. The
>>> iommu drivers tend to leave ATS disabled when IDENTITY translation is
>>> being used.

I think this is the common practice as it is not necessary to enable ATS
since iommu is passthrough mode. :)

>>
>> Is there more context on this?
> 
> How do you mean?
> 
>> Looking at intel-iommu driver ATS is disabled for IDENETITY when
>> the iommu is in legacy mode:
>>
>> dmar_domain_attach_device()
>> {
>> 	...
>> 	if (sm_supported(info->iommu) || !domain_type_is_si(info->domain))
>> 		iommu_enable_pci_caps(info);
>> 	...
>> }
>>
>> But this follows what VT-d spec says (section 9.3):
>>
>> TT: Translate Type
>> 10b: Untranslated requests are processed as pass-through. The SSPTPTR
>> field is ignored by hardware. Translated and Translation Requests are
>> blocked.
> 
> Yes, HW like this is exactly the problem, it ends up not enabling ATS
> on the PF and then we don't have the STU programmed so the VF is
> effectively disabled too.
> 
> Ideally iommus would continue to work with translated requests when
> ATS is enabled. 

As Kevin's pasting, the Translated requests will be blocked. So it does
not work. :(

> Not supporting this configuration creates a nasty
> problem for devices that are using PASID.
>
> The PASID may require ATS to be enabled (ie SVA), but the RID may be
> IDENTITY for performance. The poor device has no idea it is not
> allowed to use ATS on the RID side :(

If this is the only problematic case, the intel iommu driver in this
patch could check the scalable mode before enabling ATS in the
probe_device() op. In this way, the legacy mode iommu would keep the old
ATS enable policy.

Seems an alternative is to use paging domain for IDENTITY. This means
the TT would not be 10b, hence it would work with ATS enabled.
Jason Gunthorpe Aug. 12, 2024, 11:35 a.m. UTC | #7
On Mon, Aug 12, 2024 at 05:03:37PM +0800, Yi Liu wrote:

> > The PASID may require ATS to be enabled (ie SVA), but the RID may be
> > IDENTITY for performance. The poor device has no idea it is not
> > allowed to use ATS on the RID side :(
> 
> If this is the only problematic case, the intel iommu driver in this
> patch could check the scalable mode before enabling ATS in the
> probe_device() op. In this way, the legacy mode iommu would keep the old
> ATS enable policy.

At some point we will need to address this.. At least when we add
PASID support to mlx5 it will need something since mlx5 is using ATS
on the RID right now.

Supporting ATS with the IDENTITY optimization is a good idea in the
IOMMU HW. The HW just has to answer the ATS with a 1:1 TA.

Next, telling the driver that ATS is enabled but doesn't work on the
RID would allow the driver to deal with it

Finally, doing things like using IDENTITY through a paging domain
would be a last resort for troubled devices.

Jason
Yi Liu Aug. 13, 2024, 3 a.m. UTC | #8
On 2024/8/12 19:35, Jason Gunthorpe wrote:
> On Mon, Aug 12, 2024 at 05:03:37PM +0800, Yi Liu wrote:
> 
>>> The PASID may require ATS to be enabled (ie SVA), but the RID may be
>>> IDENTITY for performance. The poor device has no idea it is not
>>> allowed to use ATS on the RID side :(
>>
>> If this is the only problematic case, the intel iommu driver in this
>> patch could check the scalable mode before enabling ATS in the
>> probe_device() op. In this way, the legacy mode iommu would keep the old
>> ATS enable policy.
> 
> At some point we will need to address this.. At least when we add
> PASID support to mlx5 it will need something since mlx5 is using ATS
> on the RID right now.
> 
> Supporting ATS with the IDENTITY optimization is a good idea in the
> IOMMU HW. The HW just has to answer the ATS with a 1:1 TA.

looks like Intel scalable mode would allow ATS work even device uses
IDENTITY. Below is what spec says when PGTT is PT.

Chapter 9.6:
Untranslated/Translation Requests (with or without PASID) referencing this
scalable-mode PASID Table Entry bypass address translation and are
processed as pass-through.

> Next, telling the driver that ATS is enabled but doesn't work on the
> RID would allow the driver to deal with it
> 
> Finally, doing things like using IDENTITY through a paging domain
> would be a last resort for troubled devices.

yep. It's another issue then.
Yi Liu Aug. 13, 2024, 3:11 a.m. UTC | #9
On 2024/8/8 02:19, Jason Gunthorpe wrote:
> PCI ATS has a global Smallest Translation Unit field that is located in
> the PF but shared by all of the VFs.
> 
> The expectation is that the STU will be set to the root port's global STU
> capability which is driven by the IO page table configuration of the iommu
> HW. Today it becomes set when the iommu driver first enables ATS.
> 
> Thus, to enable ATS on the VF, the PF must have already had the correct
> STU programmed, even if ATS is off on the PF.
> 
> Unfortunately the PF only programs the STU when the PF enables ATS. The
> iommu drivers tend to leave ATS disabled when IDENTITY translation is
> being used.
> 
> Thus we can get into a state where the PF is setup to use IDENTITY with
> the DMA API while the VF would like to use VFIO with a PAGING domain and
> have ATS turned on. This fails because the PF never loaded a PAGING domain
> and so it never setup the STU, and the VF can't do it.
> 
> The simplest solution is to have the iommu driver set the ATS STU when it
> probes the device. This way the ATS STU is loaded immediately at boot time
> to all PFs and there is no issue when a VF comes to use it.

This only sets STU without setting the ATS_CTRL.E bit. Is it possible that
VF considers the PF's STU field as valid only if PF's ATS_CTRL.E bit is
set?

> 
> Add a new call pci_prepare_ats() which should be called by iommu drivers
> in their probe_device() op for every PCI device if the iommu driver
> supports ATS. This will setup the STU based on whatever page size
> capability the iommu HW has.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>   drivers/iommu/amd/iommu.c                   |  3 ++
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  6 ++++
>   drivers/iommu/intel/iommu.c                 |  1 +
>   drivers/pci/ats.c                           | 33 +++++++++++++++++++++
>   include/linux/pci-ats.h                     |  1 +
>   5 files changed, 44 insertions(+)
> 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index b19e8c0f48fa25..98054497d343bc 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2203,6 +2203,9 @@ static struct iommu_device *amd_iommu_probe_device(struct device *dev)
>   
>   	iommu_completion_wait(iommu);
>   
> +	if (dev_is_pci(dev))
> +		pci_prepare_ats(to_pci_dev(dev), PAGE_SHIFT);
> +
>   	return iommu_dev;
>   }
>   
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index a31460f9f3d421..9bc50bded5af72 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3295,6 +3295,12 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>   	    smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
>   		master->stall_enabled = true;
>   
> +	if (dev_is_pci(dev)) {
> +		unsigned int stu = __ffs(smmu->pgsize_bitmap);
> +
> +		pci_prepare_ats(to_pci_dev(dev), stu);
> +	}
> +
>   	return &smmu->iommu;
>   
>   err_free_master:
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 9ff8b83c19a3e2..ad81db026ab236 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4091,6 +4091,7 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
>   
>   	dev_iommu_priv_set(dev, info);
>   	if (pdev && pci_ats_supported(pdev)) {
> +		pci_prepare_ats(pdev, VTD_PAGE_SHIFT);

perhaps just do it for PFs? :)

>   		ret = device_rbtree_insert(iommu, info);
>   		if (ret)
>   			goto free;
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index c570892b209095..87fa03540b8a21 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -47,6 +47,39 @@ bool pci_ats_supported(struct pci_dev *dev)
>   }
>   EXPORT_SYMBOL_GPL(pci_ats_supported);
>   
> +/**
> + * pci_prepare_ats - Setup the PS for ATS
> + * @dev: the PCI device
> + * @ps: the IOMMU page shift
> + *
> + * This must be done by the IOMMU driver on the PF before any VFs are created to
> + * ensure that the VF can have ATS enabled.
> + *
> + * Returns 0 on success, or negative on failure.
> + */
> +int pci_prepare_ats(struct pci_dev *dev, int ps)
> +{
> +	u16 ctrl;
> +
> +	if (!pci_ats_supported(dev))
> +		return -EINVAL;
> +
> +	if (WARN_ON(dev->ats_enabled))
> +		return -EBUSY;
> +
> +	if (ps < PCI_ATS_MIN_STU)
> +		return -EINVAL;
> +
> +	if (dev->is_virtfn)
> +		return 0;
> +
> +	dev->ats_stu = ps;
> +	ctrl = PCI_ATS_CTRL_STU(dev->ats_stu - PCI_ATS_MIN_STU);
> +	pci_write_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, ctrl);

Is it valuable to have a flag to mark if STU is set or not? Such way can
avoid setting STU multiple times.

> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(pci_prepare_ats);
> +
>   /**
>    * pci_enable_ats - enable the ATS capability
>    * @dev: the PCI device
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index df54cd5b15db09..d98929c86991be 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -8,6 +8,7 @@
>   /* Address Translation Service */
>   bool pci_ats_supported(struct pci_dev *dev);
>   int pci_enable_ats(struct pci_dev *dev, int ps);
> +int pci_prepare_ats(struct pci_dev *dev, int ps);
>   void pci_disable_ats(struct pci_dev *dev);
>   int pci_ats_queue_depth(struct pci_dev *dev);
>   int pci_ats_page_aligned(struct pci_dev *dev);
> 
> base-commit: e7153d9c8cee2f17fdcd011509860717bfa91423
Joerg Roedel Aug. 13, 2024, 8:52 a.m. UTC | #10
On Wed, Aug 07, 2024 at 03:19:20PM -0300, Jason Gunthorpe wrote:
> PCI ATS has a global Smallest Translation Unit field that is located in
> the PF but shared by all of the VFs.
> 
> The expectation is that the STU will be set to the root port's global STU
> capability which is driven by the IO page table configuration of the iommu
> HW. Today it becomes set when the iommu driver first enables ATS.
> 
> Thus, to enable ATS on the VF, the PF must have already had the correct
> STU programmed, even if ATS is off on the PF.
> 
> Unfortunately the PF only programs the STU when the PF enables ATS. The
> iommu drivers tend to leave ATS disabled when IDENTITY translation is
> being used.
> 
> Thus we can get into a state where the PF is setup to use IDENTITY with
> the DMA API while the VF would like to use VFIO with a PAGING domain and
> have ATS turned on. This fails because the PF never loaded a PAGING domain
> and so it never setup the STU, and the VF can't do it.
> 
> The simplest solution is to have the iommu driver set the ATS STU when it
> probes the device. This way the ATS STU is loaded immediately at boot time
> to all PFs and there is no issue when a VF comes to use it.
> 
> Add a new call pci_prepare_ats() which should be called by iommu drivers
> in their probe_device() op for every PCI device if the iommu driver
> supports ATS. This will setup the STU based on whatever page size
> capability the iommu HW has.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/amd/iommu.c                   |  3 ++
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  6 ++++
>  drivers/iommu/intel/iommu.c                 |  1 +
>  drivers/pci/ats.c                           | 33 +++++++++++++++++++++
>  include/linux/pci-ats.h                     |  1 +
>  5 files changed, 44 insertions(+)

Applied, thanks.
Jason Gunthorpe Aug. 13, 2024, 12:30 p.m. UTC | #11
On Tue, Aug 13, 2024 at 11:11:01AM +0800, Yi Liu wrote:

> > The simplest solution is to have the iommu driver set the ATS STU when it
> > probes the device. This way the ATS STU is loaded immediately at boot time
> > to all PFs and there is no issue when a VF comes to use it.
> 
> This only sets STU without setting the ATS_CTRL.E bit. Is it possible that
> VF considers the PF's STU field as valid only if PF's ATS_CTRL.E bit is
> set?

That doesn't seem to be the case. Do you see something in the spec
that says so?

> > @@ -4091,6 +4091,7 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
> >   	dev_iommu_priv_set(dev, info);
> >   	if (pdev && pci_ats_supported(pdev)) {
> > +		pci_prepare_ats(pdev, VTD_PAGE_SHIFT);
> 
> perhaps just do it for PFs? :)

That check is inside pci_perpare_ats(), no reason to duplicate it in
all the callers.

> > +int pci_prepare_ats(struct pci_dev *dev, int ps)
> > +{
> > +	u16 ctrl;
> > +
> > +	if (!pci_ats_supported(dev))
> > +		return -EINVAL;
> > +
> > +	if (WARN_ON(dev->ats_enabled))
> > +		return -EBUSY;
> > +
> > +	if (ps < PCI_ATS_MIN_STU)
> > +		return -EINVAL;
> > +
> > +	if (dev->is_virtfn)
> > +		return 0;
> > +
> > +	dev->ats_stu = ps;
> > +	ctrl = PCI_ATS_CTRL_STU(dev->ats_stu - PCI_ATS_MIN_STU);
> > +	pci_write_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, ctrl);
> 
> Is it valuable to have a flag to mark if STU is set or not? Such way can
> avoid setting STU multiple times.

We don't because we only do it for the PF due to the is_virtfn check

Jason
Yi Liu Aug. 13, 2024, 11:21 p.m. UTC | #12
On 2024/8/13 20:30, Jason Gunthorpe wrote:
> On Tue, Aug 13, 2024 at 11:11:01AM +0800, Yi Liu wrote:
> 
>>> The simplest solution is to have the iommu driver set the ATS STU when it
>>> probes the device. This way the ATS STU is loaded immediately at boot time
>>> to all PFs and there is no issue when a VF comes to use it.
>>
>> This only sets STU without setting the ATS_CTRL.E bit. Is it possible that
>> VF considers the PF's STU field as valid only if PF's ATS_CTRL.E bit is
>> set?
> 
> That doesn't seem to be the case. Do you see something in the spec
> that says so?

no, I didn't find any word that describe it. So it is highly possible this
is not defined and up to vendors. This means there is possibility for what
I typed in the above :( I'm not a hw guy, but it seems reasonable to treat
something valid only when the cap is enabled?

>>> @@ -4091,6 +4091,7 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
>>>    	dev_iommu_priv_set(dev, info);
>>>    	if (pdev && pci_ats_supported(pdev)) {
>>> +		pci_prepare_ats(pdev, VTD_PAGE_SHIFT);
>>
>> perhaps just do it for PFs? :)
> 
> That check is inside pci_perpare_ats(), no reason to duplicate it in
> all the callers.

got it.

>>> +int pci_prepare_ats(struct pci_dev *dev, int ps)
>>> +{
>>> +	u16 ctrl;
>>> +
>>> +	if (!pci_ats_supported(dev))
>>> +		return -EINVAL;
>>> +
>>> +	if (WARN_ON(dev->ats_enabled))
>>> +		return -EBUSY;
>>> +
>>> +	if (ps < PCI_ATS_MIN_STU)
>>> +		return -EINVAL;
>>> +
>>> +	if (dev->is_virtfn)
>>> +		return 0;
>>> +
>>> +	dev->ats_stu = ps;
>>> +	ctrl = PCI_ATS_CTRL_STU(dev->ats_stu - PCI_ATS_MIN_STU);
>>> +	pci_write_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, ctrl);
>>
>> Is it valuable to have a flag to mark if STU is set or not? Such way can
>> avoid setting STU multiple times.
> 
> We don't because we only do it for the PF due to the is_virtfn check

ok. Doing it only for PF is enough.
diff mbox series

Patch

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b19e8c0f48fa25..98054497d343bc 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2203,6 +2203,9 @@  static struct iommu_device *amd_iommu_probe_device(struct device *dev)
 
 	iommu_completion_wait(iommu);
 
+	if (dev_is_pci(dev))
+		pci_prepare_ats(to_pci_dev(dev), PAGE_SHIFT);
+
 	return iommu_dev;
 }
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index a31460f9f3d421..9bc50bded5af72 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3295,6 +3295,12 @@  static struct iommu_device *arm_smmu_probe_device(struct device *dev)
 	    smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
 		master->stall_enabled = true;
 
+	if (dev_is_pci(dev)) {
+		unsigned int stu = __ffs(smmu->pgsize_bitmap);
+
+		pci_prepare_ats(to_pci_dev(dev), stu);
+	}
+
 	return &smmu->iommu;
 
 err_free_master:
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9ff8b83c19a3e2..ad81db026ab236 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4091,6 +4091,7 @@  static struct iommu_device *intel_iommu_probe_device(struct device *dev)
 
 	dev_iommu_priv_set(dev, info);
 	if (pdev && pci_ats_supported(pdev)) {
+		pci_prepare_ats(pdev, VTD_PAGE_SHIFT);
 		ret = device_rbtree_insert(iommu, info);
 		if (ret)
 			goto free;
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index c570892b209095..87fa03540b8a21 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -47,6 +47,39 @@  bool pci_ats_supported(struct pci_dev *dev)
 }
 EXPORT_SYMBOL_GPL(pci_ats_supported);
 
+/**
+ * pci_prepare_ats - Setup the PS for ATS
+ * @dev: the PCI device
+ * @ps: the IOMMU page shift
+ *
+ * This must be done by the IOMMU driver on the PF before any VFs are created to
+ * ensure that the VF can have ATS enabled.
+ *
+ * Returns 0 on success, or negative on failure.
+ */
+int pci_prepare_ats(struct pci_dev *dev, int ps)
+{
+	u16 ctrl;
+
+	if (!pci_ats_supported(dev))
+		return -EINVAL;
+
+	if (WARN_ON(dev->ats_enabled))
+		return -EBUSY;
+
+	if (ps < PCI_ATS_MIN_STU)
+		return -EINVAL;
+
+	if (dev->is_virtfn)
+		return 0;
+
+	dev->ats_stu = ps;
+	ctrl = PCI_ATS_CTRL_STU(dev->ats_stu - PCI_ATS_MIN_STU);
+	pci_write_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, ctrl);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pci_prepare_ats);
+
 /**
  * pci_enable_ats - enable the ATS capability
  * @dev: the PCI device
diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index df54cd5b15db09..d98929c86991be 100644
--- a/include/linux/pci-ats.h
+++ b/include/linux/pci-ats.h
@@ -8,6 +8,7 @@ 
 /* Address Translation Service */
 bool pci_ats_supported(struct pci_dev *dev);
 int pci_enable_ats(struct pci_dev *dev, int ps);
+int pci_prepare_ats(struct pci_dev *dev, int ps);
 void pci_disable_ats(struct pci_dev *dev);
 int pci_ats_queue_depth(struct pci_dev *dev);
 int pci_ats_page_aligned(struct pci_dev *dev);