diff mbox series

[v2,5/8] pci/arm: Use iommu_add_dt_pci_device()

Message ID 20230511191654.400720-6-stewart.hildebrand@amd.com (mailing list archive)
State Superseded
Headers show
Series SMMU handling for PCIe Passthrough on ARM | expand

Commit Message

Stewart Hildebrand May 11, 2023, 7:16 p.m. UTC
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

On Arm we need to parse DT PCI-IOMMU specifier and provide it to
the driver (for describing the relationship between PCI devices
and IOMMUs) before adding a device to it.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
---
v1->v2:
* new patch title (was: "pci/arm: Use iommu_add_dt_pci_device() instead of arch hook")
* move iommu_add_dt_pci_device() call (and associated #ifdef) to
  pci_add_device()
* use existing call to iommu_add_device()

downstream->v1:
* rebase
* add __maybe_unused attribute to const struct domain_iommu *hd;
* Rename: s/iommu_add_pci_device/iommu_add_dt_pci_device/
* guard iommu_add_dt_pci_device call with CONFIG_HAS_DEVICE_TREE instead of
  CONFIG_ARM

(cherry picked from commit 2b9d26badab8b24b5a80d028c4499a5022817213 from
 the downstream branch poc/pci-passthrough from
 https://gitlab.com/xen-project/people/bmarquis/xen-arm-poc.git)
---
 xen/drivers/passthrough/pci.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Comments

Jan Beulich May 12, 2023, 7:25 a.m. UTC | #1
On 11.05.2023 21:16, Stewart Hildebrand wrote:
> @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>              pdev->domain = NULL;
>              goto out;
>          }
> +#ifdef CONFIG_HAS_DEVICE_TREE
> +        ret = iommu_add_dt_pci_device(pdev);
> +        if ( ret < 0 )
> +        {
> +            printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret);
> +            goto out;
> +        }
> +#endif
>          ret = iommu_add_device(pdev);

Hmm, am I misremembering that in the earlier patch you had #else to
invoke the alternative behavior? Now you end up calling both functions;
if that's indeed intended, this may still want doing differently.
Looking at the earlier patch introducing the function, I can't infer
though whether that's intended: iommu_add_dt_pci_device() checks that
the add_device hook is present, but then I didn't find any use of this
hook. The revlog there suggests the check might be stale.

If indeed the function does only preparatory work, I don't see why it
would need naming "iommu_..."; I'd rather consider pci_add_dt_device()
then. Plus in such a case #ifdef-ary here probably wants avoiding by
introducing a suitable no-op stub for the !HAS_DEVICE_TREE case. Then
...

>          if ( ret )
>          {
> +#ifdef CONFIG_HAS_DEVICE_TREE
> +            iommu_fwspec_free(pci_to_dev(pdev));
> +#endif

... this (which I understand is doing the corresponding cleanup) then
also wants wrapping in a suitably named tiny helper function.

But yet further I'm then no longer convinced this is the right place
for the addition. pci_add_device() is backing physdev hypercalls. It
would seem to me that the function may want invoking yet one layer
further up, or it may even want invoking from a brand new DT-specific
physdev-op. This would then leave at least the x86-only paths (invoking
pci_add_device() from outside of pci_physdev_op()) entirely alone.

Jan
Stewart Hildebrand May 12, 2023, 9:03 p.m. UTC | #2
On 5/12/23 03:25, Jan Beulich wrote:
> On 11.05.2023 21:16, Stewart Hildebrand wrote:
>> @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>              pdev->domain = NULL;
>>              goto out;
>>          }
>> +#ifdef CONFIG_HAS_DEVICE_TREE
>> +        ret = iommu_add_dt_pci_device(pdev);
>> +        if ( ret < 0 )
>> +        {
>> +            printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret);
>> +            goto out;
>> +        }
>> +#endif
>>          ret = iommu_add_device(pdev);
> 
> Hmm, am I misremembering that in the earlier patch you had #else to
> invoke the alternative behavior?

You are remembering correctly. v1 had an #else, v2 does not.

> Now you end up calling both functions;
> if that's indeed intended,

Yes, this is intentional.

> this may still want doing differently.
> Looking at the earlier patch introducing the function, I can't infer
> though whether that's intended: iommu_add_dt_pci_device() checks that
> the add_device hook is present, but then I didn't find any use of this
> hook. The revlog there suggests the check might be stale.

Ah, right, the ops->add_device check is stale in the other patch. Good catch, I'll remove it there.

> If indeed the function does only preparatory work, I don't see why it
> would need naming "iommu_..."; I'd rather consider pci_add_dt_device()
> then.

The function has now been reduced to reading SMMU configuration data from DT and mapping RID/BDF -> AXI stream ID. However, it is still SMMU related, and it is still invoking another iommu_ops hook function, dt_xlate (which is yet another AXI stream ID translation, separate from what is being discussed here). Does this justify keeping "iommu_..." in the name? I'm not convinced pci_add_dt_device() is a good name for it either (more on this below).

> Plus in such a case #ifdef-ary here probably wants avoiding by
> introducing a suitable no-op stub for the !HAS_DEVICE_TREE case. Then
> ...
> 
>>          if ( ret )
>>          {
>> +#ifdef CONFIG_HAS_DEVICE_TREE
>> +            iommu_fwspec_free(pci_to_dev(pdev));
>> +#endif
> 
> ... this (which I understand is doing the corresponding cleanup) then
> also wants wrapping in a suitably named tiny helper function.

Sure, I'm on board with eliminating/reducing the #ifdef-ary where possible. Will do.

> But yet further I'm then no longer convinced this is the right place
> for the addition. pci_add_device() is backing physdev hypercalls. It
> would seem to me that the function may want invoking yet one layer
> further up, or it may even want invoking from a brand new DT-specific
> physdev-op. This would then leave at least the x86-only paths (invoking
> pci_add_device() from outside of pci_physdev_op()) entirely alone.

Let's establish that pci_add_device()/iommu_add_device() are already inherently performing tasks related to setting up a PCI device to work with an IOMMU.

The preparatory work in question needs to happen after:

  pci_add_device()
    -> alloc_pdev()

since we need to know all the possible RIDs (including those for phantom functions), but before the add_device iommu hook:

  pci_add_device()
    -> iommu_add_device()
      -> iommu_call(hd->platform_ops, add_device, ...)


The preparatory work (i.e. mapping RID/BDF -> AXI stream ID) is inherently associated with setting up a PCI device to work with an ARM SMMU (but not any particular variant of the SMMU). The SMMU distinguishes what PCI device/function DMA traffic is associated with based on the derived AXI stream ID (sideband data), not the RID/BDF directly. See [1].

Moving the preparatory work one layer up would mean duplicating what alloc_pdev() is already doing to set up pdev->phantom_stride (which we need to figure out all RIDs for that particular device). Moving it down into the individual SMMU drivers (smmu_ops/platform_ops) would mean duplicating special phantom function handling in each SMMU driver, and further deviating from the Linux SMMU driver(s) they are based on.

It still feels to me like pci_add_device() (or iommu_add_device()) is the right place to perform the RID/BDF -> AXI stream ID mapping.

Since there's nothing inherently DT specific (or ACPI specific) about deriving sideband data from RID/BDF, let me propose a new name for the function (instead of iommu_add_dt_pci_device):

  iommu_derive_pci_device_sideband_IDs()


Now, as far as DT and ACPI co-existing goes, I admit I haven't tested with CONFIG_ACPI=y yet (there seems to be some issues when both CONFIG_ARM_SMMU_V3=y and CONFIG_ACPI=y are enabled, even in staging). But I do recognize that we need a way support both CONFIG_HAS_DEVICE_TREE=y and CONFIG_ACPI=y simultaneously. Let me think on that for a bit...

[1] https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-iommu.txt
Jan Beulich May 15, 2023, 7:30 a.m. UTC | #3
On 12.05.2023 23:03, Stewart Hildebrand wrote:
> On 5/12/23 03:25, Jan Beulich wrote:
>> On 11.05.2023 21:16, Stewart Hildebrand wrote:
>>> @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>>              pdev->domain = NULL;
>>>              goto out;
>>>          }
>>> +#ifdef CONFIG_HAS_DEVICE_TREE
>>> +        ret = iommu_add_dt_pci_device(pdev);
>>> +        if ( ret < 0 )
>>> +        {
>>> +            printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret);
>>> +            goto out;
>>> +        }
>>> +#endif
>>>          ret = iommu_add_device(pdev);
>>
>> Hmm, am I misremembering that in the earlier patch you had #else to
>> invoke the alternative behavior?
> 
> You are remembering correctly. v1 had an #else, v2 does not.
> 
>> Now you end up calling both functions;
>> if that's indeed intended,
> 
> Yes, this is intentional.
> 
>> this may still want doing differently.
>> Looking at the earlier patch introducing the function, I can't infer
>> though whether that's intended: iommu_add_dt_pci_device() checks that
>> the add_device hook is present, but then I didn't find any use of this
>> hook. The revlog there suggests the check might be stale.
> 
> Ah, right, the ops->add_device check is stale in the other patch. Good catch, I'll remove it there.
> 
>> If indeed the function does only preparatory work, I don't see why it
>> would need naming "iommu_..."; I'd rather consider pci_add_dt_device()
>> then.
> 
> The function has now been reduced to reading SMMU configuration data from DT and mapping RID/BDF -> AXI stream ID. However, it is still SMMU related, and it is still invoking another iommu_ops hook function, dt_xlate (which is yet another AXI stream ID translation, separate from what is being discussed here). Does this justify keeping "iommu_..." in the name? I'm not convinced pci_add_dt_device() is a good name for it either (more on this below).

The function being SMMU-related pretty strongly suggests it wants to be
invoked via a hook. If the add_device() one isn't suitable, perhaps we
need a new (optional) prepare_device() one? With pci_add_device() then
calling iommu_prepare_device(), wrapping the hook invocation?

But just to be clear: A new hook would need enough justification as to
the existing one being unsuitable.

Jan
Jan Beulich May 15, 2023, 7:30 a.m. UTC | #4
On 12.05.2023 23:03, Stewart Hildebrand wrote:
> On 5/12/23 03:25, Jan Beulich wrote:
>> On 11.05.2023 21:16, Stewart Hildebrand wrote:
>>> @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>>              pdev->domain = NULL;
>>>              goto out;
>>>          }
>>> +#ifdef CONFIG_HAS_DEVICE_TREE
>>> +        ret = iommu_add_dt_pci_device(pdev);
>>> +        if ( ret < 0 )
>>> +        {
>>> +            printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret);
>>> +            goto out;
>>> +        }
>>> +#endif
>>>          ret = iommu_add_device(pdev);
>>
>> Hmm, am I misremembering that in the earlier patch you had #else to
>> invoke the alternative behavior?
> 
> You are remembering correctly. v1 had an #else, v2 does not.
> 
>> Now you end up calling both functions;
>> if that's indeed intended,
> 
> Yes, this is intentional.
> 
>> this may still want doing differently.
>> Looking at the earlier patch introducing the function, I can't infer
>> though whether that's intended: iommu_add_dt_pci_device() checks that
>> the add_device hook is present, but then I didn't find any use of this
>> hook. The revlog there suggests the check might be stale.
> 
> Ah, right, the ops->add_device check is stale in the other patch. Good catch, I'll remove it there.
> 
>> If indeed the function does only preparatory work, I don't see why it
>> would need naming "iommu_..."; I'd rather consider pci_add_dt_device()
>> then.
> 
> The function has now been reduced to reading SMMU configuration data from DT and mapping RID/BDF -> AXI stream ID. However, it is still SMMU related, and it is still invoking another iommu_ops hook function, dt_xlate (which is yet another AXI stream ID translation, separate from what is being discussed here). Does this justify keeping "iommu_..." in the name? I'm not convinced pci_add_dt_device() is a good name for it either (more on this below).

The function being SMMU-related pretty strongly suggests it wants to be
invoked via a hook. If the add_device() one isn't suitable, perhaps we
need a new (optional) prepare_device() one? With pci_add_device() then
calling iommu_prepare_device(), wrapping the hook invocation?

But just to be clear: A new hook would need enough justification as to
the existing one being unsuitable.

Jan
Stewart Hildebrand May 18, 2023, 9:05 p.m. UTC | #5
On 5/15/23 03:30, Jan Beulich wrote:
> On 12.05.2023 23:03, Stewart Hildebrand wrote:
>> On 5/12/23 03:25, Jan Beulich wrote:
>>> On 11.05.2023 21:16, Stewart Hildebrand wrote:
>>>> @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>>>              pdev->domain = NULL;
>>>>              goto out;
>>>>          }
>>>> +#ifdef CONFIG_HAS_DEVICE_TREE
>>>> +        ret = iommu_add_dt_pci_device(pdev);
>>>> +        if ( ret < 0 )
>>>> +        {
>>>> +            printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret);
>>>> +            goto out;
>>>> +        }
>>>> +#endif
>>>>          ret = iommu_add_device(pdev);
>>>
>>> Hmm, am I misremembering that in the earlier patch you had #else to
>>> invoke the alternative behavior?
>>
>> You are remembering correctly. v1 had an #else, v2 does not.
>>
>>> Now you end up calling both functions;
>>> if that's indeed intended,
>>
>> Yes, this is intentional.
>>
>>> this may still want doing differently.
>>> Looking at the earlier patch introducing the function, I can't infer
>>> though whether that's intended: iommu_add_dt_pci_device() checks that
>>> the add_device hook is present, but then I didn't find any use of this
>>> hook. The revlog there suggests the check might be stale.
>>
>> Ah, right, the ops->add_device check is stale in the other patch. Good catch, I'll remove it there.
>>
>>> If indeed the function does only preparatory work, I don't see why it
>>> would need naming "iommu_..."; I'd rather consider pci_add_dt_device()
>>> then.
>>
>> The function has now been reduced to reading SMMU configuration data from DT and mapping RID/BDF -> AXI stream ID. However, it is still SMMU related, and it is still invoking another iommu_ops hook function, dt_xlate (which is yet another AXI stream ID translation, separate from what is being discussed here). Does this justify keeping "iommu_..." in the name? I'm not convinced pci_add_dt_device() is a good name for it either (more on this below).
> 
> The function being SMMU-related pretty strongly suggests it wants to be
> invoked via a hook. If the add_device() one isn't suitable, perhaps we
> need a new (optional) prepare_device() one? With pci_add_device() then
> calling iommu_prepare_device(), wrapping the hook invocation?
> 
> But just to be clear: A new hook would need enough justification as to
> the existing one being unsuitable.

I'll move it to the add_device hook in v3
diff mbox series

Patch

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index b42acb8d7c09..6dbaae682773 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -34,6 +34,11 @@ 
 #include <xen/vpci.h>
 #include <xen/msi.h>
 #include <xsm/xsm.h>
+
+#ifdef CONFIG_HAS_DEVICE_TREE
+#include <asm/iommu_fwspec.h>
+#endif
+
 #include "ats.h"
 
 struct pci_seg {
@@ -762,9 +767,20 @@  int pci_add_device(u16 seg, u8 bus, u8 devfn,
             pdev->domain = NULL;
             goto out;
         }
+#ifdef CONFIG_HAS_DEVICE_TREE
+        ret = iommu_add_dt_pci_device(pdev);
+        if ( ret < 0 )
+        {
+            printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret);
+            goto out;
+        }
+#endif
         ret = iommu_add_device(pdev);
         if ( ret )
         {
+#ifdef CONFIG_HAS_DEVICE_TREE
+            iommu_fwspec_free(pci_to_dev(pdev));
+#endif
             vpci_remove_device(pdev);
             list_del(&pdev->domain_list);
             pdev->domain = NULL;