diff mbox series

[v3,03/15] hw/pci: Add a pci_device_iommu_memory_region() helper

Message ID 20230530175937.24202-4-joao.m.martins@oracle.com (mailing list archive)
State New, archived
Headers show
Series vfio: VFIO migration support with vIOMMU | expand

Commit Message

Joao Martins May 30, 2023, 5:59 p.m. UTC
Today's VFIO model of understanding if an IOMMU is behind the PCIDevice
is to check whether the address space backing the device is memory or
not, which is done via pci_device_iommu_address_space(pdev).

On the other hand, the IOMMU MR is used for example to fetch attributes,
or when doing an vIOMMU map and figuring out if it's under the vIOMMU.
However, such object is only available today by the time the IOMMU map
notifier is called which depends on the guest doing a DMA map or not.
Thus there's no way to get access to the IOMMU memory region early i.e.
at vfio device initialization where we attest migration support and
impose LM blockers or not.

Much like pci_device_iommu_address_space() fetches the IOMMU AS, add a
pci_device_iommu_memory_region() which lets it return an the IOMMU MR
associated with it. The IOMMU MR is returned correctly for vIOMMUs using
pci_setup_iommu_info(). Note that today most vIOMMUs create the address
space and IOMMU MR at the same time, it's just mainly that there's API
to make the latter available.

This is in preparation for VFIO to track both the AS and IOMMU MR into
VFIOSpace without being tied to a map taking place by the guest. The
IOMMU MR will then provide access to upper layers about various IOMMU
attributes.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 include/hw/pci/pci.h | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

Comments

Peter Xu June 5, 2023, 4:57 p.m. UTC | #1
On Tue, May 30, 2023 at 06:59:25PM +0100, Joao Martins wrote:
> Much like pci_device_iommu_address_space() fetches the IOMMU AS, add a
> pci_device_iommu_memory_region() which lets it return an the IOMMU MR
> associated with it. The IOMMU MR is returned correctly for vIOMMUs using
> pci_setup_iommu_info(). Note that today most vIOMMUs create the address
> space and IOMMU MR at the same time, it's just mainly that there's API
> to make the latter available.

Have you looked into other archs outside x86?  IIRC on some other arch one
address space can have >1 IOMMU memory regions.. at least with such AS and
MR layering it seems always possible?  Thanks,
Joao Martins June 6, 2023, 11:22 a.m. UTC | #2
On 05/06/2023 17:57, Peter Xu wrote:
> On Tue, May 30, 2023 at 06:59:25PM +0100, Joao Martins wrote:
>> Much like pci_device_iommu_address_space() fetches the IOMMU AS, add a
>> pci_device_iommu_memory_region() which lets it return an the IOMMU MR
>> associated with it. The IOMMU MR is returned correctly for vIOMMUs using
>> pci_setup_iommu_info(). Note that today most vIOMMUs create the address
>> space and IOMMU MR at the same time, it's just mainly that there's API
>> to make the latter available.
> 
> Have you looked into other archs outside x86?  IIRC on some other arch one
> address space can have >1 IOMMU memory regions.. at least with such AS and
> MR layering it seems always possible?  Thanks,
> 

I looked at all callers of pci_setup_iommu() restricting to those that actually
track an IOMMUMemoryRegion when they create a address space... as this is where
pci_device_iommu_memory_region() is applicable. From looking at those[*], I see
always a 1:1 association between the AS and the IOMMU-MR in their initialization
when iommu_fn is called. Unless I missed something... Is there an arch you were
thinking specifically?

[I am not sure we can track today an 1:N AS->IOMMU association today in Qemu]

[*] alpha, arm smmu, ppc, s390, virtio, and some pci bridges (pnv_phb3 and pnv_phb4)

	Joao
Peter Xu June 6, 2023, 3:03 p.m. UTC | #3
On Tue, Jun 06, 2023 at 12:22:16PM +0100, Joao Martins wrote:
> On 05/06/2023 17:57, Peter Xu wrote:
> > On Tue, May 30, 2023 at 06:59:25PM +0100, Joao Martins wrote:
> >> Much like pci_device_iommu_address_space() fetches the IOMMU AS, add a
> >> pci_device_iommu_memory_region() which lets it return an the IOMMU MR
> >> associated with it. The IOMMU MR is returned correctly for vIOMMUs using
> >> pci_setup_iommu_info(). Note that today most vIOMMUs create the address
> >> space and IOMMU MR at the same time, it's just mainly that there's API
> >> to make the latter available.
> > 
> > Have you looked into other archs outside x86?  IIRC on some other arch one
> > address space can have >1 IOMMU memory regions.. at least with such AS and
> > MR layering it seems always possible?  Thanks,
> > 
> 
> I looked at all callers of pci_setup_iommu() restricting to those that actually
> track an IOMMUMemoryRegion when they create a address space... as this is where
> pci_device_iommu_memory_region() is applicable. From looking at those[*], I see
> always a 1:1 association between the AS and the IOMMU-MR in their initialization
> when iommu_fn is called. Unless I missed something... Is there an arch you were
> thinking specifically?

If only observing the ones that "track an IOMMUMemoryRegion when they
create a address space", probably we're fine.  I was thinking ppc but I
don't really know the details, and I assume that's not in the scope.
Copying David Gibson just in case he got some comments here.

> 
> [I am not sure we can track today an 1:N AS->IOMMU association today in Qemu]

IIUC we can?  The address space only have a root MR, and with that after
translate() upon the root mr (per address_space_translate_iommu(), it can
even be a few rounds of nested translations) it can go into whatever MR
under it IIUC.  Different ranges can map to a different IOMMU MR logically.

> 
> [*] alpha, arm smmu, ppc, s390, virtio, and some pci bridges (pnv_phb3 and pnv_phb4)

I just worried what we need here is not a MR object but a higher level
object like the vIOMMU object.  We used to have a requirement with Scalable
IOV (SVA) on Intel.  I tried to dig a bit in my inbox, not sure whether
it's the latest status, just to show what I meant:

https://lore.kernel.org/r/20210302203827.437645-6-yi.l.liu@intel.com

Copy Yi too for that too.  From that aspect it makes more sense to me to
fetching things from either an IOMMUops or "an iommu object", rather than
relying on a specific MR (it'll also make it even harder when we can have
>1 vIOMMUs so different MR can point to different IOMMUs in the future).

I assume the two goals have similar requirement, iiuc.  If that's the case,
we'd better make sure we'll have one way to work for both.
Peter Xu June 6, 2023, 3:05 p.m. UTC | #4
[ I forgot to really copy anyone, as usual.. trying again ]

On Tue, Jun 06, 2023 at 11:03:11AM -0400, Peter Xu wrote:
> On Tue, Jun 06, 2023 at 12:22:16PM +0100, Joao Martins wrote:
> > On 05/06/2023 17:57, Peter Xu wrote:
> > > On Tue, May 30, 2023 at 06:59:25PM +0100, Joao Martins wrote:
> > >> Much like pci_device_iommu_address_space() fetches the IOMMU AS, add a
> > >> pci_device_iommu_memory_region() which lets it return an the IOMMU MR
> > >> associated with it. The IOMMU MR is returned correctly for vIOMMUs using
> > >> pci_setup_iommu_info(). Note that today most vIOMMUs create the address
> > >> space and IOMMU MR at the same time, it's just mainly that there's API
> > >> to make the latter available.
> > > 
> > > Have you looked into other archs outside x86?  IIRC on some other arch one
> > > address space can have >1 IOMMU memory regions.. at least with such AS and
> > > MR layering it seems always possible?  Thanks,
> > > 
> > 
> > I looked at all callers of pci_setup_iommu() restricting to those that actually
> > track an IOMMUMemoryRegion when they create a address space... as this is where
> > pci_device_iommu_memory_region() is applicable. From looking at those[*], I see
> > always a 1:1 association between the AS and the IOMMU-MR in their initialization
> > when iommu_fn is called. Unless I missed something... Is there an arch you were
> > thinking specifically?
> 
> If only observing the ones that "track an IOMMUMemoryRegion when they
> create a address space", probably we're fine.  I was thinking ppc but I
> don't really know the details, and I assume that's not in the scope.
> Copying David Gibson just in case he got some comments here.
> 
> > 
> > [I am not sure we can track today an 1:N AS->IOMMU association today in Qemu]
> 
> IIUC we can?  The address space only have a root MR, and with that after
> translate() upon the root mr (per address_space_translate_iommu(), it can
> even be a few rounds of nested translations) it can go into whatever MR
> under it IIUC.  Different ranges can map to a different IOMMU MR logically.
> 
> > 
> > [*] alpha, arm smmu, ppc, s390, virtio, and some pci bridges (pnv_phb3 and pnv_phb4)
> 
> I just worried what we need here is not a MR object but a higher level
> object like the vIOMMU object.  We used to have a requirement with Scalable
> IOV (SVA) on Intel.  I tried to dig a bit in my inbox, not sure whether
> it's the latest status, just to show what I meant:
> 
> https://lore.kernel.org/r/20210302203827.437645-6-yi.l.liu@intel.com
> 
> Copy Yi too for that too.  From that aspect it makes more sense to me to
> fetching things from either an IOMMUops or "an iommu object", rather than
> relying on a specific MR (it'll also make it even harder when we can have
> >1 vIOMMUs so different MR can point to different IOMMUs in the future).
> 
> I assume the two goals have similar requirement, iiuc.  If that's the case,
> we'd better make sure we'll have one way to work for both.
> 
> -- 
> Peter Xu
Joao Martins June 6, 2023, 5:44 p.m. UTC | #5
On 06/06/2023 16:05, Peter Xu wrote:
> On Tue, Jun 06, 2023 at 11:03:11AM -0400, Peter Xu wrote:
>> On Tue, Jun 06, 2023 at 12:22:16PM +0100, Joao Martins wrote:
>>> On 05/06/2023 17:57, Peter Xu wrote:
>>>> On Tue, May 30, 2023 at 06:59:25PM +0100, Joao Martins wrote:
>>>>> Much like pci_device_iommu_address_space() fetches the IOMMU AS, add a
>>>>> pci_device_iommu_memory_region() which lets it return an the IOMMU MR
>>>>> associated with it. The IOMMU MR is returned correctly for vIOMMUs using
>>>>> pci_setup_iommu_info(). Note that today most vIOMMUs create the address
>>>>> space and IOMMU MR at the same time, it's just mainly that there's API
>>>>> to make the latter available.
>>>>
>>>> Have you looked into other archs outside x86?  IIRC on some other arch one
>>>> address space can have >1 IOMMU memory regions.. at least with such AS and
>>>> MR layering it seems always possible?  Thanks,
>>>>
>>>
>>> I looked at all callers of pci_setup_iommu() restricting to those that actually
>>> track an IOMMUMemoryRegion when they create a address space... as this is where
>>> pci_device_iommu_memory_region() is applicable. From looking at those[*], I see
>>> always a 1:1 association between the AS and the IOMMU-MR in their initialization
>>> when iommu_fn is called. Unless I missed something... Is there an arch you were
>>> thinking specifically?
>>
>> If only observing the ones that "track an IOMMUMemoryRegion when they
>> create a address space", probably we're fine.  I was thinking ppc but I
>> don't really know the details, and I assume that's not in the scope.
>> Copying David Gibson just in case he got some comments here.
>>/me nods

>>>
>>> [I am not sure we can track today an 1:N AS->IOMMU association today in Qemu]
>>
>> IIUC we can?  The address space only have a root MR, and with that after
>> translate() upon the root mr (per address_space_translate_iommu(), it can
>> even be a few rounds of nested translations) it can go into whatever MR
>> under it IIUC.  Different ranges can map to a different IOMMU MR logically.
>>

I'll look some more into address_space_translate_iommu(). From a data structure
PoV wasn't obvious how two different AS can be routed via two different IOMMU
MRs from different AS (or vice versa). Thanks for clarifying;

>>>
>>> [*] alpha, arm smmu, ppc, s390, virtio, and some pci bridges (pnv_phb3 and pnv_phb4)
>>
>> I just worried what we need here is not a MR object but a higher level
>> object like the vIOMMU object.  We used to have a requirement with Scalable
>> IOV (SVA) on Intel.  I tried to dig a bit in my inbox, not sure whether
>> it's the latest status, just to show what I meant:
>>
>> https://lore.kernel.org/r/20210302203827.437645-6-yi.l.liu@intel.com
>>
Oh nice; I wasn't aware of this series.

>> Copy Yi too for that too.  From that aspect it makes more sense to me to
>> fetching things from either an IOMMUops or "an iommu object", rather than
>> relying on a specific MR (it'll also make it even harder when we can have
>> 1 vIOMMUs so different MR can point to different IOMMUs in the future).
>>
>> I assume the two goals have similar requirement, iiuc.  If that's the case,
>> we'd better make sure we'll have one way to work for both.

Yeap, makes sense, definitely more future-proof. We essentially were trying to
do the exact same thing in the PCI layer just different purposes. All I meant in
this series is a way to fetch some vIOMMU attribute that tell me if DMA
translation is enabled or not and max IOVA for the IOMMU under a particular PCI
device.

Perhaps I would instead do a bit like this series and have a
pci_setup_iommu_ops() and convert existing users to it as a separate step or
series, to avoid regressing those who don't care.

I am happy to pick those up if Yi's is OK with it -- should help for the nesting
work down the road too.

	Joao
diff mbox series

Patch

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index d2c87d87a24e..0177f50e96a3 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -365,13 +365,14 @@  void pci_device_deassert_intx(PCIDevice *dev);
 
 typedef struct PCIAddressSpace {
     AddressSpace *as;
+    IOMMUMemoryRegion *iommu_mr;
 } PCIAddressSpace;
 
 typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
 typedef PCIAddressSpace (*PCIIOMMUASFunc)(PCIBus *, void *, int);
 static inline PCIAddressSpace as_to_pci_as(AddressSpace *as)
 {
-    PCIAddressSpace ret = { .as = as };
+    PCIAddressSpace ret = { .as = as, .iommu_mr = NULL };
 
     return ret;
 }
@@ -379,6 +380,13 @@  static inline AddressSpace *pci_as_to_as(PCIAddressSpace pci_as)
 {
     return pci_as.as;
 }
+static inline PCIAddressSpace to_pci_as(AddressSpace *as,
+                                        IOMMUMemoryRegion *iommu_mr)
+{
+    PCIAddressSpace ret = { .as = as, .iommu_mr = iommu_mr };
+
+    return ret;
+}
 
 PCIAddressSpace pci_device_iommu_info(PCIDevice *dev);
 static inline AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
@@ -386,6 +394,13 @@  static inline AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
     return pci_as_to_as(pci_device_iommu_info(dev));
 }
 
+static inline IOMMUMemoryRegion *pci_device_iommu_memory_region(PCIDevice *dev)
+{
+    PCIAddressSpace ret = pci_device_iommu_info(dev);
+
+    return ret.iommu_mr;
+}
+
 void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
 void pci_setup_iommu_info(PCIBus *bus, PCIIOMMUASFunc fn, void *opaque);