Message ID | 1493201525-14418-8-git-send-email-yi.l.liu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Yi, On 26/04/17 11:12, Liu, Yi L wrote: > From: "Liu, Yi L" <yi.l.liu@linux.intel.com> > > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB > invalidate request from guest to host. > > In the case of SVM virtualization on VT-d, host IOMMU driver has > no knowledge of caching structure updates unless the guest > invalidation activities are passed down to the host. So a new > IOCTL is needed to propagate the guest cache invalidation through > VFIO. > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com> > --- > include/uapi/linux/vfio.h | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index 6b97987..50c51f8 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -564,6 +564,15 @@ struct vfio_device_svm { > > #define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) > > +/* For IOMMU TLB Invalidation Propagation */ > +struct vfio_iommu_tlb_invalidate { > + __u32 argsz; > + __u32 length; > + __u8 data[]; > +}; We initially discussed something a little more generic than this, with most info explicitly described and only pIOMMU-specific quirks and hints in an opaque structure. Out of curiosity, why the change? I'm not against a fully opaque structure, but there seem to be a large overlap between TLB invalidations across architectures. For what it's worth, when prototyping the paravirtualized IOMMU I came up with the following. (From the paravirtualized POV, the SMMU also has to swizzle endianess after unpacking an opaque structure, since userspace doesn't know what's in it and guest might use a different endianess. So we need to force all opaque data to be e.g. little-endian.) struct vfio_iommu_tlb_invalidate { __u32 argsz; __u32 scope; __u32 flags; __u32 pasid; __u64 vaddr; __u64 size; __u8 data[]; }; Scope is a bitfields restricting the invalidation scope. By default invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr and @size are unused. Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation scope to the pasid described by @pasid. Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation scope to the address range described by (@vaddr, @size). So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA range for *all* pasids (as well as no_pasid). Setting scope = (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate the VA range only for @pasid. Flags depend on the selected scope: VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either without scope or with INVALIDATE_VADDR) targets non-pasid mappings exclusively (some architectures, e.g. SMMU, allow this) VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need to invalidate all intermediate tables cached as part of the PTW for vaddr, only the last-level entry (pte). This is a hint. I guess what's missing for Intel IOMMU and would go in @data is the "global" hint (which we don't have in SMMU invalidations). Do you see anything else, that the pIOMMU cannot deduce from this structure? Thanks, Jean > +#define VFIO_IOMMU_TLB_INVALIDATE _IO(VFIO_TYPE, VFIO_BASE + 23) > + > /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */ > > /* >
On Wed, 26 Apr 2017 18:12:04 +0800 "Liu, Yi L" <yi.l.liu@intel.com> wrote: > From: "Liu, Yi L" <yi.l.liu@linux.intel.com> > > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB > invalidate request from guest to host. > > In the case of SVM virtualization on VT-d, host IOMMU driver has > no knowledge of caching structure updates unless the guest > invalidation activities are passed down to the host. So a new > IOCTL is needed to propagate the guest cache invalidation through > VFIO. > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com> > --- > include/uapi/linux/vfio.h | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index 6b97987..50c51f8 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -564,6 +564,15 @@ struct vfio_device_svm { > > #define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) > > +/* For IOMMU TLB Invalidation Propagation */ > +struct vfio_iommu_tlb_invalidate { > + __u32 argsz; > + __u32 length; > + __u8 data[]; > +}; > + > +#define VFIO_IOMMU_TLB_INVALIDATE _IO(VFIO_TYPE, VFIO_BASE + 23) I'm kind of wondering why this isn't just a new flag bit on vfio_device_svm, the data structure is so similar. Of course data needs to be fully specified in uapi. > + > /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */ > > /*
On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote: > Hi Yi, > > On 26/04/17 11:12, Liu, Yi L wrote: > > From: "Liu, Yi L" <yi.l.liu@linux.intel.com> > > > > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB > > invalidate request from guest to host. > > > > In the case of SVM virtualization on VT-d, host IOMMU driver has > > no knowledge of caching structure updates unless the guest > > invalidation activities are passed down to the host. So a new > > IOCTL is needed to propagate the guest cache invalidation through > > VFIO. > > > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com> > > --- > > include/uapi/linux/vfio.h | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > > index 6b97987..50c51f8 100644 > > --- a/include/uapi/linux/vfio.h > > +++ b/include/uapi/linux/vfio.h > > @@ -564,6 +564,15 @@ struct vfio_device_svm { > > > > #define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) > > > > +/* For IOMMU TLB Invalidation Propagation */ > > +struct vfio_iommu_tlb_invalidate { > > + __u32 argsz; > > + __u32 length; > > + __u8 data[]; > > +}; > > We initially discussed something a little more generic than this, with > most info explicitly described and only pIOMMU-specific quirks and hints > in an opaque structure. Out of curiosity, why the change? I'm not against > a fully opaque structure, but there seem to be a large overlap between TLB > invalidations across architectures. Hi Jean, As my cover letter mentioned, it is an open on the iommu tlb invalidate propagation. Paste it here since it's in the cover letter for Qemu part changes. Pls refer to the [Open] session in the following link. http://www.spinics.net/lists/kvm/msg148798.html I want to see if community wants to have opaque structure or not on iommu tlb invalidate propagation. Personally, I incline to use opaque structure. But it's better to gather the comments before deciding it. To assist the discussion, I put the full opaque structure here. Once community gets consensus on using opaque structure for iommu tlb invalidate propagation, I'm glad to work with you on a structure with partial opaque since there seems to be overlap across arch. > > For what it's worth, when prototyping the paravirtualized IOMMU I came up > with the following. > > (From the paravirtualized POV, the SMMU also has to swizzle endianess > after unpacking an opaque structure, since userspace doesn't know what's > in it and guest might use a different endianess. So we need to force all > opaque data to be e.g. little-endian.) > > struct vfio_iommu_tlb_invalidate { > __u32 argsz; > __u32 scope; > __u32 flags; > __u32 pasid; > __u64 vaddr; > __u64 size; > __u8 data[]; > }; > > Scope is a bitfields restricting the invalidation scope. By default > invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr > and @size are unused. > > Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation > scope to the pasid described by @pasid. > Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation > scope to the address range described by (@vaddr, @size). > > So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA > range for *all* pasids (as well as no_pasid). Setting scope = > (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate > the VA range only for @pasid. > Besides VA range flusing, there is PASID Cache flushing on VT-d. How about SMMU? So I think besides the two scope you defined, may need one more to indicate if it's PASID Cache flushing. > Flags depend on the selected scope: > > VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either > without scope or with INVALIDATE_VADDR) targets non-pasid mappings > exclusively (some architectures, e.g. SMMU, allow this) > > VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need > to invalidate all intermediate tables cached as part of the PTW for vaddr, > only the last-level entry (pte). This is a hint. > > I guess what's missing for Intel IOMMU and would go in @data is the > "global" hint (which we don't have in SMMU invalidations). Do you see > anything else, that the pIOMMU cannot deduce from this structure? > For Intel platform, Drain read/write would be needed in the opaque. Thanks, Yi L
On Fri, May 12, 2017 at 03:58:43PM -0600, Alex Williamson wrote: > On Wed, 26 Apr 2017 18:12:04 +0800 > "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > > From: "Liu, Yi L" <yi.l.liu@linux.intel.com> > > > > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB > > invalidate request from guest to host. > > > > In the case of SVM virtualization on VT-d, host IOMMU driver has > > no knowledge of caching structure updates unless the guest > > invalidation activities are passed down to the host. So a new > > IOCTL is needed to propagate the guest cache invalidation through > > VFIO. > > > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com> > > --- > > include/uapi/linux/vfio.h | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > > index 6b97987..50c51f8 100644 > > --- a/include/uapi/linux/vfio.h > > +++ b/include/uapi/linux/vfio.h > > @@ -564,6 +564,15 @@ struct vfio_device_svm { > > > > #define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) > > > > +/* For IOMMU TLB Invalidation Propagation */ > > +struct vfio_iommu_tlb_invalidate { > > + __u32 argsz; > > + __u32 length; > > + __u8 data[]; > > +}; > > + > > +#define VFIO_IOMMU_TLB_INVALIDATE _IO(VFIO_TYPE, VFIO_BASE + 23) > > I'm kind of wondering why this isn't just a new flag bit on > vfio_device_svm, the data structure is so similar. Of course data > needs to be fully specified in uapi. Hi Alex, For this part, it depends on using opaque structure or not. The following link mentioned it in [Open] session. http://www.spinics.net/lists/kvm/msg148798.html If we pick the full opaque solution for iommu tlb invalidate propagation. Then I may add a flag bit on vfio_device_svm and also add definition in uapi as you suggested. Thanks, Yi L > > + > > /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */ > > > > /* >
On 14/05/17 11:12, Liu, Yi L wrote: > On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote: >> Hi Yi, >> >> On 26/04/17 11:12, Liu, Yi L wrote: >>> From: "Liu, Yi L" <yi.l.liu@linux.intel.com> >>> >>> This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB >>> invalidate request from guest to host. >>> >>> In the case of SVM virtualization on VT-d, host IOMMU driver has >>> no knowledge of caching structure updates unless the guest >>> invalidation activities are passed down to the host. So a new >>> IOCTL is needed to propagate the guest cache invalidation through >>> VFIO. >>> >>> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com> >>> --- >>> include/uapi/linux/vfio.h | 9 +++++++++ >>> 1 file changed, 9 insertions(+) >>> >>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h >>> index 6b97987..50c51f8 100644 >>> --- a/include/uapi/linux/vfio.h >>> +++ b/include/uapi/linux/vfio.h >>> @@ -564,6 +564,15 @@ struct vfio_device_svm { >>> >>> #define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) >>> >>> +/* For IOMMU TLB Invalidation Propagation */ >>> +struct vfio_iommu_tlb_invalidate { >>> + __u32 argsz; >>> + __u32 length; >>> + __u8 data[]; >>> +}; >> >> We initially discussed something a little more generic than this, with >> most info explicitly described and only pIOMMU-specific quirks and hints >> in an opaque structure. Out of curiosity, why the change? I'm not against >> a fully opaque structure, but there seem to be a large overlap between TLB >> invalidations across architectures. > > Hi Jean, > > As my cover letter mentioned, it is an open on the iommu tlb invalidate > propagation. Paste it here since it's in the cover letter for Qemu part > changes. Pls refer to the [Open] session in the following link. > > http://www.spinics.net/lists/kvm/msg148798.html > > I want to see if community wants to have opaque structure or not > on iommu tlb invalidate propagation. Personally, I incline to use > opaque structure. But it's better to gather the comments before > deciding it. To assist the discussion, I put the full opaque structure > here. Once community gets consensus on using opaque structure for > iommu tlb invalidate propagation, I'm glad to work with you on a > structure with partial opaque since there seems to be overlap across > arch. I see, thanks for the pointer. I'm not fan of using the pIOMMU format in invalidations, but I understand the appeal in your case, where you can shave off the overhead of parsing/re-assembling the pIOMMU format. It's not suitable for generic io-pgtable, where different pIOMMUs may offer the same page table format but different invalidation methods. I guess I could make it work by negotiating invalidation method at bind time, falling back to the generic format for virtio-iommu. We already have to do some related negotiation for SVM on SMMU, where in embedded implementations the guest doesn't need to send invalidation requests via IOMMU queue, but can do it using CPU instructions instead. >> For what it's worth, when prototyping the paravirtualized IOMMU I came up >> with the following. >> >> (From the paravirtualized POV, the SMMU also has to swizzle endianess >> after unpacking an opaque structure, since userspace doesn't know what's >> in it and guest might use a different endianess. So we need to force all >> opaque data to be e.g. little-endian.) >> >> struct vfio_iommu_tlb_invalidate { >> __u32 argsz; >> __u32 scope; >> __u32 flags; >> __u32 pasid; >> __u64 vaddr; >> __u64 size; >> __u8 data[]; >> }; >> >> Scope is a bitfields restricting the invalidation scope. By default >> invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr >> and @size are unused. >> >> Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation >> scope to the pasid described by @pasid. >> Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation >> scope to the address range described by (@vaddr, @size). >> >> So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA >> range for *all* pasids (as well as no_pasid). Setting scope = >> (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate >> the VA range only for @pasid. >> > > Besides VA range flusing, there is PASID Cache flushing on VT-d. How about > SMMU? So I think besides the two scope you defined, may need one more to > indicate if it's PASID Cache flushing. Yes, invalidating all TLB entries associated to a PASID would be done by setting scope = VFIO_IOMMU_INVALIDATE_PASID. In which case field @pasid is valid, and fields (@vaddr, @size) aren't used. Thanks, Jean >> Flags depend on the selected scope: >> >> VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either >> without scope or with INVALIDATE_VADDR) targets non-pasid mappings >> exclusively (some architectures, e.g. SMMU, allow this) >> >> VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need >> to invalidate all intermediate tables cached as part of the PTW for vaddr, >> only the last-level entry (pte). This is a hint. >> >> I guess what's missing for Intel IOMMU and would go in @data is the >> "global" hint (which we don't have in SMMU invalidations). Do you see >> anything else, that the pIOMMU cannot deduce from this structure? >> > > For Intel platform, Drain read/write would be needed in the opaque. > > Thanks, > Yi L >
On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote: Hi Jean, As we've got a few discussions on it. I'd like to have a conclusion and make it as a reference for future discussion. Currently, we are inclined to have a hybrid format for the iommu tlb invalidation from userspace(vIOMMU or userspace driver). Based on the previous discussion, may the below work? 1. Add a IOCTL for iommu tlb invalidation. VFIO_IOMMU_TLB_INVALIDATE struct vfio_iommu_tlb_invalidate { __u32 argsz; __u32 length; __u8 data[]; }; comments from Alex William: is it more suitable to add a new flag bit on vfio_device_svm(a structure defined in patch 5 of this patchset), the data structure is so similar. Personally, I'm ok with it. Pls let me know your thoughts. However, the precondition is we accept the whole definition in this email. If not, the vfio_iommu_tlb_invalidate would be defined differently. 2. Define a structure in include/uapi/linux/iommu.h(newly added header file) struct iommu_tlb_invalidate { __u32 scope; /* pasid-selective invalidation described by @pasid */ #define IOMMU_INVALIDATE_PASID (1 << 0) /* address-selevtive invalidation described by (@vaddr, @size) */ #define IOMMU_INVALIDATE_VADDR (1 << 1) __u32 flags; /* targets non-pasid mappings, @pasid is not valid */ #define IOMMU_INVALIDATE_NO_PASID (1 << 0) /* indicating that the pIOMMU doesn't need to invalidate all intermediate tables cached as part of the PTE for vaddr, only the last-level entry (pte). This is a hint. */ #define IOMMU_INVALIDATE_VADDR_LEAF (1 << 1) __u32 pasid; __u64 vaddr; __u64 size; __u8 data[]; }; For this part, the scope and flags are basically aligned with your previous email. I renamed the prefix to be "IOMMU_". In my opinion, the scope and flags would be filled by vIOMMU emulator and be parsed by underlying iommu driver, it is much more suitable to be defined in a uapi header file. Besides the reason above, I don't want VFIO engae too much on the data parsing. If we move the scope,flags,pasid,vaddr,size fields to vfio_iommu_tlb_invalidate, then both kernel space vfio and user space vfio needs to do much parsing. So I may prefer the way above. If you've got any other idea, pls feel free to post it. It's welcomed. Thanks, Yi L > Hi Yi, > > On 26/04/17 11:12, Liu, Yi L wrote: > > From: "Liu, Yi L" <yi.l.liu@linux.intel.com> > > > > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB > > invalidate request from guest to host. > > > > In the case of SVM virtualization on VT-d, host IOMMU driver has > > no knowledge of caching structure updates unless the guest > > invalidation activities are passed down to the host. So a new > > IOCTL is needed to propagate the guest cache invalidation through > > VFIO. > > > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com> > > --- > > include/uapi/linux/vfio.h | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > > index 6b97987..50c51f8 100644 > > --- a/include/uapi/linux/vfio.h > > +++ b/include/uapi/linux/vfio.h > > @@ -564,6 +564,15 @@ struct vfio_device_svm { > > > > #define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) > > > > +/* For IOMMU TLB Invalidation Propagation */ > > +struct vfio_iommu_tlb_invalidate { > > + __u32 argsz; > > + __u32 length; > > + __u8 data[]; > > +}; > > We initially discussed something a little more generic than this, with > most info explicitly described and only pIOMMU-specific quirks and hints > in an opaque structure. Out of curiosity, why the change? I'm not against > a fully opaque structure, but there seem to be a large overlap between TLB > invalidations across architectures. > > > For what it's worth, when prototyping the paravirtualized IOMMU I came up > with the following. > > (From the paravirtualized POV, the SMMU also has to swizzle endianess > after unpacking an opaque structure, since userspace doesn't know what's > in it and guest might use a different endianess. So we need to force all > opaque data to be e.g. little-endian.) > > struct vfio_iommu_tlb_invalidate { > __u32 argsz; > __u32 scope; > __u32 flags; > __u32 pasid; > __u64 vaddr; > __u64 size; > __u8 data[]; > }; > > Scope is a bitfields restricting the invalidation scope. By default > invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr > and @size are unused. > > Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation > scope to the pasid described by @pasid. > Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation > scope to the address range described by (@vaddr, @size). > > So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA > range for *all* pasids (as well as no_pasid). Setting scope = > (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate > the VA range only for @pasid. > > Flags depend on the selected scope: > > VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either > without scope or with INVALIDATE_VADDR) targets non-pasid mappings > exclusively (some architectures, e.g. SMMU, allow this) > > VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need > to invalidate all intermediate tables cached as part of the PTW for vaddr, > only the last-level entry (pte). This is a hint. > > I guess what's missing for Intel IOMMU and would go in @data is the > "global" hint (which we don't have in SMMU invalidations). Do you see > anything else, that the pIOMMU cannot deduce from this structure? > > Thanks, > Jean > > > > +#define VFIO_IOMMU_TLB_INVALIDATE _IO(VFIO_TYPE, VFIO_BASE + 23) > > + > > /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */ > > > > /* > > > >
Hi Jean, On Mon, Jul 03, 2017 at 12:52:52PM +0100, Jean-Philippe Brucker wrote: > Hi Yi, > > On 02/07/17 11:06, Liu, Yi L wrote: > > On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote: > > > > Hi Jean, > > > > As we've got a few discussions on it. I'd like to have a conclusion and > > make it as a reference for future discussion. > > > > Currently, we are inclined to have a hybrid format for the iommu tlb > > invalidation from userspace(vIOMMU or userspace driver). > > > > Based on the previous discussion, may the below work? > > > > 1. Add a IOCTL for iommu tlb invalidation. > > > > VFIO_IOMMU_TLB_INVALIDATE > > > > struct vfio_iommu_tlb_invalidate { > > __u32 argsz; > > __u32 length; > > Wouldn't argsz be exactly length + 8? Might be redundant in this case. yes, it is. we may not use it in future version. but yes, if we still use it. I think we can make it easier. > > __u8 data[]; > > }; > > > > comments from Alex William: is it more suitable to add a new flag bit on > > vfio_device_svm(a structure defined in patch 5 of this patchset), the data > > structure is so similar. > > > > Personally, I'm ok with it. Pls let me know your thoughts. However, the > > precondition is we accept the whole definition in this email. If not, the > > vfio_iommu_tlb_invalidate would be defined differently. > > With this proposal sharing the structure makes sense. As I understand it > we're keeping the VFIO_IOMMU_TLB_INVALIDATE ioctl? In which case adding a > flag bit would be redundant. yes, it seems to be strange if we share vfio_device_svm structure but use a separate IOCTL cmd. Maybe it's more reasonable to share IOCTL cmd and just add a new flag. Then all the svm related operations share the IOCTL. However, need to check if there would be any non-svm related iommu tlb invalidation. Then vfio_device_svm should be renamed to be non-svm specific. > > > 2. Define a structure in include/uapi/linux/iommu.h(newly added header file) > > > > struct iommu_tlb_invalidate { > > __u32 scope; > > /* pasid-selective invalidation described by @pasid */ > > #define IOMMU_INVALIDATE_PASID (1 << 0) > > /* address-selevtive invalidation described by (@vaddr, @size) */ > > #define IOMMU_INVALIDATE_VADDR (1 << 1) > > __u32 flags; > > /* targets non-pasid mappings, @pasid is not valid */ > > #define IOMMU_INVALIDATE_NO_PASID (1 << 0) > > Although it was my proposal, I don't like this flag. In ARM SMMU, we're > using a special mode where PASID 0 is reserved and any traffic without > PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag > to invalidate that special context explicitly. But this means that > invalidation packet targeted at that context will have "scope = PASID" and > "flags = NO_PASID", which is utterly confusing. > > I now think that we should get rid of the IOMMU_INVALIDATE_NO_PASID flag > and just use PASID 0 to invalidate this context on ARM. I don't think > other architectures would use the NO_PASID flag anyway, but might be mistaken. I may suggest to keep it so far. On VT-d, we may pass some data in opaque, so we may work without it. But if other vendor want to issue non-PASID tagged cache, then may encounter problem. > > /* indicating that the pIOMMU doesn't need to invalidate > > all intermediate tables cached as part of the PTE for > > vaddr, only the last-level entry (pte). This is a hint. */ > > #define IOMMU_INVALIDATE_VADDR_LEAF (1 << 1) > > __u32 pasid; > > __u64 vaddr; > > __u64 size; > > __u8 data[]; > > }; > > > > For this part, the scope and flags are basically aligned with your previous > > email. I renamed the prefix to be "IOMMU_". In my opinion, the scope and flags > > would be filled by vIOMMU emulator and be parsed by underlying iommu driver, > > it is much more suitable to be defined in a uapi header file. > > I tend to agree, defining a single structure in a new IOMMU UAPI file is > better than having identical structures both in uapi/linux/vfio.h and > linux/iommu.h. This way we avoid VFIO having to copy the same structure > field by field. Arch-specific structures that go in > iommu_tlb_invalidate.data also ought to be defined in uapi/linux/iommu.h yes, it is. > > Besides the reason above, I don't want VFIO engae too much on the data parsing. > > If we move the scope,flags,pasid,vaddr,size fields to vfio_iommu_tlb_invalidate, > > then both kernel space vfio and user space vfio needs to do much parsing. So I > > may prefer the way above. > > Would the entire validation of struct iommu_tlb_invalidate be offloaded to > the IOMMU subsystem? Checking the structure sizes, copying from user, and > validating the flags? no, the copying from user and flag validation is still in VFIO. Basic idea is still passing the iommu_tlb_invalidate.data to iommu sub-system. > I guess it's mostly an implementation detail, but it might be better to > keep this code in VFIO as well, even for the validation of > iommu_tlb_invalidate.data (which would require VFIO to keep track of the > model used during bind). This way VFIO sanitizes what comes from > userspace, whilst the IOMMU subsystem only deals with valid kernel > structures, and another subsystem could easily reuse the > iommu_tlb_invalidate API. it's a good idae. may think about it. VFIO should also be able to parse the generic part of the iommu_tlb_invalidate.data. Thanks, Yi L > The IOMMU subsystem would still validate the meaning of the fields, for > instance whether a given combination of flag is allowed or if the PASID > exists. > > Thanks, > Jean > > > If you've got any other idea, pls feel free to post it. It's welcomed. > > > > Thanks, > > Yi L > > > >> Hi Yi, > >> > >> On 26/04/17 11:12, Liu, Yi L wrote: > >>> From: "Liu, Yi L" <yi.l.liu@linux.intel.com> > >>> > >>> This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB > >>> invalidate request from guest to host. > >>> > >>> In the case of SVM virtualization on VT-d, host IOMMU driver has > >>> no knowledge of caching structure updates unless the guest > >>> invalidation activities are passed down to the host. So a new > >>> IOCTL is needed to propagate the guest cache invalidation through > >>> VFIO. > >>> > >>> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com> > >>> --- > >>> include/uapi/linux/vfio.h | 9 +++++++++ > >>> 1 file changed, 9 insertions(+) > >>> > >>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > >>> index 6b97987..50c51f8 100644 > >>> --- a/include/uapi/linux/vfio.h > >>> +++ b/include/uapi/linux/vfio.h > >>> @@ -564,6 +564,15 @@ struct vfio_device_svm { > >>> > >>> #define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) > >>> > >>> +/* For IOMMU TLB Invalidation Propagation */ > >>> +struct vfio_iommu_tlb_invalidate { > >>> + __u32 argsz; > >>> + __u32 length; > >>> + __u8 data[]; > >>> +}; > >> > >> We initially discussed something a little more generic than this, with > >> most info explicitly described and only pIOMMU-specific quirks and hints > >> in an opaque structure. Out of curiosity, why the change? I'm not against > >> a fully opaque structure, but there seem to be a large overlap between TLB > >> invalidations across architectures. > >> > >> > >> For what it's worth, when prototyping the paravirtualized IOMMU I came up > >> with the following. > >> > >> (From the paravirtualized POV, the SMMU also has to swizzle endianess > >> after unpacking an opaque structure, since userspace doesn't know what's > >> in it and guest might use a different endianess. So we need to force all > >> opaque data to be e.g. little-endian.) > >> > >> struct vfio_iommu_tlb_invalidate { > >> __u32 argsz; > >> __u32 scope; > >> __u32 flags; > >> __u32 pasid; > >> __u64 vaddr; > >> __u64 size; > >> __u8 data[]; > >> }; > >> > >> Scope is a bitfields restricting the invalidation scope. By default > >> invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr > >> and @size are unused. > >> > >> Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation > >> scope to the pasid described by @pasid. > >> Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation > >> scope to the address range described by (@vaddr, @size). > >> > >> So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA > >> range for *all* pasids (as well as no_pasid). Setting scope = > >> (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate > >> the VA range only for @pasid. > >> > >> Flags depend on the selected scope: > >> > >> VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either > >> without scope or with INVALIDATE_VADDR) targets non-pasid mappings > >> exclusively (some architectures, e.g. SMMU, allow this)>> > >> VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need > >> to invalidate all intermediate tables cached as part of the PTW for vaddr, > >> only the last-level entry (pte). This is a hint. > >> > >> I guess what's missing for Intel IOMMU and would go in @data is the > >> "global" hint (which we don't have in SMMU invalidations). Do you see > >> anything else, that the pIOMMU cannot deduce from this structure? > >> > >> Thanks, > >> Jean > >> > >> > >>> +#define VFIO_IOMMU_TLB_INVALIDATE _IO(VFIO_TYPE, VFIO_BASE + 23) > >>> + > >>> /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */ > >>> > >>> /* > >>> > >> > >> > >
Hi Yi, On 02/07/17 11:06, Liu, Yi L wrote: > On Fri, May 12, 2017 at 01:11:02PM +0100, Jean-Philippe Brucker wrote: > > Hi Jean, > > As we've got a few discussions on it. I'd like to have a conclusion and > make it as a reference for future discussion. > > Currently, we are inclined to have a hybrid format for the iommu tlb > invalidation from userspace(vIOMMU or userspace driver). > > Based on the previous discussion, may the below work? > > 1. Add a IOCTL for iommu tlb invalidation. > > VFIO_IOMMU_TLB_INVALIDATE > > struct vfio_iommu_tlb_invalidate { > __u32 argsz; > __u32 length; Wouldn't argsz be exactly length + 8? Might be redundant in this case. > __u8 data[]; > }; > > comments from Alex William: is it more suitable to add a new flag bit on > vfio_device_svm(a structure defined in patch 5 of this patchset), the data > structure is so similar. > > Personally, I'm ok with it. Pls let me know your thoughts. However, the > precondition is we accept the whole definition in this email. If not, the > vfio_iommu_tlb_invalidate would be defined differently. With this proposal sharing the structure makes sense. As I understand it we're keeping the VFIO_IOMMU_TLB_INVALIDATE ioctl? In which case adding a flag bit would be redundant. > 2. Define a structure in include/uapi/linux/iommu.h(newly added header file) > > struct iommu_tlb_invalidate { > __u32 scope; > /* pasid-selective invalidation described by @pasid */ > #define IOMMU_INVALIDATE_PASID (1 << 0) > /* address-selevtive invalidation described by (@vaddr, @size) */ > #define IOMMU_INVALIDATE_VADDR (1 << 1) > __u32 flags; > /* targets non-pasid mappings, @pasid is not valid */ > #define IOMMU_INVALIDATE_NO_PASID (1 << 0) Although it was my proposal, I don't like this flag. In ARM SMMU, we're using a special mode where PASID 0 is reserved and any traffic without PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag to invalidate that special context explicitly. But this means that invalidation packet targeted at that context will have "scope = PASID" and "flags = NO_PASID", which is utterly confusing. I now think that we should get rid of the IOMMU_INVALIDATE_NO_PASID flag and just use PASID 0 to invalidate this context on ARM. I don't think other architectures would use the NO_PASID flag anyway, but might be mistaken. > /* indicating that the pIOMMU doesn't need to invalidate > all intermediate tables cached as part of the PTE for > vaddr, only the last-level entry (pte). This is a hint. */ > #define IOMMU_INVALIDATE_VADDR_LEAF (1 << 1) > __u32 pasid; > __u64 vaddr; > __u64 size; > __u8 data[]; > }; > > For this part, the scope and flags are basically aligned with your previous > email. I renamed the prefix to be "IOMMU_". In my opinion, the scope and flags > would be filled by vIOMMU emulator and be parsed by underlying iommu driver, > it is much more suitable to be defined in a uapi header file. I tend to agree, defining a single structure in a new IOMMU UAPI file is better than having identical structures both in uapi/linux/vfio.h and linux/iommu.h. This way we avoid VFIO having to copy the same structure field by field. Arch-specific structures that go in iommu_tlb_invalidate.data also ought to be defined in uapi/linux/iommu.h > Besides the reason above, I don't want VFIO engae too much on the data parsing. > If we move the scope,flags,pasid,vaddr,size fields to vfio_iommu_tlb_invalidate, > then both kernel space vfio and user space vfio needs to do much parsing. So I > may prefer the way above. Would the entire validation of struct iommu_tlb_invalidate be offloaded to the IOMMU subsystem? Checking the structure sizes, copying from user, and validating the flags? I guess it's mostly an implementation detail, but it might be better to keep this code in VFIO as well, even for the validation of iommu_tlb_invalidate.data (which would require VFIO to keep track of the model used during bind). This way VFIO sanitizes what comes from userspace, whilst the IOMMU subsystem only deals with valid kernel structures, and another subsystem could easily reuse the iommu_tlb_invalidate API. The IOMMU subsystem would still validate the meaning of the fields, for instance whether a given combination of flag is allowed or if the PASID exists. Thanks, Jean > If you've got any other idea, pls feel free to post it. It's welcomed. > > Thanks, > Yi L > >> Hi Yi, >> >> On 26/04/17 11:12, Liu, Yi L wrote: >>> From: "Liu, Yi L" <yi.l.liu@linux.intel.com> >>> >>> This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU TLB >>> invalidate request from guest to host. >>> >>> In the case of SVM virtualization on VT-d, host IOMMU driver has >>> no knowledge of caching structure updates unless the guest >>> invalidation activities are passed down to the host. So a new >>> IOCTL is needed to propagate the guest cache invalidation through >>> VFIO. >>> >>> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com> >>> --- >>> include/uapi/linux/vfio.h | 9 +++++++++ >>> 1 file changed, 9 insertions(+) >>> >>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h >>> index 6b97987..50c51f8 100644 >>> --- a/include/uapi/linux/vfio.h >>> +++ b/include/uapi/linux/vfio.h >>> @@ -564,6 +564,15 @@ struct vfio_device_svm { >>> >>> #define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) >>> >>> +/* For IOMMU TLB Invalidation Propagation */ >>> +struct vfio_iommu_tlb_invalidate { >>> + __u32 argsz; >>> + __u32 length; >>> + __u8 data[]; >>> +}; >> >> We initially discussed something a little more generic than this, with >> most info explicitly described and only pIOMMU-specific quirks and hints >> in an opaque structure. Out of curiosity, why the change? I'm not against >> a fully opaque structure, but there seem to be a large overlap between TLB >> invalidations across architectures. >> >> >> For what it's worth, when prototyping the paravirtualized IOMMU I came up >> with the following. >> >> (From the paravirtualized POV, the SMMU also has to swizzle endianess >> after unpacking an opaque structure, since userspace doesn't know what's >> in it and guest might use a different endianess. So we need to force all >> opaque data to be e.g. little-endian.) >> >> struct vfio_iommu_tlb_invalidate { >> __u32 argsz; >> __u32 scope; >> __u32 flags; >> __u32 pasid; >> __u64 vaddr; >> __u64 size; >> __u8 data[]; >> }; >> >> Scope is a bitfields restricting the invalidation scope. By default >> invalidate the whole container (all PASIDs and all VAs). @pasid, @vaddr >> and @size are unused. >> >> Adding VFIO_IOMMU_INVALIDATE_PASID (1 << 0) restricts the invalidation >> scope to the pasid described by @pasid. >> Adding VFIO_IOMMU_INVALIDATE_VADDR (1 << 1) restricts the invalidation >> scope to the address range described by (@vaddr, @size). >> >> So setting scope = VFIO_IOMMU_INVALIDATE_VADDR would invalidate the VA >> range for *all* pasids (as well as no_pasid). Setting scope = >> (VFIO_IOMMU_INVALIDATE_VADDR|VFIO_IOMMU_INVALIDATE_PASID) would invalidate >> the VA range only for @pasid. >> >> Flags depend on the selected scope: >> >> VFIO_IOMMU_INVALIDATE_NO_PASID, indicating that invalidation (either >> without scope or with INVALIDATE_VADDR) targets non-pasid mappings >> exclusively (some architectures, e.g. SMMU, allow this)>> >> VFIO_IOMMU_INVALIDATE_VADDR_LEAF, indicating that the pIOMMU doesn't need >> to invalidate all intermediate tables cached as part of the PTW for vaddr, >> only the last-level entry (pte). This is a hint. >> >> I guess what's missing for Intel IOMMU and would go in @data is the >> "global" hint (which we don't have in SMMU invalidations). Do you see >> anything else, that the pIOMMU cannot deduce from this structure? >> >> Thanks, >> Jean >> >> >>> +#define VFIO_IOMMU_TLB_INVALIDATE _IO(VFIO_TYPE, VFIO_BASE + 23) >>> + >>> /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */ >>> >>> /* >>> >> >>
> From: Liu, Yi L [mailto:yi.l.liu@linux.intel.com] > Sent: Sunday, May 14, 2017 6:55 PM > > On Fri, May 12, 2017 at 03:58:43PM -0600, Alex Williamson wrote: > > On Wed, 26 Apr 2017 18:12:04 +0800 > > "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > > > > From: "Liu, Yi L" <yi.l.liu@linux.intel.com> > > > > > > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU > TLB > > > invalidate request from guest to host. > > > > > > In the case of SVM virtualization on VT-d, host IOMMU driver has > > > no knowledge of caching structure updates unless the guest > > > invalidation activities are passed down to the host. So a new > > > IOCTL is needed to propagate the guest cache invalidation through > > > VFIO. > > > > > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com> > > > --- > > > include/uapi/linux/vfio.h | 9 +++++++++ > > > 1 file changed, 9 insertions(+) > > > > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > > > index 6b97987..50c51f8 100644 > > > --- a/include/uapi/linux/vfio.h > > > +++ b/include/uapi/linux/vfio.h > > > @@ -564,6 +564,15 @@ struct vfio_device_svm { > > > > > > #define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + > 22) > > > > > > +/* For IOMMU TLB Invalidation Propagation */ > > > +struct vfio_iommu_tlb_invalidate { > > > + __u32 argsz; > > > + __u32 length; > > > + __u8 data[]; > > > +}; > > > + > > > +#define VFIO_IOMMU_TLB_INVALIDATE _IO(VFIO_TYPE, VFIO_BASE + > 23) > > > > I'm kind of wondering why this isn't just a new flag bit on > > vfio_device_svm, the data structure is so similar. Of course data > > needs to be fully specified in uapi. > > Hi Alex, > > For this part, it depends on using opaque structure or not. The following > link mentioned it in [Open] session. > > http://www.spinics.net/lists/kvm/msg148798.html > > If we pick the full opaque solution for iommu tlb invalidate propagation. > Then I may add a flag bit on vfio_device_svm and also add definition in > uapi as you suggested. > there is another benefit to keep it as a separate command. For now we only need to invalidate 1st level translation (GVA->GPA) for SVM, since 1st level page table is provided by guest while directly walked by IOMMU. It's possible some vendor may also choose to implement a nested 2nd level translation (e.g. GIOVA->GPA->HPA) then hardware can directly walk guest GIOVA page table thus explicit invalidation is also required. We'd better not to limit invalidation interface with svm structure. Thanks Kevin
> From: Liu, Yi L > Sent: Monday, July 3, 2017 6:31 PM > > Hi Jean, > > > > > > > 2. Define a structure in include/uapi/linux/iommu.h(newly added header > file) > > > > > > struct iommu_tlb_invalidate { > > > __u32 scope; > > > /* pasid-selective invalidation described by @pasid */ > > > #define IOMMU_INVALIDATE_PASID (1 << 0) > > > /* address-selevtive invalidation described by (@vaddr, @size) */ > > > #define IOMMU_INVALIDATE_VADDR (1 << 1) For VT-d above two flags are related. There is no method of flushing (@vaddr, @size) for all pasids, which doesn't make sense. address- selective invalidation is valid only for a given pasid. So it's not appropriate to put them in same level of scope definition at least for VT-d. > > > __u32 flags; > > > /* targets non-pasid mappings, @pasid is not valid */ > > > #define IOMMU_INVALIDATE_NO_PASID (1 << 0) > > > > Although it was my proposal, I don't like this flag. In ARM SMMU, we're > > using a special mode where PASID 0 is reserved and any traffic without > > PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag > > to invalidate that special context explicitly. But this means that > > invalidation packet targeted at that context will have "scope = PASID" and > > "flags = NO_PASID", which is utterly confusing. > > > > I now think that we should get rid of the IOMMU_INVALIDATE_NO_PASID > flag > > and just use PASID 0 to invalidate this context on ARM. I don't think > > other architectures would use the NO_PASID flag anyway, but might be > mistaken. > > I may suggest to keep it so far. On VT-d, we may pass some data in opaque, > so > we may work without it. But if other vendor want to issue non-PASID tagged > cache, then may encounter problem. I'm worried about what's the criteria which attribute should be abstracted in common structure and which can be left to opaque. It doesn't make much sense to do such abstraction purely because different vendor formats have some common fields. Usually we do such abstraction because vendor-agnostic code need to do some common handling before going to vendor specific code. However in this case VFIO is not expected to do anything with those IOMMU specific attributes. Then the structure is directly forwarded to IOMMU driver, which simply translates the structure into vendor specific opaque data again. Then why bothering to do double translations in Qemu and IOMMU driver side? Take VT-d for example. Below is a summary of all possible selections around invalidation of 1st level structure for svm: Scope: All PASIDs, single PASID for each PASID: all mappings, or page-selective mappings (addr, size) invalidation target: IOTLB entries (leaf) paging structure cache (non-leaf) PASID cache (pasid->cr3) invalidation hint: whether global pages are included drain reads/writes Above are pretty architectural attributes if just looking at functional purpose. Then if we really consider defining a common structure, it might be more natural to define a superset of all vendors' capabilities and remove the opaque field at all. But as said earlier the purpose of doing such abstraction is not clear if there is no vendor-agnostic user actually digesting those fields. Then should we reconsider the full opaque approach? Welcome comments since I may overlook something here. :-) Thanks Kevin
On 05/07/17 07:45, Tian, Kevin wrote: >> From: Liu, Yi L >> Sent: Monday, July 3, 2017 6:31 PM >> >> Hi Jean, >> >> >>> >>>> 2. Define a structure in include/uapi/linux/iommu.h(newly added header >> file) >>>> >>>> struct iommu_tlb_invalidate { >>>> __u32 scope; >>>> /* pasid-selective invalidation described by @pasid */ >>>> #define IOMMU_INVALIDATE_PASID (1 << 0) >>>> /* address-selevtive invalidation described by (@vaddr, @size) */ >>>> #define IOMMU_INVALIDATE_VADDR (1 << 1) > > For VT-d above two flags are related. There is no method of flushing > (@vaddr, @size) for all pasids, which doesn't make sense. address- > selective invalidation is valid only for a given pasid. So it's not appropriate > to put them in same level of scope definition at least for VT-d. For ARM SMMU the "flush all by VA" operation is valid. Although it's unclear at this point if we will ever allow that, it should probably stay in the common format, if there is one. >>>> __u32 flags; >>>> /* targets non-pasid mappings, @pasid is not valid */ >>>> #define IOMMU_INVALIDATE_NO_PASID (1 << 0) >>> >>> Although it was my proposal, I don't like this flag. In ARM SMMU, we're >>> using a special mode where PASID 0 is reserved and any traffic without >>> PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag >>> to invalidate that special context explicitly. But this means that >>> invalidation packet targeted at that context will have "scope = PASID" and >>> "flags = NO_PASID", which is utterly confusing. >>> >>> I now think that we should get rid of the IOMMU_INVALIDATE_NO_PASID >> flag >>> and just use PASID 0 to invalidate this context on ARM. I don't think >>> other architectures would use the NO_PASID flag anyway, but might be >> mistaken. >> >> I may suggest to keep it so far. On VT-d, we may pass some data in opaque, >> so >> we may work without it. But if other vendor want to issue non-PASID tagged >> cache, then may encounter problem. > > I'm worried about what's the criteria which attribute should be abstracted > in common structure and which can be left to opaque. It doesn't make > much sense to do such abstraction purely because different vendor formats > have some common fields. Usually we do such abstraction because > vendor-agnostic code need to do some common handling before going to > vendor specific code. However in this case VFIO is not expected to do anything > with those IOMMU specific attributes. Then the structure is directly forwarded > to IOMMU driver, which simply translates the structure into vendor specific > opaque data again. Then why bothering to do double translations in Qemu > and IOMMU driver side?> > Take VT-d for example. Below is a summary of all possible selections around > invalidation of 1st level structure for svm: > > Scope: All PASIDs, single PASID > for each PASID: > all mappings, or page-selective mappings (addr, size) > invalidation target: > IOTLB entries (leaf) > paging structure cache (non-leaf) I'm curious, can you invalidate all intermediate paging structures for a given PASID without invalidating the leaves? > PASID cache (pasid->cr3) I guess any implementations that gives the whole PASID table to userspace will need the PASID cache invalidation. This was missing from my proposal since it was from virtio-iommu. > invalidation hint: > whether global pages are included > drain reads/writes> > Above are pretty architectural attributes if just looking at functional > purpose. Then if we really consider defining a common structure, it > might be more natural to define a superset of all vendors' capabilities > and remove the opaque field at all. But as said earlier the purpose of > doing such abstraction is not clear if there is no vendor-agnostic > user actually digesting those fields. Then should we reconsider the > full opaque approach? > > Welcome comments since I may overlook something here. :-) I guess on x86 the invalidation packet formats are stable, but for ARM I'm reluctant to deal with vendor-specific formats at the API level, because they tend to be volatile. If a virtual IOMMU version is different from the physical one, then the page table format will be the same but invalidation format will not. So it would be good to define common fields that have the same effects regardless on the underlying pIOMMU. And the fields that differ between ARM and x86 seem to only be hints. In addition on ARM SMMU, the guest cannot build an invalidation command that the host could simply copy into the hardware command queue. The pIOMMU driver needs to craft an invalidation command with a Virtual Machine ID, that the guest is never aware of, and a separate ATS invalidation command. It might also need to retrieve an Address Space ID associated with the given PASID if it chose to hide it from the guest. So for us the invalidation structure would always be different from the hardware one. That's why I do not have any reason the prefer an opaque structure in the first place, and defining generic fields looks much neater :) Then again, I don't have any strong technical objection to it. Thanks, Jean
On Wed, 5 Jul 2017 13:42:03 +0100 Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote: > On 05/07/17 07:45, Tian, Kevin wrote: > >> From: Liu, Yi L > >> Sent: Monday, July 3, 2017 6:31 PM > >> > >> Hi Jean, > >> > >> > >>> > >>>> 2. Define a structure in include/uapi/linux/iommu.h(newly added header > >> file) > >>>> > >>>> struct iommu_tlb_invalidate { > >>>> __u32 scope; > >>>> /* pasid-selective invalidation described by @pasid */ > >>>> #define IOMMU_INVALIDATE_PASID (1 << 0) > >>>> /* address-selevtive invalidation described by (@vaddr, @size) */ > >>>> #define IOMMU_INVALIDATE_VADDR (1 << 1) > > > > For VT-d above two flags are related. There is no method of flushing > > (@vaddr, @size) for all pasids, which doesn't make sense. address- > > selective invalidation is valid only for a given pasid. So it's not appropriate > > to put them in same level of scope definition at least for VT-d. > > For ARM SMMU the "flush all by VA" operation is valid. Although it's > unclear at this point if we will ever allow that, it should probably stay > in the common format, if there is one. > > >>>> __u32 flags; > >>>> /* targets non-pasid mappings, @pasid is not valid */ > >>>> #define IOMMU_INVALIDATE_NO_PASID (1 << 0) > >>> > >>> Although it was my proposal, I don't like this flag. In ARM SMMU, we're > >>> using a special mode where PASID 0 is reserved and any traffic without > >>> PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag > >>> to invalidate that special context explicitly. But this means that > >>> invalidation packet targeted at that context will have "scope = PASID" and > >>> "flags = NO_PASID", which is utterly confusing. > >>> > >>> I now think that we should get rid of the IOMMU_INVALIDATE_NO_PASID > >> flag > >>> and just use PASID 0 to invalidate this context on ARM. I don't think > >>> other architectures would use the NO_PASID flag anyway, but might be > >> mistaken. > >> > >> I may suggest to keep it so far. On VT-d, we may pass some data in opaque, > >> so > >> we may work without it. But if other vendor want to issue non-PASID tagged > >> cache, then may encounter problem. > > > > I'm worried about what's the criteria which attribute should be abstracted > > in common structure and which can be left to opaque. It doesn't make > > much sense to do such abstraction purely because different vendor formats > > have some common fields. Usually we do such abstraction because > > vendor-agnostic code need to do some common handling before going to > > vendor specific code. However in this case VFIO is not expected to do anything > > with those IOMMU specific attributes. Then the structure is directly forwarded > > to IOMMU driver, which simply translates the structure into vendor specific > > opaque data again. Then why bothering to do double translations in Qemu > > and IOMMU driver side?> > > Take VT-d for example. Below is a summary of all possible selections around > > invalidation of 1st level structure for svm: > > > > Scope: All PASIDs, single PASID > > for each PASID: > > all mappings, or page-selective mappings (addr, size) > > invalidation target: > > IOTLB entries (leaf) > > paging structure cache (non-leaf) > > I'm curious, can you invalidate all intermediate paging structures for a > given PASID without invalidating the leaves? > > > PASID cache (pasid->cr3) > I guess any implementations that gives the whole PASID table to userspace > will need the PASID cache invalidation. This was missing from my proposal > since it was from virtio-iommu. > > > invalidation hint: > > whether global pages are included > > drain reads/writes> > > Above are pretty architectural attributes if just looking at functional > > purpose. Then if we really consider defining a common structure, it > > might be more natural to define a superset of all vendors' capabilities > > and remove the opaque field at all. But as said earlier the purpose of > > doing such abstraction is not clear if there is no vendor-agnostic > > user actually digesting those fields. Then should we reconsider the > > full opaque approach? > > > > Welcome comments since I may overlook something here. :-) > > I guess on x86 the invalidation packet formats are stable, but for ARM I'm > reluctant to deal with vendor-specific formats at the API level, because > they tend to be volatile. If a virtual IOMMU version is different from the > physical one, then the page table format will be the same but invalidation > format will not. > > So it would be good to define common fields that have the same effects > regardless on the underlying pIOMMU. And the fields that differ between > ARM and x86 seem to only be hints. > > In addition on ARM SMMU, the guest cannot build an invalidation command > that the host could simply copy into the hardware command queue. The > pIOMMU driver needs to craft an invalidation command with a Virtual > Machine ID, that the guest is never aware of, and a separate ATS > invalidation command. It might also need to retrieve an Address Space ID > associated with the given PASID if it chose to hide it from the guest. > > So for us the invalidation structure would always be different from the > hardware one. That's why I do not have any reason the prefer an opaque > structure in the first place, and defining generic fields looks much > neater :) Then again, I don't have any strong technical objection to it. I have an objection to opaque data, it's not documented for users, can't be considered a stable ABI, introduces compatibility issues, and makes debugging difficult. vfio should have the right to and the ability to validate anything coming from the user, whether it's vendor specific or generic. Your concern about hardware changing is just as valid on VT-d. Even if we're emulating VT-d in userspace on a VT-d host, how do we know that the two are strictly compatible? It may be today, but we cannot predict the future. A fully specified ABI means that we can properly version it, and if necessary provide compatibility handlers if the hardware changes. Thanks, Alex
> From: Alex Williamson [mailto:alex.williamson@redhat.com] > Sent: Thursday, July 6, 2017 1:28 AM > > On Wed, 5 Jul 2017 13:42:03 +0100 > Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote: > > > On 05/07/17 07:45, Tian, Kevin wrote: > > >> From: Liu, Yi L > > >> Sent: Monday, July 3, 2017 6:31 PM > > >> > > >> Hi Jean, > > >> > > >> > > >>> > > >>>> 2. Define a structure in include/uapi/linux/iommu.h(newly added > header > > >> file) > > >>>> > > >>>> struct iommu_tlb_invalidate { > > >>>> __u32 scope; > > >>>> /* pasid-selective invalidation described by @pasid */ > > >>>> #define IOMMU_INVALIDATE_PASID (1 << 0) > > >>>> /* address-selevtive invalidation described by (@vaddr, @size) */ > > >>>> #define IOMMU_INVALIDATE_VADDR (1 << 1) > > > > > > For VT-d above two flags are related. There is no method of flushing > > > (@vaddr, @size) for all pasids, which doesn't make sense. address- > > > selective invalidation is valid only for a given pasid. So it's not appropriate > > > to put them in same level of scope definition at least for VT-d. > > > > For ARM SMMU the "flush all by VA" operation is valid. Although it's > > unclear at this point if we will ever allow that, it should probably stay > > in the common format, if there is one. > > > > >>>> __u32 flags; > > >>>> /* targets non-pasid mappings, @pasid is not valid */ > > >>>> #define IOMMU_INVALIDATE_NO_PASID (1 << 0) > > >>> > > >>> Although it was my proposal, I don't like this flag. In ARM SMMU, we're > > >>> using a special mode where PASID 0 is reserved and any traffic without > > >>> PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" > flag > > >>> to invalidate that special context explicitly. But this means that > > >>> invalidation packet targeted at that context will have "scope = PASID" > and > > >>> "flags = NO_PASID", which is utterly confusing. > > >>> > > >>> I now think that we should get rid of the > IOMMU_INVALIDATE_NO_PASID > > >> flag > > >>> and just use PASID 0 to invalidate this context on ARM. I don't think > > >>> other architectures would use the NO_PASID flag anyway, but might be > > >> mistaken. > > >> > > >> I may suggest to keep it so far. On VT-d, we may pass some data in > opaque, > > >> so > > >> we may work without it. But if other vendor want to issue non-PASID > tagged > > >> cache, then may encounter problem. > > > > > > I'm worried about what's the criteria which attribute should be > abstracted > > > in common structure and which can be left to opaque. It doesn't make > > > much sense to do such abstraction purely because different vendor > formats > > > have some common fields. Usually we do such abstraction because > > > vendor-agnostic code need to do some common handling before going to > > > vendor specific code. However in this case VFIO is not expected to do > anything > > > with those IOMMU specific attributes. Then the structure is directly > forwarded > > > to IOMMU driver, which simply translates the structure into vendor > specific > > > opaque data again. Then why bothering to do double translations in > Qemu > > > and IOMMU driver side?> > > > Take VT-d for example. Below is a summary of all possible selections > around > > > invalidation of 1st level structure for svm: > > > > > > Scope: All PASIDs, single PASID > > > for each PASID: > > > all mappings, or page-selective mappings (addr, size) > > > invalidation target: > > > IOTLB entries (leaf) > > > paging structure cache (non-leaf) > > > > I'm curious, can you invalidate all intermediate paging structures for a > > given PASID without invalidating the leaves? > > > > > PASID cache (pasid->cr3) > > I guess any implementations that gives the whole PASID table to userspace > > will need the PASID cache invalidation. This was missing from my proposal > > since it was from virtio-iommu. > > > > > invalidation hint: > > > whether global pages are included > > > drain reads/writes> > > > Above are pretty architectural attributes if just looking at functional > > > purpose. Then if we really consider defining a common structure, it > > > might be more natural to define a superset of all vendors' capabilities > > > and remove the opaque field at all. But as said earlier the purpose of > > > doing such abstraction is not clear if there is no vendor-agnostic > > > user actually digesting those fields. Then should we reconsider the > > > full opaque approach? > > > > > > Welcome comments since I may overlook something here. :-) > > > > I guess on x86 the invalidation packet formats are stable, but for ARM I'm > > reluctant to deal with vendor-specific formats at the API level, because > > they tend to be volatile. If a virtual IOMMU version is different from the > > physical one, then the page table format will be the same but invalidation > > format will not. > > > > So it would be good to define common fields that have the same effects > > regardless on the underlying pIOMMU. And the fields that differ between > > ARM and x86 seem to only be hints. > > > > In addition on ARM SMMU, the guest cannot build an invalidation > command > > that the host could simply copy into the hardware command queue. The > > pIOMMU driver needs to craft an invalidation command with a Virtual > > Machine ID, that the guest is never aware of, and a separate ATS > > invalidation command. It might also need to retrieve an Address Space ID > > associated with the given PASID if it chose to hide it from the guest. > > > > So for us the invalidation structure would always be different from the > > hardware one. That's why I do not have any reason the prefer an opaque > > structure in the first place, and defining generic fields looks much > > neater :) Then again, I don't have any strong technical objection to it. > > I have an objection to opaque data, it's not documented for users, > can't be considered a stable ABI, introduces compatibility issues, and So far there are three options discussed: 1) full opaque 2) partial specified + opaque 3) fully specified I take your objection as going 3), since as long as there is still opaque the same ABI compatibility issue still exist, right? Then even vendor specific capabilities will be explicitly defined in this structure, and then we may also need a query interface to know which capabilities are supported underlying. > makes debugging difficult. vfio should have the right to and the > ability to validate anything coming from the user, whether it's vendor this part I didn't quite get. Which part in such structure will be validated by VFIO? From current description, they are all IOMMU specific knowledge. > specific or generic. Your concern about hardware changing is just as > valid on VT-d. Even if we're emulating VT-d in userspace on a VT-d > host, how do we know that the two are strictly compatible? It may be It makes sense > today, but we cannot predict the future. A fully specified ABI means > that we can properly version it, and if necessary provide compatibility > handlers if the hardware changes. Thanks, > Thanks Kevin
> From: Jean-Philippe Brucker > Sent: Wednesday, July 5, 2017 8:42 PM > > On 05/07/17 07:45, Tian, Kevin wrote: > >> From: Liu, Yi L > >> Sent: Monday, July 3, 2017 6:31 PM > >> > >> Hi Jean, > >> > >> > >>> > >>>> 2. Define a structure in include/uapi/linux/iommu.h(newly added > header > >> file) > >>>> > >>>> struct iommu_tlb_invalidate { > >>>> __u32 scope; > >>>> /* pasid-selective invalidation described by @pasid */ > >>>> #define IOMMU_INVALIDATE_PASID (1 << 0) > >>>> /* address-selevtive invalidation described by (@vaddr, @size) */ > >>>> #define IOMMU_INVALIDATE_VADDR (1 << 1) > > > > For VT-d above two flags are related. There is no method of flushing > > (@vaddr, @size) for all pasids, which doesn't make sense. address- > > selective invalidation is valid only for a given pasid. So it's not appropriate > > to put them in same level of scope definition at least for VT-d. > > For ARM SMMU the "flush all by VA" operation is valid. Although it's > unclear at this point if we will ever allow that, it should probably stay > in the common format, if there is one. fine in common format. earlier I was thinking whether it should be in scope. possibly fine after another thinking. :-) > > >>>> __u32 flags; > >>>> /* targets non-pasid mappings, @pasid is not valid */ > >>>> #define IOMMU_INVALIDATE_NO_PASID (1 << 0) > >>> > >>> Although it was my proposal, I don't like this flag. In ARM SMMU, we're > >>> using a special mode where PASID 0 is reserved and any traffic without > >>> PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag > >>> to invalidate that special context explicitly. But this means that > >>> invalidation packet targeted at that context will have "scope = PASID" > and > >>> "flags = NO_PASID", which is utterly confusing. > >>> > >>> I now think that we should get rid of the > IOMMU_INVALIDATE_NO_PASID > >> flag > >>> and just use PASID 0 to invalidate this context on ARM. I don't think > >>> other architectures would use the NO_PASID flag anyway, but might be > >> mistaken. > >> > >> I may suggest to keep it so far. On VT-d, we may pass some data in > opaque, > >> so > >> we may work without it. But if other vendor want to issue non-PASID > tagged > >> cache, then may encounter problem. > > > > I'm worried about what's the criteria which attribute should be abstracted > > in common structure and which can be left to opaque. It doesn't make > > much sense to do such abstraction purely because different vendor > formats > > have some common fields. Usually we do such abstraction because > > vendor-agnostic code need to do some common handling before going to > > vendor specific code. However in this case VFIO is not expected to do > anything > > with those IOMMU specific attributes. Then the structure is directly > forwarded > > to IOMMU driver, which simply translates the structure into vendor specific > > opaque data again. Then why bothering to do double translations in Qemu > > and IOMMU driver side?> > > Take VT-d for example. Below is a summary of all possible selections > around > > invalidation of 1st level structure for svm: > > > > Scope: All PASIDs, single PASID > > for each PASID: > > all mappings, or page-selective mappings (addr, size) > > invalidation target: > > IOTLB entries (leaf) > > paging structure cache (non-leaf) > > I'm curious, can you invalidate all intermediate paging structures for a > given PASID without invalidating the leaves? I don't think so. usually IOTLB flush is the base. one can further specify whether flush should apply to non-leaves. Thanks Kevin
Hi Alex, Against to the opaque open, I'd like to propose the following definition based on the existing comments. Pls note that I've merged the pasid table binding and iommu tlb invalidation into a single IOCTL and make different flags to indicate the iommu operations. Per Kevin's comments, there may be iommu invalidation for guest IOVA tlb, so I renamed the IOCTL and data structure to be non-svm specific. Pls kindly have a review, so that we can make the opaque open closed and move forward. Surely, comments and ideas are welcomed. And for the scope and flags definition in struct iommu_tlb_invalidate, it's also welcomed to give your ideas on it. 1. Add a VFIO IOCTL for iommu operations from user-space #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24) Corresponding data structure: struct vfio_iommu_operation_info { __u32 argsz; #define VFIO_IOMMU_BIND_PASIDTBL (1 << 0) /* Bind PASID Table */ #define VFIO_IOMMU_BIND_PASID (1 << 1) /* Bind PASID from userspace driver*/ #define VFIO_IOMMU_BIND_PGTABLE (1 << 2) /* Bind guest mmu page table */ #define VFIO_IOMMU_INVAL_IOTLB (1 << 3) /* Invalidate iommu tlb */ __u32 flag; __u32 length; // length of the data[] part in byte __u8 data[]; // stores the data for iommu op indicated by flag field }; For iommu tlb invalidation from userspace, the "__u8 data[]" stores data which would be parsed by the "struct iommu_tlb_invalidate" defined below. 2. Definitions in include/uapi/linux/iommu.h(newly added header file) /* IOMMU model definition for iommu operations from userspace */ enum iommu_model { INTLE_IOMMU, ARM_SMMU, AMD_IOMMU, SPAPR_IOMMU, S390_IOMMU, }; struct iommu_tlb_invalidate { __u32 scope; /* pasid-selective invalidation described by @pasid */ #define IOMMU_INVALIDATE_PASID (1 << 0) /* address-selevtive invalidation described by (@vaddr, @size) */ #define IOMMU_INVALIDATE_VADDR (1 << 1) __u32 flags; /* targets non-pasid mappings, @pasid is not valid */ #define IOMMU_INVALIDATE_NO_PASID (1 << 0) /* indicating that the pIOMMU doesn't need to invalidate all intermediate tables cached as part of the PTE for vaddr, only the last-level entry (pte). This is a hint. */ #define IOMMU_INVALIDATE_VADDR_LEAF (1 << 1) __u32 pasid; __u64 vaddr; __u64 size; enum iommu_model model; /* Vendor may have different HW version and thus the data part of this structure differs, use sub_version to indicate such difference. */ __u322 sub_version; __u64 length; // length of the data[] part in byte __u8 data[]; }; For Intel, the data structue is: struct intel_iommu_invalidate_data { __u64 low; __u64 high; } Thanks, Yi L > -----Original Message----- > From: Alex Williamson [mailto:alex.williamson@redhat.com] > Sent: Thursday, July 6, 2017 1:28 AM > To: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> > Cc: Tian, Kevin <kevin.tian@intel.com>; Liu, Yi L <yi.l.liu@linux.intel.com>; Lan, > Tianyu <tianyu.lan@intel.com>; Liu, Yi L <yi.l.liu@intel.com>; Raj, Ashok > <ashok.raj@intel.com>; kvm@vger.kernel.org; jasowang@redhat.com; Will Deacon > <Will.Deacon@arm.com>; peterx@redhat.com; qemu-devel@nongnu.org; > iommu@lists.linux-foundation.org; Pan, Jacob jun <jacob.jun.pan@intel.com> > Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB > invalidate propagation > > On Wed, 5 Jul 2017 13:42:03 +0100 > Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote: > > > On 05/07/17 07:45, Tian, Kevin wrote: > > >> From: Liu, Yi L > > >> Sent: Monday, July 3, 2017 6:31 PM > > >> > > >> Hi Jean, > > >> > > >> > > >>> > > >>>> 2. Define a structure in include/uapi/linux/iommu.h(newly added > > >>>> header > > >> file) > > >>>> > > >>>> struct iommu_tlb_invalidate { > > >>>> __u32 scope; > > >>>> /* pasid-selective invalidation described by @pasid */ > > >>>> #define IOMMU_INVALIDATE_PASID (1 << 0) > > >>>> /* address-selevtive invalidation described by (@vaddr, @size) */ > > >>>> #define IOMMU_INVALIDATE_VADDR (1 << 1) > > > > > > For VT-d above two flags are related. There is no method of flushing > > > (@vaddr, @size) for all pasids, which doesn't make sense. address- > > > selective invalidation is valid only for a given pasid. So it's not > > > appropriate to put them in same level of scope definition at least for VT-d. > > > > For ARM SMMU the "flush all by VA" operation is valid. Although it's > > unclear at this point if we will ever allow that, it should probably > > stay in the common format, if there is one. > > > > >>>> __u32 flags; > > >>>> /* targets non-pasid mappings, @pasid is not valid */ > > >>>> #define IOMMU_INVALIDATE_NO_PASID (1 << 0) > > >>> > > >>> Although it was my proposal, I don't like this flag. In ARM SMMU, > > >>> we're using a special mode where PASID 0 is reserved and any > > >>> traffic without PASID uses entry 0 of the PASID table. So I > > >>> proposed the "NO_PASID" flag to invalidate that special context > > >>> explicitly. But this means that invalidation packet targeted at > > >>> that context will have "scope = PASID" and "flags = NO_PASID", which is utterly > confusing. > > >>> > > >>> I now think that we should get rid of the > > >>> IOMMU_INVALIDATE_NO_PASID > > >> flag > > >>> and just use PASID 0 to invalidate this context on ARM. I don't > > >>> think other architectures would use the NO_PASID flag anyway, but > > >>> might be > > >> mistaken. > > >> > > >> I may suggest to keep it so far. On VT-d, we may pass some data in > > >> opaque, so we may work without it. But if other vendor want to > > >> issue non-PASID tagged cache, then may encounter problem. > > > > > > I'm worried about what's the criteria which attribute should be > > > abstracted in common structure and which can be left to opaque. It > > > doesn't make much sense to do such abstraction purely because > > > different vendor formats have some common fields. Usually we do such > > > abstraction because vendor-agnostic code need to do some common > > > handling before going to vendor specific code. However in this case > > > VFIO is not expected to do anything with those IOMMU specific > > > attributes. Then the structure is directly forwarded to IOMMU > > > driver, which simply translates the structure into vendor specific > > > opaque data again. Then why bothering to do double translations in > > > Qemu and IOMMU driver side?> Take VT-d for example. Below is a > > > summary of all possible selections around invalidation of 1st level structure for > svm: > > > > > > Scope: All PASIDs, single PASID > > > for each PASID: > > > all mappings, or page-selective mappings (addr, size) invalidation > > > target: > > > IOTLB entries (leaf) > > > paging structure cache (non-leaf) > > > > I'm curious, can you invalidate all intermediate paging structures for > > a given PASID without invalidating the leaves? > > > > > PASID cache (pasid->cr3) > > I guess any implementations that gives the whole PASID table to > > userspace will need the PASID cache invalidation. This was missing > > from my proposal since it was from virtio-iommu. > > > > > invalidation hint: > > > whether global pages are included > > > drain reads/writes> > > > Above are pretty architectural attributes if just looking at > > > functional purpose. Then if we really consider defining a common > > > structure, it might be more natural to define a superset of all > > > vendors' capabilities and remove the opaque field at all. But as > > > said earlier the purpose of doing such abstraction is not clear if > > > there is no vendor-agnostic user actually digesting those fields. > > > Then should we reconsider the full opaque approach? > > > > > > Welcome comments since I may overlook something here. :-) > > > > I guess on x86 the invalidation packet formats are stable, but for ARM > > I'm reluctant to deal with vendor-specific formats at the API level, > > because they tend to be volatile. If a virtual IOMMU version is > > different from the physical one, then the page table format will be > > the same but invalidation format will not. > > > > So it would be good to define common fields that have the same effects > > regardless on the underlying pIOMMU. And the fields that differ > > between ARM and x86 seem to only be hints. > > > > In addition on ARM SMMU, the guest cannot build an invalidation > > command that the host could simply copy into the hardware command > > queue. The pIOMMU driver needs to craft an invalidation command with a > > Virtual Machine ID, that the guest is never aware of, and a separate > > ATS invalidation command. It might also need to retrieve an Address > > Space ID associated with the given PASID if it chose to hide it from the guest. > > > > So for us the invalidation structure would always be different from > > the hardware one. That's why I do not have any reason the prefer an > > opaque structure in the first place, and defining generic fields looks > > much neater :) Then again, I don't have any strong technical objection to it. > > I have an objection to opaque data, it's not documented for users, can't be > considered a stable ABI, introduces compatibility issues, and makes debugging > difficult. vfio should have the right to and the ability to validate anything coming > from the user, whether it's vendor specific or generic. Your concern about hardware > changing is just as valid on VT-d. Even if we're emulating VT-d in userspace on a VT- > d host, how do we know that the two are strictly compatible? It may be today, but > we cannot predict the future. A fully specified ABI means that we can properly > version it, and if necessary provide compatibility handlers if the hardware changes. > Thanks, > > Alex
On Fri, 14 Jul 2017 08:58:02 +0000 "Liu, Yi L" <yi.l.liu@intel.com> wrote: > Hi Alex, > > Against to the opaque open, I'd like to propose the following definition > based on the existing comments. Pls note that I've merged the pasid > table binding and iommu tlb invalidation into a single IOCTL and make > different flags to indicate the iommu operations. Per Kevin's comments, > there may be iommu invalidation for guest IOVA tlb, so I renamed the > IOCTL and data structure to be non-svm specific. Pls kindly have a review, > so that we can make the opaque open closed and move forward. Surely, > comments and ideas are welcomed. And for the scope and flags definition > in struct iommu_tlb_invalidate, it's also welcomed to give your ideas on it. > > 1. Add a VFIO IOCTL for iommu operations from user-space > > #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24) > > Corresponding data structure: > struct vfio_iommu_operation_info { > __u32 argsz; > #define VFIO_IOMMU_BIND_PASIDTBL (1 << 0) /* Bind PASID Table */ > #define VFIO_IOMMU_BIND_PASID (1 << 1) /* Bind PASID from userspace driver*/ > #define VFIO_IOMMU_BIND_PGTABLE (1 << 2) /* Bind guest mmu page table */ > #define VFIO_IOMMU_INVAL_IOTLB (1 << 3) /* Invalidate iommu tlb */ > __u32 flag; > __u32 length; // length of the data[] part in byte > __u8 data[]; // stores the data for iommu op indicated by flag field > }; If we're doing a generic "Ops" ioctl, then we should have an "op" field which is defined by an enum. It doesn't make sense to use flags for this, for example can we set multiple flag bits? If not then it's not a good use for a bit field. I'm also not sure I understand the value of the "length" field, can't it always be calculated from argsz? > For iommu tlb invalidation from userspace, the "__u8 data[]" stores > data which would be parsed by the "struct iommu_tlb_invalidate" defined > below. > > 2. Definitions in include/uapi/linux/iommu.h(newly added header file) > > /* IOMMU model definition for iommu operations from userspace */ > enum iommu_model { > INTLE_IOMMU, > ARM_SMMU, > AMD_IOMMU, > SPAPR_IOMMU, > S390_IOMMU, > }; > > struct iommu_tlb_invalidate { > __u32 scope; > /* pasid-selective invalidation described by @pasid */ > #define IOMMU_INVALIDATE_PASID (1 << 0) > /* address-selevtive invalidation described by (@vaddr, @size) */ > #define IOMMU_INVALIDATE_VADDR (1 << 1) Again, is a bit field appropriate here, can a user set both bits? > __u32 flags; > /* targets non-pasid mappings, @pasid is not valid */ > #define IOMMU_INVALIDATE_NO_PASID (1 << 0) > /* indicating that the pIOMMU doesn't need to invalidate > all intermediate tables cached as part of the PTE for > vaddr, only the last-level entry (pte). This is a hint. */ > #define IOMMU_INVALIDATE_VADDR_LEAF (1 << 1) Are we venturing into vendor specific attributes here? > __u32 pasid; > __u64 vaddr; > __u64 size; > enum iommu_model model; How does a user learn which model(s) are supported by the interface? How do they learn which ops are supported? Perhaps a good use for one of those flag bits in the outer data structure is "probe". > /* > Vendor may have different HW version and thus the > data part of this structure differs, use sub_version > to indicate such difference. > */ > __u322 sub_version; Not sure I see the value of this vs creating an INTEL_IOMMUv2 entry in the model enum. > __u64 length; // length of the data[] part in byte Questionably useful vs calculating from argsz again , but it certainly doesn't need to be a qword :-o > __u8 data[]; > }; > > For Intel, the data structue is: > struct intel_iommu_invalidate_data { > __u64 low; > __u64 high; > } high/low what? This is a pretty weak uapi definition. Thanks, Alex
Hi Alex, Pls refer to the response inline. > -----Original Message----- > From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf > Of Alex Williamson > Sent: Saturday, July 15, 2017 2:16 AM > To: Liu, Yi L <yi.l.liu@intel.com> > Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>; Tian, Kevin > <kevin.tian@intel.com>; Liu, Yi L <yi.l.liu@linux.intel.com>; Lan, Tianyu > <tianyu.lan@intel.com>; Raj, Ashok <ashok.raj@intel.com>; kvm@vger.kernel.org; > jasowang@redhat.com; Will Deacon <Will.Deacon@arm.com>; peterx@redhat.com; > qemu-devel@nongnu.org; iommu@lists.linux-foundation.org; Pan, Jacob jun > <jacob.jun.pan@intel.com>; Joerg Roedel <joro@8bytes.org> > Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB > invalidate propagation > > On Fri, 14 Jul 2017 08:58:02 +0000 > "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > > Hi Alex, > > > > Against to the opaque open, I'd like to propose the following > > definition based on the existing comments. Pls note that I've merged > > the pasid table binding and iommu tlb invalidation into a single IOCTL > > and make different flags to indicate the iommu operations. Per Kevin's > > comments, there may be iommu invalidation for guest IOVA tlb, so I > > renamed the IOCTL and data structure to be non-svm specific. Pls > > kindly have a review, so that we can make the opaque open closed and > > move forward. Surely, comments and ideas are welcomed. And for the > > scope and flags definition in struct iommu_tlb_invalidate, it's also welcomed to > give your ideas on it. > > > > 1. Add a VFIO IOCTL for iommu operations from user-space > > > > #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24) > > > > Corresponding data structure: > > struct vfio_iommu_operation_info { > > __u32 argsz; > > #define VFIO_IOMMU_BIND_PASIDTBL (1 << 0) /* Bind PASID Table */ > > #define VFIO_IOMMU_BIND_PASID (1 << 1) /* Bind PASID from userspace > driver*/ > > #define VFIO_IOMMU_BIND_PGTABLE (1 << 2) /* Bind guest mmu page table */ > > #define VFIO_IOMMU_INVAL_IOTLB (1 << 3) /* Invalidate iommu tlb */ > > __u32 flag; > > __u32 length; // length of the data[] part in byte > > __u8 data[]; // stores the data for iommu op indicated by flag field > > }; > > If we're doing a generic "Ops" ioctl, then we should have an "op" field which is > defined by an enum. It doesn't make sense to use flags for this, for example can we > set multiple flag bits? If not then it's not a good use for a bit field. I'm also not sure I > understand the value of the "length" field, can't it always be calculated from argsz? Agreed, enum would be better. "length" field could be calculated from argsz. I used it just to avoid offset calculations. May remove it. > > For iommu tlb invalidation from userspace, the "__u8 data[]" stores > > data which would be parsed by the "struct iommu_tlb_invalidate" > > defined below. > > > > 2. Definitions in include/uapi/linux/iommu.h(newly added header file) > > > > /* IOMMU model definition for iommu operations from userspace */ enum > > iommu_model { > > INTLE_IOMMU, > > ARM_SMMU, > > AMD_IOMMU, > > SPAPR_IOMMU, > > S390_IOMMU, > > }; > > > > struct iommu_tlb_invalidate { > > __u32 scope; > > /* pasid-selective invalidation described by @pasid */ > > #define IOMMU_INVALIDATE_PASID (1 << 0) > > /* address-selevtive invalidation described by (@vaddr, @size) */ > > #define IOMMU_INVALIDATE_VADDR (1 << 1) > > Again, is a bit field appropriate here, can a user set both bits? yes, user may set both bits. It would be invalidate address range which is tagged with a PASID value. > > > __u32 flags; > > /* targets non-pasid mappings, @pasid is not valid */ > > #define IOMMU_INVALIDATE_NO_PASID (1 << 0) > > /* indicating that the pIOMMU doesn't need to invalidate > > all intermediate tables cached as part of the PTE for > > vaddr, only the last-level entry (pte). This is a hint. */ > > #define IOMMU_INVALIDATE_VADDR_LEAF (1 << 1) > > Are we venturing into vendor specific attributes here? These two attributes are still in discussion. Jean and me synced several rounds. But lack of comments from other vendors. Personally, I think both should be generic. IOMMU_INVALIDATE_NO_PASID is to indicate no PASID used for the invalidation. IOMMU_INVALIDATE_VADDR_LEAF indicates only invalidate leaf mappings. I would see if other vendor is object on it. If yes, I'm fine to move it to vendor specific part. > > > __u32 pasid; > > __u64 vaddr; > > __u64 size; > > enum iommu_model model; > > How does a user learn which model(s) are supported by the interface? > How do they learn which ops are supported? Perhaps a good use for one of those > flag bits in the outer data structure is "probe". My initial plan to user fills it, if the underlying HW doesn't support the model, it refuses to service it. User should get a failure and stop to use it. But your suggestion to have a probe or kinds of query makes sense. How about we add one more operation for such purpose? Besides the supported model query, I'd like to add more. E.g the HW IOMMU capabilities. > > > /* > > Vendor may have different HW version and thus the > > data part of this structure differs, use sub_version > > to indicate such difference. > > */ > > __u322 sub_version; > > Not sure I see the value of this vs creating an INTEL_IOMMUv2 entry in the model > enum. Both are fine to me. Just see the opinions from other guys. > > __u64 length; // length of the data[] part in byte > > Questionably useful vs calculating from argsz again , but it certainly doesn't need to > be a qword :-o Thx for the remind. 32bits would be enough. It is surely to get it from argsz. However, I would like to leave it here. Reason is: argsz is in vfio layer, the "length" here is actually used in vendor-specific iommu driver layer. So would require vfio to pass argsz or the size of " struct iommu_tlb_invalidate" to vendor-specific iommu driver layer by means of parameter or so. Personally, I prefer to pass it in the structure. If it's better to pass it as a parameter, I would do it. > > > __u8 data[]; > > }; > > > > For Intel, the data structue is: > > struct intel_iommu_invalidate_data { > > __u64 low; > > __u64 high; > > } > > high/low what? This is a pretty weak uapi definition. Thanks, For this part, for Intel platform, we plan to pass a 128 bit data for the invalidation. The structure varies from invalidation type to type. Here is my thought on it. Define an 128 bits union. List the invalidation data details for each invalidation type. What's your opinion on it? So far, we have 7 types for invalidation. The prq response is not included. union intel_iommu_invalidate_data { struct { __u64 low; __u64 high; } invalidate_data; struct { __u64 type: 4; __u64 gran: 2; __u64 rsv1: 10; __u64 did: 16; __u64 sid: 16; __u64 func_mask: 2; __u64 rsv2: 14; __64 rsv3: 64; } context_cache_inv; .... }; Thanks, Yi L
On Mon, 17 Jul 2017 10:58:41 +0000 "Liu, Yi L" <yi.l.liu@intel.com> wrote: > Hi Alex, > > Pls refer to the response inline. > > > -----Original Message----- > > From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf > > Of Alex Williamson > > Sent: Saturday, July 15, 2017 2:16 AM > > To: Liu, Yi L <yi.l.liu@intel.com> > > Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>; Tian, Kevin > > <kevin.tian@intel.com>; Liu, Yi L <yi.l.liu@linux.intel.com>; Lan, Tianyu > > <tianyu.lan@intel.com>; Raj, Ashok <ashok.raj@intel.com>; kvm@vger.kernel.org; > > jasowang@redhat.com; Will Deacon <Will.Deacon@arm.com>; peterx@redhat.com; > > qemu-devel@nongnu.org; iommu@lists.linux-foundation.org; Pan, Jacob jun > > <jacob.jun.pan@intel.com>; Joerg Roedel <joro@8bytes.org> > > Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB > > invalidate propagation > > > > On Fri, 14 Jul 2017 08:58:02 +0000 > > "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > > > > Hi Alex, > > > > > > Against to the opaque open, I'd like to propose the following > > > definition based on the existing comments. Pls note that I've merged > > > the pasid table binding and iommu tlb invalidation into a single IOCTL > > > and make different flags to indicate the iommu operations. Per Kevin's > > > comments, there may be iommu invalidation for guest IOVA tlb, so I > > > renamed the IOCTL and data structure to be non-svm specific. Pls > > > kindly have a review, so that we can make the opaque open closed and > > > move forward. Surely, comments and ideas are welcomed. And for the > > > scope and flags definition in struct iommu_tlb_invalidate, it's also welcomed to > > give your ideas on it. > > > > > > 1. Add a VFIO IOCTL for iommu operations from user-space > > > > > > #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24) > > > > > > Corresponding data structure: > > > struct vfio_iommu_operation_info { > > > __u32 argsz; > > > #define VFIO_IOMMU_BIND_PASIDTBL (1 << 0) /* Bind PASID Table */ > > > #define VFIO_IOMMU_BIND_PASID (1 << 1) /* Bind PASID from userspace > > driver*/ > > > #define VFIO_IOMMU_BIND_PGTABLE (1 << 2) /* Bind guest mmu page table */ > > > #define VFIO_IOMMU_INVAL_IOTLB (1 << 3) /* Invalidate iommu tlb */ > > > __u32 flag; > > > __u32 length; // length of the data[] part in byte > > > __u8 data[]; // stores the data for iommu op indicated by flag field > > > }; > > > > If we're doing a generic "Ops" ioctl, then we should have an "op" field which is > > defined by an enum. It doesn't make sense to use flags for this, for example can we > > set multiple flag bits? If not then it's not a good use for a bit field. I'm also not sure I > > understand the value of the "length" field, can't it always be calculated from argsz? > > Agreed, enum would be better. "length" field could be calculated from argsz. I used > it just to avoid offset calculations. May remove it. > > > > For iommu tlb invalidation from userspace, the "__u8 data[]" stores > > > data which would be parsed by the "struct iommu_tlb_invalidate" > > > defined below. > > > > > > 2. Definitions in include/uapi/linux/iommu.h(newly added header file) > > > > > > /* IOMMU model definition for iommu operations from userspace */ enum > > > iommu_model { > > > INTLE_IOMMU, > > > ARM_SMMU, > > > AMD_IOMMU, > > > SPAPR_IOMMU, > > > S390_IOMMU, > > > }; > > > > > > struct iommu_tlb_invalidate { > > > __u32 scope; > > > /* pasid-selective invalidation described by @pasid */ > > > #define IOMMU_INVALIDATE_PASID (1 << 0) > > > /* address-selevtive invalidation described by (@vaddr, @size) */ > > > #define IOMMU_INVALIDATE_VADDR (1 << 1) > > > > Again, is a bit field appropriate here, can a user set both bits? > > yes, user may set both bits. It would be invalidate address range > which is tagged with a PASID value. > > > > > > __u32 flags; > > > /* targets non-pasid mappings, @pasid is not valid */ > > > #define IOMMU_INVALIDATE_NO_PASID (1 << 0) > > > /* indicating that the pIOMMU doesn't need to invalidate > > > all intermediate tables cached as part of the PTE for > > > vaddr, only the last-level entry (pte). This is a hint. */ > > > #define IOMMU_INVALIDATE_VADDR_LEAF (1 << 1) > > > > Are we venturing into vendor specific attributes here? > > These two attributes are still in discussion. Jean and me synced > several rounds. But lack of comments from other vendors. > > Personally, I think both should be generic. > IOMMU_INVALIDATE_NO_PASID is to indicate no PASID used > for the invalidation. IOMMU_INVALIDATE_VADDR_LEAF indicates > only invalidate leaf mappings. > I would see if other vendor is object on it. If yes, I'm fine to move > it to vendor specific part. > > > > > > __u32 pasid; > > > __u64 vaddr; > > > __u64 size; > > > enum iommu_model model; > > > > How does a user learn which model(s) are supported by the interface? > > How do they learn which ops are supported? Perhaps a good use for one of those > > flag bits in the outer data structure is "probe". > > My initial plan to user fills it, if the underlying HW doesn't support the > model, it refuses to service it. User should get a failure and stop to use > it. But your suggestion to have a probe or kinds of query makes sense. > How about we add one more operation for such purpose? Besides the > supported model query, I'd like to add more. E.g the HW IOMMU capabilities. We also have VFIO_IOMMU_GET_INFO where the structure can be extended for missing capabilities. Depending on the capability you want to describe, this might be a better, existing interface for it. > > > /* > > > Vendor may have different HW version and thus the > > > data part of this structure differs, use sub_version > > > to indicate such difference. > > > */ > > > __u322 sub_version; > > > > Not sure I see the value of this vs creating an INTEL_IOMMUv2 entry in the model > > enum. > > Both are fine to me. Just see the opinions from other guys. > > > > __u64 length; // length of the data[] part in byte > > > > Questionably useful vs calculating from argsz again , but it certainly doesn't need to > > be a qword :-o > > Thx for the remind. 32bits would be enough. It is surely to get it from argsz. However, > I would like to leave it here. Reason is: > argsz is in vfio layer, the "length" here is actually used in vendor-specific iommu driver > layer. So would require vfio to pass argsz or the size of " struct iommu_tlb_invalidate" > to vendor-specific iommu driver layer by means of parameter or so. Personally, I prefer > to pass it in the structure. If it's better to pass it as a parameter, I would do it. Ok, then the layer that does the copy_from_user will need to validate that length is fully contained within the copied data structure, we can't let the user trick the kernel into using kernel memory for this. > > > __u8 data[]; > > > }; > > > > > > For Intel, the data structue is: > > > struct intel_iommu_invalidate_data { > > > __u64 low; > > > __u64 high; > > > } > > > > high/low what? This is a pretty weak uapi definition. Thanks, > > For this part, for Intel platform, we plan to pass a 128 bit data for the invalidation. > The structure varies from invalidation type to type. Here is my thought on it. Define > an 128 bits union. List the invalidation data details for each invalidation type. What's > your opinion on it? So far, we have 7 types for invalidation. The prq response is not > included. I want this interface to be fully defined, but at the same time I don't necessarily want to create useless data structures. I believe the intention here is to pass these directly through to a QI entry, where we must match a hardware definition. I'm tempted to suggest referencing the hardware specification, but see below... A concern for this model is that hardware may trust the iommu driver not to create QI entries that don't set reserved bits or set invalid field data. If it does those kinds of things, it's a kernel driver bug. Once exposed to the user, we cannot guarantee that. Does Intel have confidence that a user cannot maliciously interfere with other contexts or the general operation of the invalidation queue if a user is effectively given direct access? Will the invalidation data be sanitized by the iommu driver? > union intel_iommu_invalidate_data { > struct { > __u64 low; > __u64 high; > } invalidate_data; > > struct { > __u64 type: 4; > __u64 gran: 2; > __u64 rsv1: 10; > __u64 did: 16; > __u64 sid: 16; > __u64 func_mask: 2; > __u64 rsv2: 14; > __64 rsv3: 64; > } context_cache_inv; > .... Here's part of the issue with not fully defining these, we have did, sid, and func_mask. I think we're claiming that the benefit of passing through the hardware data structure is performance, but the user needs to replace these IDs to match the physical device rather than the virtual device, perhaps even entirely recreating it because there's not necessarily a 1:1 mapping of things like func_mask between virtual and physical hardware topologies (assuming I'm interpreting these fields correctly). Doesn't the kernel also need to validate any such field to prevent the user spoofing entries for other devices? Is there any actual performance benefit remaining vs defining a generic interface after multiple levels have manipulated, recreated, and sanitized these structures? We can't evaluate these sorts of risks if we don't define what we're passing through. Thanks, Alex
On 17/07/17 23:45, Alex Williamson wrote: [..] >>> >>> How does a user learn which model(s) are supported by the interface? >>> How do they learn which ops are supported? Perhaps a good use for one of those >>> flag bits in the outer data structure is "probe". >> >> My initial plan to user fills it, if the underlying HW doesn't support the >> model, it refuses to service it. User should get a failure and stop to use >> it. But your suggestion to have a probe or kinds of query makes sense. >> How about we add one more operation for such purpose? Besides the >> supported model query, I'd like to add more. E.g the HW IOMMU capabilities. > > We also have VFIO_IOMMU_GET_INFO where the structure can be extended > for missing capabilities. Depending on the capability you want to > describe, this might be a better, existing interface for it. It might become hairy when physical IOMMUs start supporting multiple formats, or when we want to describe multiple page table formats in addition to PASID tables. I was wondering if sysfs iommu_group would be better suited for this kind of hardware description with variable-length properties, but a new ioctl-based probing mechanism would work as well. Other things that we'll want to describe are fault reporting capability and PASID range, which would fit better in vfio_device_info. Thanks, Jean
On Tue, 18 Jul 2017 10:38:40 +0100 Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote: > On 17/07/17 23:45, Alex Williamson wrote: > [..] > >>> > >>> How does a user learn which model(s) are supported by the interface? > >>> How do they learn which ops are supported? Perhaps a good use for one of those > >>> flag bits in the outer data structure is "probe". > >> > >> My initial plan to user fills it, if the underlying HW doesn't support the > >> model, it refuses to service it. User should get a failure and stop to use > >> it. But your suggestion to have a probe or kinds of query makes sense. > >> How about we add one more operation for such purpose? Besides the > >> supported model query, I'd like to add more. E.g the HW IOMMU capabilities. > > > > We also have VFIO_IOMMU_GET_INFO where the structure can be extended > > for missing capabilities. Depending on the capability you want to > > describe, this might be a better, existing interface for it. > > It might become hairy when physical IOMMUs start supporting multiple > formats, or when we want to describe multiple page table formats in > addition to PASID tables. I was wondering if sysfs iommu_group would be > better suited for this kind of hardware description with variable-length > properties, but a new ioctl-based probing mechanism would work as well. Would different groups have different properties? Perhaps it's related to the iommu hardware unit supporting the device, which could host one or more groups. Each device already has a link to its iommu where we could add info (/sys/class/iommu/). > Other things that we'll want to describe are fault reporting capability > and PASID range, which would fit better in vfio_device_info. Why? The per device PASID info is in a PCI capability, so wouldn't this be iommu info? Isn't the fault reporting also via the iommu? Thanks, Alex
On 18/07/17 15:29, Alex Williamson wrote: > On Tue, 18 Jul 2017 10:38:40 +0100 > Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote: > >> On 17/07/17 23:45, Alex Williamson wrote: >> [..] >>>>> >>>>> How does a user learn which model(s) are supported by the interface? >>>>> How do they learn which ops are supported? Perhaps a good use for one of those >>>>> flag bits in the outer data structure is "probe". >>>> >>>> My initial plan to user fills it, if the underlying HW doesn't support the >>>> model, it refuses to service it. User should get a failure and stop to use >>>> it. But your suggestion to have a probe or kinds of query makes sense. >>>> How about we add one more operation for such purpose? Besides the >>>> supported model query, I'd like to add more. E.g the HW IOMMU capabilities. >>> >>> We also have VFIO_IOMMU_GET_INFO where the structure can be extended >>> for missing capabilities. Depending on the capability you want to >>> describe, this might be a better, existing interface for it. >> >> It might become hairy when physical IOMMUs start supporting multiple >> formats, or when we want to describe multiple page table formats in >> addition to PASID tables. I was wondering if sysfs iommu_group would be >> better suited for this kind of hardware description with variable-length >> properties, but a new ioctl-based probing mechanism would work as well. > > Would different groups have different properties? Perhaps it's related > to the iommu hardware unit supporting the device, which could host one > or more groups. Each device already has a link to its iommu where we > could add info (/sys/class/iommu/). Yes, /sys/class/iommu might be better than iommu_group for PASID and page table formats. >> Other things that we'll want to describe are fault reporting capability >> and PASID range, which would fit better in vfio_device_info. > > Why? The per device PASID info is in a PCI capability, so wouldn't > this be iommu info? Isn't the fault reporting also via the iommu? Ah yes, I missed that. If userspace can read the PASID and PRI capabilities it should be enough. Inspecting individual device capability might help userspace decide how to combine multiple devices in a container. For example, if it puts PASID-capable and non-PASID-capable device in the same container, the container probably wouldn't support PASID (but would still support MAP/UNMAP). I'm not sure how it will work for platform devices though, some integrated devices on ARM will support features resembling PASID and PRI. I suspect these will need ACPI/DT description anyway, that userspace would access via sysfs. Thanks, Jean
On Mon, Jul 17, 2017 at 04:45:15PM -0600, Alex Williamson wrote: > On Mon, 17 Jul 2017 10:58:41 +0000 > "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > > Hi Alex, > > > > Pls refer to the response inline. > > > > > -----Original Message----- > > > From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf > > > Of Alex Williamson > > > Sent: Saturday, July 15, 2017 2:16 AM > > > To: Liu, Yi L <yi.l.liu@intel.com> > > > Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>; Tian, Kevin > > > <kevin.tian@intel.com>; Liu, Yi L <yi.l.liu@linux.intel.com>; Lan, Tianyu > > > <tianyu.lan@intel.com>; Raj, Ashok <ashok.raj@intel.com>; kvm@vger.kernel.org; > > > jasowang@redhat.com; Will Deacon <Will.Deacon@arm.com>; peterx@redhat.com; > > > qemu-devel@nongnu.org; iommu@lists.linux-foundation.org; Pan, Jacob jun > > > <jacob.jun.pan@intel.com>; Joerg Roedel <joro@8bytes.org> > > > Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB > > > invalidate propagation > > > > > > On Fri, 14 Jul 2017 08:58:02 +0000 > > > "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > > > > > > Hi Alex, > > > > > > > > Against to the opaque open, I'd like to propose the following > > > > definition based on the existing comments. Pls note that I've merged > > > > the pasid table binding and iommu tlb invalidation into a single IOCTL > > > > and make different flags to indicate the iommu operations. Per Kevin's > > > > comments, there may be iommu invalidation for guest IOVA tlb, so I > > > > renamed the IOCTL and data structure to be non-svm specific. Pls > > > > kindly have a review, so that we can make the opaque open closed and > > > > move forward. Surely, comments and ideas are welcomed. And for the > > > > scope and flags definition in struct iommu_tlb_invalidate, it's also welcomed to > > > give your ideas on it. > > > > > > > > 1. Add a VFIO IOCTL for iommu operations from user-space > > > > > > > > #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24) > > > > > > > > Corresponding data structure: > > > > struct vfio_iommu_operation_info { > > > > __u32 argsz; > > > > #define VFIO_IOMMU_BIND_PASIDTBL (1 << 0) /* Bind PASID Table */ > > > > #define VFIO_IOMMU_BIND_PASID (1 << 1) /* Bind PASID from userspace > > > driver*/ > > > > #define VFIO_IOMMU_BIND_PGTABLE (1 << 2) /* Bind guest mmu page table */ > > > > #define VFIO_IOMMU_INVAL_IOTLB (1 << 3) /* Invalidate iommu tlb */ > > > > __u32 flag; > > > > __u32 length; // length of the data[] part in byte > > > > __u8 data[]; // stores the data for iommu op indicated by flag field > > > > }; > > > > > > If we're doing a generic "Ops" ioctl, then we should have an "op" field which is > > > defined by an enum. It doesn't make sense to use flags for this, for example can we > > > set multiple flag bits? If not then it's not a good use for a bit field. I'm also not sure I > > > understand the value of the "length" field, can't it always be calculated from argsz? > > > > Agreed, enum would be better. "length" field could be calculated from argsz. I used > > it just to avoid offset calculations. May remove it. > > > > > > For iommu tlb invalidation from userspace, the "__u8 data[]" stores > > > > data which would be parsed by the "struct iommu_tlb_invalidate" > > > > defined below. > > > > > > > > 2. Definitions in include/uapi/linux/iommu.h(newly added header file) > > > > > > > > /* IOMMU model definition for iommu operations from userspace */ enum > > > > iommu_model { > > > > INTLE_IOMMU, > > > > ARM_SMMU, > > > > AMD_IOMMU, > > > > SPAPR_IOMMU, > > > > S390_IOMMU, > > > > }; > > > > > > > > struct iommu_tlb_invalidate { > > > > __u32 scope; > > > > /* pasid-selective invalidation described by @pasid */ > > > > #define IOMMU_INVALIDATE_PASID (1 << 0) > > > > /* address-selevtive invalidation described by (@vaddr, @size) */ > > > > #define IOMMU_INVALIDATE_VADDR (1 << 1) > > > > > > Again, is a bit field appropriate here, can a user set both bits? > > > > yes, user may set both bits. It would be invalidate address range > > which is tagged with a PASID value. > > > > > > > > > __u32 flags; > > > > /* targets non-pasid mappings, @pasid is not valid */ > > > > #define IOMMU_INVALIDATE_NO_PASID (1 << 0) > > > > /* indicating that the pIOMMU doesn't need to invalidate > > > > all intermediate tables cached as part of the PTE for > > > > vaddr, only the last-level entry (pte). This is a hint. */ > > > > #define IOMMU_INVALIDATE_VADDR_LEAF (1 << 1) > > > > > > Are we venturing into vendor specific attributes here? > > > > These two attributes are still in discussion. Jean and me synced > > several rounds. But lack of comments from other vendors. > > > > Personally, I think both should be generic. > > IOMMU_INVALIDATE_NO_PASID is to indicate no PASID used > > for the invalidation. IOMMU_INVALIDATE_VADDR_LEAF indicates > > only invalidate leaf mappings. > > I would see if other vendor is object on it. If yes, I'm fine to move > > it to vendor specific part. > > > > > > > > > __u32 pasid; > > > > __u64 vaddr; > > > > __u64 size; > > > > enum iommu_model model; > > > > > > How does a user learn which model(s) are supported by the interface? > > > How do they learn which ops are supported? Perhaps a good use for one of those > > > flag bits in the outer data structure is "probe". > > > > My initial plan to user fills it, if the underlying HW doesn't support the > > model, it refuses to service it. User should get a failure and stop to use > > it. But your suggestion to have a probe or kinds of query makes sense. > > How about we add one more operation for such purpose? Besides the > > supported model query, I'd like to add more. E.g the HW IOMMU capabilities. > > We also have VFIO_IOMMU_GET_INFO where the structure can be extended > for missing capabilities. Depending on the capability you want to > describe, this might be a better, existing interface for it. > > > > > /* > > > > Vendor may have different HW version and thus the > > > > data part of this structure differs, use sub_version > > > > to indicate such difference. > > > > */ > > > > __u322 sub_version; > > > > > > Not sure I see the value of this vs creating an INTEL_IOMMUv2 entry in the model > > > enum. > > > > Both are fine to me. Just see the opinions from other guys. > > > > > > __u64 length; // length of the data[] part in byte > > > > > > Questionably useful vs calculating from argsz again , but it certainly doesn't need to > > > be a qword :-o > > > > Thx for the remind. 32bits would be enough. It is surely to get it from argsz. However, > > I would like to leave it here. Reason is: > > argsz is in vfio layer, the "length" here is actually used in vendor-specific iommu driver > > layer. So would require vfio to pass argsz or the size of " struct iommu_tlb_invalidate" > > to vendor-specific iommu driver layer by means of parameter or so. Personally, I prefer > > to pass it in the structure. If it's better to pass it as a parameter, I would do it. > > Ok, then the layer that does the copy_from_user will need to validate > that length is fully contained within the copied data structure, we > can't let the user trick the kernel into using kernel memory for this. VFIO is still the layer which copy_from_user. would check the length. > > > > > __u8 data[]; > > > > }; > > > > > > > > For Intel, the data structue is: > > > > struct intel_iommu_invalidate_data { > > > > __u64 low; > > > > __u64 high; > > > > } > > > > > > high/low what? This is a pretty weak uapi definition. Thanks, > > > > For this part, for Intel platform, we plan to pass a 128 bit data for the invalidation. > > The structure varies from invalidation type to type. Here is my thought on it. Define > > an 128 bits union. List the invalidation data details for each invalidation type. What's > > your opinion on it? So far, we have 7 types for invalidation. The prq response is not > > included. > > I want this interface to be fully defined, but at the same time I don't > necessarily want to create useless data structures. I believe the > intention here is to pass these directly through to a QI entry, where yes, it's a QI entry from guest. > we must match a hardware definition. I'm tempted to suggest > referencing the hardware specification, but see below... > > A concern for this model is that hardware may trust the iommu driver > not to create QI entries that don't set reserved bits or set invalid > field data. If it does those kinds of things, it's a kernel driver > bug. Once exposed to the user, we cannot guarantee that. Does Intel > have confidence that a user cannot maliciously interfere with other > contexts or the general operation of the invalidation queue if a user is > effectively given direct access? Will the invalidation data be > sanitized by the iommu driver? > > > union intel_iommu_invalidate_data { > > struct { > > __u64 low; > > __u64 high; > > } invalidate_data; > > > > struct { > > __u64 type: 4; > > __u64 gran: 2; > > __u64 rsv1: 10; > > __u64 did: 16; > > __u64 sid: 16; > > __u64 func_mask: 2; > > __u64 rsv2: 14; > > __64 rsv3: 64; > > } context_cache_inv; > > .... > > Here's part of the issue with not fully defining these, we have did, > sid, and func_mask. I think we're claiming that the benefit of passing > through the hardware data structure is performance, but the user needs > to replace these IDs to match the physical device rather than the > virtual device, perhaps even entirely recreating it because there's not > necessarily a 1:1 mapping of things like func_mask between virtual and > physical hardware topologies (assuming I'm interpreting these fields > correctly). Doesn't the kernel also need to validate any such field to > prevent the user spoofing entries for other devices? Is there any > actual performance benefit remaining vs defining a generic interface > after multiple levels have manipulated, recreated, and sanitized > these structures? We can't evaluate these sorts of risks if we don't > define what we're passing through. Thanks, > A potential proposal is to abstract the fields of the QI entry. However, here is a concern for it. Different type of QI entry would have diferent fields. It means we need to have a hyper set to include all the possible fields. Supposedly, the set would increase as more QI type is introduced. I'm not sure if it is an acceptable definition. Based on the latest spec, the vendor-specific fields may have: Global hint Drain read/write Source-ID MIP PFSID PRQ response is another topic. Not included here. Thanks, Yi L
On Wed, 19 Jul 2017 18:45:43 +0800 "Liu, Yi L" <yi.l.liu@linux.intel.com> wrote: > On Mon, Jul 17, 2017 at 04:45:15PM -0600, Alex Williamson wrote: > > On Mon, 17 Jul 2017 10:58:41 +0000 > > "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > > > > Hi Alex, > > > > > > Pls refer to the response inline. > > > > > > > -----Original Message----- > > > > From: kvm-owner@vger.kernel.org > > > > [mailto:kvm-owner@vger.kernel.org] On Behalf Of Alex Williamson > > > > Sent: Saturday, July 15, 2017 2:16 AM > > > > To: Liu, Yi L <yi.l.liu@intel.com> > > > > Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>; > > > > Tian, Kevin <kevin.tian@intel.com>; Liu, Yi L > > > > <yi.l.liu@linux.intel.com>; Lan, Tianyu <tianyu.lan@intel.com>; > > > > Raj, Ashok <ashok.raj@intel.com>; kvm@vger.kernel.org; > > > > jasowang@redhat.com; Will Deacon <Will.Deacon@arm.com>; > > > > peterx@redhat.com; qemu-devel@nongnu.org; > > > > iommu@lists.linux-foundation.org; Pan, Jacob jun > > > > <jacob.jun.pan@intel.com>; Joerg Roedel <joro@8bytes.org> > > > > Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL > > > > for IOMMU TLB invalidate propagation > > > > > > > > On Fri, 14 Jul 2017 08:58:02 +0000 > > > > "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > > > > > > > > Hi Alex, > > > > > > > > > > Against to the opaque open, I'd like to propose the following > > > > > definition based on the existing comments. Pls note that I've > > > > > merged the pasid table binding and iommu tlb invalidation > > > > > into a single IOCTL and make different flags to indicate the > > > > > iommu operations. Per Kevin's comments, there may be iommu > > > > > invalidation for guest IOVA tlb, so I renamed the IOCTL and > > > > > data structure to be non-svm specific. Pls kindly have a > > > > > review, so that we can make the opaque open closed and move > > > > > forward. Surely, comments and ideas are welcomed. And for the > > > > > scope and flags definition in struct iommu_tlb_invalidate, > > > > > it's also welcomed to > > > > give your ideas on it. > > > > > > > > > > 1. Add a VFIO IOCTL for iommu operations from user-space > > > > > > > > > > #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24) > > > > > > > > > > Corresponding data structure: > > > > > struct vfio_iommu_operation_info { > > > > > __u32 argsz; > > > > > #define VFIO_IOMMU_BIND_PASIDTBL (1 << 0) /* Bind > > > > > PASID Table */ #define VFIO_IOMMU_BIND_PASID (1 << > > > > > 1) /* Bind PASID from userspace > > > > driver*/ > > > > > #define VFIO_IOMMU_BIND_PGTABLE (1 << 2) /* Bind guest > > > > > mmu page table */ #define VFIO_IOMMU_INVAL_IOTLB (1 << > > > > > 3) /* Invalidate iommu tlb */ __u32 flag; > > > > > __u32 length; // length of the data[] part in > > > > > byte __u8 data[]; // stores the data for iommu op > > > > > indicated by flag field }; > > > > > > > > If we're doing a generic "Ops" ioctl, then we should have an > > > > "op" field which is defined by an enum. It doesn't make sense > > > > to use flags for this, for example can we set multiple flag > > > > bits? If not then it's not a good use for a bit field. I'm > > > > also not sure I understand the value of the "length" field, > > > > can't it always be calculated from argsz? > > > > > > Agreed, enum would be better. "length" field could be calculated > > > from argsz. I used it just to avoid offset calculations. May > > > remove it. > > > > > For iommu tlb invalidation from userspace, the "__u8 data[]" > > > > > stores data which would be parsed by the "struct > > > > > iommu_tlb_invalidate" defined below. > > > > > > > > > > 2. Definitions in include/uapi/linux/iommu.h(newly added > > > > > header file) > > > > > > > > > > /* IOMMU model definition for iommu operations from userspace > > > > > */ enum iommu_model { > > > > > INTLE_IOMMU, > > > > > ARM_SMMU, > > > > > AMD_IOMMU, > > > > > SPAPR_IOMMU, > > > > > S390_IOMMU, > > > > > }; > > > > > > > > > > struct iommu_tlb_invalidate { > > > > > __u32 scope; > > > > > /* pasid-selective invalidation described by @pasid */ > > > > > #define IOMMU_INVALIDATE_PASID (1 << 0) > > > > > /* address-selevtive invalidation described by (@vaddr, > > > > > @size) */ #define IOMMU_INVALIDATE_VADDR (1 << 1) > > > > > > > > Again, is a bit field appropriate here, can a user set both > > > > bits? > > > > > > yes, user may set both bits. It would be invalidate address range > > > which is tagged with a PASID value. > > > > > > > > > > > > __u32 flags; > > > > > /* targets non-pasid mappings, @pasid is not valid */ > > > > > #define IOMMU_INVALIDATE_NO_PASID (1 << 0) > > > > > /* indicating that the pIOMMU doesn't need to invalidate > > > > > all intermediate tables cached as part of the PTE for > > > > > vaddr, only the last-level entry (pte). This is a > > > > > hint. */ #define IOMMU_INVALIDATE_VADDR_LEAF (1 << > > > > > 1) > > > > > > > > Are we venturing into vendor specific attributes here? > > > > > > These two attributes are still in discussion. Jean and me synced > > > several rounds. But lack of comments from other vendors. > > > > > > Personally, I think both should be generic. > > > IOMMU_INVALIDATE_NO_PASID is to indicate no PASID used > > > for the invalidation. IOMMU_INVALIDATE_VADDR_LEAF indicates > > > only invalidate leaf mappings. > > > I would see if other vendor is object on it. If yes, I'm fine to > > > move it to vendor specific part. > > > > > > > > > > > > __u32 pasid; > > > > > __u64 vaddr; > > > > > __u64 size; > > > > > enum iommu_model model; > > > > > > > > How does a user learn which model(s) are supported by the > > > > interface? How do they learn which ops are supported? Perhaps > > > > a good use for one of those flag bits in the outer data > > > > structure is "probe". > > > > > > My initial plan to user fills it, if the underlying HW doesn't > > > support the model, it refuses to service it. User should get a > > > failure and stop to use it. But your suggestion to have a probe > > > or kinds of query makes sense. How about we add one more > > > operation for such purpose? Besides the supported model query, > > > I'd like to add more. E.g the HW IOMMU capabilities. > > > > We also have VFIO_IOMMU_GET_INFO where the structure can be extended > > for missing capabilities. Depending on the capability you want to > > describe, this might be a better, existing interface for it. > > > > > > > /* > > > > > Vendor may have different HW version and thus the > > > > > data part of this structure differs, use sub_version > > > > > to indicate such difference. > > > > > */ > > > > > __u322 sub_version; > > > > > > > > Not sure I see the value of this vs creating an INTEL_IOMMUv2 > > > > entry in the model enum. > > > > > > Both are fine to me. Just see the opinions from other guys. > > > > > > > > __u64 length; // length of the data[] part in byte > > > > > > > > Questionably useful vs calculating from argsz again , but it > > > > certainly doesn't need to be a qword :-o > > > > > > Thx for the remind. 32bits would be enough. It is surely to get > > > it from argsz. However, I would like to leave it here. Reason is: > > > argsz is in vfio layer, the "length" here is actually used in > > > vendor-specific iommu driver layer. So would require vfio to pass > > > argsz or the size of " struct iommu_tlb_invalidate" to > > > vendor-specific iommu driver layer by means of parameter or so. > > > Personally, I prefer to pass it in the structure. If it's better > > > to pass it as a parameter, I would do it. > > > > Ok, then the layer that does the copy_from_user will need to > > validate that length is fully contained within the copied data > > structure, we can't let the user trick the kernel into using kernel > > memory for this. > > VFIO is still the layer which copy_from_user. would check the length. > > > > > > > > __u8 data[]; > > > > > }; > > > > > > > > > > For Intel, the data structue is: > > > > > struct intel_iommu_invalidate_data { > > > > > __u64 low; > > > > > __u64 high; > > > > > } > > > > > > > > high/low what? This is a pretty weak uapi definition. > > > > Thanks, > > > > > > For this part, for Intel platform, we plan to pass a 128 bit data > > > for the invalidation. The structure varies from invalidation type > > > to type. Here is my thought on it. Define an 128 bits union. List > > > the invalidation data details for each invalidation type. What's > > > your opinion on it? So far, we have 7 types for invalidation. The > > > prq response is not included. > > > > I want this interface to be fully defined, but at the same time I > > don't necessarily want to create useless data structures. I > > believe the intention here is to pass these directly through to a > > QI entry, where > > yes, it's a QI entry from guest. > > > we must match a hardware definition. I'm tempted to suggest > > referencing the hardware specification, but see below... > > > > A concern for this model is that hardware may trust the iommu driver > > not to create QI entries that don't set reserved bits or set invalid > > field data. If it does those kinds of things, it's a kernel driver > > bug. Once exposed to the user, we cannot guarantee that. Does > > Intel have confidence that a user cannot maliciously interfere with > > other contexts or the general operation of the invalidation queue > > if a user is effectively given direct access? Will the > > invalidation data be sanitized by the iommu driver? > > > > > union intel_iommu_invalidate_data { > > > struct { > > > __u64 low; > > > __u64 high; > > > } invalidate_data; > > > > > > struct { > > > __u64 type: 4; > > > __u64 gran: 2; > > > __u64 rsv1: 10; > > > __u64 did: 16; > > > __u64 sid: 16; > > > __u64 func_mask: 2; > > > __u64 rsv2: 14; > > > __64 rsv3: 64; > > > } context_cache_inv; > > > .... > > > > Here's part of the issue with not fully defining these, we have did, > > sid, and func_mask. I think we're claiming that the benefit of > > passing through the hardware data structure is performance, but the > > user needs to replace these IDs to match the physical device rather > > than the virtual device, perhaps even entirely recreating it > > because there's not necessarily a 1:1 mapping of things like > > func_mask between virtual and physical hardware topologies > > (assuming I'm interpreting these fields correctly). Doesn't the > > kernel also need to validate any such field to prevent the user > > spoofing entries for other devices? Is there any actual > > performance benefit remaining vs defining a generic interface after > > multiple levels have manipulated, recreated, and sanitized these > > structures? We can't evaluate these sorts of risks if we don't > > define what we're passing through. Thanks, > > A potential proposal is to abstract the fields of the QI entry. > However, here is a concern for it. Different type of QI entry would > have diferent fields. It means we need to have a hyper set to include > all the possible fields. Supposedly, the set would increase as more > QI type is introduced. I'm not sure if it is an acceptable definition. > > Based on the latest spec, the vendor-specific fields may have: > > Global hint > Drain read/write > Source-ID > MIP > PFSID > My thinking was that as long as the risk of having some opaque data is limited to the device that is already exposed to the user space, it should be fine. We have model specific IOMMU driver to sanitize the data before putting the descriptor into hardware. But I agree the overhead of disassemble/assemble may not be significant. Though with vIOMMU and caching mode = 1 (requires explicit invalidation of caches regardless present or not, VT-d spec 6.1), we will see more invalidation than the native pIOMMU case. Anyway, we can do some micro benchmark to see the overhead. > PRQ response is another topic. Not included here. > > Thanks, > Yi L > [Jacob Pan]
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 6b97987..50c51f8 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -564,6 +564,15 @@ struct vfio_device_svm { #define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE + 22) +/* For IOMMU TLB Invalidation Propagation */ +struct vfio_iommu_tlb_invalidate { + __u32 argsz; + __u32 length; + __u8 data[]; +}; + +#define VFIO_IOMMU_TLB_INVALIDATE _IO(VFIO_TYPE, VFIO_BASE + 23) + /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */ /*