Message ID | 1519900415-30314-13-git-send-email-yi.l.liu@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 01/03/2018 11:33, Liu, Yi L wrote: > + IntelPASIDNode *node; > + char name[128]; > + > + QLIST_FOREACH(node, &(s->pasid_as_list), next) { > + vtd_pasid_as = node->pasid_as; > + if (pasid == vtd_pasid_as->sva_ctx.pasid) { > + return vtd_pasid_as; > + } > + } > + > + vtd_pasid_as = g_malloc0(sizeof(*vtd_pasid_as)); > + vtd_pasid_as->iommu_state = s; > + snprintf(name, sizeof(name), "intel_iommu_pasid_%d", pasid); > + address_space_init(&vtd_pasid_as->as, NULL, "pasid"); The name is unused here. The call to address_space_init should probably use it. You also don't need the separate IntelPASIDNode, because the QLIST_ENTRY can be placed directly in VTDPASIDAddressSpace. Paolo
On Fri, Mar 02, 2018 at 03:51:53PM +0100, Paolo Bonzini wrote: > On 01/03/2018 11:33, Liu, Yi L wrote: > > + IntelPASIDNode *node; > > + char name[128]; > > + > > + QLIST_FOREACH(node, &(s->pasid_as_list), next) { > > + vtd_pasid_as = node->pasid_as; > > + if (pasid == vtd_pasid_as->sva_ctx.pasid) { > > + return vtd_pasid_as; > > + } > > + } > > + > > + vtd_pasid_as = g_malloc0(sizeof(*vtd_pasid_as)); > > + vtd_pasid_as->iommu_state = s; > > + snprintf(name, sizeof(name), "intel_iommu_pasid_%d", pasid); > > + address_space_init(&vtd_pasid_as->as, NULL, "pasid"); > > The name is unused here. The call to address_space_init should probably > use it. yes, it is. I missed it. Thanks for catching it. > You also don't need the separate IntelPASIDNode, because the > QLIST_ENTRY can be placed directly in VTDPASIDAddressSpace. Would refine it in next version. Ragards, Yi Liu
On Thu, Mar 01, 2018 at 06:33:35PM +0800, Liu, Yi L wrote: > This patch shows the idea of how a device is binded to a PASID tagged > AddressSpace. > > when Intel vIOMMU emulator detected a pasid table entry programming > from guest. Intel vIOMMU emulator firstly finds a VTDPASIDAddressSpace > with the pasid field of pasid cache invalidate request. > > * If it is to bind a device to a guest process, needs add the device > to the device list behind the VTDPASIDAddressSpace. And if the device > is assigned device, need to register sva_notfier for future tlb > flushing if any mapping changed to the process address space. > > * If it is to unbind a device from a guest process, then need to remove > the device from the device list behind the VTDPASIDAddressSpace. > And also needs to unregister the sva_notfier if the device is assigned > device. > > This patch hasn't added the unbind logic. It depends on guest pasid > table entry parsing which requires further emulation. Here just want > to show the idea for the PASID tagged AddressSpace management framework. > Full unregister logic would be included in future virt-SVA patchset. > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com> > --- > hw/i386/intel_iommu.c | 119 +++++++++++++++++++++++++++++++++++++++++ > hw/i386/intel_iommu_internal.h | 10 ++++ > 2 files changed, 129 insertions(+) > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > index b8e8dbb..ed07035 100644 > --- a/hw/i386/intel_iommu.c > +++ b/hw/i386/intel_iommu.c > @@ -1801,6 +1801,118 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) > return true; > } > > +static VTDPASIDAddressSpace *vtd_get_pasid_as(IntelIOMMUState *s, > + uint32_t pasid) > +{ > + VTDPASIDAddressSpace *vtd_pasid_as = NULL; > + IntelPASIDNode *node; > + char name[128]; > + > + QLIST_FOREACH(node, &(s->pasid_as_list), next) { > + vtd_pasid_as = node->pasid_as; > + if (pasid == vtd_pasid_as->sva_ctx.pasid) { > + return vtd_pasid_as; > + } > + } This seems to be a per-iommu pasid table. However from the spec it looks more like that should be per-domain (I'm seeing figure 3-8). For example, each domain should be able to have its own pasid table. Then IIUC a pasid context will need a (domain, pasid) tuple to identify, not only the pasid itself? And, do we need to destroy the pasid context after it's freed by the guest? Here it seems that we'll cache it forever. > + > + vtd_pasid_as = g_malloc0(sizeof(*vtd_pasid_as)); > + vtd_pasid_as->iommu_state = s; > + snprintf(name, sizeof(name), "intel_iommu_pasid_%d", pasid); > + address_space_init(&vtd_pasid_as->as, NULL, "pasid"); I saw that this is only inited and never used. Could I ask when this will be used? > + QLIST_INIT(&vtd_pasid_as->device_list); > + > + node = g_malloc0(sizeof(*node)); > + node->pasid_as = vtd_pasid_as; > + QLIST_INSERT_HEAD(&s->pasid_as_list, node, next); > + > + return vtd_pasid_as; > +} > + > +static void vtd_bind_device_to_pasid_as(VTDPASIDAddressSpace *vtd_pasid_as, > + PCIBus *bus, uint8_t devfn) > +{ > + VTDDeviceNode *node = NULL; > + > + QLIST_FOREACH(node, &(vtd_pasid_as->device_list), next) { > + if (node->bus == bus && node->devfn == devfn) { > + return; > + } > + } > + > + node = g_malloc0(sizeof(*node)); > + node->bus = bus; > + node->devfn = devfn; > + QLIST_INSERT_HEAD(&(vtd_pasid_as->device_list), node, next); So here I have the same confusion - IIUC according to the spec two devices can have differnet pasid tables, however they can be sharing the same PASID number (e.g., pasid=1) in the table. Here since vtd_pasid_as is only per-IOMMU, could it possible that we put multiple devices under same PASID context while actually they are not sharing the same process page table? Problematic? Please correct me if needed. > + > + pci_device_sva_register_notifier(bus, devfn, &vtd_pasid_as->sva_ctx); > + > + return; > +} > + > +static bool vtd_process_pc_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) > +{ > + > + IntelIOMMUAssignedDeviceNode *node = NULL; > + int ret = 0; > + > + uint16_t domain_id; > + uint32_t pasid; > + VTDPASIDAddressSpace *vtd_pasid_as; > + > + if ((inv_desc->lo & VTD_INV_DESC_PASIDC_RSVD_LO) || > + (inv_desc->hi & VTD_INV_DESC_PASIDC_RSVD_HI)) { > + return false; > + } > + > + domain_id = VTD_INV_DESC_PASIDC_DID(inv_desc->lo); > + > + switch (inv_desc->lo & VTD_INV_DESC_PASIDC_G) { > + case VTD_INV_DESC_PASIDC_ALL_ALL: > + /* TODO: invalidate all pasid related cache */ I think it's fine as RFC, but we'd better have this in the final version? IIUC you'll need caching-mode too for virt-sva, and here you'll possibly need to walk and scan every context entry that has the same domain ID specified in the invalidation request. Maybe further you'll need to scan the pasid entries too, register notifiers when needed. Thanks, > + break; > + > + case VTD_INV_DESC_PASIDC_PASID_SI: > + pasid = VTD_INV_DESC_PASIDC_PASID(inv_desc->lo); > + vtd_pasid_as = vtd_get_pasid_as(s, pasid); > + QLIST_FOREACH(node, &(s->assigned_device_list), next) { > + VTDAddressSpace *vtd_as = node->vtd_as; > + VTDContextEntry ce; > + uint16_t did; > + uint8_t bus = pci_bus_num(vtd_as->bus); > + ret = vtd_dev_to_context_entry(s, bus, > + vtd_as->devfn, &ce); > + if (ret != 0) { > + continue; > + } > + > + did = VTD_CONTEXT_ENTRY_DID(ce.hi); > + /* > + * If did field equals to the domain_id field of inv_descriptor, > + * then the device is affect by this invalidate request, need to > + * bind or unbind the device to the pasid tagged address space. > + * a) If it is bind, need to add the device to the device list, > + * add register tlb flush notifier for it > + * b) If it is unbind, need to remove the device from the device > + * list, and unregister the tlb flush notifier > + * TODO: add unbind logic accordingly, depends on the parsing of > + * guest pasid table entry pasrsing, here has no parsing to > + * pasid table entry. > + * > + */ > + if (did == domain_id) { > + vtd_bind_device_to_pasid_as(vtd_pasid_as, > + vtd_as->bus, vtd_as->devfn); > + } > + } > + break; > + > + default: > + return false; > + } > + > + return true; > +} > +
> From: Peter Xu [mailto:peterx@redhat.com] > Sent: Tuesday, March 6, 2018 7:44 PM > Subject: Re: [PATCH v3 12/12] intel_iommu: bind device to PASID tagged > AddressSpace > > On Thu, Mar 01, 2018 at 06:33:35PM +0800, Liu, Yi L wrote: > > This patch shows the idea of how a device is binded to a PASID tagged > > AddressSpace. > > > > when Intel vIOMMU emulator detected a pasid table entry programming > > from guest. Intel vIOMMU emulator firstly finds a VTDPASIDAddressSpace > > with the pasid field of pasid cache invalidate request. > > > > * If it is to bind a device to a guest process, needs add the device > > to the device list behind the VTDPASIDAddressSpace. And if the device > > is assigned device, need to register sva_notfier for future tlb > > flushing if any mapping changed to the process address space. > > > > * If it is to unbind a device from a guest process, then need to remove > > the device from the device list behind the VTDPASIDAddressSpace. > > And also needs to unregister the sva_notfier if the device is assigned > > device. > > > > This patch hasn't added the unbind logic. It depends on guest pasid > > table entry parsing which requires further emulation. Here just want > > to show the idea for the PASID tagged AddressSpace management framework. > > Full unregister logic would be included in future virt-SVA patchset. > > > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com> > > --- > > hw/i386/intel_iommu.c | 119 > +++++++++++++++++++++++++++++++++++++++++ > > hw/i386/intel_iommu_internal.h | 10 ++++ > > 2 files changed, 129 insertions(+) > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > index b8e8dbb..ed07035 100644 > > --- a/hw/i386/intel_iommu.c > > +++ b/hw/i386/intel_iommu.c > > @@ -1801,6 +1801,118 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState > *s, VTDInvDesc *inv_desc) > > return true; > > } > > > > +static VTDPASIDAddressSpace *vtd_get_pasid_as(IntelIOMMUState *s, > > + uint32_t pasid) > > +{ > > + VTDPASIDAddressSpace *vtd_pasid_as = NULL; > > + IntelPASIDNode *node; > > + char name[128]; > > + > > + QLIST_FOREACH(node, &(s->pasid_as_list), next) { > > + vtd_pasid_as = node->pasid_as; > > + if (pasid == vtd_pasid_as->sva_ctx.pasid) { > > + return vtd_pasid_as; > > + } > > + } > > This seems to be a per-iommu pasid table. However from the spec it > looks more like that should be per-domain (I'm seeing figure 3-8). > For example, each domain should be able to have its own pasid table. > Then IIUC a pasid context will need a (domain, pasid) tuple to > identify, not only the pasid itself? Yes, this is a per-iommu table here. Actually, how we assemble the table here depends on the PASID namespace. You may refer to the iommu driver code. intel-svm.c, it's actually per-iommu. /* Do not use PASID 0 in caching mode (virtualised IOMMU) */ ret = idr_alloc(&iommu->pasid_idr, svm, !!cap_caching_mode(iommu->cap), pasid_max - 1, GFP_KERNEL); > > And, do we need to destroy the pasid context after it's freed by the > guest? Here it seems that we'll cache it forever. If we need to do it. A PASID can be bind to multiple devices. If there is no device binding on it, then needs to destroy it. This may be done by refcount. As I mentioned in the description, that requires further vIOMMU emulation, so I didn't include it. But it should be covered in final version. Good catch. > > > + > > + vtd_pasid_as = g_malloc0(sizeof(*vtd_pasid_as)); > > + vtd_pasid_as->iommu_state = s; > > + snprintf(name, sizeof(name), "intel_iommu_pasid_%d", pasid); > > + address_space_init(&vtd_pasid_as->as, NULL, "pasid"); > > I saw that this is only inited and never used. Could I ask when this > will be used? AddressSpace is actually introduced for future support of emulated SVA capable devices and possible 1st level paging shadowing(similar to the 2nd level page table shadowing you upstreamed). > > > + QLIST_INIT(&vtd_pasid_as->device_list); > > + > > + node = g_malloc0(sizeof(*node)); > > + node->pasid_as = vtd_pasid_as; > > + QLIST_INSERT_HEAD(&s->pasid_as_list, node, next); > > + > > + return vtd_pasid_as; > > +} > > + > > +static void vtd_bind_device_to_pasid_as(VTDPASIDAddressSpace *vtd_pasid_as, > > + PCIBus *bus, uint8_t devfn) > > +{ > > + VTDDeviceNode *node = NULL; > > + > > + QLIST_FOREACH(node, &(vtd_pasid_as->device_list), next) { > > + if (node->bus == bus && node->devfn == devfn) { > > + return; > > + } > > + } > > + > > + node = g_malloc0(sizeof(*node)); > > + node->bus = bus; > > + node->devfn = devfn; > > + QLIST_INSERT_HEAD(&(vtd_pasid_as->device_list), node, next); > > So here I have the same confusion - IIUC according to the spec two > devices can have differnet pasid tables, however they can be sharing > the same PASID number (e.g., pasid=1) in the table. Do you mean the pasid table in iommu driver? I can not say it is impossible, but you may notice that in current iommu driver, the devices behind a single iommu unit shared pasid table. > Here since > vtd_pasid_as is only per-IOMMU, could it possible that we put multiple > devices under same PASID context while actually they are not sharing > the same process page table? Problematic? You are correct, two devices may under same PASID context. For the case you described, I don't think it is allowed as it breaks the PASID concept. Software should avoid it. Does it make sense? > > Please correct me if needed. > > > + > > + pci_device_sva_register_notifier(bus, devfn, &vtd_pasid_as->sva_ctx); > > + > > + return; > > +} > > + > > +static bool vtd_process_pc_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) > > +{ > > + > > + IntelIOMMUAssignedDeviceNode *node = NULL; > > + int ret = 0; > > + > > + uint16_t domain_id; > > + uint32_t pasid; > > + VTDPASIDAddressSpace *vtd_pasid_as; > > + > > + if ((inv_desc->lo & VTD_INV_DESC_PASIDC_RSVD_LO) || > > + (inv_desc->hi & VTD_INV_DESC_PASIDC_RSVD_HI)) { > > + return false; > > + } > > + > > + domain_id = VTD_INV_DESC_PASIDC_DID(inv_desc->lo); > > + > > + switch (inv_desc->lo & VTD_INV_DESC_PASIDC_G) { > > + case VTD_INV_DESC_PASIDC_ALL_ALL: > > + /* TODO: invalidate all pasid related cache */ > > I think it's fine as RFC, but we'd better have this in the final > version? Definitely. > > IIUC you'll need caching-mode too for virt-sva, and here you'll > possibly need to walk and scan every context entry that has the same > domain ID specified in the invalidation request. Maybe further you'll > need to scan the pasid entries too, register notifiers when needed. Sure, should be in the full virt-sva series. Thanks, Yi Liu
On Thu, Mar 08, 2018 at 09:39:18AM +0000, Liu, Yi L wrote: > > From: Peter Xu [mailto:peterx@redhat.com] > > Sent: Tuesday, March 6, 2018 7:44 PM > > Subject: Re: [PATCH v3 12/12] intel_iommu: bind device to PASID tagged > > AddressSpace > > > > On Thu, Mar 01, 2018 at 06:33:35PM +0800, Liu, Yi L wrote: > > > This patch shows the idea of how a device is binded to a PASID tagged > > > AddressSpace. > > > > > > when Intel vIOMMU emulator detected a pasid table entry programming > > > from guest. Intel vIOMMU emulator firstly finds a VTDPASIDAddressSpace > > > with the pasid field of pasid cache invalidate request. > > > > > > * If it is to bind a device to a guest process, needs add the device > > > to the device list behind the VTDPASIDAddressSpace. And if the device > > > is assigned device, need to register sva_notfier for future tlb > > > flushing if any mapping changed to the process address space. > > > > > > * If it is to unbind a device from a guest process, then need to remove > > > the device from the device list behind the VTDPASIDAddressSpace. > > > And also needs to unregister the sva_notfier if the device is assigned > > > device. > > > > > > This patch hasn't added the unbind logic. It depends on guest pasid > > > table entry parsing which requires further emulation. Here just want > > > to show the idea for the PASID tagged AddressSpace management framework. > > > Full unregister logic would be included in future virt-SVA patchset. > > > > > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com> > > > --- > > > hw/i386/intel_iommu.c | 119 > > +++++++++++++++++++++++++++++++++++++++++ > > > hw/i386/intel_iommu_internal.h | 10 ++++ > > > 2 files changed, 129 insertions(+) > > > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > > index b8e8dbb..ed07035 100644 > > > --- a/hw/i386/intel_iommu.c > > > +++ b/hw/i386/intel_iommu.c > > > @@ -1801,6 +1801,118 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState > > *s, VTDInvDesc *inv_desc) > > > return true; > > > } > > > > > > +static VTDPASIDAddressSpace *vtd_get_pasid_as(IntelIOMMUState *s, > > > + uint32_t pasid) > > > +{ > > > + VTDPASIDAddressSpace *vtd_pasid_as = NULL; > > > + IntelPASIDNode *node; > > > + char name[128]; > > > + > > > + QLIST_FOREACH(node, &(s->pasid_as_list), next) { > > > + vtd_pasid_as = node->pasid_as; > > > + if (pasid == vtd_pasid_as->sva_ctx.pasid) { > > > + return vtd_pasid_as; > > > + } > > > + } > > > > This seems to be a per-iommu pasid table. However from the spec it > > looks more like that should be per-domain (I'm seeing figure 3-8). > > For example, each domain should be able to have its own pasid table. > > Then IIUC a pasid context will need a (domain, pasid) tuple to > > identify, not only the pasid itself? > > Yes, this is a per-iommu table here. Actually, how we assemble the > table here depends on the PASID namespace. You may refer to the > iommu driver code. intel-svm.c, it's actually per-iommu. > > /* Do not use PASID 0 in caching mode (virtualised IOMMU) */ > ret = idr_alloc(&iommu->pasid_idr, svm, > !!cap_caching_mode(iommu->cap), > pasid_max - 1, GFP_KERNEL); Thanks for the pointer. However from the spec, I see that PASID table pointer is per-context, IMHO which means that the spec will allow the PASID table to be different from one device to another. Even if current Linux is sharing a single PASID table now, I don't know whether that can be expanded in the future. Also, what if we run a guest with another OS besides Linux? After all we are emulating the device, so IIUC the only thing we should follow is the spec. > > > > > And, do we need to destroy the pasid context after it's freed by the > > guest? Here it seems that we'll cache it forever. > > If we need to do it. A PASID can be bind to multiple devices. If there > is no device binding on it, then needs to destroy it. This may be done > by refcount. As I mentioned in the description, that requires further > vIOMMU emulation, so I didn't include it. But it should be covered > in final version. Good catch. > > > > > > + > > > + vtd_pasid_as = g_malloc0(sizeof(*vtd_pasid_as)); > > > + vtd_pasid_as->iommu_state = s; > > > + snprintf(name, sizeof(name), "intel_iommu_pasid_%d", pasid); > > > + address_space_init(&vtd_pasid_as->as, NULL, "pasid"); > > > > I saw that this is only inited and never used. Could I ask when this > > will be used? > > AddressSpace is actually introduced for future support of emulated > SVA capable devices and possible 1st level paging shadowing(similar > to the 2nd level page table shadowing you upstreamed). I am not sure whether that can be useful even if there will be such a device. The reason is that if you see current with-IOMMU guests, they are actually "somehow" bypassing the address space framework by calling the IOMMU MR's translate() to do the page walking. If there will be an emulated device that (for example) supports PASID, and with the 1st page table enabled, I think it'll also work naturally with current translate() interface, just that in the VT-d code we'll find that we'll need to walk a process page table this time rather than the IOMMU device page table. And no matter what, I would prefer you drop this address space until it'll be firstly used. > > > > > > + QLIST_INIT(&vtd_pasid_as->device_list); > > > + > > > + node = g_malloc0(sizeof(*node)); > > > + node->pasid_as = vtd_pasid_as; > > > + QLIST_INSERT_HEAD(&s->pasid_as_list, node, next); > > > + > > > + return vtd_pasid_as; > > > +} > > > + > > > +static void vtd_bind_device_to_pasid_as(VTDPASIDAddressSpace *vtd_pasid_as, > > > + PCIBus *bus, uint8_t devfn) > > > +{ > > > + VTDDeviceNode *node = NULL; > > > + > > > + QLIST_FOREACH(node, &(vtd_pasid_as->device_list), next) { > > > + if (node->bus == bus && node->devfn == devfn) { > > > + return; > > > + } > > > + } > > > + > > > + node = g_malloc0(sizeof(*node)); > > > + node->bus = bus; > > > + node->devfn = devfn; > > > + QLIST_INSERT_HEAD(&(vtd_pasid_as->device_list), node, next); > > > > So here I have the same confusion - IIUC according to the spec two > > devices can have differnet pasid tables, however they can be sharing > > the same PASID number (e.g., pasid=1) in the table. > > Do you mean the pasid table in iommu driver? I can not say it is impossible, > but you may notice that in current iommu driver, the devices behind a single > iommu unit shared pasid table. > > > Here since > > vtd_pasid_as is only per-IOMMU, could it possible that we put multiple > > devices under same PASID context while actually they are not sharing > > the same process page table? Problematic? > > You are correct, two devices may under same PASID context. For the case > you described, I don't think it is allowed as it breaks the PASID concept. > Software should avoid it. Yeh, so here my question would be the same as above: is it following the spec that all devices _must_ share a PASID table between devices, or it is just that we _can_ share it as a first version of Linux SVA implementation? Thanks,
> From: Peter Xu [mailto:peterx@redhat.com] > Sent: Friday, March 9, 2018 3:59 PM > > On Thu, Mar 08, 2018 at 09:39:18AM +0000, Liu, Yi L wrote: > > > From: Peter Xu [mailto:peterx@redhat.com] > > > Sent: Tuesday, March 6, 2018 7:44 PM > > > Subject: Re: [PATCH v3 12/12] intel_iommu: bind device to PASID tagged > > > AddressSpace > > > > > > On Thu, Mar 01, 2018 at 06:33:35PM +0800, Liu, Yi L wrote: > > > > This patch shows the idea of how a device is binded to a PASID tagged > > > > AddressSpace. > > > > > > > > when Intel vIOMMU emulator detected a pasid table entry > programming > > > > from guest. Intel vIOMMU emulator firstly finds a > VTDPASIDAddressSpace > > > > with the pasid field of pasid cache invalidate request. > > > > > > > > * If it is to bind a device to a guest process, needs add the device > > > > to the device list behind the VTDPASIDAddressSpace. And if the > device > > > > is assigned device, need to register sva_notfier for future tlb > > > > flushing if any mapping changed to the process address space. > > > > > > > > * If it is to unbind a device from a guest process, then need to remove > > > > the device from the device list behind the VTDPASIDAddressSpace. > > > > And also needs to unregister the sva_notfier if the device is assigned > > > > device. > > > > > > > > This patch hasn't added the unbind logic. It depends on guest pasid > > > > table entry parsing which requires further emulation. Here just want > > > > to show the idea for the PASID tagged AddressSpace management > framework. > > > > Full unregister logic would be included in future virt-SVA patchset. > > > > > > > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com> > > > > --- > > > > hw/i386/intel_iommu.c | 119 > > > +++++++++++++++++++++++++++++++++++++++++ > > > > hw/i386/intel_iommu_internal.h | 10 ++++ > > > > 2 files changed, 129 insertions(+) > > > > > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > > > index b8e8dbb..ed07035 100644 > > > > --- a/hw/i386/intel_iommu.c > > > > +++ b/hw/i386/intel_iommu.c > > > > @@ -1801,6 +1801,118 @@ static bool > vtd_process_iotlb_desc(IntelIOMMUState > > > *s, VTDInvDesc *inv_desc) > > > > return true; > > > > } > > > > > > > > +static VTDPASIDAddressSpace *vtd_get_pasid_as(IntelIOMMUState > *s, > > > > + uint32_t pasid) > > > > +{ > > > > + VTDPASIDAddressSpace *vtd_pasid_as = NULL; > > > > + IntelPASIDNode *node; > > > > + char name[128]; > > > > + > > > > + QLIST_FOREACH(node, &(s->pasid_as_list), next) { > > > > + vtd_pasid_as = node->pasid_as; > > > > + if (pasid == vtd_pasid_as->sva_ctx.pasid) { > > > > + return vtd_pasid_as; > > > > + } > > > > + } > > > > > > This seems to be a per-iommu pasid table. However from the spec it > > > looks more like that should be per-domain (I'm seeing figure 3-8). > > > For example, each domain should be able to have its own pasid table. > > > Then IIUC a pasid context will need a (domain, pasid) tuple to > > > identify, not only the pasid itself? > > > > Yes, this is a per-iommu table here. Actually, how we assemble the > > table here depends on the PASID namespace. You may refer to the > > iommu driver code. intel-svm.c, it's actually per-iommu. > > > > /* Do not use PASID 0 in caching mode (virtualised IOMMU) > */ > > ret = idr_alloc(&iommu->pasid_idr, svm, > > !!cap_caching_mode(iommu->cap), > > pasid_max - 1, GFP_KERNEL); > > Thanks for the pointer. > > However from the spec, I see that PASID table pointer is per-context, > IMHO which means that the spec will allow the PASID table to be > different from one device to another. Even if current Linux is > sharing a single PASID table now, I don't know whether that can be > expanded in the future. Also, what if we run a guest with another OS > besides Linux? > > After all we are emulating the device, so IIUC the only thing we > should follow is the spec. > > > > > > > > > And, do we need to destroy the pasid context after it's freed by the > > > guest? Here it seems that we'll cache it forever. > > > > If we need to do it. A PASID can be bind to multiple devices. If there > > is no device binding on it, then needs to destroy it. This may be done > > by refcount. As I mentioned in the description, that requires further > > vIOMMU emulation, so I didn't include it. But it should be covered > > in final version. Good catch. > > > > > > > > > + > > > > + vtd_pasid_as = g_malloc0(sizeof(*vtd_pasid_as)); > > > > + vtd_pasid_as->iommu_state = s; > > > > + snprintf(name, sizeof(name), "intel_iommu_pasid_%d", pasid); > > > > + address_space_init(&vtd_pasid_as->as, NULL, "pasid"); > > > > > > I saw that this is only inited and never used. Could I ask when this > > > will be used? > > > > AddressSpace is actually introduced for future support of emulated > > SVA capable devices and possible 1st level paging shadowing(similar > > to the 2nd level page table shadowing you upstreamed). > > I am not sure whether that can be useful even if there will be such a > device. The reason is that if you see current with-IOMMU guests, they > are actually "somehow" bypassing the address space framework by > calling the IOMMU MR's translate() to do the page walking. If there > will be an emulated device that (for example) supports PASID, and with > the 1st page table enabled, I think it'll also work naturally with > current translate() interface, just that in the VT-d code we'll find > that we'll need to walk a process page table this time rather than the > IOMMU device page table. > > And no matter what, I would prefer you drop this address space until > it'll be firstly used. > > > > > > > > > > + QLIST_INIT(&vtd_pasid_as->device_list); > > > > + > > > > + node = g_malloc0(sizeof(*node)); > > > > + node->pasid_as = vtd_pasid_as; > > > > + QLIST_INSERT_HEAD(&s->pasid_as_list, node, next); > > > > + > > > > + return vtd_pasid_as; > > > > +} > > > > + > > > > +static void vtd_bind_device_to_pasid_as(VTDPASIDAddressSpace > *vtd_pasid_as, > > > > + PCIBus *bus, uint8_t devfn) > > > > +{ > > > > + VTDDeviceNode *node = NULL; > > > > + > > > > + QLIST_FOREACH(node, &(vtd_pasid_as->device_list), next) { > > > > + if (node->bus == bus && node->devfn == devfn) { > > > > + return; > > > > + } > > > > + } > > > > + > > > > + node = g_malloc0(sizeof(*node)); > > > > + node->bus = bus; > > > > + node->devfn = devfn; > > > > + QLIST_INSERT_HEAD(&(vtd_pasid_as->device_list), node, next); > > > > > > So here I have the same confusion - IIUC according to the spec two > > > devices can have differnet pasid tables, however they can be sharing > > > the same PASID number (e.g., pasid=1) in the table. > > > > Do you mean the pasid table in iommu driver? I can not say it is > impossible, > > but you may notice that in current iommu driver, the devices behind a > single > > iommu unit shared pasid table. > > > > > Here since > > > vtd_pasid_as is only per-IOMMU, could it possible that we put multiple > > > devices under same PASID context while actually they are not sharing > > > the same process page table? Problematic? > > > > You are correct, two devices may under same PASID context. For the case > > you described, I don't think it is allowed as it breaks the PASID concept. > > Software should avoid it. > > Yeh, so here my question would be the same as above: is it following > the spec that all devices _must_ share a PASID table between devices, > or it is just that we _can_ share it as a first version of Linux SVA > implementation? > the spec defines PASID table as per device. Software may decide whether to share PASID table between devices based on its needs (e.g. with kernel drivers sharing PASID table can reduce footprint, but with user space drivers then per-device PASID table is necessary to ensure isolation). VT-d emulation code shouldn't stick to one specific software usage here... Thanks Kevin
> From: Peter Xu [mailto:peterx@redhat.com] > Sent: Friday, March 9, 2018 3:59 PM > Subject: Re: [PATCH v3 12/12] intel_iommu: bind device to PASID tagged > AddressSpace > > On Thu, Mar 08, 2018 at 09:39:18AM +0000, Liu, Yi L wrote: > > > From: Peter Xu [mailto:peterx@redhat.com] > > > Sent: Tuesday, March 6, 2018 7:44 PM > > > Subject: Re: [PATCH v3 12/12] intel_iommu: bind device to PASID > > > tagged AddressSpace > > > > > > On Thu, Mar 01, 2018 at 06:33:35PM +0800, Liu, Yi L wrote: > > > > This patch shows the idea of how a device is binded to a PASID > > > > tagged AddressSpace. > > > > > > > > when Intel vIOMMU emulator detected a pasid table entry > > > > programming from guest. Intel vIOMMU emulator firstly finds a > > > > VTDPASIDAddressSpace with the pasid field of pasid cache invalidate request. > > > > > > > > * If it is to bind a device to a guest process, needs add the device > > > > to the device list behind the VTDPASIDAddressSpace. And if the device > > > > is assigned device, need to register sva_notfier for future tlb > > > > flushing if any mapping changed to the process address space. > > > > > > > > * If it is to unbind a device from a guest process, then need to remove > > > > the device from the device list behind the VTDPASIDAddressSpace. > > > > And also needs to unregister the sva_notfier if the device is assigned > > > > device. > > > > > > > > This patch hasn't added the unbind logic. It depends on guest > > > > pasid table entry parsing which requires further emulation. Here > > > > just want to show the idea for the PASID tagged AddressSpace management > framework. > > > > Full unregister logic would be included in future virt-SVA patchset. > > > > > > > > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com> > > > > --- > > > > hw/i386/intel_iommu.c | 119 > > > +++++++++++++++++++++++++++++++++++++++++ > > > > hw/i386/intel_iommu_internal.h | 10 ++++ > > > > 2 files changed, 129 insertions(+) > > > > > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index > > > > b8e8dbb..ed07035 100644 > > > > --- a/hw/i386/intel_iommu.c > > > > +++ b/hw/i386/intel_iommu.c > > > > @@ -1801,6 +1801,118 @@ static bool > > > > vtd_process_iotlb_desc(IntelIOMMUState > > > *s, VTDInvDesc *inv_desc) > > > > return true; > > > > } > > > > > > > > +static VTDPASIDAddressSpace *vtd_get_pasid_as(IntelIOMMUState *s, > > > > + uint32_t pasid) { > > > > + VTDPASIDAddressSpace *vtd_pasid_as = NULL; > > > > + IntelPASIDNode *node; > > > > + char name[128]; > > > > + > > > > + QLIST_FOREACH(node, &(s->pasid_as_list), next) { > > > > + vtd_pasid_as = node->pasid_as; > > > > + if (pasid == vtd_pasid_as->sva_ctx.pasid) { > > > > + return vtd_pasid_as; > > > > + } > > > > + } > > > > > > This seems to be a per-iommu pasid table. However from the spec it > > > looks more like that should be per-domain (I'm seeing figure 3-8). > > > For example, each domain should be able to have its own pasid table. > > > Then IIUC a pasid context will need a (domain, pasid) tuple to > > > identify, not only the pasid itself? > > > > Yes, this is a per-iommu table here. Actually, how we assemble the > > table here depends on the PASID namespace. You may refer to the iommu > > driver code. intel-svm.c, it's actually per-iommu. > > > > /* Do not use PASID 0 in caching mode (virtualised IOMMU) */ > > ret = idr_alloc(&iommu->pasid_idr, svm, > > !!cap_caching_mode(iommu->cap), > > pasid_max - 1, GFP_KERNEL); > > Thanks for the pointer. > > However from the spec, I see that PASID table pointer is per-context, IMHO which > means that the spec will allow the PASID table to be different from one device to > another. Even if current Linux is sharing a single PASID table now, I don't know > whether that can be expanded in the future. Also, what if we run a guest with > another OS besides Linux? > > After all we are emulating the device, so IIUC the only thing we should follow is the > spec. Agree. just echo Kevin's reply here. Let' me re-consider a way to maintain all the PASID tagged address space here. > > > > > > > > > And, do we need to destroy the pasid context after it's freed by the > > > guest? Here it seems that we'll cache it forever. > > > > If we need to do it. A PASID can be bind to multiple devices. If there > > is no device binding on it, then needs to destroy it. This may be done > > by refcount. As I mentioned in the description, that requires further > > vIOMMU emulation, so I didn't include it. But it should be covered in > > final version. Good catch. > > > > > > > > > + > > > > + vtd_pasid_as = g_malloc0(sizeof(*vtd_pasid_as)); > > > > + vtd_pasid_as->iommu_state = s; > > > > + snprintf(name, sizeof(name), "intel_iommu_pasid_%d", pasid); > > > > + address_space_init(&vtd_pasid_as->as, NULL, "pasid"); > > > > > > I saw that this is only inited and never used. Could I ask when > > > this will be used? > > > > AddressSpace is actually introduced for future support of emulated SVA > > capable devices and possible 1st level paging shadowing(similar to the > > 2nd level page table shadowing you upstreamed). > > I am not sure whether that can be useful even if there will be such a device. The > reason is that if you see current with-IOMMU guests, they are actually "somehow" > bypassing the address space framework by calling the IOMMU MR's translate() to do > the page walking. If there will be an emulated device that (for example) supports > PASID, and with the 1st page table enabled, I think it'll also work naturally with > current translate() interface, just that in the VT-d code we'll find that we'll need to > walk a process page table this time rather than the IOMMU device page table. > > And no matter what, I would prefer you drop this address space until it'll be firstly > used. yeah, I would. May add parameter like pasid in the existing MR translate() interface to meet the requirement. > > > > > > > > > + QLIST_INIT(&vtd_pasid_as->device_list); > > > > + > > > > + node = g_malloc0(sizeof(*node)); > > > > + node->pasid_as = vtd_pasid_as; > > > > + QLIST_INSERT_HEAD(&s->pasid_as_list, node, next); > > > > + > > > > + return vtd_pasid_as; > > > > +} > > > > + > > > > +static void vtd_bind_device_to_pasid_as(VTDPASIDAddressSpace > *vtd_pasid_as, > > > > + PCIBus *bus, uint8_t > > > > +devfn) { > > > > + VTDDeviceNode *node = NULL; > > > > + > > > > + QLIST_FOREACH(node, &(vtd_pasid_as->device_list), next) { > > > > + if (node->bus == bus && node->devfn == devfn) { > > > > + return; > > > > + } > > > > + } > > > > + > > > > + node = g_malloc0(sizeof(*node)); > > > > + node->bus = bus; > > > > + node->devfn = devfn; > > > > + QLIST_INSERT_HEAD(&(vtd_pasid_as->device_list), node, next); > > > > > > So here I have the same confusion - IIUC according to the spec two > > > devices can have differnet pasid tables, however they can be sharing > > > the same PASID number (e.g., pasid=1) in the table. > > > > Do you mean the pasid table in iommu driver? I can not say it is > > impossible, but you may notice that in current iommu driver, the > > devices behind a single iommu unit shared pasid table. > > > > > Here since > > > vtd_pasid_as is only per-IOMMU, could it possible that we put > > > multiple devices under same PASID context while actually they are > > > not sharing the same process page table? Problematic? > > > > You are correct, two devices may under same PASID context. For the > > case you described, I don't think it is allowed as it breaks the PASID concept. > > Software should avoid it. > > Yeh, so here my question would be the same as above: is it following the spec that > all devices _must_ share a PASID table between devices, or it is just that we _can_ > share it as a first version of Linux SVA implementation? Thanks, Yi Liu
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index b8e8dbb..ed07035 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -1801,6 +1801,118 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) return true; } +static VTDPASIDAddressSpace *vtd_get_pasid_as(IntelIOMMUState *s, + uint32_t pasid) +{ + VTDPASIDAddressSpace *vtd_pasid_as = NULL; + IntelPASIDNode *node; + char name[128]; + + QLIST_FOREACH(node, &(s->pasid_as_list), next) { + vtd_pasid_as = node->pasid_as; + if (pasid == vtd_pasid_as->sva_ctx.pasid) { + return vtd_pasid_as; + } + } + + vtd_pasid_as = g_malloc0(sizeof(*vtd_pasid_as)); + vtd_pasid_as->iommu_state = s; + snprintf(name, sizeof(name), "intel_iommu_pasid_%d", pasid); + address_space_init(&vtd_pasid_as->as, NULL, "pasid"); + QLIST_INIT(&vtd_pasid_as->device_list); + + node = g_malloc0(sizeof(*node)); + node->pasid_as = vtd_pasid_as; + QLIST_INSERT_HEAD(&s->pasid_as_list, node, next); + + return vtd_pasid_as; +} + +static void vtd_bind_device_to_pasid_as(VTDPASIDAddressSpace *vtd_pasid_as, + PCIBus *bus, uint8_t devfn) +{ + VTDDeviceNode *node = NULL; + + QLIST_FOREACH(node, &(vtd_pasid_as->device_list), next) { + if (node->bus == bus && node->devfn == devfn) { + return; + } + } + + node = g_malloc0(sizeof(*node)); + node->bus = bus; + node->devfn = devfn; + QLIST_INSERT_HEAD(&(vtd_pasid_as->device_list), node, next); + + pci_device_sva_register_notifier(bus, devfn, &vtd_pasid_as->sva_ctx); + + return; +} + +static bool vtd_process_pc_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) +{ + + IntelIOMMUAssignedDeviceNode *node = NULL; + int ret = 0; + + uint16_t domain_id; + uint32_t pasid; + VTDPASIDAddressSpace *vtd_pasid_as; + + if ((inv_desc->lo & VTD_INV_DESC_PASIDC_RSVD_LO) || + (inv_desc->hi & VTD_INV_DESC_PASIDC_RSVD_HI)) { + return false; + } + + domain_id = VTD_INV_DESC_PASIDC_DID(inv_desc->lo); + + switch (inv_desc->lo & VTD_INV_DESC_PASIDC_G) { + case VTD_INV_DESC_PASIDC_ALL_ALL: + /* TODO: invalidate all pasid related cache */ + break; + + case VTD_INV_DESC_PASIDC_PASID_SI: + pasid = VTD_INV_DESC_PASIDC_PASID(inv_desc->lo); + vtd_pasid_as = vtd_get_pasid_as(s, pasid); + QLIST_FOREACH(node, &(s->assigned_device_list), next) { + VTDAddressSpace *vtd_as = node->vtd_as; + VTDContextEntry ce; + uint16_t did; + uint8_t bus = pci_bus_num(vtd_as->bus); + ret = vtd_dev_to_context_entry(s, bus, + vtd_as->devfn, &ce); + if (ret != 0) { + continue; + } + + did = VTD_CONTEXT_ENTRY_DID(ce.hi); + /* + * If did field equals to the domain_id field of inv_descriptor, + * then the device is affect by this invalidate request, need to + * bind or unbind the device to the pasid tagged address space. + * a) If it is bind, need to add the device to the device list, + * add register tlb flush notifier for it + * b) If it is unbind, need to remove the device from the device + * list, and unregister the tlb flush notifier + * TODO: add unbind logic accordingly, depends on the parsing of + * guest pasid table entry pasrsing, here has no parsing to + * pasid table entry. + * + */ + if (did == domain_id) { + vtd_bind_device_to_pasid_as(vtd_pasid_as, + vtd_as->bus, vtd_as->devfn); + } + } + break; + + default: + return false; + } + + return true; +} + static bool vtd_process_inv_iec_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) { @@ -1911,6 +2023,13 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s) } break; + case VTD_INV_DESC_PC: + trace_vtd_inv_desc("pc", inv_desc.hi, inv_desc.lo); + if (!vtd_process_pc_desc(s, &inv_desc)) { + return false; + } + break; + case VTD_INV_DESC_IEC: trace_vtd_inv_desc("iec", inv_desc.hi, inv_desc.lo); if (!vtd_process_inv_iec_desc(s, &inv_desc)) { diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index d084099..31d0d53 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -332,6 +332,7 @@ typedef union VTDInvDesc VTDInvDesc; #define VTD_INV_DESC_IEC 0x4 /* Interrupt Entry Cache Invalidate Descriptor */ #define VTD_INV_DESC_WAIT 0x5 /* Invalidation Wait Descriptor */ +#define VTD_INV_DESC_PC 0x7 /* PASID-cache Invalidate Desc */ #define VTD_INV_DESC_NONE 0 /* Not an Invalidate Descriptor */ /* Masks for Invalidation Wait Descriptor*/ @@ -388,6 +389,15 @@ typedef union VTDInvDesc VTDInvDesc; #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \ (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) +#define VTD_INV_DESC_PASIDC_G (3ULL << 4) +#define VTD_INV_DESC_PASIDC_PASID(val) (((val) >> 32) & 0xfffffULL) +#define VTD_INV_DESC_PASIDC_DID(val) (((val) >> 16) & VTD_DOMAIN_ID_MASK) +#define VTD_INV_DESC_PASIDC_RSVD_LO 0xfff000000000ffc0ULL +#define VTD_INV_DESC_PASIDC_RSVD_HI 0xffffffffffffffffULL + +#define VTD_INV_DESC_PASIDC_ALL_ALL (0ULL << 4) +#define VTD_INV_DESC_PASIDC_PASID_SI (1ULL << 4) + /* Information about page-selective IOTLB invalidate */ struct VTDIOTLBPageInvInfo { uint16_t domain_id;
This patch shows the idea of how a device is binded to a PASID tagged AddressSpace. when Intel vIOMMU emulator detected a pasid table entry programming from guest. Intel vIOMMU emulator firstly finds a VTDPASIDAddressSpace with the pasid field of pasid cache invalidate request. * If it is to bind a device to a guest process, needs add the device to the device list behind the VTDPASIDAddressSpace. And if the device is assigned device, need to register sva_notfier for future tlb flushing if any mapping changed to the process address space. * If it is to unbind a device from a guest process, then need to remove the device from the device list behind the VTDPASIDAddressSpace. And also needs to unregister the sva_notfier if the device is assigned device. This patch hasn't added the unbind logic. It depends on guest pasid table entry parsing which requires further emulation. Here just want to show the idea for the PASID tagged AddressSpace management framework. Full unregister logic would be included in future virt-SVA patchset. Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com> --- hw/i386/intel_iommu.c | 119 +++++++++++++++++++++++++++++++++++++++++ hw/i386/intel_iommu_internal.h | 10 ++++ 2 files changed, 129 insertions(+)