Message ID | 20160119213901.GG14080@localhost (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
> -----Original Message----- > From: linux-pci-owner@vger.kernel.org [mailto:linux-pci- > owner@vger.kernel.org] On Behalf Of Bjorn Helgaas > Sent: Tuesday, January 19, 2016 10:39 PM > To: Alex Williamson <alex.williamson@redhat.com> > Cc: Lawrynowicz, Jacek <jacek.lawrynowicz@intel.com>; linux- > pci@vger.kernel.org; bhelgaas@google.com; dwmw2@infradead.org; > jroedel@suse.de > Subject: Re: [PATCH] pci: Add support for multiple DMA aliases > > On Tue, Jan 19, 2016 at 02:04:31PM -0700, Alex Williamson wrote: > > On Tue, 2016-01-19 at 14:12 -0600, Bjorn Helgaas wrote: > > > [+cc Alex] > > > > > > On Mon, Jan 18, 2016 at 09:33:15PM -0600, Bjorn Helgaas wrote: > > > > On Mon, Jan 18, 2016 at 05:07:47PM +0100, Jacek Lawrynowicz wrote: > > > > > This patch solves IOMMU support issues with PCIe non-transparent > > > > > bridges that use Requester ID look-up tables (LUT), e.g. > > > > > PEX8733. Before exiting the bridge, packet's RID is rewritten > > > > > according to LUT programmed by a driver. Modified packets are > > > > > then passed to a destination bus and processed upstream. The > > > > > problem is that such packets seem to come from non-existent > > > > > nodes that are hidden behind NTB and are not discoverable by a > > > > > destination node, so IOMMU discards them. Adding DMA alias for a > > > > > given LUT entry allows IOMMU to create a proper mapping that > enables inter-node communication. > > > > > > > > > > The current DMA alias implementation supports only single alias, > > > > > so it's not possible to connect more than two nodes when IOMMU > > > > > is enabled. This implementation enables all possible aliases on > > > > > a given bus (256) that are stored in a bitset. Alias devfn is > > > > > directly translated to a bit number. The bitset is not allocated > > > > > for devices that have no need for DMA aliases. > > > > My only concern here is that pci_add_dma_alias() makes aliases seem > > more dynamic than they really are. For instance, when we add a device > > to an IOMMU domain, we evaluate the aliases at that point, if an NTB > > later adds a new lookup entry and specifies a new alias, it's still > > not going to work. Similarly, IOMMU groups are evaluated as the > > device is added, so if an alias is to a physical device and we need > > the cross reference to bind them together into a single group, calling > > pci_add_dma_alias() from a driver isn't going to work. > > > > The existing code had this problem too, it's just more obvious now > > that we have a helper function and that the helper is exported for use > > outside of the PCI core. Thanks, > > Oh, that's a really good point. I hadn't noticed the export. Is there any > reason pci_add_dma_alias() needs to be declared in include/linux/pci.h and > exported to modules? > > I don't think the current patch requires the export, but I suppose you > envision an NTB driver that might be a module? I guess we can easily export > it when that driver is merged if that seems the best solution. This export would be useful for Xeon Phi x200 which uses on a NTB generating multiple RIDs. x200 is not yet ready for upstreming (x100 is already upstreamed) and having this export would make driver development less painful. -- Jacek Lawrynowicz Intel Technology Poland sp. z o.o. KRS 101882 - ul. Slowackiego 173, 80-298 Gdansk -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jacek, On Wed, Jan 20, 2016 at 03:02:26PM +0000, Lawrynowicz, Jacek wrote: > > -----Original Message----- > > From: linux-pci-owner@vger.kernel.org [mailto:linux-pci- > > owner@vger.kernel.org] On Behalf Of Bjorn Helgaas > > Sent: Tuesday, January 19, 2016 10:39 PM > > To: Alex Williamson <alex.williamson@redhat.com> > > Cc: Lawrynowicz, Jacek <jacek.lawrynowicz@intel.com>; linux- > > pci@vger.kernel.org; bhelgaas@google.com; dwmw2@infradead.org; > > jroedel@suse.de > > Subject: Re: [PATCH] pci: Add support for multiple DMA aliases > > > > On Tue, Jan 19, 2016 at 02:04:31PM -0700, Alex Williamson wrote: > > > On Tue, 2016-01-19 at 14:12 -0600, Bjorn Helgaas wrote: > > > > [+cc Alex] > > > > > > > > On Mon, Jan 18, 2016 at 09:33:15PM -0600, Bjorn Helgaas wrote: > > > > > On Mon, Jan 18, 2016 at 05:07:47PM +0100, Jacek Lawrynowicz wrote: > > > > > > This patch solves IOMMU support issues with PCIe non-transparent > > > > > > bridges that use Requester ID look-up tables (LUT), e.g. > > > > > > PEX8733. Before exiting the bridge, packet's RID is rewritten > > > > > > according to LUT programmed by a driver. Modified packets are > > > > > > then passed to a destination bus and processed upstream. The > > > > > > problem is that such packets seem to come from non-existent > > > > > > nodes that are hidden behind NTB and are not discoverable by a > > > > > > destination node, so IOMMU discards them. Adding DMA alias for a > > > > > > given LUT entry allows IOMMU to create a proper mapping that > > enables inter-node communication. > > > > > > > > > > > > The current DMA alias implementation supports only single alias, > > > > > > so it's not possible to connect more than two nodes when IOMMU > > > > > > is enabled. This implementation enables all possible aliases on > > > > > > a given bus (256) that are stored in a bitset. Alias devfn is > > > > > > directly translated to a bit number. The bitset is not allocated > > > > > > for devices that have no need for DMA aliases. > > > > > > My only concern here is that pci_add_dma_alias() makes aliases seem > > > more dynamic than they really are. For instance, when we add a device > > > to an IOMMU domain, we evaluate the aliases at that point, if an NTB > > > later adds a new lookup entry and specifies a new alias, it's still > > > not going to work. Similarly, IOMMU groups are evaluated as the > > > device is added, so if an alias is to a physical device and we need > > > the cross reference to bind them together into a single group, calling > > > pci_add_dma_alias() from a driver isn't going to work. > > > > > > The existing code had this problem too, it's just more obvious now > > > that we have a helper function and that the helper is exported for use > > > outside of the PCI core. Thanks, > > > > Oh, that's a really good point. I hadn't noticed the export. Is there any > > reason pci_add_dma_alias() needs to be declared in include/linux/pci.h and > > exported to modules? > > > > I don't think the current patch requires the export, but I suppose you > > envision an NTB driver that might be a module? I guess we can easily export > > it when that driver is merged if that seems the best solution. > > This export would be useful for Xeon Phi x200 which uses on a NTB generating > multiple RIDs. x200 is not yet ready for upstreming (x100 is already upstreamed) and > having this export would make driver development less painful. I don't really want to merge things that only exist to enable out-of-tree development, because (1) they're an extra maintenance burden for which we get risk without benefit, and (2) we can't see the out-of-tree code, so it's easy for people to make changes that accidentally break that code. Looking at the patch again, I see that even without the export, there's no current benefit, and there are a couple things that should be fixed up: - Fix the comment that references dma_alias_devfn (since you removed that field). - Add an interface that get_pci_alias_group() can use instead of accessing the dma_alias_mask directly. - Figure out the scope and exportability of pci_add_dma_alias() and the new boolean interface I'm suggesting. So I'm going to drop this for now, and you can carry it along with your driver patches. Then when we merge the driver, we should think about whether it makes sense to export pci_add_dma_alias(), or whether we can come up with an interface that is safer with regard to the issues Alex mentioned. I think this patch makes a lot of sense, so I'm definitely not rejecting it. But I think it will make even more sense in the context of the driver, when we can think about the lifetime of the aliases. (*You* know that already, but I don't, so I'm operating with a lot of missing information :)) Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2016-01-20 at 11:46 -0600, Bjorn Helgaas wrote: > > I don't really want to merge things that only exist to enable > out-of-tree development, because (1) they're an extra maintenance > burden for which we get risk without benefit, and (2) we can't see > the out-of-tree code, so it's easy for people to make changes that > accidentally break that code. > > Looking at the patch again, I see that even without the export, > there's no current benefit, This is just a PCI quirk; I'm not sure it should be considered part of the driver code at all. With this patch, even without a Linux driver, we could correctly handle assignment to VM guests (which *might* have a driver), and also theoretically we should be able to handle fault storms and shoot the right device in the head if it was left in an odd state and misbehaves (not that I've hooked that up yet). So I'm not sure it makes sense to tie this patch to the existence of a driver.
> -----Original Message----- > From: Bjorn Helgaas [mailto:helgaas@kernel.org] > Sent: Wednesday, January 20, 2016 6:46 PM > To: Lawrynowicz, Jacek <jacek.lawrynowicz@intel.com> > Cc: Alex Williamson <alex.williamson@redhat.com>; linux- > pci@vger.kernel.org; bhelgaas@google.com; dwmw2@infradead.org; > jroedel@suse.de > Subject: Re: [PATCH] pci: Add support for multiple DMA aliases > > Hi Jacek, > > On Wed, Jan 20, 2016 at 03:02:26PM +0000, Lawrynowicz, Jacek wrote: > > > -----Original Message----- > > > From: linux-pci-owner@vger.kernel.org [mailto:linux-pci- > > > owner@vger.kernel.org] On Behalf Of Bjorn Helgaas > > > Sent: Tuesday, January 19, 2016 10:39 PM > > > To: Alex Williamson <alex.williamson@redhat.com> > > > Cc: Lawrynowicz, Jacek <jacek.lawrynowicz@intel.com>; linux- > > > pci@vger.kernel.org; bhelgaas@google.com; dwmw2@infradead.org; > > > jroedel@suse.de > > > Subject: Re: [PATCH] pci: Add support for multiple DMA aliases > > > > > > On Tue, Jan 19, 2016 at 02:04:31PM -0700, Alex Williamson wrote: > > > > On Tue, 2016-01-19 at 14:12 -0600, Bjorn Helgaas wrote: > > > > > [+cc Alex] > > > > > > > > > > On Mon, Jan 18, 2016 at 09:33:15PM -0600, Bjorn Helgaas wrote: > > > > > > On Mon, Jan 18, 2016 at 05:07:47PM +0100, Jacek Lawrynowicz > wrote: > > > > > > > This patch solves IOMMU support issues with PCIe > > > > > > > non-transparent bridges that use Requester ID look-up tables > (LUT), e.g. > > > > > > > PEX8733. Before exiting the bridge, packet's RID is > > > > > > > rewritten according to LUT programmed by a driver. Modified > > > > > > > packets are then passed to a destination bus and processed > > > > > > > upstream. The problem is that such packets seem to come from > > > > > > > non-existent nodes that are hidden behind NTB and are not > > > > > > > discoverable by a destination node, so IOMMU discards them. > > > > > > > Adding DMA alias for a given LUT entry allows IOMMU to > > > > > > > create a proper mapping that > > > enables inter-node communication. > > > > > > > > > > > > > > The current DMA alias implementation supports only single > > > > > > > alias, so it's not possible to connect more than two nodes > > > > > > > when IOMMU is enabled. This implementation enables all > > > > > > > possible aliases on a given bus (256) that are stored in a > > > > > > > bitset. Alias devfn is directly translated to a bit number. > > > > > > > The bitset is not allocated for devices that have no need for DMA > aliases. > > > > > > > > My only concern here is that pci_add_dma_alias() makes aliases > > > > seem more dynamic than they really are. For instance, when we add > > > > a device to an IOMMU domain, we evaluate the aliases at that > > > > point, if an NTB later adds a new lookup entry and specifies a new > > > > alias, it's still not going to work. Similarly, IOMMU groups are > > > > evaluated as the device is added, so if an alias is to a physical > > > > device and we need the cross reference to bind them together into > > > > a single group, calling > > > > pci_add_dma_alias() from a driver isn't going to work. > > > > > > > > The existing code had this problem too, it's just more obvious now > > > > that we have a helper function and that the helper is exported for > > > > use outside of the PCI core. Thanks, > > > > > > Oh, that's a really good point. I hadn't noticed the export. Is > > > there any reason pci_add_dma_alias() needs to be declared in > > > include/linux/pci.h and exported to modules? > > > > > > I don't think the current patch requires the export, but I suppose > > > you envision an NTB driver that might be a module? I guess we can > > > easily export it when that driver is merged if that seems the best > solution. > > > > This export would be useful for Xeon Phi x200 which uses on a NTB > > generating multiple RIDs. x200 is not yet ready for upstreming (x100 > > is already upstreamed) and having this export would make driver > development less painful. > > I don't really want to merge things that only exist to enable out-of-tree > development, because (1) they're an extra maintenance burden for which > we get risk without benefit, and (2) we can't see the out-of-tree code, so it's > easy for people to make changes that accidentally break that code. > > Looking at the patch again, I see that even without the export, there's no > current benefit, and there are a couple things that should be fixed up: > > - Fix the comment that references dma_alias_devfn (since you removed > that field). > > - Add an interface that get_pci_alias_group() can use instead of > accessing the dma_alias_mask directly. > > - Figure out the scope and exportability of pci_add_dma_alias() and > the new boolean interface I'm suggesting. > > So I'm going to drop this for now, and you can carry it along with your driver > patches. Then when we merge the driver, we should think about whether it > makes sense to export pci_add_dma_alias(), or whether we can come up > with an interface that is safer with regard to the issues Alex mentioned. > > I think this patch makes a lot of sense, so I'm definitely not rejecting it. But I > think it will make even more sense in the context of the driver, when we can > think about the lifetime of the aliases. > (*You* know that already, but I don't, so I'm operating with a lot of missing > information :)) I would understand rejecting the patch if it would be specific to out of the tree driver. It make perfect sense from kernel development perspective but this patch is not device specific and everyone previously agreed it improves current dma alias handling. It's also small and without the export it has very little maintenance impact. Please reconsider merging the patch. -- Jacek Lawrynowicz Intel Technology Poland sp. z o.o. KRS 101882 - ul. Slowackiego 173, 80-298 Gdansk -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 21, 2016 at 09:39:01AM +0000, David Woodhouse wrote: > On Wed, 2016-01-20 at 11:46 -0600, Bjorn Helgaas wrote: > > > > I don't really want to merge things that only exist to enable > > out-of-tree development, because (1) they're an extra maintenance > > burden for which we get risk without benefit, and (2) we can't see > > the out-of-tree code, so it's easy for people to make changes that > > accidentally break that code. > > > > Looking at the patch again, I see that even without the export, > > there's no current benefit, > > This is just a PCI quirk; I'm not sure it should be considered part of > the driver code at all. With this patch, even without a Linux driver, > we could correctly handle assignment to VM guests (which *might* have a > driver), and also theoretically we should be able to handle fault > storms and shoot the right device in the head if it was left in an odd > state and misbehaves (not that I've hooked that up yet). > > So I'm not sure it makes sense to tie this patch to the existence of a > driver. This definitely isn't part of the driver code; I didn't mean to suggest that. I'd like to see this as a separate patch, but as part of a series that adds a user of the multiple-alias functionality. All I'm saying is that as-is, this patch makes the quirks easier to read but doesn't actually change any behavior: we set up at most one alias, and we do it as a header quirk at enumeration-time, so there are no new lifetime issues. Normally we merge things when they're needed, and multiple alias support is a bit of infrastructure that isn't used yet. I already said I'm not rejecting the patch. Alex and I raised a few questions. Usually that leads to a little discussion and possibly a v2 of the patch, but so far, I haven't seen any answers. Here are a couple more questions/concerns: - If we export an "add" function, do we need a corresponding "remove"? This depends on how the alias lifetimes are managed, and I haven't seen that yet. - Changing pci_dev_flags and struct pci_dev changes the ABI and makes work for distros, with no current benefit. I'm not sure why we're even having this discussion. The merge window opened Jan 10, and IIRC, I first saw the patch Jan 11 and it first appeared on linux-pci Jan 13, so this is just late for v4.5 to begin with. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2016-01-21 at 09:22 -0600, Bjorn Helgaas wrote: > > This definitely isn't part of the driver code; I didn't mean to > suggest that. I'd like to see this as a separate patch, but as part > of a series that adds a user of the multiple-alias functionality. > > All I'm saying is that as-is, this patch makes the quirks easier to > read but doesn't actually change any behavior: we set up at most one > alias, and we do it as a header quirk at enumeration-time, so there > are no new lifetime issues. Normally we merge things when they're > needed, and multiple alias support is a bit of infrastructure that > isn't used yet. Ah, right. I see your point. I don't actually see why pci_add_dma_alias() should be exported at all. I suspect the best approach is for Jacek to add a second patch in this series, adding the required quirk to drivers/pci/quirks.c for the device in question, and then resubmit them to you when Linus releases 4.5-rc1. This is completely independent of any native driver for the device, of course.
> -----Original Message----- > From: David Woodhouse [mailto:dwmw2@infradead.org] > Sent: Thursday, January 21, 2016 4:33 PM > To: Bjorn Helgaas <helgaas@kernel.org> > Cc: Lawrynowicz, Jacek <jacek.lawrynowicz@intel.com>; Alex Williamson > <alex.williamson@redhat.com>; linux-pci@vger.kernel.org; > bhelgaas@google.com; jroedel@suse.de > Subject: Re: [PATCH] pci: Add support for multiple DMA aliases > > On Thu, 2016-01-21 at 09:22 -0600, Bjorn Helgaas wrote: > > > > This definitely isn't part of the driver code; I didn't mean to > > suggest that. I'd like to see this as a separate patch, but as part > > of a series that adds a user of the multiple-alias functionality. > > > > All I'm saying is that as-is, this patch makes the quirks easier to > > read but doesn't actually change any behavior: we set up at most one > > alias, and we do it as a header quirk at enumeration-time, so there > > are no new lifetime issues. Normally we merge things when they're > > needed, and multiple alias support is a bit of infrastructure that > > isn't used yet. > > Ah, right. I see your point. > > I don't actually see why pci_add_dma_alias() should be exported at all. > > I suspect the best approach is for Jacek to add a second patch in this > series, adding the required quirk to drivers/pci/quirks.c for the > device in question, and then resubmit them to you when Linus releases > 4.5-rc1. > > This is completely independent of any native driver for the device, of > course. OK guys, I've prepared a second version of the patch that includes fixes for all your review comments. I will post it today to linux-pci. It also includes a quirk for x200 dma driver that is undergoing internal review and will be posted to dma tree probably early next week. It would be great if v2 could be included in 4.5 but I understand that it's a bit late and you may prefer to wait for x200 dma driver to be posted. Thanks for all your feedback. The patch now make a lot more sense. Regards, Jacek -- Jacek Lawrynowicz Intel Technology Poland sp. z o.o. KRS 101882 - ul. Slowackiego 173, 80-298 Gdansk
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 29cfe1a..b6434c0 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4587,7 +4587,6 @@ void pci_add_dma_alias(struct pci_dev *dev, u8 devfn) set_bit(devfn, dev->dma_alias_mask); } -EXPORT_SYMBOL_GPL(pci_add_dma_alias); bool pci_device_is_present(struct pci_dev *pdev) { diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index fd2f03f..1aad757 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -339,4 +339,6 @@ static inline int pci_dev_specific_reset(struct pci_dev *dev, int probe) struct pci_host_bridge *pci_find_host_bridge(struct pci_bus *bus); +void pci_add_dma_alias(struct pci_dev *dev, u8 devfn); + #endif /* DRIVERS_PCI_H */ diff --git a/include/linux/pci.h b/include/linux/pci.h index 66c07d0..00d0862 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1228,8 +1228,6 @@ resource_size_t pcibios_iov_resource_alignment(struct pci_dev *dev, int resno); int pci_set_vga_state(struct pci_dev *pdev, bool decode, unsigned int command_bits, u32 flags); -void pci_add_dma_alias(struct pci_dev *dev, u8 devfn); - /* kmem_cache style wrapper around pci_alloc_consistent() */ #include <linux/pci-dma.h>