Message ID | 20221111132706.500733944@linutronix.de (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 2 API rework | expand |
On Fri, Nov 11, 2022 at 02:56:50PM +0100, Thomas Gleixner wrote: > To support multiple MSI interrupt domains per device it is necessary to > segment the xarray MSI descriptor storage. Each domain gets up to > MSI_MAX_INDEX entries. This kinds of suggests that the new per-device MSI domains should hold this storage instead of per-device xarray? I suppose the reason to avoid this is because alot of the driver facing API is now built on vector index numbers that index this xarray? But on the other hand can we just say drivers using multiple domains are "new" and they should use some new style pointer based interface so we don't have to have arrays of things? At least, I'd like to understand a bit better the motivation for using a domain ID instead of a pointer. It feels like we are baking in several hard coded limits with this choice > +static int msi_get_domain_base_index(struct device *dev, unsigned int domid) > +{ > + lockdep_assert_held(&dev->msi.data->mutex); > + > + if (WARN_ON_ONCE(domid >= MSI_MAX_DEVICE_IRQDOMAINS)) > + return -ENODEV; > + > + if (WARN_ON_ONCE(!dev->msi.data->__irqdomains[domid])) > + return -ENODEV; > + > + return domid * MSI_XA_DOMAIN_SIZE; > +} > + > + > /** Extra new line Jason
On Wed, Nov 16 2022 at 14:36, Jason Gunthorpe wrote: > On Fri, Nov 11, 2022 at 02:56:50PM +0100, Thomas Gleixner wrote: >> To support multiple MSI interrupt domains per device it is necessary to >> segment the xarray MSI descriptor storage. Each domain gets up to >> MSI_MAX_INDEX entries. > > This kinds of suggests that the new per-device MSI domains should hold > this storage instead of per-device xarray? No, really not. This would create random storage in random driver places instead of having a central storage place which is managed by the core code. We've had that back in the days when every architecture had it's own magic place to store and manage interrupt descriptors. Seen that, mopped it up and never want to go back. > I suppose the reason to avoid this is because alot of the driver > facing API is now built on vector index numbers that index this > xarray? That's one aspect, but as I demonstrate later even for the IMS domains which do not have a real requirement for 'index' you still need to have a place to store the MSI descriptor and allocate storage space for it. I really don't want to have random places doing that because then I can't provide implicit MSI descriptor management, e.g. automatic alloc/free anymore and everything has to happen at the driver side. The only reason why I still need to do that for PCI/MSI is to be able to support the museum architectures which still depend on the arch_....() interfaces from 20 years ago. So if a IMS domain, which e.g. stores the MSI message in queue memory, wants a new interrupt then it allocates it with MSI_ANY_INDEX, which gives it the next free slot in the XARRAY section of the MSI domain. This avoids having IDA, bitmap allocators or whatever at the driver side and having a virtual index number to track things does not affect the flexibility of the driver side in any way. All the driver needs at the very end is the interrupt number and the message itself. > But on the other hand can we just say drivers using multiple domains > are "new" and they should use some new style pointer based interface > so we don't have to have arrays of things? Then driver writers have to provide storage for the domain pointer and care about teardown etc. Seriously? NO! > At least, I'd like to understand a bit better the motivation for using > a domain ID instead of a pointer. The main motivation was to avoid device specific storage for the irq domain pointers. It would have started with PCI/MSI[X]: I'd had to add a irqdomain pointer to struct pci_dev and then have the PCI core care about it. So we'd add that to everything and the world which utilizes per device MSI domains which is quite a few places outside of PCI in the ARM64 world and growing. The msi_device_data struct which is allocated on demand for MSI usage is the obvious point to store _and_ manage these things, i.e. managed teardown etc. Giving this up makes any change to the core code hard because you have to chase all usage sites and mop them up. Just look at the ARM part of this series which is by now 40+ patches just to mop up the irqchip core. There are still 25 PCI/MSI global irqdomain left. > It feels like we are baking in several hard coded limits with this > choice Which ones? The chosen array section size per domain is arbitrary and can be changed at any given time. Though you have to exhaust 64k vectors per domain first before we start debating that. The number of irqdomains is not really hard limited either. It's trivial enough to extend that number and once we hit 32 we just can stash them away in the xarray. I pondered to do that right away, but that wastes too much memory for now. It really does not matter whether the domain creation results in a number or in a pointer. Pointers are required for the inner workings of the domain hierarchy but absolutely uninteresting for endpoint domains. All you need there is a conveniant way to create the domain and then allocate/free interrupts as you see fit. We agreed a year ago that we want to abstract most of these things away for driver writers and that all they need is simple way to create the domains and the corresponding interrupt chip is mostly about writing the MSI message to implementation defined storage and eventually providing a implementation specific mask/unmask operation. So what are you concerned about? Thanks, tglx
On Wed, Nov 16, 2022 at 11:32:15PM +0100, Thomas Gleixner wrote: > On Wed, Nov 16 2022 at 14:36, Jason Gunthorpe wrote: > > On Fri, Nov 11, 2022 at 02:56:50PM +0100, Thomas Gleixner wrote: > >> To support multiple MSI interrupt domains per device it is necessary to > >> segment the xarray MSI descriptor storage. Each domain gets up to > >> MSI_MAX_INDEX entries. > > > > This kinds of suggests that the new per-device MSI domains should hold > > this storage instead of per-device xarray? > > No, really not. This would create random storage in random driver places > instead of having a central storage place which is managed by the core > code. We've had that back in the days when every architecture had it's > own magic place to store and manage interrupt descriptors. Seen that, > mopped it up and never want to go back. I don't mean shift it into the msi_domain driver logic, I just mean stick an xarray in the struct msi_domain that the core code, and only the core code, manages. But I suppose, on reflection, the strong reason not to do this is that the msi_descriptor array is per-device, and while it would work OK with per-device msi_domains we still have the legacy of global msi domains and thus still need a per-device place to store the global msi domain's per-device descriptors. > > At least, I'd like to understand a bit better the motivation for using > > a domain ID instead of a pointer. > > The main motivation was to avoid device specific storage for the irq > domain pointers. It would have started with PCI/MSI[X]: I'd had to add a > irqdomain pointer to struct pci_dev and then have the PCI core care > about it. So we'd add that to everything and the world which utilizes > per device MSI domains which is quite a few places outside of PCI in the > ARM64 world and growing. I was thinking more that the "default" domain (eg the domain ID 0 as this series has it) would remain as a domain pointer in the device data as it is here, but any secondary domains would be handled with a pointer that the driver owns. You could have as many secondary domains as is required this way. Few drivers would ever use a secondary domain, so it not really a big deal for them to hold the pointer lifetime. > So what are you concerned about? Mostly API clarity, I find it very un-kernly to swap a clear pointer for an ID #. We loose typing, the APIs become less clear and we now have to worry about ID allocation policy if we ever need more than 2. Thanks, Jason
On Wed, Nov 16 2022 at 20:37, Jason Gunthorpe wrote: > On Wed, Nov 16, 2022 at 11:32:15PM +0100, Thomas Gleixner wrote: >> On Wed, Nov 16 2022 at 14:36, Jason Gunthorpe wrote: >> > On Fri, Nov 11, 2022 at 02:56:50PM +0100, Thomas Gleixner wrote: >> >> To support multiple MSI interrupt domains per device it is necessary to >> >> segment the xarray MSI descriptor storage. Each domain gets up to >> >> MSI_MAX_INDEX entries. >> > >> > This kinds of suggests that the new per-device MSI domains should hold >> > this storage instead of per-device xarray? >> >> No, really not. This would create random storage in random driver places >> instead of having a central storage place which is managed by the core >> code. We've had that back in the days when every architecture had it's >> own magic place to store and manage interrupt descriptors. Seen that, >> mopped it up and never want to go back. > > I don't mean shift it into the msi_domain driver logic, I just mean > stick an xarray in the struct msi_domain that the core code, and only > the core code, manages. > > But I suppose, on reflection, the strong reason not to do this is that > the msi_descriptor array is per-device, and while it would work OK > with per-device msi_domains we still have the legacy of global msi > domains and thus still need a per-device place to store the global msi > domain's per-device descriptors. I tried several approaches but all of them ended up having slightly different code pathes and decided to keep everything the same from legacy arch over global MSI and the modern per device MSI models. Due to that some of the constructs are slightly awkward, but the important outcome for me was that I ended up with as many shared code pathes as possible. Having separate code pathes for all variants is for one causing code bloat and what's worse it's a guarantee for divergance and maintenance nightmares. As this is setup/teardown management code and not the fancy hotpath where we really want to spare cycles, I went for the unified model. > You could have as many secondary domains as is required this way. Few > drivers would ever use a secondary domain, so it not really a big deal > for them to hold the pointer lifetime. > >> So what are you concerned about? > > Mostly API clarity, I find it very un-kernly to swap a clear pointer > for an ID #. We loose typing, the APIs become less clear and we now > have to worry about ID allocation policy if we ever need more than 2. I don't see an issue with that. id = msi_create_device_domain(dev, &template, ...); is not much different from: ptr = msi_create_device_domain(dev, &template, ...); But it makes a massive difference vs. encapsulation and pointer leakage. If you have a stale ID then you can't do harm, a stale pointer very much so. Aside of that once pointers are available people insist on fiddling in the guts. As I'm mopping up behind driver writers for the last twenty years now, my confidence in them is pretty close to zero. So I rather be defensive and work towards encapsulation where ever its possible. Interrupts are a source of hard to debug subtle bugs, so taking the tinkerers the tools away to cause them is a good thing IMO. Thanks, tglx
> From: Thomas Gleixner <tglx@linutronix.de> > Sent: Friday, November 11, 2022 9:57 PM > > +/* Invalid XA index which is outside of any searchable range */ > +#define MSI_XA_MAX_INDEX (ULONG_MAX - 1) > +#define MSI_XA_DOMAIN_SIZE (MSI_MAX_INDEX + 1) > + Out of curiosity. Other places treat MSI_MAX_INDEX - 1 as the upper bound of a valid range. This size definition here implies that the last ID is wasted for every domain. Is it intended?
On Fri, Nov 18 2022 at 07:57, Kevin Tian wrote: >> From: Thomas Gleixner <tglx@linutronix.de> >> Sent: Friday, November 11, 2022 9:57 PM >> >> +/* Invalid XA index which is outside of any searchable range */ >> +#define MSI_XA_MAX_INDEX (ULONG_MAX - 1) >> +#define MSI_XA_DOMAIN_SIZE (MSI_MAX_INDEX + 1) >> + > > Out of curiosity. Other places treat MSI_MAX_INDEX - 1 as the upper > bound of a valid range. This size definition here implies that the last ID > is wasted for every domain. Is it intended? Bah. MSI_MAX_INDEX is inclusive so the size must be + 1. I obviously missed that in the other places which use it as upper bound. Not that it matters, but yes. Let me fix that.
--- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -177,6 +177,7 @@ enum msi_desc_filter { * @mutex: Mutex protecting the MSI descriptor store * @__store: Xarray for storing MSI descriptor pointers * @__iter_idx: Index to search the next entry for iterators + * @__iter_max: Index to limit the search * @__irqdomains: Per device interrupt domains */ struct msi_device_data { @@ -185,6 +186,7 @@ struct msi_device_data { struct mutex mutex; struct xarray __store; unsigned long __iter_idx; + unsigned long __iter_max; struct irq_domain *__irqdomains[MSI_MAX_DEVICE_IRQDOMAINS]; }; @@ -193,14 +195,34 @@ int msi_setup_device_data(struct device void msi_lock_descs(struct device *dev); void msi_unlock_descs(struct device *dev); -struct msi_desc *msi_first_desc(struct device *dev, enum msi_desc_filter filter); +struct msi_desc *msi_domain_first_desc(struct device *dev, unsigned int domid, + enum msi_desc_filter filter); + +/** + * msi_first_desc - Get the first MSI descriptor of the default irqdomain + * @dev: Device to operate on + * @filter: Descriptor state filter + * + * Must be called with the MSI descriptor mutex held, i.e. msi_lock_descs() + * must be invoked before the call. + * + * Return: Pointer to the first MSI descriptor matching the search + * criteria, NULL if none found. + */ +static inline struct msi_desc *msi_first_desc(struct device *dev, + enum msi_desc_filter filter) +{ + return msi_domain_first_desc(dev, MSI_DEFAULT_DOMAIN, filter); +} + struct msi_desc *msi_next_desc(struct device *dev, enum msi_desc_filter filter); /** - * msi_for_each_desc - Iterate the MSI descriptors + * msi_domain_for_each_desc - Iterate the MSI descriptors in a specific domain * * @desc: struct msi_desc pointer used as iterator * @dev: struct device pointer - device to iterate + * @domid: The id of the interrupt domain which should be walked. * @filter: Filter for descriptor selection * * Notes: @@ -208,10 +230,25 @@ struct msi_desc *msi_next_desc(struct de * pair. * - It is safe to remove a retrieved MSI descriptor in the loop. */ -#define msi_for_each_desc(desc, dev, filter) \ - for ((desc) = msi_first_desc((dev), (filter)); (desc); \ +#define msi_domain_for_each_desc(desc, dev, domid, filter) \ + for ((desc) = msi_domain_first_desc((dev), (domid), (filter)); (desc); \ (desc) = msi_next_desc((dev), (filter))) +/** + * msi_for_each_desc - Iterate the MSI descriptors in the default irqdomain + * + * @desc: struct msi_desc pointer used as iterator + * @dev: struct device pointer - device to iterate + * @filter: Filter for descriptor selection + * + * Notes: + * - The loop must be protected with a msi_lock_descs()/msi_unlock_descs() + * pair. + * - It is safe to remove a retrieved MSI descriptor in the loop. + */ +#define msi_for_each_desc(desc, dev, filter) \ + msi_domain_for_each_desc((desc), (dev), MSI_DEFAULT_DOMAIN, (filter)) + #define msi_desc_to_dev(desc) ((desc)->dev) #ifdef CONFIG_IRQ_MSI_IOMMU --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -21,6 +21,10 @@ static inline int msi_sysfs_create_group(struct device *dev); +/* Invalid XA index which is outside of any searchable range */ +#define MSI_XA_MAX_INDEX (ULONG_MAX - 1) +#define MSI_XA_DOMAIN_SIZE (MSI_MAX_INDEX + 1) + static inline void msi_setup_default_irqdomain(struct device *dev, struct msi_device_data *md) { if (!dev->msi.domain) @@ -33,6 +37,20 @@ static inline void msi_setup_default_irq md->__irqdomains[MSI_DEFAULT_DOMAIN] = dev->msi.domain; } +static int msi_get_domain_base_index(struct device *dev, unsigned int domid) +{ + lockdep_assert_held(&dev->msi.data->mutex); + + if (WARN_ON_ONCE(domid >= MSI_MAX_DEVICE_IRQDOMAINS)) + return -ENODEV; + + if (WARN_ON_ONCE(!dev->msi.data->__irqdomains[domid])) + return -ENODEV; + + return domid * MSI_XA_DOMAIN_SIZE; +} + + /** * msi_alloc_desc - Allocate an initialized msi_desc * @dev: Pointer to the device for which this is allocated @@ -229,6 +247,7 @@ int msi_setup_device_data(struct device xa_init(&md->__store); mutex_init(&md->mutex); + md->__iter_idx = MSI_XA_MAX_INDEX; dev->msi.data = md; devres_add(dev, md); return 0; @@ -251,7 +270,7 @@ EXPORT_SYMBOL_GPL(msi_lock_descs); void msi_unlock_descs(struct device *dev) { /* Invalidate the index wich was cached by the iterator */ - dev->msi.data->__iter_idx = MSI_MAX_INDEX; + dev->msi.data->__iter_idx = MSI_XA_MAX_INDEX; mutex_unlock(&dev->msi.data->mutex); } EXPORT_SYMBOL_GPL(msi_unlock_descs); @@ -260,17 +279,18 @@ static struct msi_desc *msi_find_desc(st { struct msi_desc *desc; - xa_for_each_start(&md->__store, md->__iter_idx, desc, md->__iter_idx) { + xa_for_each_range(&md->__store, md->__iter_idx, desc, md->__iter_idx, md->__iter_max) { if (msi_desc_match(desc, filter)) return desc; } - md->__iter_idx = MSI_MAX_INDEX; + md->__iter_idx = MSI_XA_MAX_INDEX; return NULL; } /** - * msi_first_desc - Get the first MSI descriptor of a device + * msi_domain_first_desc - Get the first MSI descriptor of an irqdomain associated to a device * @dev: Device to operate on + * @domid: The id of the interrupt domain which should be walked. * @filter: Descriptor state filter * * Must be called with the MSI descriptor mutex held, i.e. msi_lock_descs() @@ -279,19 +299,26 @@ static struct msi_desc *msi_find_desc(st * Return: Pointer to the first MSI descriptor matching the search * criteria, NULL if none found. */ -struct msi_desc *msi_first_desc(struct device *dev, enum msi_desc_filter filter) +struct msi_desc *msi_domain_first_desc(struct device *dev, unsigned int domid, + enum msi_desc_filter filter) { struct msi_device_data *md = dev->msi.data; + int baseidx; if (WARN_ON_ONCE(!md)) return NULL; lockdep_assert_held(&md->mutex); - md->__iter_idx = 0; + baseidx = msi_get_domain_base_index(dev, domid); + if (baseidx < 0) + return NULL; + + md->__iter_idx = baseidx; + md->__iter_max = baseidx + MSI_MAX_INDEX - 1; return msi_find_desc(md, filter); } -EXPORT_SYMBOL_GPL(msi_first_desc); +EXPORT_SYMBOL_GPL(msi_domain_first_desc); /** * msi_next_desc - Get the next MSI descriptor of a device @@ -315,7 +342,7 @@ struct msi_desc *msi_next_desc(struct de lockdep_assert_held(&md->mutex); - if (md->__iter_idx >= (unsigned long)MSI_MAX_INDEX) + if (md->__iter_idx >= md->__iter_max) return NULL; md->__iter_idx++;
To support multiple MSI interrupt domains per device it is necessary to segment the xarray MSI descriptor storage. Each domain gets up to MSI_MAX_INDEX entries. Change the iterators so they operate with domain ids and take the domain offsets into account. The publicly available iterators which are mostly used in legacy implementations and the PCI/MSI core default to MSI_DEFAULT_DOMAIN (0) which is the id for the existing "global" domains. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- include/linux/msi.h | 45 +++++++++++++++++++++++++++++++++++++++++---- kernel/irq/msi.c | 43 +++++++++++++++++++++++++++++++++++-------- 2 files changed, 76 insertions(+), 12 deletions(-)