Message ID | 1561162778-12669-6-git-send-email-megha.dey@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | Introduce dynamic allocation/freeing of MSI-X vectors | expand |
Megha, On Fri, 21 Jun 2019, Megha Dey wrote: > +static int free_msi_irqs_grp(struct pci_dev *dev, int group_id) > +{ > + > + for_each_pci_msi_entry(entry, dev) { > + if (entry->group_id == group_id && entry->irq) > + for (i = 0; i < entry->nvec_used; i++) > + BUG_ON(irq_has_action(entry->irq + i)); BUG_ON is wrong here. This can and must be handled gracefully. > + } > + > + pci_msi_teardown_msi_irqs_grp(dev, group_id); > + > + list_for_each_entry_safe(entry, tmp, msi_list, list) { > + if (entry->group_id == group_id) { > + clear_bit(entry->msi_attrib.entry_nr, dev->entry); > + list_del(&entry->list); > + free_msi_entry(entry); > + } > + } > + > + list_for_each_entry_safe(msix_sysfs_entry, tmp_msix, pci_msix, list) { > + if (msix_sysfs_entry->group_id == group_id) { Again. Proper group management makes all of that just straight forward and not yet another special case. > + msi_attrs = msix_sysfs_entry->msi_irq_group->attrs; > > +static void pci_msix_shutdown_grp(struct pci_dev *dev, int group_id) > +{ > + struct msi_desc *entry; > + int grp_present = 0; > + > + if (pci_dev_is_disconnected(dev)) { > + dev->msix_enabled = 0; Huch? What's that? I can't figure out why this is needed and of course it completely lacks a comment explaining this. > + return; > + } > + > + /* Return the device with MSI-X masked as initial states */ > + for_each_pci_msi_entry(entry, dev) { > + if (entry->group_id == group_id) { > + /* Keep cached states to be restored */ > + __pci_msix_desc_mask_irq(entry, 1); > + grp_present = 1; > + } > + } > + > + if (!grp_present) { > + pci_err(dev, "Group to be disabled not present\n"); > + return; So you print an error and silently return > + } > +} > + > +int pci_disable_msix_grp(struct pci_dev *dev, int group_id) > +{ > + int num_vecs; > + > + if (!pci_msi_enable || !dev) > + return -EINVAL; > + > + pci_msix_shutdown_grp(dev, group_id); > + num_vecs = free_msi_irqs_grp(dev, group_id); just to call in another function which has to do the same group_id lookup muck again. > + > + return num_vecs; > +} > +EXPORT_SYMBOL(pci_disable_msix_grp); Why is this exposed ? Thanks, tglx
On Sat, 2019-06-29 at 10:08 +0200, Thomas Gleixner wrote: > Megha, > > On Fri, 21 Jun 2019, Megha Dey wrote: > > > > +static int free_msi_irqs_grp(struct pci_dev *dev, int group_id) > > +{ > > > > + > > + for_each_pci_msi_entry(entry, dev) { > > + if (entry->group_id == group_id && entry->irq) > > + for (i = 0; i < entry->nvec_used; i++) > > + BUG_ON(irq_has_action(entry->irq + > > i)); > BUG_ON is wrong here. This can and must be handled gracefully. > Hmm, I reused this code from the 'free_msi_irqs' function. I am not sure why it is wrong to use BUG_ON here but ok to use it there, please let me know. > > > > + } > > + > > + pci_msi_teardown_msi_irqs_grp(dev, group_id); > > + > > + list_for_each_entry_safe(entry, tmp, msi_list, list) { > > + if (entry->group_id == group_id) { > > + clear_bit(entry->msi_attrib.entry_nr, dev- > > >entry); > > + list_del(&entry->list); > > + free_msi_entry(entry); > > + } > > + } > > + > > + list_for_each_entry_safe(msix_sysfs_entry, tmp_msix, > > pci_msix, list) { > > + if (msix_sysfs_entry->group_id == group_id) { > Again. Proper group management makes all of that just straight > forward and > not yet another special case. > Yeah, the new proposal of having a group_list would get rid of this. > > > > + msi_attrs = msix_sysfs_entry- > > >msi_irq_group->attrs; > > > > +static void pci_msix_shutdown_grp(struct pci_dev *dev, int > > group_id) > > +{ > > + struct msi_desc *entry; > > + int grp_present = 0; > > + > > + if (pci_dev_is_disconnected(dev)) { > > + dev->msix_enabled = 0; > Huch? What's that? I can't figure out why this is needed and of > course it > completely lacks a comment explaining this. > Again, I have reused this code from the pci_msix_shutdown() function. So for the group case, this is not required? > > > > + return; > > + } > > + > > + /* Return the device with MSI-X masked as initial states > > */ > > + for_each_pci_msi_entry(entry, dev) { > > + if (entry->group_id == group_id) { > > + /* Keep cached states to be restored */ > > + __pci_msix_desc_mask_irq(entry, 1); > > + grp_present = 1; > > + } > > + } > > + > > + if (!grp_present) { > > + pci_err(dev, "Group to be disabled not > > present\n"); > > + return; > So you print an error and silently return > This is a void function, hence no error value can be returned. What do you think is the right thing to do if someone wants to delete a group which is not present? > > > > + } > > +} > > + > > +int pci_disable_msix_grp(struct pci_dev *dev, int group_id) > > +{ > > + int num_vecs; > > + > > + if (!pci_msi_enable || !dev) > > + return -EINVAL; > > + > > + pci_msix_shutdown_grp(dev, group_id); > > + num_vecs = free_msi_irqs_grp(dev, group_id); > just to call in another function which has to do the same group_id > lookup > muck again. Even with the new proposal, we are to have 2 sets of functions: one to delete all the msic_desc entries associated with the device, and the other to delete those only belonging a 'user specified' group. So we do need to pass a group_id to these functions right? Yes, internally the deletion would be straightforward with the new approach. > > > > > + > > + return num_vecs; > > +} > > +EXPORT_SYMBOL(pci_disable_msix_grp); > Why is this exposed ? > As before, I just followed what pci_disable_msix() did <sigh>. Looks like pci_disable_msix is called from a variety of drivers, thus it is exposed. Currently, no one will use the pci_disable_msix_grp() externally, thus it need not be exposed for now. > Thanks, > > tglx
On Tue, 6 Aug 2019, Megha Dey wrote: > On Sat, 2019-06-29 at 10:08 +0200, Thomas Gleixner wrote: > > Megha, > > > > On Fri, 21 Jun 2019, Megha Dey wrote: > > > > > > +static int free_msi_irqs_grp(struct pci_dev *dev, int group_id) > > > +{ > > > > > > + > > > + for_each_pci_msi_entry(entry, dev) { > > > + if (entry->group_id == group_id && entry->irq) > > > + for (i = 0; i < entry->nvec_used; i++) > > > + BUG_ON(irq_has_action(entry->irq + > > > i)); > > BUG_ON is wrong here. This can and must be handled gracefully. > > > > Hmm, I reused this code from the 'free_msi_irqs' function. I am not > sure why it is wrong to use BUG_ON here but ok to use it there, please > let me know. We are not adding BUG_ON() anymore except for situations where there is absolutely no way out. Just because there is still older code having BUG_ON() does not make it any better. Copying it surely is no justification. If there is really no way out, then you need to explain it. > > > +static void pci_msix_shutdown_grp(struct pci_dev *dev, int > > > group_id) > > > +{ > > > + struct msi_desc *entry; > > > + int grp_present = 0; > > > + > > > + if (pci_dev_is_disconnected(dev)) { > > > + dev->msix_enabled = 0; > > Huch? What's that? I can't figure out why this is needed and of > > course it > > completely lacks a comment explaining this. > > > > Again, I have reused this code from the pci_msix_shutdown() function. > So for the group case, this is not required? Copy and paste is not an argument, really. Can this happen here? If so, then please add a comment. > > > > > > + return; > > > + } > > > + > > > + /* Return the device with MSI-X masked as initial states > > > */ > > > + for_each_pci_msi_entry(entry, dev) { > > > + if (entry->group_id == group_id) { > > > + /* Keep cached states to be restored */ > > > + __pci_msix_desc_mask_irq(entry, 1); > > > + grp_present = 1; > > > + } > > > + } > > > + > > > + if (!grp_present) { > > > + pci_err(dev, "Group to be disabled not > > > present\n"); > > > + return; > > So you print an error and silently return > > > > This is a void function, hence no error value can be returned. What do > you think is the right thing to do if someone wants to delete a group > which is not present? Well, you made it a void function. > > > > > > + } > > > +} > > > + > > > +int pci_disable_msix_grp(struct pci_dev *dev, int group_id) > > > +{ > > > + int num_vecs; > > > + > > > + if (!pci_msi_enable || !dev) > > > + return -EINVAL; > > > + > > > + pci_msix_shutdown_grp(dev, group_id); > > > + num_vecs = free_msi_irqs_grp(dev, group_id); > > just to call in another function which has to do the same group_id > > lookup > > muck again. > > Even with the new proposal, we are to have 2 sets of functions: one to > delete all the msic_desc entries associated with the device, and the > other to delete those only belonging a 'user specified' group. So we do > need to pass a group_id to these functions right? Yes, internally the > deletion would be straightforward with the new approach. That does not matter. If pci_msix_shutdown_grp() does not find a group, why proceeding instead of having a proper error return and telling the caller? Thanks, tglx
On Sun, 2019-08-11 at 09:18 +0200, Thomas Gleixner wrote: > On Tue, 6 Aug 2019, Megha Dey wrote: > > > > > On Sat, 2019-06-29 at 10:08 +0200, Thomas Gleixner wrote: > > > > > > Megha, > > > > > > On Fri, 21 Jun 2019, Megha Dey wrote: > > > > > > > > > > > > +static int free_msi_irqs_grp(struct pci_dev *dev, int > > > > group_id) > > > > +{ > > > > > > > > + > > > > + for_each_pci_msi_entry(entry, dev) { > > > > + if (entry->group_id == group_id && entry->irq) > > > > + for (i = 0; i < entry->nvec_used; i++) > > > > + BUG_ON(irq_has_action(entry- > > > > >irq + > > > > i)); > > > BUG_ON is wrong here. This can and must be handled gracefully. > > > > > Hmm, I reused this code from the 'free_msi_irqs' function. I am not > > sure why it is wrong to use BUG_ON here but ok to use it there, > > please > > let me know. > We are not adding BUG_ON() anymore except for situations where there > is > absolutely no way out. Just because there is still older code having > BUG_ON() does not make it any better. Copying it surely is no > justification. > > If there is really no way out, then you need to explain it. > Ok, got it. I will look into it closely to see if the BUG_ON is absolutely required. > > > > > > > > > > > > > > +static void pci_msix_shutdown_grp(struct pci_dev *dev, int > > > > group_id) > > > > +{ > > > > + struct msi_desc *entry; > > > > + int grp_present = 0; > > > > + > > > > + if (pci_dev_is_disconnected(dev)) { > > > > + dev->msix_enabled = 0; > > > Huch? What's that? I can't figure out why this is needed and of > > > course it > > > completely lacks a comment explaining this. > > > > > Again, I have reused this code from the pci_msix_shutdown() > > function. > > So for the group case, this is not required? > Copy and paste is not an argument, really. Can this happen here? If > so, > then please add a comment. > Ok, will do. > > > > > > > > > > > > > > > > > + return; > > > > + } > > > > + > > > > + /* Return the device with MSI-X masked as initial > > > > states > > > > */ > > > > + for_each_pci_msi_entry(entry, dev) { > > > > + if (entry->group_id == group_id) { > > > > + /* Keep cached states to be restored > > > > */ > > > > + __pci_msix_desc_mask_irq(entry, 1); > > > > + grp_present = 1; > > > > + } > > > > + } > > > > + > > > > + if (!grp_present) { > > > > + pci_err(dev, "Group to be disabled not > > > > present\n"); > > > > + return; > > > So you print an error and silently return > > > > > This is a void function, hence no error value can be returned. What > > do > > you think is the right thing to do if someone wants to delete a > > group > > which is not present? > Well, you made it a void function. ok sure, got your point. > > > > > > > > > > > > > > > > > > + } > > > > +} > > > > + > > > > +int pci_disable_msix_grp(struct pci_dev *dev, int group_id) > > > > +{ > > > > + int num_vecs; > > > > + > > > > + if (!pci_msi_enable || !dev) > > > > + return -EINVAL; > > > > + > > > > + pci_msix_shutdown_grp(dev, group_id); > > > > + num_vecs = free_msi_irqs_grp(dev, group_id); > > > just to call in another function which has to do the same > > > group_id > > > lookup > > > muck again. > > Even with the new proposal, we are to have 2 sets of functions: one > > to > > delete all the msic_desc entries associated with the device, and > > the > > other to delete those only belonging a 'user specified' group. So > > we do > > need to pass a group_id to these functions right? Yes, internally > > the > > deletion would be straightforward with the new approach. > That does not matter. If pci_msix_shutdown_grp() does not find a > group, why > proceeding instead of having a proper error return and telling the > caller? > Oh ok, I got it now, I will return a proper error code in the pci_msi_shutdown_grp and do the free_msi_irqs_grp only if the group is found. > Thanks, > > tglx
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index e947243..342e267 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -53,9 +53,23 @@ static void pci_msi_teardown_msi_irqs(struct pci_dev *dev) else arch_teardown_msi_irqs(dev); } + +static void pci_msi_teardown_msi_irqs_grp(struct pci_dev *dev, int group_id) +{ + struct irq_domain *domain; + + domain = dev_get_msi_domain(&dev->dev); + + if (domain && irq_domain_is_hierarchy(domain)) + msi_domain_free_irqs_grp(domain, &dev->dev, group_id); + else + arch_teardown_msi_irqs_grp(dev, group_id); +} + #else #define pci_msi_setup_msi_irqs arch_setup_msi_irqs #define pci_msi_teardown_msi_irqs arch_teardown_msi_irqs +#define pci_msi_teardown_msi_irqs_grp default_teardown_msi_irqs_grp #endif /* Arch hooks */ @@ -373,6 +387,7 @@ static void free_msi_irqs(struct pci_dev *dev) list_for_each_entry_safe(entry, tmp, msi_list, list) { if (entry->msi_attrib.is_msix) { + clear_bit(entry->msi_attrib.entry_nr, dev->entry); if (list_is_last(&entry->list, msi_list)) iounmap(entry->mask_base); } @@ -381,6 +396,8 @@ static void free_msi_irqs(struct pci_dev *dev) free_msi_entry(entry); } + idr_destroy(dev->grp_idr); + if (dev->msi_irq_groups) { sysfs_remove_groups(&dev->dev.kobj, dev->msi_irq_groups); msi_attrs = dev->msi_irq_groups[0]->attrs; @@ -398,6 +415,60 @@ static void free_msi_irqs(struct pci_dev *dev) } } +static const char msix_sysfs_grp[] = "msi_irqs"; + +static int free_msi_irqs_grp(struct pci_dev *dev, int group_id) +{ + struct list_head *msi_list = dev_to_msi_list(&dev->dev); + struct msi_desc *entry, *tmp; + struct attribute **msi_attrs; + struct device_attribute *dev_attr; + int i; + long vec; + struct msix_sysfs *msix_sysfs_entry, *tmp_msix; + struct list_head *pci_msix = &dev->msix_sysfs; + int num_vec = 0; + + for_each_pci_msi_entry(entry, dev) { + if (entry->group_id == group_id && entry->irq) + for (i = 0; i < entry->nvec_used; i++) + BUG_ON(irq_has_action(entry->irq + i)); + } + + pci_msi_teardown_msi_irqs_grp(dev, group_id); + + list_for_each_entry_safe(entry, tmp, msi_list, list) { + if (entry->group_id == group_id) { + clear_bit(entry->msi_attrib.entry_nr, dev->entry); + list_del(&entry->list); + free_msi_entry(entry); + } + } + + list_for_each_entry_safe(msix_sysfs_entry, tmp_msix, pci_msix, list) { + if (msix_sysfs_entry->group_id == group_id) { + msi_attrs = msix_sysfs_entry->msi_irq_group->attrs; + for (i = 0; i < msix_sysfs_entry->vecs_in_grp; i++) { + if (!i) + num_vec = msix_sysfs_entry->vecs_in_grp; + dev_attr = container_of(msi_attrs[i], + struct device_attribute, attr); + sysfs_remove_file_from_group(&dev->dev.kobj, + &dev_attr->attr, msix_sysfs_grp); + if (kstrtol(dev_attr->attr.name, 10, &vec)) + return -EINVAL; + kfree(dev_attr->attr.name); + kfree(dev_attr); + } + msix_sysfs_entry->msi_irq_group = NULL; + list_del(&msix_sysfs_entry->list); + idr_remove(dev->grp_idr, group_id); + kfree(msix_sysfs_entry); + } + } + return num_vec; +} + static void pci_intx_for_msi(struct pci_dev *dev, int enable) { if (!(dev->dev_flags & PCI_DEV_FLAGS_MSI_INTX_DISABLE_BUG)) @@ -1052,6 +1123,45 @@ void pci_disable_msix(struct pci_dev *dev) } EXPORT_SYMBOL(pci_disable_msix); +static void pci_msix_shutdown_grp(struct pci_dev *dev, int group_id) +{ + struct msi_desc *entry; + int grp_present = 0; + + if (pci_dev_is_disconnected(dev)) { + dev->msix_enabled = 0; + return; + } + + /* Return the device with MSI-X masked as initial states */ + for_each_pci_msi_entry(entry, dev) { + if (entry->group_id == group_id) { + /* Keep cached states to be restored */ + __pci_msix_desc_mask_irq(entry, 1); + grp_present = 1; + } + } + + if (!grp_present) { + pci_err(dev, "Group to be disabled not present\n"); + return; + } +} + +int pci_disable_msix_grp(struct pci_dev *dev, int group_id) +{ + int num_vecs; + + if (!pci_msi_enable || !dev) + return -EINVAL; + + pci_msix_shutdown_grp(dev, group_id); + num_vecs = free_msi_irqs_grp(dev, group_id); + + return num_vecs; +} +EXPORT_SYMBOL(pci_disable_msix_grp); + void pci_no_msi(void) { pci_msi_enable = 0; @@ -1356,6 +1466,26 @@ void pci_free_irq_vectors(struct pci_dev *dev) EXPORT_SYMBOL(pci_free_irq_vectors); /** + * pci_free_irq_vectors_grp - free previously allocated IRQs for a + * device associated with a group + * @dev: PCI device to operate on + * @group_id: group to be freed + * + * Undoes the allocations and enabling in pci_alloc_irq_vectors_dyn(). + * Can be only called for MSIx vectors. + */ +int pci_free_irq_vectors_grp(struct pci_dev *dev, int group_id) +{ + if (group_id < 0) { + pci_err(dev, "Group should be > 0\n"); + return -EINVAL; + } + + return pci_disable_msix_grp(dev, group_id); +} +EXPORT_SYMBOL(pci_free_irq_vectors_grp); + +/** * pci_irq_vector - return Linux IRQ number of a device vector * @dev: PCI device to operate on * @nr: device-relative interrupt vector index (0-based). diff --git a/include/linux/msi.h b/include/linux/msi.h index e61ba24..78929ad 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -333,6 +333,8 @@ struct irq_domain *msi_create_irq_domain(struct fwnode_handle *fwnode, int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, int nvec); void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev); +void msi_domain_free_irqs_grp(struct irq_domain *domain, struct device *dev, + int group_id); struct msi_domain_info *msi_get_domain_info(struct irq_domain *domain); struct irq_domain *platform_msi_create_irq_domain(struct fwnode_handle *fwnode, diff --git a/include/linux/pci.h b/include/linux/pci.h index 73385c0..944e539 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1404,6 +1404,7 @@ int pci_msi_vec_count(struct pci_dev *dev); void pci_disable_msi(struct pci_dev *dev); int pci_msix_vec_count(struct pci_dev *dev); void pci_disable_msix(struct pci_dev *dev); +int pci_disable_msix_grp(struct pci_dev *dev, int group_id); void pci_restore_msi_state(struct pci_dev *dev); int pci_msi_enabled(void); int pci_enable_msi(struct pci_dev *dev); @@ -1428,6 +1429,7 @@ int pci_alloc_irq_vectors_affinity_dyn(struct pci_dev *dev, int *group_id, bool one_shot); void pci_free_irq_vectors(struct pci_dev *dev); +int pci_free_irq_vectors_grp(struct pci_dev *dev, int group_id); int pci_irq_vector(struct pci_dev *dev, unsigned int nr); int pci_irq_vector_group(struct pci_dev *dev, unsigned int nr, unsigned int group_id); @@ -1439,6 +1441,8 @@ static inline int pci_msi_vec_count(struct pci_dev *dev) { return -ENOSYS; } static inline void pci_disable_msi(struct pci_dev *dev) { } static inline int pci_msix_vec_count(struct pci_dev *dev) { return -ENOSYS; } static inline void pci_disable_msix(struct pci_dev *dev) { } +static inline int pci_disable_msix_grp(struct pci_dev *dev, int group_id) + { return -ENOSYS; } static inline void pci_restore_msi_state(struct pci_dev *dev) { } static inline int pci_msi_enabled(void) { return 0; } static inline int pci_enable_msi(struct pci_dev *dev) @@ -1475,6 +1479,11 @@ static inline void pci_free_irq_vectors(struct pci_dev *dev) { } +static inline void pci_free_irq_vectors_grp(struct pci_dev *dev, int group_id) +{ + return 0; +} + static inline int pci_irq_vector(struct pci_dev *dev, unsigned int nr) { if (WARN_ON_ONCE(nr > 0)) diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c index 5cfa931..d73a5dc 100644 --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -511,6 +511,32 @@ void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev) } /** + * msi_domain_free_irqs_grp - Free interrupts belonging to a group from + * a MSI interrupt @domain associated to @dev + * @domain: The domain to managing the interrupts + * @dev: Pointer to device struct of the device for which the interrupt + * should be freed + * @group_id: The group ID to be freed + */ +void msi_domain_free_irqs_grp(struct irq_domain *domain, struct device *dev, + int group_id) +{ + struct msi_desc *desc; + + for_each_msi_entry(desc, dev) { + /* + * We might have failed to allocate an MSI early + * enough that there is no IRQ associated to this + * entry. If that's the case, don't do anything. + */ + if (desc->group_id == group_id && desc->irq) { + irq_domain_free_irqs(desc->irq, desc->nvec_used); + desc->irq = 0; + } + } +} + +/** * msi_get_domain_info - Get the MSI interrupt domain info for @domain * @domain: The interrupt domain to retrieve data from *
Currently, the pci_free_irq_vectors() frees all the allocated resources associated with a PCIe device when the device is being shut down. With the introduction of dynamic allocation of MSI-X vectors by group ID, there should exist an API which can free the resources allocated only to a particular group, which can be called even if the device is not being shut down. The pci_free_irq_vectors_grp() function provides this type of interface. The existing pci_free_irq_vectors() can be called along side this API. Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Megha Dey <megha.dey@linux.intel.com> --- drivers/pci/msi.c | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/msi.h | 2 + include/linux/pci.h | 9 ++++ kernel/irq/msi.c | 26 +++++++++++ 4 files changed, 167 insertions(+)