Message ID | 1388707565-16535-3-git-send-email-yinghai@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Yinghai, On Thu, 2 Jan 2014, Yinghai Lu wrote: > For ioapic hot-add support, it would be easy if we have continuous > irq numbers for hot added ioapic controller. I really don't care about easy. Easy to solve problems are for wimps. What you really want to say is, that ioapic hot-add support requires a contiguous irq number space for a hotplugged ioapic to avoid expensive translations in the ioapic hotplug code. That's a proper reason for making that change to the core code. > We can reserve irq range at first, and later allocate desc for those > pre-reserved irqs when they are needed. > > The reasons for not allocating them during reserving: > 1. only several pins of one ioapic are used, allocate for all pins, will > waste memory for not used pins. > 2. allocate later when is needed could make sure irq_desc is allocated > on local node ram, as dev->node is set at that point. > > -v2: update changelog by adding reasons, requested by Konrad. > -v3: according to tglx: > separate core code change with arch code change. Thanks for splitting the patches! Now the scope of this change becomes more obvious and what I already suspected before becomes crystal clear. The initial intention of irq_reserve_irqs() was to cope with the legacy interrupts to prevent the dynamic allocator from giving them out, but it was at least a misnomer if not even a misconception. Did you notice that? No! Did you even think why irq_reserve_irqs() exists? No! You just hacked it into submission for your purpose. As usual, sigh! What prevents a user of __irq_alloc_reserved_desc() to request something completely out of its range? Nothing as you happily return an existing interrupt via: + if (irq_to_desc(irq)) + return irq; which is true for all already existing interrupts. So some random off by one is going to cause a spurious and extremly hard to debug issue. Brilliant. No, we are not going to play the "it works for Yinghai" game again. I wasted enough time with that already. There is a clear step by step approach to get this done proper: 1) Get rid of the existing misconception/misnomer of irq_reserve_irqs(). Make it explicit that this is dealing with legacy irq spaces. It's not that hard as there are only two users in tree which are both trivial to fix. 2) Provide a proper reservation mechanism which does not piggypack blindly on the allocation bitmap. So what you want is a reservation which: A) Marks the irq range in the allocation bitmap This prevents other code pathes to stomp on that range. B) Stores a unique generated ID in a separate radix tree for that particular irq range. The generated ID is returned to the caller as it is required for actually allocating an interrupt from that range. We don't have to bother with making this conditional as the initial memory consumption of the radix tree is minimal and we only expand it when we actually use that hotplug feature. 3) Provide a proper alloc_reserved_irqdesc() function This function verifies against the reservation ID which was handed out by the reservation function. It's questionable whether we want to allow the reuse of already allocated irq descriptors. I'm leaning to avoid that. See #4 4) Provide a proper mechanism to free the registered irq descriptors and the reservation range when the physical device is removed from the system. So you don't have to preserve state in the ioapic code. Physical hotplug is not a high frequency hotpath operation. 5) Modify the x86 ioapic code to always use the reserve first and allocate later mechanism to avoid ifdeffery and pointless conditional code pathes. That also ensures proper test coverage. TBH, I could not be bothered to look at your x86 related changes, but I expect they are from the "make it work for Yinghai" departement as well. I'll review them once the core code changes are in an acceptable shape. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 22, 2014 at 4:03 PM, Thomas Gleixner <tglx@linutronix.de> wrote: > > There is a clear step by step approach to get this done proper: > > 1) Get rid of the existing misconception/misnomer of > irq_reserve_irqs(). > > Make it explicit that this is dealing with legacy irq spaces. It's > not that hard as there are only two users in tree which are both > trivial to fix. Hi, Thomas, While going through the code for kill irq_reserve_irqs(), I found that there is irq_reserve_irq(). in include/linux/irq.h static inline int irq_reserve_irq(unsigned int irq) { return irq_reserve_irqs(irq, 1); } it is called via kernel/irq/chip.c::irq_set_chip(). /* * For !CONFIG_SPARSE_IRQ make the irq show up in * allocated_irqs. For the CONFIG_SPARSE_IRQ case, it is * already marked, and this call is harmless. */ irq_reserve_irq(irq); There are tens of irq_set_chip... calling for arches that does not support SPARSE_IRQ yet, and they does not use irq_alloc_desc() anywhere. so how about change those irq_reserve_irq to irq_set_allocated_irqs() and leave them there? Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2014/1/3 8:05, Yinghai Lu wrote: > For ioapic hot-add support, it would be easy if we have continuous > irq numbers for hot added ioapic controller. > > We can reserve irq range at first, and later allocate desc for those > pre-reserved irqs when they are needed. > > The reasons for not allocating them during reserving: > 1. only several pins of one ioapic are used, allocate for all pins, will > waste memory for not used pins. > 2. allocate later when is needed could make sure irq_desc is allocated > on local node ram, as dev->node is set at that point. > > -v2: update changelog by adding reasons, requested by Konrad. > -v3: according to tglx: > separate core code change with arch code change. > change function name to irq_alloc_reserved_desc. > kill __irq_is_reserved(). > remove not need exports. > according to Sebastian: > spare one comments by put two functions together. > > Signed-off-by: Yinghai Lu <yinghai@kernel.org> > Cc: Joerg Roedel <joro@8bytes.org> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> > --- > include/linux/irq.h | 3 +++ > kernel/irq/irqdesc.c | 23 +++++++++++++++++++++++ > 2 files changed, 26 insertions(+) > > diff --git a/include/linux/irq.h b/include/linux/irq.h > index 0229caf..e5f6493 100644 > --- a/include/linux/irq.h > +++ b/include/linux/irq.h > @@ -595,10 +595,13 @@ static inline u32 irq_get_trigger_type(unsigned int irq) > > int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, > struct module *owner); > +int __irq_alloc_reserved_desc(int at, int node, struct module *owner); > > /* use macros to avoid needing export.h for THIS_MODULE */ > #define irq_alloc_descs(irq, from, cnt, node) \ > __irq_alloc_descs(irq, from, cnt, node, THIS_MODULE) > +#define irq_alloc_reserved_desc_at(at, node) \ > + __irq_alloc_reserved_desc(at, node, THIS_MODULE) > > #define irq_alloc_desc(node) \ > irq_alloc_descs(-1, 0, 1, node) > diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c > index a151db6..1166545 100644 > --- a/kernel/irq/irqdesc.c > +++ b/kernel/irq/irqdesc.c > @@ -410,6 +410,29 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, > EXPORT_SYMBOL_GPL(__irq_alloc_descs); > > /** > + * __irq_alloc_reserved_desc - allocate irq descriptor for irq that is already reserved > + * @irq: Allocate for specific irq number if irq >= 0 Hi Yinghai Should we skip "if irq >= 0" here because irq should have already been reserved? Or should we add range check for irq? Thanks! Gerry > + * @node: Preferred node on which the irq descriptor should be allocated > + * @owner: Owning module (can be NULL) > + * > + * Returns the irq number or error code > + */ > +int __ref __irq_alloc_reserved_desc(int irq, int node, struct module *owner) > +{ > + mutex_lock(&sparse_irq_lock); > + if (!test_bit(irq, allocated_irqs)) { > + mutex_unlock(&sparse_irq_lock); > + return -EINVAL; > + } > + mutex_unlock(&sparse_irq_lock); > + > + if (irq_to_desc(irq)) > + return irq; > + > + return alloc_descs(irq, 1, node, owner); > +} > + > +/** > * irq_reserve_irqs - mark irqs allocated > * @from: mark from irq number > * @cnt: number of irqs to mark > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 17 Feb 2014, Yinghai Lu wrote: > On Wed, Jan 22, 2014 at 4:03 PM, Thomas Gleixner <tglx@linutronix.de> wrote: > > > > There is a clear step by step approach to get this done proper: > > > > 1) Get rid of the existing misconception/misnomer of > > irq_reserve_irqs(). > > > > Make it explicit that this is dealing with legacy irq spaces. It's > > not that hard as there are only two users in tree which are both > > trivial to fix. > > Hi, Thomas, > > While going through the code for kill irq_reserve_irqs(), I found that > there is irq_reserve_irq(). > > in include/linux/irq.h > > static inline int irq_reserve_irq(unsigned int irq) > { > return irq_reserve_irqs(irq, 1); > } > > it is called via kernel/irq/chip.c::irq_set_chip(). > > /* > * For !CONFIG_SPARSE_IRQ make the irq show up in > * allocated_irqs. For the CONFIG_SPARSE_IRQ case, it is > * already marked, and this call is harmless. > */ > irq_reserve_irq(irq); > > There are tens of irq_set_chip... calling for arches that does not > support SPARSE_IRQ yet, and they does not use irq_alloc_desc() > anywhere. > > so how about change those irq_reserve_irq to irq_set_allocated_irqs() and > leave them there? As I said before irq_reserve_irq() is a misnomer and a misconception. Of course this needs to be fixed as well. And you cannot just blindly change it because !SPARSE can use the allocation. We are not creating stupid corner cases just to support your sloppyness. Its not rocket science to do it the right way. That said, it might be worthwhile to get rid of the !SPARSE case completely. That would probably make quite some stuff simpler. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Feb 22, 2014 at 2:08 AM, Thomas Gleixner <tglx@linutronix.de> wrote: > > As I said before irq_reserve_irq() is a misnomer and a > misconception. Of course this needs to be fixed as well. > > And you cannot just blindly change it because !SPARSE can use the > allocation. We are not creating stupid corner cases just to support > your sloppyness. Its not rocket science to do it the right way. > > That said, it might be worthwhile to get rid of the !SPARSE case > completely. That would probably make quite some stuff simpler. So we need to make all arches support SPARSE_IRQ at first? Now we have arm, arm64, c6x, metag, powerpc, sh, x86 support SPARSE_IRQ. The following are not with SPARSE_IRQ yet: alpha, arc, avr32, blackfin, cris, frv, hexagon, m32r, m68k, microblaze, mips, mn10300, openrisc, parisc, s390, score, sparc, tile, um, unicore32, xtensa. Thanks Yinghai Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/irq.h b/include/linux/irq.h index 0229caf..e5f6493 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -595,10 +595,13 @@ static inline u32 irq_get_trigger_type(unsigned int irq) int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, struct module *owner); +int __irq_alloc_reserved_desc(int at, int node, struct module *owner); /* use macros to avoid needing export.h for THIS_MODULE */ #define irq_alloc_descs(irq, from, cnt, node) \ __irq_alloc_descs(irq, from, cnt, node, THIS_MODULE) +#define irq_alloc_reserved_desc_at(at, node) \ + __irq_alloc_reserved_desc(at, node, THIS_MODULE) #define irq_alloc_desc(node) \ irq_alloc_descs(-1, 0, 1, node) diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index a151db6..1166545 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -410,6 +410,29 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, EXPORT_SYMBOL_GPL(__irq_alloc_descs); /** + * __irq_alloc_reserved_desc - allocate irq descriptor for irq that is already reserved + * @irq: Allocate for specific irq number if irq >= 0 + * @node: Preferred node on which the irq descriptor should be allocated + * @owner: Owning module (can be NULL) + * + * Returns the irq number or error code + */ +int __ref __irq_alloc_reserved_desc(int irq, int node, struct module *owner) +{ + mutex_lock(&sparse_irq_lock); + if (!test_bit(irq, allocated_irqs)) { + mutex_unlock(&sparse_irq_lock); + return -EINVAL; + } + mutex_unlock(&sparse_irq_lock); + + if (irq_to_desc(irq)) + return irq; + + return alloc_descs(irq, 1, node, owner); +} + +/** * irq_reserve_irqs - mark irqs allocated * @from: mark from irq number * @cnt: number of irqs to mark
For ioapic hot-add support, it would be easy if we have continuous irq numbers for hot added ioapic controller. We can reserve irq range at first, and later allocate desc for those pre-reserved irqs when they are needed. The reasons for not allocating them during reserving: 1. only several pins of one ioapic are used, allocate for all pins, will waste memory for not used pins. 2. allocate later when is needed could make sure irq_desc is allocated on local node ram, as dev->node is set at that point. -v2: update changelog by adding reasons, requested by Konrad. -v3: according to tglx: separate core code change with arch code change. change function name to irq_alloc_reserved_desc. kill __irq_is_reserved(). remove not need exports. according to Sebastian: spare one comments by put two functions together. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> --- include/linux/irq.h | 3 +++ kernel/irq/irqdesc.c | 23 +++++++++++++++++++++++ 2 files changed, 26 insertions(+)