Message ID | 1309855755-6261-15-git-send-email-marc.zyngier@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Jul 05, 2011 at 09:49:15AM +0100, Marc Zyngier wrote: > It is sometimes usefull to compute the mapping of a PPI on a CPU > that is not the one handling the interrupt (in the probe function > of a driver, for example). This is unsafe. Firstly, smp_processor_id() in preemptible contexts is not valid as you can be migrated to another CPU at any moment. Secondly, if you have a driver probe function, and you want to register a PPI interrupt for a different CPU, while you can call the request function, the resulting ->unmask will be called on the calling CPU (assuming preempt is disabled and we don't schedule). You might get lucky and it may be called on your intended CPU... That means the mask registers you access are on the local CPU, not the target CPU, so you end up with the wrong interrupt enabled. As request_threaded_irq() contains a kzalloc() which can schedule, I can't see how you can safely request a particular per-CPU interrupt and guarantee that you'll enable it on the appropriate core. Finally, note that using request_irq() from the secondary_startup() thread is probably a very bad idea - should it require to schedule, there is nothing for the secondary CPU to switch to - we're the secondary CPUs idle thread. I think you'll need to use setup_irq() to register the PPI interrupts, especially for the TWD. However, that then gives you a problem when you come to tear it down on CPU hot unplug, as free_irq() can't be used - it'll try to kfree() the static irqaction structure. So I think more thought is required on this entire patch set. I have thought (before this patch set appeared) about making the PPI stuff entirely GIC specific, having an array of 16 function pointers in the GIC code to allow things like TWD to hook into. No preemption or sleep problems, no genirq overhead to their handling, etc. Then again, I think we do need irq_enter()..irq_exit().
On Fri, 8 Jul 2011 20:57:08 +0100 Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Tue, Jul 05, 2011 at 09:49:15AM +0100, Marc Zyngier wrote: > > It is sometimes usefull to compute the mapping of a PPI on a CPU > > that is not the one handling the interrupt (in the probe function > > of a driver, for example). > > This is unsafe. Firstly, smp_processor_id() in preemptible contexts > is not valid as you can be migrated to another CPU at any moment. Point taken. I think I can solve this by having a per-cpu variable instead of hijacking the gic_chip_data structure. > Secondly, if you have a driver probe function, and you want to > register a PPI interrupt for a different CPU, while you can call the > request function, the resulting ->unmask will be called on the > calling CPU (assuming preempt is disabled and we don't schedule). > You might get lucky and it may be called on your intended CPU... I think you're going way beyond than the intended use of this function. It is only there to be able to compute the PPI->IRQ mapping, *not* to register the IRQ, which of course only make sense on the target CPU. > That means the mask registers you access are on the local CPU, not > the target CPU, so you end up with the wrong interrupt enabled. > > As request_threaded_irq() contains a kzalloc() which can schedule, > I can't see how you can safely request a particular per-CPU interrupt > and guarantee that you'll enable it on the appropriate core. > > Finally, note that using request_irq() from the secondary_startup() > thread is probably a very bad idea - should it require to schedule, > there is nothing for the secondary CPU to switch to - we're the > secondary CPUs idle thread. > > I think you'll need to use setup_irq() to register the PPI interrupts, > especially for the TWD. However, that then gives you a problem when > you come to tear it down on CPU hot unplug, as free_irq() can't be > used - it'll try to kfree() the static irqaction structure. Actually, you wouldn't need to to do anything on CPU hot unplug (just as it is now, actually). You could directly do another setup_irq() once the CPU is hot plugged back. I'll have a look at this. > So I think more thought is required on this entire patch set. > > I have thought (before this patch set appeared) about making the PPI > stuff entirely GIC specific, having an array of 16 function pointers > in the GIC code to allow things like TWD to hook into. No preemption > or sleep problems, no genirq overhead to their handling, etc. Then > again, I think we do need irq_enter()..irq_exit(). Wouldn't that be another interrupt mechanism? I agree that it would be simpler, but it would make the PPI handling different from normal interrupts. genirq already has a percpu flow that is relatively cheap. And being able to rely on the usual kernel semantics is probably the best we can do while undergoing drastic changes such as DT. If in the end we cannot efficiently cast PPI handling into genirq, then I agree that your proposal is a sensible alternative. M.
On Sun, Jul 10, 2011 at 04:10:38PM +0100, Marc Zyngier wrote: > On Fri, 8 Jul 2011 20:57:08 +0100 > Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > > > On Tue, Jul 05, 2011 at 09:49:15AM +0100, Marc Zyngier wrote: > > > It is sometimes usefull to compute the mapping of a PPI on a CPU > > > that is not the one handling the interrupt (in the probe function > > > of a driver, for example). > > > > This is unsafe. Firstly, smp_processor_id() in preemptible contexts > > is not valid as you can be migrated to another CPU at any moment. > > Point taken. I think I can solve this by having a per-cpu variable > instead of hijacking the gic_chip_data structure. I think you miss the point. global_irq = gic_ppi_map(ppi_irq); is broken, no matter what kind of implementation you have behind it. To illustrate it, lets define gic_ppi_map() as the following: (ppi_irq * NR_CPUS + cpu_nr) And you have this code: irq = gic_ppi_map(2); ret = request_irq(irq, ...); Now, lets say that when you call this, you're running on CPU1, you have 4 CPUs and you want PPI 2. global_irq will be 2 * 4 + 1 = 9. You intend to pass this to request_irq() to claim the PPI 2 for the current CPU. However, you're preempted by another thread, so you stop executing on CPU1. CPU3 is about to idle, and so you're migrated to CPU3 which now starts executing your code. You continue into request_irq(), still with IRQ9. You request the interrupt, and the unmask function is called. The GIC now unmasks PPI 2 on CPU 3 - but this is IRQ11. However, your IRQ handler is registered in IRQ9's slot but IRQ11 is enabled instead. Now, lets say you decide to disable preemption over this code section: preempt_disable(); irq = gic_ppi_map(2); ret = request_irq(irq, ...); preempt_enable(); This is _still_ unsafe. Instead of the explicit preemption, consider what happens if the attempt to allocate memory inside request_irq() blocks: action = kzalloc(sizeof(struct irqaction), GFP_KERNEL); There is _still_ no guarantee that you will continue execution on the same CPU as the one you were originally executing on. > > Secondly, if you have a driver probe function, and you want to > > register a PPI interrupt for a different CPU, while you can call the > > request function, the resulting ->unmask will be called on the > > calling CPU (assuming preempt is disabled and we don't schedule). > > You might get lucky and it may be called on your intended CPU... > > I think you're going way beyond than the intended use of this function. > It is only there to be able to compute the PPI->IRQ mapping, *not* to > register the IRQ, which of course only make sense on the target CPU. No, my point is that this is not a useful function, period. It should not even exist, because it is *dangerous* for anyone to use it. See the examples above. However, my point is more than that: the whole idea behind this patch series is unsafe: it is _unsafe_ to assume that calling request_irq() on any particular CPU will result in execution being bound solely to that IRQ. The only way to ensure that this code works as you've intended is: 1. save the threads affinity 2. set a new affinity for a single CPU for the thread 3. force execution to switch to that CPU 4. translate the IRQ and request it 5. restore the threads affinity And to ensure that drivers do that safely, that would _have_ to be a separate function which takes care of all that. The code would look something like this: cpumask_t cpus_allowed = current->cpus_allowed; set_cpus_allowed(current, cpumask_of_cpu(cpu)); BUG_ON(cpu != smp_processor_id()); irq = gic_ppi_map(ppi); ret = request_irq(irq, ...); set_cpus_allowed(current, cpus_allowed); (But not this: this depends on CPUMASK_OFFSTACK not being set.) In any case, this is _not_ what we want drivers to get involved with - we'll get different implementations of this which are likely to be buggy. We need an interface which doesn't have this complexity. So we need something like this: int request_ppi_irq(int ppi, irq_handler_t handler, void *dev); which contains all this complexity. At that point, we've _already_ moved away from the standard genirq interfaces to a GIC specific PPI request interface. So, all in all, I see _zero_ reason to use genirq for this ARM specific oddity.
diff --git a/arch/arm/common/gic.c b/arch/arm/common/gic.c index 14ab235..70ef6a8 100644 --- a/arch/arm/common/gic.c +++ b/arch/arm/common/gic.c @@ -269,7 +269,7 @@ void __init gic_cascade_irq(unsigned int gic_nr, unsigned int irq) } #ifdef CONFIG_ARM_GIC_PPI_MAP -unsigned int gic_ppi_map(unsigned int irq) +unsigned int gic_ppi_map_on_cpu(unsigned int irq, int cpu) { struct gic_chip_data *chip_data = irq_get_chip_data(irq); unsigned int vppi_irq; @@ -278,12 +278,17 @@ unsigned int gic_ppi_map(unsigned int irq) WARN_ON(!chip_data->vppi_base); ppi = irq - chip_data->ppi_base; - vppi_irq = ppi + chip_data->nrppis * smp_processor_id(); + vppi_irq = ppi + chip_data->nrppis * cpu; vppi_irq += chip_data->vppi_base; return vppi_irq; } +unsigned int gic_ppi_map(unsigned int irq) +{ + return gic_ppi_map_on_cpu(irq, smp_processor_id()); +} + static void gic_handle_ppi(unsigned int irq, struct irq_desc *desc) { unsigned int vppi_irq; diff --git a/arch/arm/include/asm/hardware/gic.h b/arch/arm/include/asm/hardware/gic.h index bad26c2..52dc4dd 100644 --- a/arch/arm/include/asm/hardware/gic.h +++ b/arch/arm/include/asm/hardware/gic.h @@ -44,11 +44,10 @@ void gic_cascade_irq(unsigned int gic_nr, unsigned int irq); void gic_raise_softirq(const struct cpumask *mask, unsigned int irq); #ifdef CONFIG_ARM_GIC_PPI_MAP unsigned int gic_ppi_map(unsigned int irq); +unsigned int gic_ppi_map_on_cpu(unsigned int irq, int cpu); #else -static inline unsigned int gic_ppi_map(unsigned int irq) -{ - return irq; -} +#define gic_ppi_map(irq) (irq) +#define gic_ppi_map_on_cpu(irq, cpu) (irq) #endif #endif
It is sometimes usefull to compute the mapping of a PPI on a CPU that is not the one handling the interrupt (in the probe function of a driver, for example). The new gic_ppi_map_on_cpu() function does just that. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> --- arch/arm/common/gic.c | 9 +++++++-- arch/arm/include/asm/hardware/gic.h | 7 +++---- 2 files changed, 10 insertions(+), 6 deletions(-)