diff mbox

[v8,14/14] ARM: gic: add gic_ppi_map_on_cpu()

Message ID 1309855755-6261-15-git-send-email-marc.zyngier@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Marc Zyngier July 5, 2011, 8:49 a.m. UTC
It is sometimes usefull to compute the mapping of a PPI on a CPU
that is not the one handling the interrupt (in the probe function
of a driver, for example).

The new gic_ppi_map_on_cpu() function does just that.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/common/gic.c               |    9 +++++++--
 arch/arm/include/asm/hardware/gic.h |    7 +++----
 2 files changed, 10 insertions(+), 6 deletions(-)

Comments

Russell King - ARM Linux July 8, 2011, 7:57 p.m. UTC | #1
On Tue, Jul 05, 2011 at 09:49:15AM +0100, Marc Zyngier wrote:
> It is sometimes usefull to compute the mapping of a PPI on a CPU
> that is not the one handling the interrupt (in the probe function
> of a driver, for example).

This is unsafe.  Firstly, smp_processor_id() in preemptible contexts
is not valid as you can be migrated to another CPU at any moment.

Secondly, if you have a driver probe function, and you want to
register a PPI interrupt for a different CPU, while you can call the
request function, the resulting ->unmask will be called on the
calling CPU (assuming preempt is disabled and we don't schedule).
You might get lucky and it may be called on your intended CPU...

That means the mask registers you access are on the local CPU, not
the target CPU, so you end up with the wrong interrupt enabled.

As request_threaded_irq() contains a kzalloc() which can schedule,
I can't see how you can safely request a particular per-CPU interrupt
and guarantee that you'll enable it on the appropriate core.

Finally, note that using request_irq() from the secondary_startup()
thread is probably a very bad idea - should it require to schedule,
there is nothing for the secondary CPU to switch to - we're the
secondary CPUs idle thread.

I think you'll need to use setup_irq() to register the PPI interrupts,
especially for the TWD.  However, that then gives you a problem when
you come to tear it down on CPU hot unplug, as free_irq() can't be
used - it'll try to kfree() the static irqaction structure.

So I think more thought is required on this entire patch set.

I have thought (before this patch set appeared) about making the PPI
stuff entirely GIC specific, having an array of 16 function pointers
in the GIC code to allow things like TWD to hook into.  No preemption
or sleep problems, no genirq overhead to their handling, etc.  Then
again, I think we do need irq_enter()..irq_exit().
Marc Zyngier July 10, 2011, 3:10 p.m. UTC | #2
On Fri, 8 Jul 2011 20:57:08 +0100
Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:

> On Tue, Jul 05, 2011 at 09:49:15AM +0100, Marc Zyngier wrote:
> > It is sometimes usefull to compute the mapping of a PPI on a CPU
> > that is not the one handling the interrupt (in the probe function
> > of a driver, for example).
> 
> This is unsafe.  Firstly, smp_processor_id() in preemptible contexts
> is not valid as you can be migrated to another CPU at any moment.

Point taken. I think I can solve this by having a per-cpu variable
instead of hijacking the gic_chip_data structure.

> Secondly, if you have a driver probe function, and you want to
> register a PPI interrupt for a different CPU, while you can call the
> request function, the resulting ->unmask will be called on the
> calling CPU (assuming preempt is disabled and we don't schedule).
> You might get lucky and it may be called on your intended CPU...

I think you're going way beyond than the intended use of this function.
It is only there to be able to compute the PPI->IRQ mapping, *not* to
register the IRQ, which of course only make sense on the target CPU.

> That means the mask registers you access are on the local CPU, not
> the target CPU, so you end up with the wrong interrupt enabled.
> 
> As request_threaded_irq() contains a kzalloc() which can schedule,
> I can't see how you can safely request a particular per-CPU interrupt
> and guarantee that you'll enable it on the appropriate core.
> 
> Finally, note that using request_irq() from the secondary_startup()
> thread is probably a very bad idea - should it require to schedule,
> there is nothing for the secondary CPU to switch to - we're the
> secondary CPUs idle thread.
> 
> I think you'll need to use setup_irq() to register the PPI interrupts,
> especially for the TWD.  However, that then gives you a problem when
> you come to tear it down on CPU hot unplug, as free_irq() can't be
> used - it'll try to kfree() the static irqaction structure.

Actually, you wouldn't need to to do anything on CPU hot unplug (just
as it is now, actually). You could directly do another setup_irq() once
the CPU is hot plugged back. I'll have a look at this.

> So I think more thought is required on this entire patch set.
> 
> I have thought (before this patch set appeared) about making the PPI
> stuff entirely GIC specific, having an array of 16 function pointers
> in the GIC code to allow things like TWD to hook into.  No preemption
> or sleep problems, no genirq overhead to their handling, etc.  Then
> again, I think we do need irq_enter()..irq_exit().

Wouldn't that be another interrupt mechanism? I agree that it would be
simpler, but it would make the PPI handling different from normal
interrupts. genirq already has a percpu flow that is relatively cheap.
And being able to rely on the usual kernel semantics is probably the
best we can do while undergoing drastic changes such as DT.

If in the end we cannot efficiently cast PPI handling into genirq, then
I agree that your proposal is a sensible alternative.

	M.
Russell King - ARM Linux July 10, 2011, 3:37 p.m. UTC | #3
On Sun, Jul 10, 2011 at 04:10:38PM +0100, Marc Zyngier wrote:
> On Fri, 8 Jul 2011 20:57:08 +0100
> Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:
> 
> > On Tue, Jul 05, 2011 at 09:49:15AM +0100, Marc Zyngier wrote:
> > > It is sometimes usefull to compute the mapping of a PPI on a CPU
> > > that is not the one handling the interrupt (in the probe function
> > > of a driver, for example).
> > 
> > This is unsafe.  Firstly, smp_processor_id() in preemptible contexts
> > is not valid as you can be migrated to another CPU at any moment.
> 
> Point taken. I think I can solve this by having a per-cpu variable
> instead of hijacking the gic_chip_data structure.

I think you miss the point.

	global_irq = gic_ppi_map(ppi_irq);

is broken, no matter what kind of implementation you have behind it.

To illustrate it, lets define gic_ppi_map() as the following:

	(ppi_irq * NR_CPUS + cpu_nr)

And you have this code:

	irq = gic_ppi_map(2);
	ret = request_irq(irq, ...);

Now, lets say that when you call this, you're running on CPU1, you
have 4 CPUs and you want PPI 2.  global_irq will be 2 * 4 + 1 = 9.

You intend to pass this to request_irq() to claim the PPI 2 for the
current CPU.  However, you're preempted by another thread, so you
stop executing on CPU1.

CPU3 is about to idle, and so you're migrated to CPU3 which now starts
executing your code.  You continue into request_irq(), still with
IRQ9.  You request the interrupt, and the unmask function is called.
The GIC now unmasks PPI 2 on CPU 3 - but this is IRQ11.

However, your IRQ handler is registered in IRQ9's slot but IRQ11 is
enabled instead.

Now, lets say you decide to disable preemption over this code section:

	preempt_disable();
	irq = gic_ppi_map(2);
	ret = request_irq(irq, ...);
	preempt_enable();

This is _still_ unsafe.  Instead of the explicit preemption, consider
what happens if the attempt to allocate memory inside request_irq()
blocks:

        action = kzalloc(sizeof(struct irqaction), GFP_KERNEL);

There is _still_ no guarantee that you will continue execution on the
same CPU as the one you were originally executing on.

> > Secondly, if you have a driver probe function, and you want to
> > register a PPI interrupt for a different CPU, while you can call the
> > request function, the resulting ->unmask will be called on the
> > calling CPU (assuming preempt is disabled and we don't schedule).
> > You might get lucky and it may be called on your intended CPU...
> 
> I think you're going way beyond than the intended use of this function.
> It is only there to be able to compute the PPI->IRQ mapping, *not* to
> register the IRQ, which of course only make sense on the target CPU.

No, my point is that this is not a useful function, period.  It should
not even exist, because it is *dangerous* for anyone to use it.  See
the examples above.

However, my point is more than that: the whole idea behind this patch
series is unsafe: it is _unsafe_ to assume that calling request_irq()
on any particular CPU will result in execution being bound solely to
that IRQ.

The only way to ensure that this code works as you've intended is:

1. save the threads affinity
2. set a new affinity for a single CPU for the thread
3. force execution to switch to that CPU
4. translate the IRQ and request it
5. restore the threads affinity

And to ensure that drivers do that safely, that would _have_ to be a
separate function which takes care of all that.  The code would look
something like this:

        cpumask_t cpus_allowed = current->cpus_allowed;
        set_cpus_allowed(current, cpumask_of_cpu(cpu));
        BUG_ON(cpu != smp_processor_id());
	irq = gic_ppi_map(ppi);
	ret = request_irq(irq, ...);
	set_cpus_allowed(current, cpus_allowed);

(But not this: this depends on CPUMASK_OFFSTACK not being set.)

In any case, this is _not_ what we want drivers to get involved with -
we'll get different implementations of this which are likely to be
buggy.  We need an interface which doesn't have this complexity.  So
we need something like this:

int request_ppi_irq(int ppi, irq_handler_t handler, void *dev);

which contains all this complexity.  At that point, we've _already_
moved away from the standard genirq interfaces to a GIC specific PPI
request interface.

So, all in all, I see _zero_ reason to use genirq for this ARM specific
oddity.
diff mbox

Patch

diff --git a/arch/arm/common/gic.c b/arch/arm/common/gic.c
index 14ab235..70ef6a8 100644
--- a/arch/arm/common/gic.c
+++ b/arch/arm/common/gic.c
@@ -269,7 +269,7 @@  void __init gic_cascade_irq(unsigned int gic_nr, unsigned int irq)
 }
 
 #ifdef CONFIG_ARM_GIC_PPI_MAP
-unsigned int gic_ppi_map(unsigned int irq)
+unsigned int gic_ppi_map_on_cpu(unsigned int irq, int cpu)
 {
 	struct gic_chip_data *chip_data = irq_get_chip_data(irq);
 	unsigned int vppi_irq;
@@ -278,12 +278,17 @@  unsigned int gic_ppi_map(unsigned int irq)
 	WARN_ON(!chip_data->vppi_base);
 
 	ppi = irq - chip_data->ppi_base;
-	vppi_irq = ppi + chip_data->nrppis * smp_processor_id();
+	vppi_irq = ppi + chip_data->nrppis * cpu;
 	vppi_irq += chip_data->vppi_base;
 
 	return vppi_irq;
 }
 
+unsigned int gic_ppi_map(unsigned int irq)
+{
+	return gic_ppi_map_on_cpu(irq, smp_processor_id());
+}
+
 static void gic_handle_ppi(unsigned int irq, struct irq_desc *desc)
 {
 	unsigned int vppi_irq;
diff --git a/arch/arm/include/asm/hardware/gic.h b/arch/arm/include/asm/hardware/gic.h
index bad26c2..52dc4dd 100644
--- a/arch/arm/include/asm/hardware/gic.h
+++ b/arch/arm/include/asm/hardware/gic.h
@@ -44,11 +44,10 @@  void gic_cascade_irq(unsigned int gic_nr, unsigned int irq);
 void gic_raise_softirq(const struct cpumask *mask, unsigned int irq);
 #ifdef CONFIG_ARM_GIC_PPI_MAP
 unsigned int gic_ppi_map(unsigned int irq);
+unsigned int gic_ppi_map_on_cpu(unsigned int irq, int cpu);
 #else
-static inline unsigned int gic_ppi_map(unsigned int irq)
-{
-	return irq;
-}
+#define gic_ppi_map(irq)		(irq)
+#define gic_ppi_map_on_cpu(irq, cpu)	(irq)
 #endif
 
 #endif