diff mbox series

[V2,04/46] genirq/chip: Use the first chip in irq_chip_compose_msi_msg()

Message ID 20200826112331.047917603@linutronix.de (mailing list archive)
State Not Applicable, archived
Headers show
Series x86, PCI, XEN, genirq ...: Prepare for device MSI | expand

Commit Message

Thomas Gleixner Aug. 26, 2020, 11:16 a.m. UTC
The documentation of irq_chip_compose_msi_msg() claims that with
hierarchical irq domains the first chip in the hierarchy which has an
irq_compose_msi_msg() callback is chosen. But the code just keeps
iterating after it finds a chip with a compose callback.

The x86 HPET MSI implementation relies on that behaviour, but that does not
make it more correct.

The message should always be composed at the domain which manages the
underlying resource (e.g. APIC or remap table) because that domain knows
about the required layout of the message.

On X86 the following hierarchies exist:

1)   vector -------- PCI/MSI
2)   vector -- IR -- PCI/MSI

The vector domain has a different message format than the IR (remapping)
domain. So obviously the PCI/MSI domain can't compose the message without
having knowledge about the parent domain, which is exactly the opposite of
what hierarchical domains want to achieve.

X86 actually has two different PCI/MSI chips where #1 has a compose
callback and #2 does not. #2 delegates the composition to the remap domain
where it belongs, but #1 does it at the PCI/MSI level.

For the upcoming device MSI support it's necessary to change this and just
let the first domain which can compose the message take care of it. That
way the top level chip does not have to worry about it and the device MSI
code does not need special knowledge about topologies. It just sets the
compose callback to NULL and lets the hierarchy pick the first chip which
has one.

Due to that the attempt to move the compose callback from the direct
delivery PCI/MSI domain to the vector domain made the system fail to boot
with interrupt remapping enabled because in the remapping case
irq_chip_compose_msi_msg() keeps iterating and choses the compose callback
of the vector domain which obviously creates the wrong format for the remap
table.

Break out of the loop when the first irq chip with a compose callback is
found and fixup the HPET code temporarily. That workaround will be removed
once the direct delivery compose callback is moved to the place where it
belongs in the vector domain.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: New patch. Note, that this might break other stuff which relies on the
    current behaviour, but the hierarchy composition of DT based chips is
    really hard to follow.
---
 arch/x86/kernel/apic/msi.c |    7 +++++--
 kernel/irq/chip.c          |   12 +++++++++---
 2 files changed, 14 insertions(+), 5 deletions(-)

Comments

Marc Zyngier Aug. 26, 2020, 7:50 p.m. UTC | #1
On Wed, 26 Aug 2020 12:16:32 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> The documentation of irq_chip_compose_msi_msg() claims that with
> hierarchical irq domains the first chip in the hierarchy which has an
> irq_compose_msi_msg() callback is chosen. But the code just keeps
> iterating after it finds a chip with a compose callback.
> 
> The x86 HPET MSI implementation relies on that behaviour, but that does not
> make it more correct.
> 
> The message should always be composed at the domain which manages the
> underlying resource (e.g. APIC or remap table) because that domain knows
> about the required layout of the message.
> 
> On X86 the following hierarchies exist:
> 
> 1)   vector -------- PCI/MSI
> 2)   vector -- IR -- PCI/MSI
> 
> The vector domain has a different message format than the IR (remapping)
> domain. So obviously the PCI/MSI domain can't compose the message without
> having knowledge about the parent domain, which is exactly the opposite of
> what hierarchical domains want to achieve.
> 
> X86 actually has two different PCI/MSI chips where #1 has a compose
> callback and #2 does not. #2 delegates the composition to the remap domain
> where it belongs, but #1 does it at the PCI/MSI level.
> 
> For the upcoming device MSI support it's necessary to change this and just
> let the first domain which can compose the message take care of it. That
> way the top level chip does not have to worry about it and the device MSI
> code does not need special knowledge about topologies. It just sets the
> compose callback to NULL and lets the hierarchy pick the first chip which
> has one.
> 
> Due to that the attempt to move the compose callback from the direct
> delivery PCI/MSI domain to the vector domain made the system fail to boot
> with interrupt remapping enabled because in the remapping case
> irq_chip_compose_msi_msg() keeps iterating and choses the compose callback
> of the vector domain which obviously creates the wrong format for the remap
> table.
> 
> Break out of the loop when the first irq chip with a compose callback is
> found and fixup the HPET code temporarily. That workaround will be removed
> once the direct delivery compose callback is moved to the place where it
> belongs in the vector domain.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V2: New patch. Note, that this might break other stuff which relies on the
>     current behaviour, but the hierarchy composition of DT based chips is
>     really hard to follow.

Grepping around, I don't think there is any occurrence of two irqchips
providing irq_compose_msi() that can share a hierarchy on any real
system, so we should be fine. Famous last words.

> ---
>  arch/x86/kernel/apic/msi.c |    7 +++++--
>  kernel/irq/chip.c          |   12 +++++++++---
>  2 files changed, 14 insertions(+), 5 deletions(-)
> 
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -479,10 +479,13 @@ struct irq_domain *hpet_create_irq_domai
>  	info.type = X86_IRQ_ALLOC_TYPE_HPET;
>  	info.hpet_id = hpet_id;
>  	parent = irq_remapping_get_ir_irq_domain(&info);
> -	if (parent == NULL)
> +	if (parent == NULL) {
>  		parent = x86_vector_domain;
> -	else
> +	} else {
>  		hpet_msi_controller.name = "IR-HPET-MSI";
> +		/* Temporary fix: Will go away */
> +		hpet_msi_controller.irq_compose_msi_msg = NULL;
> +	}
>  
>  	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
>  					      hpet_id);
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -1544,10 +1544,16 @@ int irq_chip_compose_msi_msg(struct irq_
>  	struct irq_data *pos = NULL;
>  
>  #ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
> -	for (; data; data = data->parent_data)
> -#endif
> -		if (data->chip && data->chip->irq_compose_msi_msg)
> +	for (; data; data = data->parent_data) {
> +		if (data->chip && data->chip->irq_compose_msi_msg) {
>  			pos = data;
> +			break;
> +		}
> +	}
> +#else
> +	if (data->chip && data->chip->irq_compose_msi_msg)
> +		pos = data;
> +#endif
>  	if (!pos)
>  		return -ENOSYS;
>  
> 
> 

Is it just me, or is this last change more complex than it ought to be?

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 857f5f4c8098..25e18b73699c 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1544,7 +1544,7 @@ int irq_chip_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
 	struct irq_data *pos = NULL;
 
 #ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
-	for (; data; data = data->parent_data)
+	for (; data && !pos; data = data->parent_data)
 #endif
 		if (data->chip && data->chip->irq_compose_msi_msg)
 			pos = data;

Though the for loop in a #ifdef in admittedly an acquired taste...

Reviewed-by: Marc Zyngier <maz@kernel.org>

	M.
Thomas Gleixner Aug. 26, 2020, 9:19 p.m. UTC | #2
On Wed, Aug 26 2020 at 20:50, Marc Zyngier wrote:
> On Wed, 26 Aug 2020 12:16:32 +0100,
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> ---
>> V2: New patch. Note, that this might break other stuff which relies on the
>>     current behaviour, but the hierarchy composition of DT based chips is
>>     really hard to follow.
>
> Grepping around, I don't think there is any occurrence of two irqchips
> providing irq_compose_msi() that can share a hierarchy on any real
> system, so we should be fine. Famous last words.

Knocking on wood :)

>>  #ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
>> -	for (; data; data = data->parent_data)
>> -#endif
>> -		if (data->chip && data->chip->irq_compose_msi_msg)
>> +	for (; data; data = data->parent_data) {
>> +		if (data->chip && data->chip->irq_compose_msi_msg) {
>>  			pos = data;
>> +			break;
>> +		}
>> +	}
>> +#else
>> +	if (data->chip && data->chip->irq_compose_msi_msg)
>> +		pos = data;
>> +#endif
>>  	if (!pos)
>>  		return -ENOSYS;
>
> Is it just me, or is this last change more complex than it ought to
> be?

Kinda.

> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index 857f5f4c8098..25e18b73699c 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -1544,7 +1544,7 @@ int irq_chip_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
>  	struct irq_data *pos = NULL;
>  
>  #ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
> -	for (; data; data = data->parent_data)
> +	for (; data && !pos; data = data->parent_data)
>  #endif
>  		if (data->chip && data->chip->irq_compose_msi_msg)
>  			pos = data;
>
> Though the for loop in a #ifdef in admittedly an acquired taste...

Checking !pos is simpler obviously. That doesn't make me hate the loop
in the #ifdef less. :)

What about the below?

Thanks,

        tglx
---
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -473,6 +473,15 @@ static inline void irq_domain_deactivate
 }
 #endif
 
+static inline struct irq_data *irqd_get_parent_data(struct irq_data *irqd)
+{
+#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
+	return irqd->parent_data;
+#else
+	return NULL;
+#endif
+}
+
 #ifdef CONFIG_GENERIC_IRQ_DEBUGFS
 #include <linux/debugfs.h>
 
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1541,18 +1541,17 @@ EXPORT_SYMBOL_GPL(irq_chip_release_resou
  */
 int irq_chip_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
 {
-	struct irq_data *pos = NULL;
+	struct irq_data *pos;
 
-#ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
-	for (; data; data = data->parent_data)
-#endif
+	for (pos = NULL; !pos && data; data = irqd_get_parent_data(data)) {
 		if (data->chip && data->chip->irq_compose_msi_msg)
 			pos = data;
+	}
+
 	if (!pos)
 		return -ENOSYS;
 
 	pos->chip->irq_compose_msi_msg(pos, msg);
-
 	return 0;
 }
Marc Zyngier Aug. 26, 2020, 9:32 p.m. UTC | #3
On Wed, 26 Aug 2020 22:19:56 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Wed, Aug 26 2020 at 20:50, Marc Zyngier wrote:
> > On Wed, 26 Aug 2020 12:16:32 +0100,
> > Thomas Gleixner <tglx@linutronix.de> wrote:
> >> ---
> >> V2: New patch. Note, that this might break other stuff which relies on the
> >>     current behaviour, but the hierarchy composition of DT based chips is
> >>     really hard to follow.
> >

[...]

> What about the below?
> 
> Thanks,
> 
>         tglx
> ---
> --- a/kernel/irq/internals.h
> +++ b/kernel/irq/internals.h
> @@ -473,6 +473,15 @@ static inline void irq_domain_deactivate
>  }
>  #endif
>  
> +static inline struct irq_data *irqd_get_parent_data(struct irq_data *irqd)
> +{
> +#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
> +	return irqd->parent_data;
> +#else
> +	return NULL;
> +#endif
> +}
> +

We obviously should have had this forever.

>  #ifdef CONFIG_GENERIC_IRQ_DEBUGFS
>  #include <linux/debugfs.h>
>  
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -1541,18 +1541,17 @@ EXPORT_SYMBOL_GPL(irq_chip_release_resou
>   */
>  int irq_chip_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
>  {
> -	struct irq_data *pos = NULL;
> +	struct irq_data *pos;
>  
> -#ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
> -	for (; data; data = data->parent_data)
> -#endif
> +	for (pos = NULL; !pos && data; data = irqd_get_parent_data(data)) {
>  		if (data->chip && data->chip->irq_compose_msi_msg)
>  			pos = data;
> +	}
> +
>  	if (!pos)
>  		return -ENOSYS;
>  
>  	pos->chip->irq_compose_msi_msg(pos, msg);
> -
>  	return 0;
>  }

Perfect, ship it! ;-)

	M.
diff mbox series

Patch

--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -479,10 +479,13 @@  struct irq_domain *hpet_create_irq_domai
 	info.type = X86_IRQ_ALLOC_TYPE_HPET;
 	info.hpet_id = hpet_id;
 	parent = irq_remapping_get_ir_irq_domain(&info);
-	if (parent == NULL)
+	if (parent == NULL) {
 		parent = x86_vector_domain;
-	else
+	} else {
 		hpet_msi_controller.name = "IR-HPET-MSI";
+		/* Temporary fix: Will go away */
+		hpet_msi_controller.irq_compose_msi_msg = NULL;
+	}
 
 	fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
 					      hpet_id);
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1544,10 +1544,16 @@  int irq_chip_compose_msi_msg(struct irq_
 	struct irq_data *pos = NULL;
 
 #ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
-	for (; data; data = data->parent_data)
-#endif
-		if (data->chip && data->chip->irq_compose_msi_msg)
+	for (; data; data = data->parent_data) {
+		if (data->chip && data->chip->irq_compose_msi_msg) {
 			pos = data;
+			break;
+		}
+	}
+#else
+	if (data->chip && data->chip->irq_compose_msi_msg)
+		pos = data;
+#endif
 	if (!pos)
 		return -ENOSYS;