ARM64: kernel: implement ACPI parking protocol

Message ID	1436959990-32054-1-git-send-email-lorenzo.pieralisi@arm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> To: linux-arm-kernel@lists.infradead.org Subject: [PATCH] ARM64: kernel: implement ACPI parking protocol Date: Wed, 15 Jul 2015 12:33:10 +0100 Message-Id: <1436959990-32054-1-git-send-email-lorenzo.pieralisi@arm.com> Precedence: list Cc: Mark Rutland <mark.rutland@arm.com>, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>, Catalin Marinas <catalin.marinas@arm.com>, Mark Salter <msalter@redhat.com>, Will Deacon <will.deacon@arm.com>, Hanjun Guo <hanjun.guo@linaro.org>, Sudeep Holla <sudeep.holla@arm.com>, Al Stone <ahs3@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Lorenzo Pieralisi July 15, 2015, 11:33 a.m. UTC

The SBBR and ACPI specifications allow ACPI based systems that do not
implement PSCI (eg systems with no EL3) to boot through the ACPI parking
protocol specification[1].

This patch implements the ACPI parking protocol CPU operations, and adds
code that eases parsing the parking protocol data structures to the
ARM64 SMP initializion carried out at the same time as cpus enumeration.

To wake-up the CPUs from the parked state, this patch implements a
wakeup IPI for ARM64 (ie arch_send_wakeup_ipi_mask()) that mirrors the
ARM one, so that a specific IPI is sent for wake-up purpose in order
to distinguish it from other IPI sources.

Given the current ACPI MADT parsing API, the patch implements a glue
layer that helps passing MADT GICC data structure from SMP initialization
code to the parking protocol implementation somewhat overriding the CPU
operations interfaces. This to avoid creating a completely trasparent
DT/ACPI CPU operations layer that would require creating opaque
structure handling for CPUs data (DT represents CPU through DT nodes, ACPI
through static MADT table entries), which seems overkill given that ACPI
on ARM64 mandates only two booting protocols (PSCI and parking protocol),
so there is no need for further protocol additions.

Based on the original work by Mark Salter <msalter@redhat.com>

[1] https://acpica.org/sites/acpica/files/MP%20Startup%20for%20ARM%20platforms.docx

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Al Stone <ahs3@redhat.com>
---
 arch/arm64/Kconfig                        |   4 +
 arch/arm64/include/asm/acpi.h             |  19 +++-
 arch/arm64/include/asm/hardirq.h          |   2 +-
 arch/arm64/include/asm/smp.h              |   9 ++
 arch/arm64/kernel/Makefile                |   1 +
 arch/arm64/kernel/acpi_parking_protocol.c | 153 ++++++++++++++++++++++++++++++
 arch/arm64/kernel/cpu_ops.c               |  27 +++++-
 arch/arm64/kernel/smp.c                   |  13 +++
 8 files changed, 222 insertions(+), 6 deletions(-)
 create mode 100644 arch/arm64/kernel/acpi_parking_protocol.c

Mark Salter July 16, 2015, 4:17 p.m. UTC | #1

On Wed, 2015-07-15 at 12:33 +0100, Lorenzo Pieralisi wrote:

> The SBBR and ACPI specifications allow ACPI based systems that do not
> implement PSCI (eg systems with no EL3) to boot through the ACPI parking
> protocol specification[1].
> 
> This patch implements the ACPI parking protocol CPU operations, and adds
> code that eases parsing the parking protocol data structures to the
> ARM64 SMP initializion carried out at the same time as cpus enumeration.
> 
> To wake-up the CPUs from the parked state, this patch implements a
> wakeup IPI for ARM64 (ie arch_send_wakeup_ipi_mask()) that mirrors the
> ARM one, so that a specific IPI is sent for wake-up purpose in order
> to distinguish it from other IPI sources.
> 
> Given the current ACPI MADT parsing API, the patch implements a glue
> layer that helps passing MADT GICC data structure from SMP initialization

Somewhat off topic, but this reminds once again, that it might be
better to generalize the ACPI_MADT_TYPE_GENERIC_INTERRUPT so that it
could be done in one pass. Currently, the SMP code and the GIC code
need boot-time info from ACPI_MADT_TYPE_GENERIC_INTERRUPT tables. This
patch adds parking protocol, and this patch:

 https://lkml.org/lkml/2015/5/1/203

need to get the PMU irq from the same table. I've been thinking of
something like a single loop through the table in setup.c with
callouts to registered users of the various bits of data. Those
users could register a handler function with something like an
ACPI_MADT_GIC_DECLARE() macro which would add a handler to a
special linker section.

I could work up a separate patch if others think it a worthwhile
thing to do.

> code to the parking protocol implementation somewhat overriding the CPU
> operations interfaces. This to avoid creating a completely trasparent
                                                             ^^^ transparent
> DT/ACPI CPU operations layer that would require creating opaque
> structure handling for CPUs data (DT represents CPU through DT nodes, ACPI
> through static MADT table entries), which seems overkill given that ACPI
> on ARM64 mandates only two booting protocols (PSCI and parking protocol),
> so there is no need for further protocol additions.
> 
> Based on the original work by Mark Salter <msalter@redhat.com>
> 
> [1] https://acpica.org/sites/acpica/files/MP%20Startup%20for%20ARM%20platforms.docx
> 
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Hanjun Guo <hanjun.guo@linaro.org>
> Cc: Sudeep Holla <sudeep.holla@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Mark Salter <msalter@redhat.com>
> Cc: Al Stone <ahs3@redhat.com>
> ---
>  arch/arm64/Kconfig                        |   4 +
>  arch/arm64/include/asm/acpi.h             |  19 +++-
>  arch/arm64/include/asm/hardirq.h          |   2 +-
>  arch/arm64/include/asm/smp.h              |   9 ++
>  arch/arm64/kernel/Makefile                |   1 +
>  arch/arm64/kernel/acpi_parking_protocol.c | 153 ++++++++++++++++++++++++++++++
>  arch/arm64/kernel/cpu_ops.c               |  27 +++++-
>  arch/arm64/kernel/smp.c                   |  13 +++
>  8 files changed, 222 insertions(+), 6 deletions(-)
>  create mode 100644 arch/arm64/kernel/acpi_parking_protocol.c
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 318175f..b01891e 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -517,6 +517,10 @@ config HOTPLUG_CPU
>  	  Say Y here to experiment with turning CPUs off and on.  CPUs
>  	  can be controlled through /sys/devices/system/cpu.
>  
> +config ARM64_ACPI_PARKING_PROTOCOL
> +	def_bool y
> +	depends on ACPI && SMP
> +
>  source kernel/Kconfig.preempt
>  
>  config UP_LATE_INIT
> diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
> index 406485e..7127db8 100644
> --- a/arch/arm64/include/asm/acpi.h
> +++ b/arch/arm64/include/asm/acpi.h
> @@ -88,8 +88,25 @@ void __init acpi_init_cpus(void);
>  static inline void acpi_init_cpus(void) { }
>  #endif /* CONFIG_ACPI */
>  
> +#ifdef CONFIG_ARM64_ACPI_PARKING_PROTOCOL
> +bool acpi_parking_protocol_valid(int cpu);
> +void __init
> +acpi_set_mailbox_entry(int cpu, struct acpi_madt_generic_interrupt *processor);
> +#else
> +static inline bool acpi_parking_protocol_valid(int cpu) { return false; }
> +static inline void
> +acpi_set_mailbox_entry(int cpu, struct acpi_madt_generic_interrupt *processor)
> +{}
> +#endif
> +
>  static inline const char *acpi_get_enable_method(int cpu)
>  {
> -	return acpi_psci_present() ? "psci" : NULL;
> +	if (acpi_psci_present())
> +		return "psci";
> +
> +	if (acpi_parking_protocol_valid(cpu))
> +		return "parking-protocol";
> +
> +	return NULL;
>  }
>  #endif /*_ASM_ACPI_H*/
> diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h
> index 6aae421..e8a3268 100644
> --- a/arch/arm64/include/asm/hardirq.h
> +++ b/arch/arm64/include/asm/hardirq.h
> @@ -20,7 +20,7 @@
>  #include <linux/threads.h>
>  #include <asm/irq.h>
>  
> -#define NR_IPI	5
> +#define NR_IPI	6
>  
>  typedef struct {
>  	unsigned int __softirq_pending;
> diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
> index db02be8..b73ca99 100644
> --- a/arch/arm64/include/asm/smp.h
> +++ b/arch/arm64/include/asm/smp.h
> @@ -68,6 +68,15 @@ extern void secondary_entry(void);
>  extern void arch_send_call_function_single_ipi(int cpu);
>  extern void arch_send_call_function_ipi_mask(const struct cpumask *mask);
>  
> +#ifdef CONFIG_ARM64_ACPI_PARKING_PROTOCOL
> +extern void arch_send_wakeup_ipi_mask(const struct cpumask *mask);
> +#else
> +static inline void arch_send_wakeup_ipi_mask(const struct cpumask *mask)
> +{
> +	BUILD_BUG();
> +}
> +#endif
> +
>  extern int __cpu_disable(void);
>  
>  extern void __cpu_die(unsigned int cpu);
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index 426d076..a766566 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -36,6 +36,7 @@ arm64-obj-$(CONFIG_EFI)			+= efi.o efi-stub.o efi-entry.o
>  arm64-obj-$(CONFIG_PCI)			+= pci.o
>  arm64-obj-$(CONFIG_ARMV8_DEPRECATED)	+= armv8_deprecated.o
>  arm64-obj-$(CONFIG_ACPI)		+= acpi.o
> +arm64-obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
>  
>  obj-y					+= $(arm64-obj-y) vdso/
>  obj-m					+= $(arm64-obj-m)
> diff --git a/arch/arm64/kernel/acpi_parking_protocol.c b/arch/arm64/kernel/acpi_parking_protocol.c
> new file mode 100644
> index 0000000..531c3ad
> --- /dev/null
> +++ b/arch/arm64/kernel/acpi_parking_protocol.c
> @@ -0,0 +1,153 @@
> +/*
> + * ARM64 ACPI Parking Protocol implementation
> + *
> + * Authors: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> + *	    Mark Salter <msalter@redhat.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +#include <linux/acpi.h>
> +#include <linux/types.h>
> +
> +#include <asm/cpu_ops.h>
> +
> +struct cpu_mailbox_entry {
> +	phys_addr_t mailbox_addr;
> +	u8 version;
> +	u8 gic_cpu_id;
> +};
> +
> +static struct cpu_mailbox_entry cpu_mailbox_entries[NR_CPUS];
> +
> +void __init acpi_set_mailbox_entry(int cpu,
> +				   struct acpi_madt_generic_interrupt *p)
> +{
> +	struct cpu_mailbox_entry *cpu_entry = &cpu_mailbox_entries[cpu];
> +
> +	cpu_entry->mailbox_addr = p->parked_address;
> +	cpu_entry->version = p->parking_version;
> +	cpu_entry->gic_cpu_id = p->cpu_interface_number;
> +}
> +
> +bool __init acpi_parking_protocol_valid(int cpu)
> +{
> +	struct cpu_mailbox_entry *cpu_entry = &cpu_mailbox_entries[cpu];
> +
> +	return cpu_entry->mailbox_addr && cpu_entry->version;
> +}
> +
> +static int acpi_parking_protocol_cpu_init(unsigned int cpu)
> +{
> +	pr_debug("%s: ACPI parked addr=%llx\n", __func__,
> +		  cpu_mailbox_entries[cpu].mailbox_addr);
> +
> +	return 0;
> +}
> +
> +static int acpi_parking_protocol_cpu_prepare(unsigned int cpu)
> +{
> +	return 0;
> +}
> +
> +struct parking_protocol_mailbox {
> +	__le32 cpu_id;
> +	__le32 reserved;
> +	__le64 entry_point;
> +};
> +
> +static int acpi_parking_protocol_cpu_boot(unsigned int cpu)
> +{
> +	struct cpu_mailbox_entry *cpu_entry = &cpu_mailbox_entries[cpu];
> +	struct parking_protocol_mailbox __iomem *mailbox;
> +	__le32 cpu_id;
> +
> +	/*
> +	 * Map mailbox memory with attribute device nGnRE (ie ioremap -
> +	 * this deviates from the parking protocol specifications since
> +	 * the mailboxes are required to be mapped nGnRnE; the attribute
> +	 * discrepancy is harmless insofar as the protocol specification
> +	 * is concerned).
> +	 * If the mailbox is mistakenly allocated in the linear mapping
> +	 * by FW ioremap will fail since the mapping will be prevented
> +	 * by the kernel (it clashes with the linear mapping attributes
> +	 * specifications).
> +	 */
> +	mailbox = ioremap(cpu_entry->mailbox_addr, sizeof(*mailbox));
> +	if (!mailbox)
> +		return -EIO;
> +
> +	cpu_id = readl_relaxed(&mailbox->cpu_id);
> +	/*
> +	 * Check if firmware has set-up the mailbox entry properly
> +	 * before kickstarting the respective cpu.
> +	 */
> +	if (cpu_id != ~0U) {
> +		iounmap(mailbox);
> +		return -ENXIO;
> +	}
> +
> +	/*
> +	 * We write the entry point and cpu id as LE regardless of the
> +	 * native endianness of the kernel. Therefore, any boot-loaders
> +	 * that read this address need to convert this address to the
> +	 * Boot-Loader's endianness before jumping.
> +	 */
> +	writeq_relaxed(__pa(secondary_entry), &mailbox->entry_point);
> +	writel_relaxed(cpu_entry->gic_cpu_id, &mailbox->cpu_id);
> +
> +	arch_send_wakeup_ipi_mask(cpumask_of(cpu));
> +
> +	iounmap(mailbox);
> +
> +	return 0;
> +}
> +
> +static void acpi_parking_protocol_cpu_postboot(void)
> +{
> +	int cpu = smp_processor_id();
> +	struct cpu_mailbox_entry *cpu_entry = &cpu_mailbox_entries[cpu];
> +	struct parking_protocol_mailbox __iomem *mailbox;
> +	__le64 entry_point;
> +
> +	/*
> +	 * Map mailbox memory with attribute device nGnRE (ie ioremap -
> +	 * this deviates from the parking protocol specifications since
> +	 * the mailboxes are required to be mapped nGnRnE; the attribute

Where is the nGnRnE requirement? I couldn't find it in the protocol doc.
Just curious.

> +	 * discrepancy is harmless insofar as the protocol specification
> +	 * is concerned).
> +	 * If the mailbox is mistakenly allocated in the linear mapping
> +	 * by FW ioremap will fail since the mapping will be prevented
> +	 * by the kernel (it clashes with the linear mapping attributes
> +	 * specifications).

The kernel will only add cached memory regions to linear mapping and
presumably, the FW will mark the mailboxes as uncached. Otherwise, it
is a FW bug. But I suppose we could run into problems with kernels
using 64K pagesize since firmware assumes 4k.

> +	 */
> +	mailbox = ioremap(cpu_entry->mailbox_addr, sizeof(*mailbox));
> +	if (!mailbox)
> +		return;
> +
> +	entry_point = readl_relaxed(&mailbox->entry_point);
> +	/*
> +	 * Check if firmware has cleared the entry_point as expected
> +	 * by the protocol specification.
> +	 */
> +	WARN_ON(entry_point);
> +
> +	iounmap(mailbox);
> +}
> +
> +const struct cpu_operations acpi_parking_protocol_ops = {
> +	.name		= "parking-protocol",
> +	.cpu_init	= acpi_parking_protocol_cpu_init,
> +	.cpu_prepare	= acpi_parking_protocol_cpu_prepare,
> +	.cpu_boot	= acpi_parking_protocol_cpu_boot,
> +	.cpu_postboot	= acpi_parking_protocol_cpu_postboot
> +};
> diff --git a/arch/arm64/kernel/cpu_ops.c b/arch/arm64/kernel/cpu_ops.c
> index 5ea337d..db31991 100644
> --- a/arch/arm64/kernel/cpu_ops.c
> +++ b/arch/arm64/kernel/cpu_ops.c
> @@ -25,11 +25,12 @@
>  #include <asm/smp_plat.h>
>  
>  extern const struct cpu_operations smp_spin_table_ops;
> +extern const struct cpu_operations acpi_parking_protocol_ops;
>  extern const struct cpu_operations cpu_psci_ops;
>  
>  const struct cpu_operations *cpu_ops[NR_CPUS];
>  
> -static const struct cpu_operations *supported_cpu_ops[] __initconst = {
> +static const struct cpu_operations *dt_supported_cpu_ops[] __initconst = {
>  #ifdef CONFIG_SMP
>  	&smp_spin_table_ops,
>  #endif
> @@ -37,9 +38,19 @@ static const struct cpu_operations *supported_cpu_ops[] __initconst = {
>  	NULL,
>  };
>  
> +static const struct cpu_operations *acpi_supported_cpu_ops[] __initconst = {
> +#ifdef CONFIG_ARM64_ACPI_PARKING_PROTOCOL
> +	&acpi_parking_protocol_ops,
> +#endif
> +	&cpu_psci_ops,
> +	NULL,
> +};
> +
>  static const struct cpu_operations * __init cpu_get_ops(const char *name)
>  {
> -	const struct cpu_operations **ops = supported_cpu_ops;
> +	const struct cpu_operations **ops;
> +
> +	ops = acpi_disabled ? dt_supported_cpu_ops : acpi_supported_cpu_ops;
>  
>  	while (*ops) {
>  		if (!strcmp(name, (*ops)->name))
> @@ -77,8 +88,16 @@ static const char *__init cpu_read_enable_method(int cpu)
>  		}
>  	} else {
>  		enable_method = acpi_get_enable_method(cpu);
> -		if (!enable_method)
> -			pr_err("Unsupported ACPI enable-method\n");
> +		if (!enable_method) {
> +			/*
> +			 * In ACPI systems the boot CPU does not require
> +			 * checking the enable method since for some
> +			 * boot protocol (ie parking protocol) it need not
> +			 * be initialized. Don't warn spuriously.
> +			 */
> +			if (cpu != 0)
> +				pr_err("Unsupported ACPI enable-method\n");
> +		}
>  	}
>  
>  	return enable_method;
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 50fb469..1d98f2d 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -69,6 +69,7 @@ enum ipi_msg_type {
>  	IPI_CPU_STOP,
>  	IPI_TIMER,
>  	IPI_IRQ_WORK,
> +	IPI_WAKEUP
>  };
>  
>  /*
> @@ -428,6 +429,8 @@ acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor)
>  	/* map the logical cpu id to cpu MPIDR */
>  	cpu_logical_map(cpu_count) = hwid;
>  
> +	acpi_set_mailbox_entry(cpu_count, processor);
> +
>  	cpu_count++;
>  }
>  
> @@ -610,6 +613,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
>  	S(IPI_CPU_STOP, "CPU stop interrupts"),
>  	S(IPI_TIMER, "Timer broadcast interrupts"),
>  	S(IPI_IRQ_WORK, "IRQ work interrupts"),
> +	S(IPI_WAKEUP, "CPU wakeup interrupts"),
>  };
>  
>  static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
> @@ -653,6 +657,13 @@ void arch_send_call_function_single_ipi(int cpu)
>  	smp_cross_call(cpumask_of(cpu), IPI_CALL_FUNC);
>  }
>  
> +#ifdef CONFIG_ARM64_ACPI_PARKING_PROTOCOL
> +void arch_send_wakeup_ipi_mask(const struct cpumask *mask)
> +{
> +	smp_cross_call(mask, IPI_WAKEUP);
> +}
> +#endif
> +
>  #ifdef CONFIG_IRQ_WORK
>  void arch_irq_work_raise(void)
>  {
> @@ -729,6 +740,8 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
>  		irq_exit();
>  		break;
>  #endif
> +	case IPI_WAKEUP:
> +		break;
>  
>  	default:
>  		pr_crit("CPU%u: Unknown IPI message 0x%x\n", cpu, ipinr);

Lorenzo Pieralisi July 16, 2015, 5:12 p.m. UTC | #2

On Thu, Jul 16, 2015 at 05:17:11PM +0100, Mark Salter wrote:
> On Wed, 2015-07-15 at 12:33 +0100, Lorenzo Pieralisi wrote:
> 
> > The SBBR and ACPI specifications allow ACPI based systems that do not
> > implement PSCI (eg systems with no EL3) to boot through the ACPI parking
> > protocol specification[1].
> >
> > This patch implements the ACPI parking protocol CPU operations, and adds
> > code that eases parsing the parking protocol data structures to the
> > ARM64 SMP initializion carried out at the same time as cpus enumeration.
> >
> > To wake-up the CPUs from the parked state, this patch implements a
> > wakeup IPI for ARM64 (ie arch_send_wakeup_ipi_mask()) that mirrors the
> > ARM one, so that a specific IPI is sent for wake-up purpose in order
> > to distinguish it from other IPI sources.
> >
> > Given the current ACPI MADT parsing API, the patch implements a glue
> > layer that helps passing MADT GICC data structure from SMP initialization
> 
> Somewhat off topic, but this reminds once again, that it might be
> better to generalize the ACPI_MADT_TYPE_GENERIC_INTERRUPT so that it
> could be done in one pass. Currently, the SMP code and the GIC code
> need boot-time info from ACPI_MADT_TYPE_GENERIC_INTERRUPT tables. This
> patch adds parking protocol, and this patch:
> 
>  https://lkml.org/lkml/2015/5/1/203
> 
> need to get the PMU irq from the same table. I've been thinking of
> something like a single loop through the table in setup.c with
> callouts to registered users of the various bits of data.

It is not off topic at all, it is bang on topic. I hate the code
as it stands forcing parsing the MADT in multiple places at different
times, that's why I added hooks to set the parking protocol entries
from smp.c and I know that's ugly, I posted it like this on purpose
to get feedback.

> Those users could register a handler function with something like an
> ACPI_MADT_GIC_DECLARE() macro which would add a handler to a
> special linker section.
> 
> I could work up a separate patch if others think it a worthwhile
> thing to do.

Something simpler ? Like stashing the GICC entries (I know we need
permanent table mappings for that to work unless we create data
structures out of the MADT entries with the fields we are interested in)
for possible CPUS ?

> > code to the parking protocol implementation somewhat overriding the CPU
> > operations interfaces. This to avoid creating a completely trasparent
>                                                              ^^^ transparent

Ok.

[...]

> > +static int acpi_parking_protocol_cpu_boot(unsigned int cpu)
> > +{
> > +     struct cpu_mailbox_entry *cpu_entry = &cpu_mailbox_entries[cpu];
> > +     struct parking_protocol_mailbox __iomem *mailbox;
> > +     __le32 cpu_id;
> > +
> > +     /*
> > +      * Map mailbox memory with attribute device nGnRE (ie ioremap -
> > +      * this deviates from the parking protocol specifications since
> > +      * the mailboxes are required to be mapped nGnRnE; the attribute
> > +      * discrepancy is harmless insofar as the protocol specification
> > +      * is concerned).
> > +      * If the mailbox is mistakenly allocated in the linear mapping
> > +      * by FW ioremap will fail since the mapping will be prevented
> > +      * by the kernel (it clashes with the linear mapping attributes
> > +      * specifications).
> > +      */
> > +     mailbox = ioremap(cpu_entry->mailbox_addr, sizeof(*mailbox));
> > +     if (!mailbox)
> > +             return -EIO;
> > +
> > +     cpu_id = readl_relaxed(&mailbox->cpu_id);
> > +     /*
> > +      * Check if firmware has set-up the mailbox entry properly
> > +      * before kickstarting the respective cpu.
> > +      */
> > +     if (cpu_id != ~0U) {
> > +             iounmap(mailbox);
> > +             return -ENXIO;
> > +     }
> > +
> > +     /*
> > +      * We write the entry point and cpu id as LE regardless of the
> > +      * native endianness of the kernel. Therefore, any boot-loaders
> > +      * that read this address need to convert this address to the
> > +      * Boot-Loader's endianness before jumping.
> > +      */
> > +     writeq_relaxed(__pa(secondary_entry), &mailbox->entry_point);
> > +     writel_relaxed(cpu_entry->gic_cpu_id, &mailbox->cpu_id);
> > +
> > +     arch_send_wakeup_ipi_mask(cpumask_of(cpu));
> > +
> > +     iounmap(mailbox);
> > +
> > +     return 0;
> > +}
> > +
> > +static void acpi_parking_protocol_cpu_postboot(void)
> > +{
> > +     int cpu = smp_processor_id();
> > +     struct cpu_mailbox_entry *cpu_entry = &cpu_mailbox_entries[cpu];
> > +     struct parking_protocol_mailbox __iomem *mailbox;
> > +     __le64 entry_point;
> > +
> > +     /*
> > +      * Map mailbox memory with attribute device nGnRE (ie ioremap -
> > +      * this deviates from the parking protocol specifications since
> > +      * the mailboxes are required to be mapped nGnRnE; the attribute
> 
> Where is the nGnRnE requirement? I couldn't find it in the protocol doc.
> Just curious.

Page 11 (3.5 Mailbox Access Rules), in the Note
"...On ARM v8 Systems, the OS must map the memory as Device-nGnRnE".

> > +      * discrepancy is harmless insofar as the protocol specification
> > +      * is concerned).
> > +      * If the mailbox is mistakenly allocated in the linear mapping
> > +      * by FW ioremap will fail since the mapping will be prevented
> > +      * by the kernel (it clashes with the linear mapping attributes
> > +      * specifications).
> 
> The kernel will only add cached memory regions to linear mapping and
> presumably, the FW will mark the mailboxes as uncached. Otherwise, it
> is a FW bug. But I suppose we could run into problems with kernels
> using 64K pagesize since firmware assumes 4k.

Nope, ioremap takes care of that, everything should be fine.

Did you give this patch a go ?

Thanks,
Lorenzo

> > +      */
> > +     mailbox = ioremap(cpu_entry->mailbox_addr, sizeof(*mailbox));
> > +     if (!mailbox)
> > +             return;
> > +
> > +     entry_point = readl_relaxed(&mailbox->entry_point);
> > +     /*
> > +      * Check if firmware has cleared the entry_point as expected
> > +      * by the protocol specification.
> > +      */
> > +     WARN_ON(entry_point);
> > +
> > +     iounmap(mailbox);
> > +}
> > +
> > +const struct cpu_operations acpi_parking_protocol_ops = {
> > +     .name           = "parking-protocol",
> > +     .cpu_init       = acpi_parking_protocol_cpu_init,
> > +     .cpu_prepare    = acpi_parking_protocol_cpu_prepare,
> > +     .cpu_boot       = acpi_parking_protocol_cpu_boot,
> > +     .cpu_postboot   = acpi_parking_protocol_cpu_postboot
> > +};
> > diff --git a/arch/arm64/kernel/cpu_ops.c b/arch/arm64/kernel/cpu_ops.c
> > index 5ea337d..db31991 100644
> > --- a/arch/arm64/kernel/cpu_ops.c
> > +++ b/arch/arm64/kernel/cpu_ops.c
> > @@ -25,11 +25,12 @@
> >  #include <asm/smp_plat.h>
> >
> >  extern const struct cpu_operations smp_spin_table_ops;
> > +extern const struct cpu_operations acpi_parking_protocol_ops;
> >  extern const struct cpu_operations cpu_psci_ops;
> >
> >  const struct cpu_operations *cpu_ops[NR_CPUS];
> >
> > -static const struct cpu_operations *supported_cpu_ops[] __initconst = {
> > +static const struct cpu_operations *dt_supported_cpu_ops[] __initconst = {
> >  #ifdef CONFIG_SMP
> >       &smp_spin_table_ops,
> >  #endif
> > @@ -37,9 +38,19 @@ static const struct cpu_operations *supported_cpu_ops[] __initconst = {
> >       NULL,
> >  };
> >
> > +static const struct cpu_operations *acpi_supported_cpu_ops[] __initconst = {
> > +#ifdef CONFIG_ARM64_ACPI_PARKING_PROTOCOL
> > +     &acpi_parking_protocol_ops,
> > +#endif
> > +     &cpu_psci_ops,
> > +     NULL,
> > +};
> > +
> >  static const struct cpu_operations * __init cpu_get_ops(const char *name)
> >  {
> > -     const struct cpu_operations **ops = supported_cpu_ops;
> > +     const struct cpu_operations **ops;
> > +
> > +     ops = acpi_disabled ? dt_supported_cpu_ops : acpi_supported_cpu_ops;
> >
> >       while (*ops) {
> >               if (!strcmp(name, (*ops)->name))
> > @@ -77,8 +88,16 @@ static const char *__init cpu_read_enable_method(int cpu)
> >               }
> >       } else {
> >               enable_method = acpi_get_enable_method(cpu);
> > -             if (!enable_method)
> > -                     pr_err("Unsupported ACPI enable-method\n");
> > +             if (!enable_method) {
> > +                     /*
> > +                      * In ACPI systems the boot CPU does not require
> > +                      * checking the enable method since for some
> > +                      * boot protocol (ie parking protocol) it need not
> > +                      * be initialized. Don't warn spuriously.
> > +                      */
> > +                     if (cpu != 0)
> > +                             pr_err("Unsupported ACPI enable-method\n");
> > +             }
> >       }
> >
> >       return enable_method;
> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > index 50fb469..1d98f2d 100644
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -69,6 +69,7 @@ enum ipi_msg_type {
> >       IPI_CPU_STOP,
> >       IPI_TIMER,
> >       IPI_IRQ_WORK,
> > +     IPI_WAKEUP
> >  };
> >
> >  /*
> > @@ -428,6 +429,8 @@ acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor)
> >       /* map the logical cpu id to cpu MPIDR */
> >       cpu_logical_map(cpu_count) = hwid;
> >
> > +     acpi_set_mailbox_entry(cpu_count, processor);
> > +
> >       cpu_count++;
> >  }
> >
> > @@ -610,6 +613,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
> >       S(IPI_CPU_STOP, "CPU stop interrupts"),
> >       S(IPI_TIMER, "Timer broadcast interrupts"),
> >       S(IPI_IRQ_WORK, "IRQ work interrupts"),
> > +     S(IPI_WAKEUP, "CPU wakeup interrupts"),
> >  };
> >
> >  static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
> > @@ -653,6 +657,13 @@ void arch_send_call_function_single_ipi(int cpu)
> >       smp_cross_call(cpumask_of(cpu), IPI_CALL_FUNC);
> >  }
> >
> > +#ifdef CONFIG_ARM64_ACPI_PARKING_PROTOCOL
> > +void arch_send_wakeup_ipi_mask(const struct cpumask *mask)
> > +{
> > +     smp_cross_call(mask, IPI_WAKEUP);
> > +}
> > +#endif
> > +
> >  #ifdef CONFIG_IRQ_WORK
> >  void arch_irq_work_raise(void)
> >  {
> > @@ -729,6 +740,8 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
> >               irq_exit();
> >               break;
> >  #endif
> > +     case IPI_WAKEUP:
> > +             break;
> >
> >       default:
> >               pr_crit("CPU%u: Unknown IPI message 0x%x\n", cpu, ipinr);
>

Mark Salter July 16, 2015, 5:40 p.m. UTC | #3

On Thu, 2015-07-16 at 18:12 +0100, Lorenzo Pieralisi wrote:

> On Thu, Jul 16, 2015 at 05:17:11PM +0100, Mark Salter wrote:

> > On Wed, 2015-07-15 at 12:33 +0100, Lorenzo Pieralisi wrote:
> > 

> > > The SBBR and ACPI specifications allow ACPI based systems that do not
> > > implement PSCI (eg systems with no EL3) to boot through the ACPI parking
> > > protocol specification[1].
> > > 
> > > This patch implements the ACPI parking protocol CPU operations, and adds
> > > code that eases parsing the parking protocol data structures to the
> > > ARM64 SMP initializion carried out at the same time as cpus enumeration.
> > > 
> > > To wake-up the CPUs from the parked state, this patch implements a
> > > wakeup IPI for ARM64 (ie arch_send_wakeup_ipi_mask()) that mirrors the
> > > ARM one, so that a specific IPI is sent for wake-up purpose in order
> > > to distinguish it from other IPI sources.
> > > 
> > > Given the current ACPI MADT parsing API, the patch implements a glue
> > > layer that helps passing MADT GICC data structure from SMP initialization
> > 
> > Somewhat off topic, but this reminds once again, that it might be
> > better to generalize the ACPI_MADT_TYPE_GENERIC_INTERRUPT so that it
> > could be done in one pass. Currently, the SMP code and the GIC code
> > need boot-time info from ACPI_MADT_TYPE_GENERIC_INTERRUPT tables. This
> > patch adds parking protocol, and this patch:
> > 
> >  https://lkml.org/lkml/2015/5/1/203
> > 
> > need to get the PMU irq from the same table. I've been thinking of
> > something like a single loop through the table in setup.c with
> > callouts to registered users of the various bits of data.
> 
> It is not off topic at all, it is bang on topic. I hate the code
> as it stands forcing parsing the MADT in multiple places at different
> times, that's why I added hooks to set the parking protocol entries
> from smp.c and I know that's ugly, I posted it like this on purpose
> to get feedback.
> 

> > Those users could register a handler function with something like an
> > ACPI_MADT_GIC_DECLARE() macro which would add a handler to a
> > special linker section.
> > 
> > I could work up a separate patch if others think it a worthwhile
> > thing to do.
> 
> Something simpler ? Like stashing the GICC entries (I know we need
> permanent table mappings for that to work unless we create data
> structures out of the MADT entries with the fields we are interested in)
> for possible CPUS ?
> 

I thought about that too. Permanent mapping doesn't save you from
having to parse more than once (validating the entries over and
over). Parsing once and saving the info is better but if done
generically, could end up wasting memory. I thought the handler
callout would work best so that the user of the info could decide
what needed saving and how to save it most efficiently. I'll put
together a patch and we can go from there to improve it.

> > > code to the parking protocol implementation somewhat overriding the CPU
> > > operations interfaces. This to avoid creating a completely trasparent
> >                                                              ^^^ transparent
> 
> Ok.
> 
> [...]
> 


> > > +static int acpi_parking_protocol_cpu_boot(unsigned int cpu)
> > > +{
> > > +     struct cpu_mailbox_entry *cpu_entry = &cpu_mailbox_entries[cpu];
> > > +     struct parking_protocol_mailbox __iomem *mailbox;
> > > +     __le32 cpu_id;
> > > +
> > > +     /*
> > > +      * Map mailbox memory with attribute device nGnRE (ie ioremap -
> > > +      * this deviates from the parking protocol specifications since
> > > +      * the mailboxes are required to be mapped nGnRnE; the attribute
> > > +      * discrepancy is harmless insofar as the protocol specification
> > > +      * is concerned).
> > > +      * If the mailbox is mistakenly allocated in the linear mapping
> > > +      * by FW ioremap will fail since the mapping will be prevented
> > > +      * by the kernel (it clashes with the linear mapping attributes
> > > +      * specifications).
> > > +      */
> > > +     mailbox = ioremap(cpu_entry->mailbox_addr, sizeof(*mailbox));
> > > +     if (!mailbox)
> > > +             return -EIO;
> > > +
> > > +     cpu_id = readl_relaxed(&mailbox->cpu_id);
> > > +     /*
> > > +      * Check if firmware has set-up the mailbox entry properly
> > > +      * before kickstarting the respective cpu.
> > > +      */
> > > +     if (cpu_id != ~0U) {
> > > +             iounmap(mailbox);
> > > +             return -ENXIO;
> > > +     }
> > > +
> > > +     /*
> > > +      * We write the entry point and cpu id as LE regardless of the
> > > +      * native endianness of the kernel. Therefore, any boot-loaders
> > > +      * that read this address need to convert this address to the
> > > +      * Boot-Loader's endianness before jumping.
> > > +      */
> > > +     writeq_relaxed(__pa(secondary_entry), &mailbox->entry_point);
> > > +     writel_relaxed(cpu_entry->gic_cpu_id, &mailbox->cpu_id);
> > > +
> > > +     arch_send_wakeup_ipi_mask(cpumask_of(cpu));
> > > +
> > > +     iounmap(mailbox);
> > > +
> > > +     return 0;
> > > +}
> > > +
> > > +static void acpi_parking_protocol_cpu_postboot(void)
> > > +{
> > > +     int cpu = smp_processor_id();
> > > +     struct cpu_mailbox_entry *cpu_entry = &cpu_mailbox_entries[cpu];
> > > +     struct parking_protocol_mailbox __iomem *mailbox;
> > > +     __le64 entry_point;
> > > +
> > > +     /*
> > > +      * Map mailbox memory with attribute device nGnRE (ie ioremap -
> > > +      * this deviates from the parking protocol specifications since
> > > +      * the mailboxes are required to be mapped nGnRnE; the attribute
> > 
> > Where is the nGnRnE requirement? I couldn't find it in the protocol doc.
> > Just curious.
> 
> Page 11 (3.5 Mailbox Access Rules), in the Note
> "...On ARM v8 Systems, the OS must map the memory as Device-nGnRnE".
> 

That explains it. The document you linked to only has 10 pages and no
mention of specific attributes. Maybe you have one which is not public
yet.

> > > +      * discrepancy is harmless insofar as the protocol specification
> > > +      * is concerned).
> > > +      * If the mailbox is mistakenly allocated in the linear mapping
> > > +      * by FW ioremap will fail since the mapping will be prevented
> > > +      * by the kernel (it clashes with the linear mapping attributes
> > > +      * specifications).
> > 
> > The kernel will only add cached memory regions to linear mapping and
> > presumably, the FW will mark the mailboxes as uncached. Otherwise, it
> > is a FW bug. But I suppose we could run into problems with kernels
> > using 64K pagesize since firmware assumes 4k.
> 
> Nope, ioremap takes care of that, everything should be fine.

The mailbox is 4K. If it is next to a cached UEFI region, the kernel may
have to overlap the mailbox with a cached 64K mapping in order to include
the adjoining UEFI region in the linear map. Then the ioremap would fail
because the mailbox is included in the linear mapping.

> 
> Did you give this patch a go ?

No. I have nothing to try it on. All firmware implementations I have
access to mark the mailboxes as cached memory. And one requires an
event rather than irq for wakeup because it tries to satisfy both
spin-table and parking protocol at the same time. There's a sad amount
of necessary hackery right now. I'm very keen on psci saving us from
all that on future platforms. And non-compliant existing firmwares
will need to get compliant whenever parking protocol gets pulled in
upstream.

> 
> Thanks,
> Lorenzo
> 


> > > +      */
> > > +     mailbox = ioremap(cpu_entry->mailbox_addr, sizeof(*mailbox));
> > > +     if (!mailbox)
> > > +             return;
> > > +
> > > +     entry_point = readl_relaxed(&mailbox->entry_point);
> > > +     /*
> > > +      * Check if firmware has cleared the entry_point as expected
> > > +      * by the protocol specification.
> > > +      */
> > > +     WARN_ON(entry_point);
> > > +
> > > +     iounmap(mailbox);
> > > +}
> > > +
> > > +const struct cpu_operations acpi_parking_protocol_ops = {
> > > +     .name           = "parking-protocol",
> > > +     .cpu_init       = acpi_parking_protocol_cpu_init,
> > > +     .cpu_prepare    = acpi_parking_protocol_cpu_prepare,
> > > +     .cpu_boot       = acpi_parking_protocol_cpu_boot,
> > > +     .cpu_postboot   = acpi_parking_protocol_cpu_postboot
> > > +};
> > > diff --git a/arch/arm64/kernel/cpu_ops.c b/arch/arm64/kernel/cpu_ops.c
> > > index 5ea337d..db31991 100644
> > > --- a/arch/arm64/kernel/cpu_ops.c
> > > +++ b/arch/arm64/kernel/cpu_ops.c
> > > @@ -25,11 +25,12 @@
> > >  #include <asm/smp_plat.h>
> > > 
> > >  extern const struct cpu_operations smp_spin_table_ops;
> > > +extern const struct cpu_operations acpi_parking_protocol_ops;
> > >  extern const struct cpu_operations cpu_psci_ops;
> > > 
> > >  const struct cpu_operations *cpu_ops[NR_CPUS];
> > > 
> > > -static const struct cpu_operations *supported_cpu_ops[] __initconst = {
> > > +static const struct cpu_operations *dt_supported_cpu_ops[] __initconst = {
> > >  #ifdef CONFIG_SMP
> > >       &smp_spin_table_ops,
> > >  #endif
> > > @@ -37,9 +38,19 @@ static const struct cpu_operations *supported_cpu_ops[] __initconst = {
> > >       NULL,
> > >  };
> > > 
> > > +static const struct cpu_operations *acpi_supported_cpu_ops[] __initconst = {
> > > +#ifdef CONFIG_ARM64_ACPI_PARKING_PROTOCOL
> > > +     &acpi_parking_protocol_ops,
> > > +#endif
> > > +     &cpu_psci_ops,
> > > +     NULL,
> > > +};
> > > +
> > >  static const struct cpu_operations * __init cpu_get_ops(const char *name)
> > >  {
> > > -     const struct cpu_operations **ops = supported_cpu_ops;
> > > +     const struct cpu_operations **ops;
> > > +
> > > +     ops = acpi_disabled ? dt_supported_cpu_ops : acpi_supported_cpu_ops;
> > > 
> > >       while (*ops) {
> > >               if (!strcmp(name, (*ops)->name))
> > > @@ -77,8 +88,16 @@ static const char *__init cpu_read_enable_method(int cpu)
> > >               }
> > >       } else {
> > >               enable_method = acpi_get_enable_method(cpu);
> > > -             if (!enable_method)
> > > -                     pr_err("Unsupported ACPI enable-method\n");
> > > +             if (!enable_method) {
> > > +                     /*
> > > +                      * In ACPI systems the boot CPU does not require
> > > +                      * checking the enable method since for some
> > > +                      * boot protocol (ie parking protocol) it need not
> > > +                      * be initialized. Don't warn spuriously.
> > > +                      */
> > > +                     if (cpu != 0)
> > > +                             pr_err("Unsupported ACPI enable-method\n");
> > > +             }
> > >       }
> > > 
> > >       return enable_method;
> > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > > index 50fb469..1d98f2d 100644
> > > --- a/arch/arm64/kernel/smp.c
> > > +++ b/arch/arm64/kernel/smp.c
> > > @@ -69,6 +69,7 @@ enum ipi_msg_type {
> > >       IPI_CPU_STOP,
> > >       IPI_TIMER,
> > >       IPI_IRQ_WORK,
> > > +     IPI_WAKEUP
> > >  };
> > > 
> > >  /*
> > > @@ -428,6 +429,8 @@ acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor)
> > >       /* map the logical cpu id to cpu MPIDR */
> > >       cpu_logical_map(cpu_count) = hwid;
> > > 
> > > +     acpi_set_mailbox_entry(cpu_count, processor);
> > > +
> > >       cpu_count++;
> > >  }
> > > 
> > > @@ -610,6 +613,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
> > >       S(IPI_CPU_STOP, "CPU stop interrupts"),
> > >       S(IPI_TIMER, "Timer broadcast interrupts"),
> > >       S(IPI_IRQ_WORK, "IRQ work interrupts"),
> > > +     S(IPI_WAKEUP, "CPU wakeup interrupts"),
> > >  };
> > > 
> > >  static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
> > > @@ -653,6 +657,13 @@ void arch_send_call_function_single_ipi(int cpu)
> > >       smp_cross_call(cpumask_of(cpu), IPI_CALL_FUNC);
> > >  }
> > > 
> > > +#ifdef CONFIG_ARM64_ACPI_PARKING_PROTOCOL
> > > +void arch_send_wakeup_ipi_mask(const struct cpumask *mask)
> > > +{
> > > +     smp_cross_call(mask, IPI_WAKEUP);
> > > +}
> > > +#endif
> > > +
> > >  #ifdef CONFIG_IRQ_WORK
> > >  void arch_irq_work_raise(void)
> > >  {
> > > @@ -729,6 +740,8 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
> > >               irq_exit();
> > >               break;
> > >  #endif
> > > +     case IPI_WAKEUP:
> > > +             break;
> > > 
> > >       default:
> > >               pr_crit("CPU%u: Unknown IPI message 0x%x\n", cpu, ipinr);
> >

Al Stone July 16, 2015, 6:05 p.m. UTC | #4

On 07/16/2015 11:12 AM, Lorenzo Pieralisi wrote:
> On Thu, Jul 16, 2015 at 05:17:11PM +0100, Mark Salter wrote:
>> On Wed, 2015-07-15 at 12:33 +0100, Lorenzo Pieralisi wrote:
>>
>>> The SBBR and ACPI specifications allow ACPI based systems that do not
>>> implement PSCI (eg systems with no EL3) to boot through the ACPI parking
>>> protocol specification[1].
>>>
>>> This patch implements the ACPI parking protocol CPU operations, and adds
>>> code that eases parsing the parking protocol data structures to the
>>> ARM64 SMP initializion carried out at the same time as cpus enumeration.
>>>
>>> To wake-up the CPUs from the parked state, this patch implements a
>>> wakeup IPI for ARM64 (ie arch_send_wakeup_ipi_mask()) that mirrors the
>>> ARM one, so that a specific IPI is sent for wake-up purpose in order
>>> to distinguish it from other IPI sources.
>>>
>>> Given the current ACPI MADT parsing API, the patch implements a glue
>>> layer that helps passing MADT GICC data structure from SMP initialization
>>
>> Somewhat off topic, but this reminds once again, that it might be
>> better to generalize the ACPI_MADT_TYPE_GENERIC_INTERRUPT so that it
>> could be done in one pass. Currently, the SMP code and the GIC code
>> need boot-time info from ACPI_MADT_TYPE_GENERIC_INTERRUPT tables. This
>> patch adds parking protocol, and this patch:
>>
>>  https://lkml.org/lkml/2015/5/1/203
>>
>> need to get the PMU irq from the same table. I've been thinking of
>> something like a single loop through the table in setup.c with
>> callouts to registered users of the various bits of data.
> 
> It is not off topic at all, it is bang on topic. I hate the code
> as it stands forcing parsing the MADT in multiple places at different
> times, that's why I added hooks to set the parking protocol entries
> from smp.c and I know that's ugly, I posted it like this on purpose
> to get feedback.
> 
>> Those users could register a handler function with something like an
>> ACPI_MADT_GIC_DECLARE() macro which would add a handler to a
>> special linker section.
>>
>> I could work up a separate patch if others think it a worthwhile
>> thing to do.
> 
> Something simpler ? Like stashing the GICC entries (I know we need
> permanent table mappings for that to work unless we create data
> structures out of the MADT entries with the fields we are interested in)
> for possible CPUS ?

Right -- it seems like it would be pretty straightforward to traverse
the MADT once, and capture all the GICC subtables in a list of pointers,
or even make a copy of the contents of all of them.  Each user of the
content would still have to traverse the list, though, unless data is
reduced to only those things that are needed and the rest tossed.  Even
if only keep the info needed, I've still got a list of entries, one for
each CPU, I think, that I would still have to traverse.

If I have to register a handler to gather a specific bit of data needed,
I'm not understanding how that's any less complicated than what I need
to do today -- call acpi_parse_table_madt() with my GICC handler.  Wouldn't
both GICC subtable handlers be just about the same code?

I'm probably missing something obvious, but I'm not understanding what problem
is being solved....

Mark Salter July 16, 2015, 6:23 p.m. UTC | #5

On Thu, 2015-07-16 at 12:05 -0600, Al Stone wrote:

> On 07/16/2015 11:12 AM, Lorenzo Pieralisi wrote:

> > On Thu, Jul 16, 2015 at 05:17:11PM +0100, Mark Salter wrote:

> > > On Wed, 2015-07-15 at 12:33 +0100, Lorenzo Pieralisi wrote:
> > > 

> > > > The SBBR and ACPI specifications allow ACPI based systems that do not
> > > > implement PSCI (eg systems with no EL3) to boot through the ACPI parking
> > > > protocol specification[1].
> > > > 
> > > > This patch implements the ACPI parking protocol CPU operations, and adds
> > > > code that eases parsing the parking protocol data structures to the
> > > > ARM64 SMP initializion carried out at the same time as cpus enumeration.
> > > > 
> > > > To wake-up the CPUs from the parked state, this patch implements a
> > > > wakeup IPI for ARM64 (ie arch_send_wakeup_ipi_mask()) that mirrors the
> > > > ARM one, so that a specific IPI is sent for wake-up purpose in order
> > > > to distinguish it from other IPI sources.
> > > > 
> > > > Given the current ACPI MADT parsing API, the patch implements a glue
> > > > layer that helps passing MADT GICC data structure from SMP initialization
> > > 
> > > Somewhat off topic, but this reminds once again, that it might be
> > > better to generalize the ACPI_MADT_TYPE_GENERIC_INTERRUPT so that it
> > > could be done in one pass. Currently, the SMP code and the GIC code
> > > need boot-time info from ACPI_MADT_TYPE_GENERIC_INTERRUPT tables. This
> > > patch adds parking protocol, and this patch:
> > > 
> > >  https://lkml.org/lkml/2015/5/1/203
> > > 
> > > need to get the PMU irq from the same table. I've been thinking of
> > > something like a single loop through the table in setup.c with
> > > callouts to registered users of the various bits of data.
> > 
> > It is not off topic at all, it is bang on topic. I hate the code
> > as it stands forcing parsing the MADT in multiple places at different
> > times, that's why I added hooks to set the parking protocol entries
> > from smp.c and I know that's ugly, I posted it like this on purpose
> > to get feedback.
> > 

> > > Those users could register a handler function with something like an
> > > ACPI_MADT_GIC_DECLARE() macro which would add a handler to a
> > > special linker section.
> > > 
> > > I could work up a separate patch if others think it a worthwhile
> > > thing to do.
> > 
> > Something simpler ? Like stashing the GICC entries (I know we need
> > permanent table mappings for that to work unless we create data
> > structures out of the MADT entries with the fields we are interested in)
> > for possible CPUS ?
> 
> Right -- it seems like it would be pretty straightforward to traverse
> the MADT once, and capture all the GICC subtables in a list of pointers,
> or even make a copy of the contents of all of them.  Each user of the
> content would still have to traverse the list, though, unless data is
> reduced to only those things that are needed and the rest tossed.  Even
> if only keep the info needed, I've still got a list of entries, one for
> each CPU, I think, that I would still have to traverse.
> 
> If I have to register a handler to gather a specific bit of data needed,
> I'm not understanding how that's any less complicated than what I need
> to do today -- call acpi_parse_table_madt() with my GICC handler.  Wouldn't
> both GICC subtable handlers be just about the same code?
> 
> I'm probably missing something obvious, but I'm not understanding what problem
> is being solved....
> 

The difference is that GIC code and SMP code each loop
through the tables getting the info they need. If they
registered a handler, there is only one loop through
the tables regardless of how many handlers get registered.
Having each loop through as currently done isn't really
a performance issue (its boot-time only right now), but
there is duplicated code wrt validating the table entry
and calling the acpi API to do it. All of that could be
done in one place instead of duplicating it in different
places.

Al Stone July 16, 2015, 9:02 p.m. UTC | #6

On 07/16/2015 12:23 PM, Mark Salter wrote:
> On Thu, 2015-07-16 at 12:05 -0600, Al Stone wrote:
> 
>> On 07/16/2015 11:12 AM, Lorenzo Pieralisi wrote:
> 
>>> On Thu, Jul 16, 2015 at 05:17:11PM +0100, Mark Salter wrote:
> 
>>>> On Wed, 2015-07-15 at 12:33 +0100, Lorenzo Pieralisi wrote:
>>>>
> 
>>>>> The SBBR and ACPI specifications allow ACPI based systems that do not
>>>>> implement PSCI (eg systems with no EL3) to boot through the ACPI parking
>>>>> protocol specification[1].
>>>>>
>>>>> This patch implements the ACPI parking protocol CPU operations, and adds
>>>>> code that eases parsing the parking protocol data structures to the
>>>>> ARM64 SMP initializion carried out at the same time as cpus enumeration.
>>>>>
>>>>> To wake-up the CPUs from the parked state, this patch implements a
>>>>> wakeup IPI for ARM64 (ie arch_send_wakeup_ipi_mask()) that mirrors the
>>>>> ARM one, so that a specific IPI is sent for wake-up purpose in order
>>>>> to distinguish it from other IPI sources.
>>>>>
>>>>> Given the current ACPI MADT parsing API, the patch implements a glue
>>>>> layer that helps passing MADT GICC data structure from SMP initialization
>>>>
>>>> Somewhat off topic, but this reminds once again, that it might be
>>>> better to generalize the ACPI_MADT_TYPE_GENERIC_INTERRUPT so that it
>>>> could be done in one pass. Currently, the SMP code and the GIC code
>>>> need boot-time info from ACPI_MADT_TYPE_GENERIC_INTERRUPT tables. This
>>>> patch adds parking protocol, and this patch:
>>>>
>>>>  https://lkml.org/lkml/2015/5/1/203
>>>>
>>>> need to get the PMU irq from the same table. I've been thinking of
>>>> something like a single loop through the table in setup.c with
>>>> callouts to registered users of the various bits of data.
>>>
>>> It is not off topic at all, it is bang on topic. I hate the code
>>> as it stands forcing parsing the MADT in multiple places at different
>>> times, that's why I added hooks to set the parking protocol entries
>>> from smp.c and I know that's ugly, I posted it like this on purpose
>>> to get feedback.
>>>
> 
>>>> Those users could register a handler function with something like an
>>>> ACPI_MADT_GIC_DECLARE() macro which would add a handler to a
>>>> special linker section.
>>>>
>>>> I could work up a separate patch if others think it a worthwhile
>>>> thing to do.
>>>
>>> Something simpler ? Like stashing the GICC entries (I know we need
>>> permanent table mappings for that to work unless we create data
>>> structures out of the MADT entries with the fields we are interested in)
>>> for possible CPUS ?
>>
>> Right -- it seems like it would be pretty straightforward to traverse
>> the MADT once, and capture all the GICC subtables in a list of pointers,
>> or even make a copy of the contents of all of them.  Each user of the
>> content would still have to traverse the list, though, unless data is
>> reduced to only those things that are needed and the rest tossed.  Even
>> if only keep the info needed, I've still got a list of entries, one for
>> each CPU, I think, that I would still have to traverse.
>>
>> If I have to register a handler to gather a specific bit of data needed,
>> I'm not understanding how that's any less complicated than what I need
>> to do today -- call acpi_parse_table_madt() with my GICC handler.  Wouldn't
>> both GICC subtable handlers be just about the same code?
>>
>> I'm probably missing something obvious, but I'm not understanding what problem
>> is being solved....
>>
> 
> The difference is that GIC code and SMP code each loop
> through the tables getting the info they need. If they
> registered a handler, there is only one loop through
> the tables regardless of how many handlers get registered.
> Having each loop through as currently done isn't really
> a performance issue (its boot-time only right now), but
> there is duplicated code wrt validating the table entry
> and calling the acpi API to do it. All of that could be
> done in one place instead of duplicating it in different
> places.

Ah, okay.  Thanks, Mark.  I was being a bit myopic and wasn't
thinking through the complete path in the kernel.

Hanjun Guo July 17, 2015, 9:16 a.m. UTC | #7

On 07/17/2015 02:23 AM, Mark Salter wrote:
> On Thu, 2015-07-16 at 12:05 -0600, Al Stone wrote:
>
>> On 07/16/2015 11:12 AM, Lorenzo Pieralisi wrote:
>
>>> On Thu, Jul 16, 2015 at 05:17:11PM +0100, Mark Salter wrote:
>
>>>> On Wed, 2015-07-15 at 12:33 +0100, Lorenzo Pieralisi wrote:
>>>>
>
>>>>> The SBBR and ACPI specifications allow ACPI based systems that do not
>>>>> implement PSCI (eg systems with no EL3) to boot through the ACPI parking
>>>>> protocol specification[1].
>>>>>
>>>>> This patch implements the ACPI parking protocol CPU operations, and adds
>>>>> code that eases parsing the parking protocol data structures to the
>>>>> ARM64 SMP initializion carried out at the same time as cpus enumeration.
>>>>>
>>>>> To wake-up the CPUs from the parked state, this patch implements a
>>>>> wakeup IPI for ARM64 (ie arch_send_wakeup_ipi_mask()) that mirrors the
>>>>> ARM one, so that a specific IPI is sent for wake-up purpose in order
>>>>> to distinguish it from other IPI sources.
>>>>>
>>>>> Given the current ACPI MADT parsing API, the patch implements a glue
>>>>> layer that helps passing MADT GICC data structure from SMP initialization
>>>>
>>>> Somewhat off topic, but this reminds once again, that it might be
>>>> better to generalize the ACPI_MADT_TYPE_GENERIC_INTERRUPT so that it
>>>> could be done in one pass. Currently, the SMP code and the GIC code
>>>> need boot-time info from ACPI_MADT_TYPE_GENERIC_INTERRUPT tables. This
>>>> patch adds parking protocol, and this patch:
>>>>
>>>>   https://lkml.org/lkml/2015/5/1/203
>>>>
>>>> need to get the PMU irq from the same table. I've been thinking of
>>>> something like a single loop through the table in setup.c with
>>>> callouts to registered users of the various bits of data.
>>>
>>> It is not off topic at all, it is bang on topic. I hate the code
>>> as it stands forcing parsing the MADT in multiple places at different
>>> times, that's why I added hooks to set the parking protocol entries
>>> from smp.c and I know that's ugly, I posted it like this on purpose
>>> to get feedback.
>>>
>
>>>> Those users could register a handler function with something like an
>>>> ACPI_MADT_GIC_DECLARE() macro which would add a handler to a
>>>> special linker section.
>>>>
>>>> I could work up a separate patch if others think it a worthwhile
>>>> thing to do.

Yes, please :)

>>>
>>> Something simpler ? Like stashing the GICC entries (I know we need
>>> permanent table mappings for that to work unless we create data
>>> structures out of the MADT entries with the fields we are interested in)
>>> for possible CPUS ?
>>
>> Right -- it seems like it would be pretty straightforward to traverse
>> the MADT once, and capture all the GICC subtables in a list of pointers,
>> or even make a copy of the contents of all of them.  Each user of the
>> content would still have to traverse the list, though, unless data is
>> reduced to only those things that are needed and the rest tossed.  Even
>> if only keep the info needed, I've still got a list of entries, one for
>> each CPU, I think, that I would still have to traverse.
>>
>> If I have to register a handler to gather a specific bit of data needed,
>> I'm not understanding how that's any less complicated than what I need
>> to do today -- call acpi_parse_table_madt() with my GICC handler.  Wouldn't
>> both GICC subtable handlers be just about the same code?
>>
>> I'm probably missing something obvious, but I'm not understanding what problem
>> is being solved....
>>
>
> The difference is that GIC code and SMP code each loop
> through the tables getting the info they need. If they
> registered a handler, there is only one loop through
> the tables regardless of how many handlers get registered.
> Having each loop through as currently done isn't really
> a performance issue (its boot-time only right now), but
> there is duplicated code wrt validating the table entry
> and calling the acpi API to do it. All of that could be
> done in one place instead of duplicating it in different
> places.

The GICv3 code also needs parse GIC structures, and many duplicated
code as I know, please send out the patch when is ready and I will 
review it.

Thanks
Hanjun

Lorenzo Pieralisi July 17, 2015, 10:35 a.m. UTC | #8

[CC'ing Ard in relation to the ioremap issue]

On Thu, Jul 16, 2015 at 06:40:49PM +0100, Mark Salter wrote:
> On Thu, 2015-07-16 at 18:12 +0100, Lorenzo Pieralisi wrote:
> 
> > On Thu, Jul 16, 2015 at 05:17:11PM +0100, Mark Salter wrote:
> 
> > > On Wed, 2015-07-15 at 12:33 +0100, Lorenzo Pieralisi wrote:
> > >
> 
> > > > The SBBR and ACPI specifications allow ACPI based systems that do not
> > > > implement PSCI (eg systems with no EL3) to boot through the ACPI parking
> > > > protocol specification[1].
> > > >
> > > > This patch implements the ACPI parking protocol CPU operations, and adds
> > > > code that eases parsing the parking protocol data structures to the
> > > > ARM64 SMP initializion carried out at the same time as cpus enumeration.
> > > >
> > > > To wake-up the CPUs from the parked state, this patch implements a
> > > > wakeup IPI for ARM64 (ie arch_send_wakeup_ipi_mask()) that mirrors the
> > > > ARM one, so that a specific IPI is sent for wake-up purpose in order
> > > > to distinguish it from other IPI sources.
> > > >
> > > > Given the current ACPI MADT parsing API, the patch implements a glue
> > > > layer that helps passing MADT GICC data structure from SMP initialization
> > >
> > > Somewhat off topic, but this reminds once again, that it might be
> > > better to generalize the ACPI_MADT_TYPE_GENERIC_INTERRUPT so that it
> > > could be done in one pass. Currently, the SMP code and the GIC code
> > > need boot-time info from ACPI_MADT_TYPE_GENERIC_INTERRUPT tables. This
> > > patch adds parking protocol, and this patch:
> > >
> > >  https://lkml.org/lkml/2015/5/1/203
> > >
> > > need to get the PMU irq from the same table. I've been thinking of
> > > something like a single loop through the table in setup.c with
> > > callouts to registered users of the various bits of data.
> >
> > It is not off topic at all, it is bang on topic. I hate the code
> > as it stands forcing parsing the MADT in multiple places at different
> > times, that's why I added hooks to set the parking protocol entries
> > from smp.c and I know that's ugly, I posted it like this on purpose
> > to get feedback.
> >
> 
> > > Those users could register a handler function with something like an
> > > ACPI_MADT_GIC_DECLARE() macro which would add a handler to a
> > > special linker section.
> > >
> > > I could work up a separate patch if others think it a worthwhile
> > > thing to do.
> >
> > Something simpler ? Like stashing the GICC entries (I know we need
> > permanent table mappings for that to work unless we create data
> > structures out of the MADT entries with the fields we are interested in)
> > for possible CPUS ?
> >
> 
> I thought about that too. Permanent mapping doesn't save you from
> having to parse more than once (validating the entries over and
> over). Parsing once and saving the info is better but if done
> generically, could end up wasting memory. I thought the handler
> callout would work best so that the user of the info could decide
> what needed saving and how to save it most efficiently. I'll put
> together a patch and we can go from there to improve it.

Ok, thanks, happy to review it and test it.

> > > > code to the parking protocol implementation somewhat overriding the CPU
> > > > operations interfaces. This to avoid creating a completely trasparent
> > >                                                              ^^^ transparent
> >
> > Ok.
> >
> > [...]
> >
> 
> 
> > > > +static int acpi_parking_protocol_cpu_boot(unsigned int cpu)
> > > > +{
> > > > +     struct cpu_mailbox_entry *cpu_entry = &cpu_mailbox_entries[cpu];
> > > > +     struct parking_protocol_mailbox __iomem *mailbox;
> > > > +     __le32 cpu_id;
> > > > +
> > > > +     /*
> > > > +      * Map mailbox memory with attribute device nGnRE (ie ioremap -
> > > > +      * this deviates from the parking protocol specifications since
> > > > +      * the mailboxes are required to be mapped nGnRnE; the attribute
> > > > +      * discrepancy is harmless insofar as the protocol specification
> > > > +      * is concerned).
> > > > +      * If the mailbox is mistakenly allocated in the linear mapping
> > > > +      * by FW ioremap will fail since the mapping will be prevented
> > > > +      * by the kernel (it clashes with the linear mapping attributes
> > > > +      * specifications).
> > > > +      */
> > > > +     mailbox = ioremap(cpu_entry->mailbox_addr, sizeof(*mailbox));
> > > > +     if (!mailbox)
> > > > +             return -EIO;
> > > > +
> > > > +     cpu_id = readl_relaxed(&mailbox->cpu_id);
> > > > +     /*
> > > > +      * Check if firmware has set-up the mailbox entry properly
> > > > +      * before kickstarting the respective cpu.
> > > > +      */
> > > > +     if (cpu_id != ~0U) {
> > > > +             iounmap(mailbox);
> > > > +             return -ENXIO;
> > > > +     }
> > > > +
> > > > +     /*
> > > > +      * We write the entry point and cpu id as LE regardless of the
> > > > +      * native endianness of the kernel. Therefore, any boot-loaders
> > > > +      * that read this address need to convert this address to the
> > > > +      * Boot-Loader's endianness before jumping.
> > > > +      */
> > > > +     writeq_relaxed(__pa(secondary_entry), &mailbox->entry_point);
> > > > +     writel_relaxed(cpu_entry->gic_cpu_id, &mailbox->cpu_id);
> > > > +
> > > > +     arch_send_wakeup_ipi_mask(cpumask_of(cpu));
> > > > +
> > > > +     iounmap(mailbox);
> > > > +
> > > > +     return 0;
> > > > +}
> > > > +
> > > > +static void acpi_parking_protocol_cpu_postboot(void)
> > > > +{
> > > > +     int cpu = smp_processor_id();
> > > > +     struct cpu_mailbox_entry *cpu_entry = &cpu_mailbox_entries[cpu];
> > > > +     struct parking_protocol_mailbox __iomem *mailbox;
> > > > +     __le64 entry_point;
> > > > +
> > > > +     /*
> > > > +      * Map mailbox memory with attribute device nGnRE (ie ioremap -
> > > > +      * this deviates from the parking protocol specifications since
> > > > +      * the mailboxes are required to be mapped nGnRnE; the attribute
> > >
> > > Where is the nGnRnE requirement? I couldn't find it in the protocol doc.
> > > Just curious.
> >
> > Page 11 (3.5 Mailbox Access Rules), in the Note
> > "...On ARM v8 Systems, the OS must map the memory as Device-nGnRnE".
> >
> 
> That explains it. The document you linked to only has 10 pages and no
> mention of specific attributes. Maybe you have one which is not public
> yet.

No, I do not have a special edition, the one I added to the log
defines what I mentioned, here:

https://acpica.org/related-documents
"Multi-processor Startup for ARM Platforms"

in 3.5 attributes are explicitly mentioned.

> > > > +      * discrepancy is harmless insofar as the protocol specification
> > > > +      * is concerned).
> > > > +      * If the mailbox is mistakenly allocated in the linear mapping
> > > > +      * by FW ioremap will fail since the mapping will be prevented
> > > > +      * by the kernel (it clashes with the linear mapping attributes
> > > > +      * specifications).
> > >
> > > The kernel will only add cached memory regions to linear mapping and
> > > presumably, the FW will mark the mailboxes as uncached. Otherwise, it
> > > is a FW bug. But I suppose we could run into problems with kernels
> > > using 64K pagesize since firmware assumes 4k.
> >
> > Nope, ioremap takes care of that, everything should be fine.
> 
> The mailbox is 4K. If it is next to a cached UEFI region, the kernel may
> have to overlap the mailbox with a cached 64K mapping in order to include
> the adjoining UEFI region in the linear map. Then the ioremap would fail
> because the mailbox is included in the linear mapping.

Ok, I thought you were referring to the ioremap implementation and
the related 4K vs 64k alignment/offset issues.

I think this has been already debated here (from a different perspective
but that's the same problem):

http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/276586.html

That's what you are referring to, right ?

> > Did you give this patch a go ?
> 
> No. I have nothing to try it on. All firmware implementations I have
> access to mark the mailboxes as cached memory. And one requires an
> event rather than irq for wakeup because it tries to satisfy both
> spin-table and parking protocol at the same time. There's a sad amount
> of necessary hackery right now. I'm very keen on psci saving us from
> all that on future platforms. And non-compliant existing firmwares
> will need to get compliant whenever parking protocol gets pulled in
> upstream.

Excellent, it looks promising. Yes, PSCI will save us from all this
stuff, but that's not the reason why I posted this code, I posted
it to enable platforms that can't rely on PSCI at all.

I do not think we should change this patch though, except for possibly
handling the ioremap issue above, somehow.

Lorenzo

Lorenzo Pieralisi Aug. 24, 2015, 5:13 p.m. UTC | #9

Hi Mark,

On Thu, Jul 16, 2015 at 07:23:46PM +0100, Mark Salter wrote:
> On Thu, 2015-07-16 at 12:05 -0600, Al Stone wrote:
> 
> > On 07/16/2015 11:12 AM, Lorenzo Pieralisi wrote:
> 
> > > On Thu, Jul 16, 2015 at 05:17:11PM +0100, Mark Salter wrote:
> 
> > > > On Wed, 2015-07-15 at 12:33 +0100, Lorenzo Pieralisi wrote:
> > > > 
> 
> > > > > The SBBR and ACPI specifications allow ACPI based systems that do not
> > > > > implement PSCI (eg systems with no EL3) to boot through the ACPI parking
> > > > > protocol specification[1].
> > > > > 
> > > > > This patch implements the ACPI parking protocol CPU operations, and adds
> > > > > code that eases parsing the parking protocol data structures to the
> > > > > ARM64 SMP initializion carried out at the same time as cpus enumeration.
> > > > > 
> > > > > To wake-up the CPUs from the parked state, this patch implements a
> > > > > wakeup IPI for ARM64 (ie arch_send_wakeup_ipi_mask()) that mirrors the
> > > > > ARM one, so that a specific IPI is sent for wake-up purpose in order
> > > > > to distinguish it from other IPI sources.
> > > > > 
> > > > > Given the current ACPI MADT parsing API, the patch implements a glue
> > > > > layer that helps passing MADT GICC data structure from SMP initialization
> > > > 
> > > > Somewhat off topic, but this reminds once again, that it might be
> > > > better to generalize the ACPI_MADT_TYPE_GENERIC_INTERRUPT so that it
> > > > could be done in one pass. Currently, the SMP code and the GIC code
> > > > need boot-time info from ACPI_MADT_TYPE_GENERIC_INTERRUPT tables. This
> > > > patch adds parking protocol, and this patch:
> > > > 
> > > >  https://lkml.org/lkml/2015/5/1/203
> > > > 
> > > > need to get the PMU irq from the same table. I've been thinking of
> > > > something like a single loop through the table in setup.c with
> > > > callouts to registered users of the various bits of data.
> > > 
> > > It is not off topic at all, it is bang on topic. I hate the code
> > > as it stands forcing parsing the MADT in multiple places at different
> > > times, that's why I added hooks to set the parking protocol entries
> > > from smp.c and I know that's ugly, I posted it like this on purpose
> > > to get feedback.
> > > 
> 
> > > > Those users could register a handler function with something like an
> > > > ACPI_MADT_GIC_DECLARE() macro which would add a handler to a
> > > > special linker section.
> > > > 
> > > > I could work up a separate patch if others think it a worthwhile
> > > > thing to do.
> > > 
> > > Something simpler ? Like stashing the GICC entries (I know we need
> > > permanent table mappings for that to work unless we create data
> > > structures out of the MADT entries with the fields we are interested in)
> > > for possible CPUS ?
> > 
> > Right -- it seems like it would be pretty straightforward to traverse
> > the MADT once, and capture all the GICC subtables in a list of pointers,
> > or even make a copy of the contents of all of them.  Each user of the
> > content would still have to traverse the list, though, unless data is
> > reduced to only those things that are needed and the rest tossed.  Even
> > if only keep the info needed, I've still got a list of entries, one for
> > each CPU, I think, that I would still have to traverse.
> > 
> > If I have to register a handler to gather a specific bit of data needed,
> > I'm not understanding how that's any less complicated than what I need
> > to do today -- call acpi_parse_table_madt() with my GICC handler.  Wouldn't
> > both GICC subtable handlers be just about the same code?
> > 
> > I'm probably missing something obvious, but I'm not understanding what problem
> > is being solved....
> > 
> 
> The difference is that GIC code and SMP code each loop
> through the tables getting the info they need. If they
> registered a handler, there is only one loop through
> the tables regardless of how many handlers get registered.
> Having each loop through as currently done isn't really
> a performance issue (its boot-time only right now), but
> there is duplicated code wrt validating the table entry
> and calling the acpi API to do it. All of that could be
> done in one place instead of duplicating it in different
> places.

Did you have time to put the patch together ? I would like
to post a v2 for this set soon.

Thank you !
Lorenzo

Mark Salter Aug. 25, 2015, 2:01 p.m. UTC | #10

On Mon, 2015-08-24 at 18:13 +0100, Lorenzo Pieralisi wrote:
> Hi Mark,
> 
> On Thu, Jul 16, 2015 at 07:23:46PM +0100, Mark Salter wrote:
> > On Thu, 2015-07-16 at 12:05 -0600, Al Stone wrote:
> > 
> > > On 07/16/2015 11:12 AM, Lorenzo Pieralisi wrote:
> > 
> > > > On Thu, Jul 16, 2015 at 05:17:11PM +0100, Mark Salter wrote:
> > 
> > > > > On Wed, 2015-07-15 at 12:33 +0100, Lorenzo Pieralisi wrote:
> > > > > 
> > 
> > > > > > The SBBR and ACPI specifications allow ACPI based systems that 
> > > > > > do not
> > > > > > implement PSCI (eg systems with no EL3) to boot through the 
> > > > > > ACPI parking
> > > > > > protocol specification[1].
> > > > > > 
> > > > > > This patch implements the ACPI parking protocol CPU operations, 
> > > > > > and adds
> > > > > > code that eases parsing the parking protocol data structures to 
> > > > > > the
> > > > > > ARM64 SMP initializion carried out at the same time as cpus 
> > > > > > enumeration.
> > > > > > 
> > > > > > To wake-up the CPUs from the parked state, this patch 
> > > > > > implements a
> > > > > > wakeup IPI for ARM64 (ie arch_send_wakeup_ipi_mask()) that 
> > > > > > mirrors the
> > > > > > ARM one, so that a specific IPI is sent for wake-up purpose in 
> > > > > > order
> > > > > > to distinguish it from other IPI sources.
> > > > > > 
> > > > > > Given the current ACPI MADT parsing API, the patch implements a 
> > > > > > glue
> > > > > > layer that helps passing MADT GICC data structure from SMP 
> > > > > > initialization
> > > > > 
> > > > > Somewhat off topic, but this reminds once again, that it might be
> > > > > better to generalize the ACPI_MADT_TYPE_GENERIC_INTERRUPT so that 
> > > > > it
> > > > > could be done in one pass. Currently, the SMP code and the GIC 
> > > > > code
> > > > > need boot-time info from ACPI_MADT_TYPE_GENERIC_INTERRUPT tables. 
> > > > > This
> > > > > patch adds parking protocol, and this patch:
> > > > > 
> > > > >  https://lkml.org/lkml/2015/5/1/203
> > > > > 
> > > > > need to get the PMU irq from the same table. I've been thinking 
> > > > > of
> > > > > something like a single loop through the table in setup.c with
> > > > > callouts to registered users of the various bits of data.
> > > > 
> > > > It is not off topic at all, it is bang on topic. I hate the code
> > > > as it stands forcing parsing the MADT in multiple places at 
> > > > different
> > > > times, that's why I added hooks to set the parking protocol entries
> > > > from smp.c and I know that's ugly, I posted it like this on purpose
> > > > to get feedback.
> > > > 
> > 
> > > > > Those users could register a handler function with something like 
> > > > > an
> > > > > ACPI_MADT_GIC_DECLARE() macro which would add a handler to a
> > > > > special linker section.
> > > > > 
> > > > > I could work up a separate patch if others think it a worthwhile
> > > > > thing to do.
> > > > 
> > > > Something simpler ? Like stashing the GICC entries (I know we need
> > > > permanent table mappings for that to work unless we create data
> > > > structures out of the MADT entries with the fields we are 
> > > > interested in)
> > > > for possible CPUS ?
> > > 
> > > Right -- it seems like it would be pretty straightforward to traverse
> > > the MADT once, and capture all the GICC subtables in a list of 
> > > pointers,
> > > or even make a copy of the contents of all of them.  Each user of the
> > > content would still have to traverse the list, though, unless data is
> > > reduced to only those things that are needed and the rest tossed. 
> > >  Even
> > > if only keep the info needed, I've still got a list of entries, one 
> > > for
> > > each CPU, I think, that I would still have to traverse.
> > > 
> > > If I have to register a handler to gather a specific bit of data 
> > > needed,
> > > I'm not understanding how that's any less complicated than what I 
> > > need
> > > to do today -- call acpi_parse_table_madt() with my GICC handler. 
> > >  Wouldn't
> > > both GICC subtable handlers be just about the same code?
> > > 
> > > I'm probably missing something obvious, but I'm not understanding 
> > > what problem
> > > is being solved....
> > > 
> > 
> > The difference is that GIC code and SMP code each loop
> > through the tables getting the info they need. If they
> > registered a handler, there is only one loop through
> > the tables regardless of how many handlers get registered.
> > Having each loop through as currently done isn't really
> > a performance issue (its boot-time only right now), but
> > there is duplicated code wrt validating the table entry
> > and calling the acpi API to do it. All of that could be
> > done in one place instead of duplicating it in different
> > places.
> 
> Did you have time to put the patch together ? I would like
> to post a v2 for this set soon.
> 
> Thank you !
> Lorenzo

No. The original idea I had wasn't going to work out because the
parsing had to happen at different times during the boot. I'm not
sure if there's a better way than what we're doing now.

Lorenzo Pieralisi Aug. 26, 2015, 4:07 p.m. UTC | #11

Mark,

On Thu, Jul 16, 2015 at 06:40:49PM +0100, Mark Salter wrote:

[...]

> > > The kernel will only add cached memory regions to linear mapping and
> > > presumably, the FW will mark the mailboxes as uncached. Otherwise, it
> > > is a FW bug. But I suppose we could run into problems with kernels
> > > using 64K pagesize since firmware assumes 4k.
> >
> > Nope, ioremap takes care of that, everything should be fine.
> 
> The mailbox is 4K. If it is next to a cached UEFI region, the kernel may
> have to overlap the mailbox with a cached 64K mapping in order to include
> the adjoining UEFI region in the linear map. Then the ioremap would fail
> because the mailbox is included in the linear mapping.

You have to acknowledge that what you describe is a bit of a corner
case (and a silly FW set-up), are you aware of any existing FW set-up
where we actually hit the corner case above ?

I think it is fine to leave code as-is, at least the mailbox
mappings, I will check to see I can improve the MADT parsing,
somehow.

Lorenzo

Mark Salter Aug. 26, 2015, 7:13 p.m. UTC | #12

On Wed, 2015-08-26 at 17:07 +0100, Lorenzo Pieralisi wrote:
> Mark,
> 
> On Thu, Jul 16, 2015 at 06:40:49PM +0100, Mark Salter wrote:
> 
> [...]
> 
> > > > The kernel will only add cached memory regions to linear mapping 
> > > > and
> > > > presumably, the FW will mark the mailboxes as uncached. Otherwise, 
> > > > it
> > > > is a FW bug. But I suppose we could run into problems with kernels
> > > > using 64K pagesize since firmware assumes 4k.
> > > 
> > > Nope, ioremap takes care of that, everything should be fine.
> > 
> > The mailbox is 4K. If it is next to a cached UEFI region, the kernel 
> > may
> > have to overlap the mailbox with a cached 64K mapping in order to 
> > include
> > the adjoining UEFI region in the linear map. Then the ioremap would 
> > fail
> > because the mailbox is included in the linear mapping.
> 
> You have to acknowledge that what you describe is a bit of a corner
> case (and a silly FW set-up), are you aware of any existing FW set-up
> where we actually hit the corner case above ?

Oh sure. It could definitely be a corner case. I'm not aware of any
existing firmware supporting the protocol implemented by your patch.
So time will tell how well firmware vendors which do implement it
avoid the corner case problem. I am aware of some existing firmwares
which put the mailbox area in RAM which is also in the kernel linear
mapping as cached memory. But those wouldn't work with your patch for
more reasons than that.

> 
> I think it is fine to leave code as-is, at least the mailbox
> mappings, I will check to see I can improve the MADT parsing,
> somehow.

I'm not strenuously opposed to leaving it as-is. But we need to
acknowledge that if the 64K pagesize issue isn't covered by the
protocol spec, we're relying on firmware vendors to be diligent
not to break 64K pagesize kernels.

Lorenzo Pieralisi Aug. 27, 2015, 9:50 a.m. UTC | #13

On Wed, Aug 26, 2015 at 08:13:22PM +0100, Mark Salter wrote:
> On Wed, 2015-08-26 at 17:07 +0100, Lorenzo Pieralisi wrote:
> > Mark,
> > 
> > On Thu, Jul 16, 2015 at 06:40:49PM +0100, Mark Salter wrote:
> > 
> > [...]
> > 
> > > > > The kernel will only add cached memory regions to linear mapping 
> > > > > and
> > > > > presumably, the FW will mark the mailboxes as uncached. Otherwise, 
> > > > > it
> > > > > is a FW bug. But I suppose we could run into problems with kernels
> > > > > using 64K pagesize since firmware assumes 4k.
> > > > 
> > > > Nope, ioremap takes care of that, everything should be fine.
> > > 
> > > The mailbox is 4K. If it is next to a cached UEFI region, the kernel 
> > > may
> > > have to overlap the mailbox with a cached 64K mapping in order to 
> > > include
> > > the adjoining UEFI region in the linear map. Then the ioremap would 
> > > fail
> > > because the mailbox is included in the linear mapping.
> > 
> > You have to acknowledge that what you describe is a bit of a corner
> > case (and a silly FW set-up), are you aware of any existing FW set-up
> > where we actually hit the corner case above ?
> 
> Oh sure. It could definitely be a corner case. I'm not aware of any
> existing firmware supporting the protocol implemented by your patch.

So basically there is no existing firmware implementing the ARM64 ACPI
parking protocol.

> So time will tell how well firmware vendors which do implement it
> avoid the corner case problem. I am aware of some existing firmwares
> which put the mailbox area in RAM which is also in the kernel linear
> mapping as cached memory. But those wouldn't work with your patch for
> more reasons than that.

They will have to fix them, that's the reason why I am pushing this
patch upstream, and not a version that is massaged to make their
non-compliant firmware compliant.

> > I think it is fine to leave code as-is, at least the mailbox
> > mappings, I will check to see I can improve the MADT parsing,
> > somehow.
> 
> I'm not strenuously opposed to leaving it as-is. But we need to
> acknowledge that if the 64K pagesize issue isn't covered by the
> protocol spec, we're relying on firmware vendors to be diligent
> not to break 64K pagesize kernels.

Should we try to amend the protocol spec ? I will raise the point,
I do not see how we can prevent this issue otherwise, other than
relying on vendors to be aware of the issue and prevent it.

Thanks,
Lorenzo

Mark Salter Aug. 27, 2015, 1:27 p.m. UTC | #14

On Thu, 2015-08-27 at 10:50 +0100, Lorenzo Pieralisi wrote:
> On Wed, Aug 26, 2015 at 08:13:22PM +0100, Mark Salter wrote:
> > On Wed, 2015-08-26 at 17:07 +0100, Lorenzo Pieralisi wrote:
> > > Mark,
> > > 
> > > On Thu, Jul 16, 2015 at 06:40:49PM +0100, Mark Salter wrote:
> > > 
> > > [...]
> > > 
> > > > > > The kernel will only add cached memory regions to linear 
> > > > > > mapping 
> > > > > > and
> > > > > > presumably, the FW will mark the mailboxes as uncached. 
> > > > > > Otherwise, 
> > > > > > it
> > > > > > is a FW bug. But I suppose we could run into problems with 
> > > > > > kernels
> > > > > > using 64K pagesize since firmware assumes 4k.
> > > > > 
> > > > > Nope, ioremap takes care of that, everything should be fine.
> > > > 
> > > > The mailbox is 4K. If it is next to a cached UEFI region, the 
> > > > kernel 
> > > > may
> > > > have to overlap the mailbox with a cached 64K mapping in order to 
> > > > include
> > > > the adjoining UEFI region in the linear map. Then the ioremap would 
> > > > fail
> > > > because the mailbox is included in the linear mapping.
> > > 
> > > You have to acknowledge that what you describe is a bit of a corner
> > > case (and a silly FW set-up), are you aware of any existing FW set-up
> > > where we actually hit the corner case above ?
> > 
> > Oh sure. It could definitely be a corner case. I'm not aware of any
> > existing firmware supporting the protocol implemented by your patch.
> 
> So basically there is no existing firmware implementing the ARM64 ACPI
> parking protocol.

None that I am aware of.

> 
> > So time will tell how well firmware vendors which do implement it
> > avoid the corner case problem. I am aware of some existing firmwares
> > which put the mailbox area in RAM which is also in the kernel linear
> > mapping as cached memory. But those wouldn't work with your patch for
> > more reasons than that.
> 
> They will have to fix them, that's the reason why I am pushing this
> patch upstream, and not a version that is massaged to make their
> non-compliant firmware compliant.

Absolutely agree.

> 
> > > I think it is fine to leave code as-is, at least the mailbox
> > > mappings, I will check to see I can improve the MADT parsing,
> > > somehow.
> > 
> > I'm not strenuously opposed to leaving it as-is. But we need to
> > acknowledge that if the 64K pagesize issue isn't covered by the
> > protocol spec, we're relying on firmware vendors to be diligent
> > not to break 64K pagesize kernels.
> 
> Should we try to amend the protocol spec ? I will raise the point,
> I do not see how we can prevent this issue otherwise, other than
> relying on vendors to be aware of the issue and prevent it.

Yes, please do.

> 
> Thanks,
> Lorenzo

Lorenzo Pieralisi Aug. 28, 2015, 10:23 a.m. UTC | #15

Mark,

On Thu, Jul 16, 2015 at 06:40:49PM +0100, Mark Salter wrote:

[...]

> On Thu, 2015-07-16 at 18:12 +0100, Lorenzo Pieralisi wrote:
> > > The kernel will only add cached memory regions to linear mapping and
> > > presumably, the FW will mark the mailboxes as uncached. Otherwise, it
> > > is a FW bug. But I suppose we could run into problems with kernels
> > > using 64K pagesize since firmware assumes 4k.
> >
> > Nope, ioremap takes care of that, everything should be fine.
> 
> The mailbox is 4K. If it is next to a cached UEFI region, the kernel may
> have to overlap the mailbox with a cached 64K mapping in order to include
> the adjoining UEFI region in the linear map. Then the ioremap would fail
> because the mailbox is included in the linear mapping.

So that I understand: are you referring to memrange_efi_to_native()
in arch/arm64/kernel/efi.c ? Is it safe to round up (and add it to
the memblock layer) the memory region size to PAGE_SIZE without checking
attributes of overlapping (within PAGE_SIZE) UEFI regions ?

Thanks,
Lorenzo

Mark Salter Aug. 28, 2015, 2:29 p.m. UTC | #16

On Fri, 2015-08-28 at 11:23 +0100, Lorenzo Pieralisi wrote:
> Mark,
> 
> On Thu, Jul 16, 2015 at 06:40:49PM +0100, Mark Salter wrote:
> 
> [...]
> 
> > On Thu, 2015-07-16 at 18:12 +0100, Lorenzo Pieralisi wrote:
> > > > The kernel will only add cached memory regions to linear mapping 
> > > > and
> > > > presumably, the FW will mark the mailboxes as uncached. Otherwise, 
> > > > it
> > > > is a FW bug. But I suppose we could run into problems with kernels
> > > > using 64K pagesize since firmware assumes 4k.
> > > 
> > > Nope, ioremap takes care of that, everything should be fine.
> > 
> > The mailbox is 4K. If it is next to a cached UEFI region, the kernel 
> > may
> > have to overlap the mailbox with a cached 64K mapping in order to 
> > include
> > the adjoining UEFI region in the linear map. Then the ioremap would 
> > fail
> > because the mailbox is included in the linear mapping.
> 
> So that I understand: are you referring to memrange_efi_to_native()
> in arch/arm64/kernel/efi.c ? Is it safe to round up (and add it to
> the memblock layer) the memory region size to PAGE_SIZE without checking
> attributes of overlapping (within PAGE_SIZE) UEFI regions ?

The problem is that nothing in the UEFI spec or parking protocol spec
prevents firmware from placing a 4K parking protocol mailbox area in
the same 64K page as normal cached memory which winds up mapped in
the kernel linear mapping. The kernel might be able to work around
that by not putting the 64K page in the linear map. There's nothing
the kernel could do if the mailbox is in same 64K page with UEFI runtime
memory which would use a cached mapping.

Lorenzo Pieralisi Aug. 28, 2015, 3:32 p.m. UTC | #17

[Cc'ing Leif and Ard]

On Fri, Aug 28, 2015 at 03:29:46PM +0100, Mark Salter wrote:
> On Fri, 2015-08-28 at 11:23 +0100, Lorenzo Pieralisi wrote:
> > Mark,
> > 
> > On Thu, Jul 16, 2015 at 06:40:49PM +0100, Mark Salter wrote:
> > 
> > [...]
> > 
> > > On Thu, 2015-07-16 at 18:12 +0100, Lorenzo Pieralisi wrote:
> > > > > The kernel will only add cached memory regions to linear mapping 
> > > > > and
> > > > > presumably, the FW will mark the mailboxes as uncached. Otherwise, 
> > > > > it
> > > > > is a FW bug. But I suppose we could run into problems with kernels
> > > > > using 64K pagesize since firmware assumes 4k.
> > > > 
> > > > Nope, ioremap takes care of that, everything should be fine.
> > > 
> > > The mailbox is 4K. If it is next to a cached UEFI region, the kernel 
> > > may
> > > have to overlap the mailbox with a cached 64K mapping in order to 
> > > include
> > > the adjoining UEFI region in the linear map. Then the ioremap would 
> > > fail
> > > because the mailbox is included in the linear mapping.
> > 
> > So that I understand: are you referring to memrange_efi_to_native()
> > in arch/arm64/kernel/efi.c ? Is it safe to round up (and add it to
> > the memblock layer) the memory region size to PAGE_SIZE without checking
> > attributes of overlapping (within PAGE_SIZE) UEFI regions ?
> 
> The problem is that nothing in the UEFI spec or parking protocol spec
> prevents firmware from placing a 4K parking protocol mailbox area in
> the same 64K page as normal cached memory which winds up mapped in
> the kernel linear mapping. The kernel might be able to work around
> that by not putting the 64K page in the linear map. There's nothing
> the kernel could do if the mailbox is in same 64K page with UEFI runtime
> memory which would use a cached mapping.

Ok, I failed to explain myself. Let's imagine that we have a, say, 64k
aligned memory region (EFI_MEMORY_WB - size 16k) that lives in the same
64K memory frame as a device's registers memory space.
The code I pointed at above creates a 64K memory frame out of the 16K
region and adds it to the memblock so that it ends up in the kernel linear
mapping which ends up mapping the device registers too with cacheable
attributes and that's not correct.

I am not sure what's the best way to solve that, probably we should
amend the UEFI specs to enforce uniform 64K memory region attributes,
comments welcome.

Thanks,
Lorenzo

Mark Salter Aug. 28, 2015, 3:56 p.m. UTC | #18

On Fri, 2015-08-28 at 16:32 +0100, Lorenzo Pieralisi wrote:
> [Cc'ing Leif and Ard]
> 
> On Fri, Aug 28, 2015 at 03:29:46PM +0100, Mark Salter wrote:
> > On Fri, 2015-08-28 at 11:23 +0100, Lorenzo Pieralisi wrote:
> > > Mark,
> > > 
> > > On Thu, Jul 16, 2015 at 06:40:49PM +0100, Mark Salter wrote:
> > > 
> > > [...]
> > > 
> > > > On Thu, 2015-07-16 at 18:12 +0100, Lorenzo Pieralisi wrote:
> > > > > > The kernel will only add cached memory regions to linear 
> > > > > > mapping 
> > > > > > and
> > > > > > presumably, the FW will mark the mailboxes as uncached. 
> > > > > > Otherwise, 
> > > > > > it
> > > > > > is a FW bug. But I suppose we could run into problems with 
> > > > > > kernels
> > > > > > using 64K pagesize since firmware assumes 4k.
> > > > > 
> > > > > Nope, ioremap takes care of that, everything should be fine.
> > > > 
> > > > The mailbox is 4K. If it is next to a cached UEFI region, the 
> > > > kernel 
> > > > may
> > > > have to overlap the mailbox with a cached 64K mapping in order to 
> > > > include
> > > > the adjoining UEFI region in the linear map. Then the ioremap would 
> > > > fail
> > > > because the mailbox is included in the linear mapping.
> > > 
> > > So that I understand: are you referring to memrange_efi_to_native()
> > > in arch/arm64/kernel/efi.c ? Is it safe to round up (and add it to
> > > the memblock layer) the memory region size to PAGE_SIZE without 
> > > checking
> > > attributes of overlapping (within PAGE_SIZE) UEFI regions ?
> > 
> > The problem is that nothing in the UEFI spec or parking protocol spec
> > prevents firmware from placing a 4K parking protocol mailbox area in
> > the same 64K page as normal cached memory which winds up mapped in
> > the kernel linear mapping. The kernel might be able to work around
> > that by not putting the 64K page in the linear map. There's nothing
> > the kernel could do if the mailbox is in same 64K page with UEFI 
> > runtime
> > memory which would use a cached mapping.
> 
> Ok, I failed to explain myself. Let's imagine that we have a, say, 64k
> aligned memory region (EFI_MEMORY_WB - size 16k) that lives in the same
> 64K memory frame as a device's registers memory space.
> The code I pointed at above creates a 64K memory frame out of the 16K
> region and adds it to the memblock so that it ends up in the kernel 
> linear
> mapping which ends up mapping the device registers too with cacheable
> attributes and that's not correct.
> 
> I am not sure what's the best way to solve that, probably we should
> amend the UEFI specs to enforce uniform 64K memory region attributes,
> comments welcome.

OIC. Well it is a potential problem. Hardware designers do some whacky
things, but I'd like to think I/O regions and cached mem regions bordering
on such a small power of two boundary is not one of them.

> 
> Thanks,
> Lorenzo

Lorenzo Pieralisi Aug. 28, 2015, 4:10 p.m. UTC | #19

On Fri, Aug 28, 2015 at 04:56:36PM +0100, Mark Salter wrote:
> On Fri, 2015-08-28 at 16:32 +0100, Lorenzo Pieralisi wrote:
> > [Cc'ing Leif and Ard]
> > 
> > On Fri, Aug 28, 2015 at 03:29:46PM +0100, Mark Salter wrote:
> > > On Fri, 2015-08-28 at 11:23 +0100, Lorenzo Pieralisi wrote:
> > > > Mark,
> > > > 
> > > > On Thu, Jul 16, 2015 at 06:40:49PM +0100, Mark Salter wrote:
> > > > 
> > > > [...]
> > > > 
> > > > > On Thu, 2015-07-16 at 18:12 +0100, Lorenzo Pieralisi wrote:
> > > > > > > The kernel will only add cached memory regions to linear 
> > > > > > > mapping 
> > > > > > > and
> > > > > > > presumably, the FW will mark the mailboxes as uncached. 
> > > > > > > Otherwise, 
> > > > > > > it
> > > > > > > is a FW bug. But I suppose we could run into problems with 
> > > > > > > kernels
> > > > > > > using 64K pagesize since firmware assumes 4k.
> > > > > > 
> > > > > > Nope, ioremap takes care of that, everything should be fine.
> > > > > 
> > > > > The mailbox is 4K. If it is next to a cached UEFI region, the 
> > > > > kernel 
> > > > > may
> > > > > have to overlap the mailbox with a cached 64K mapping in order to 
> > > > > include
> > > > > the adjoining UEFI region in the linear map. Then the ioremap would 
> > > > > fail
> > > > > because the mailbox is included in the linear mapping.
> > > > 
> > > > So that I understand: are you referring to memrange_efi_to_native()
> > > > in arch/arm64/kernel/efi.c ? Is it safe to round up (and add it to
> > > > the memblock layer) the memory region size to PAGE_SIZE without 
> > > > checking
> > > > attributes of overlapping (within PAGE_SIZE) UEFI regions ?
> > > 
> > > The problem is that nothing in the UEFI spec or parking protocol spec
> > > prevents firmware from placing a 4K parking protocol mailbox area in
> > > the same 64K page as normal cached memory which winds up mapped in
> > > the kernel linear mapping. The kernel might be able to work around
> > > that by not putting the 64K page in the linear map. There's nothing
> > > the kernel could do if the mailbox is in same 64K page with UEFI 
> > > runtime
> > > memory which would use a cached mapping.
> > 
> > Ok, I failed to explain myself. Let's imagine that we have a, say, 64k
> > aligned memory region (EFI_MEMORY_WB - size 16k) that lives in the same
> > 64K memory frame as a device's registers memory space.
> > The code I pointed at above creates a 64K memory frame out of the 16K
> > region and adds it to the memblock so that it ends up in the kernel 
> > linear
> > mapping which ends up mapping the device registers too with cacheable
> > attributes and that's not correct.
> > 
> > I am not sure what's the best way to solve that, probably we should
> > amend the UEFI specs to enforce uniform 64K memory region attributes,
> > comments welcome.
> 
> OIC. Well it is a potential problem. Hardware designers do some whacky
> things, but I'd like to think I/O regions and cached mem regions bordering
> on such a small power of two boundary is not one of them.

Yes, I agree with you as far as I/O is concerned, I just wanted
to understand if the issue you mentioned is a consequence of how
the code above performs the rounding (it could discard memory regions
having mixed attributes within PAGE_SIZE instead, if we enforce it in
the specs).

Unfortunately for the mailboxes, since it is RAM, it might well happen.
If we change the UEFI specs to enforce uniform attributes on 64K regions
the problem you raised is solved at UEFI spec level instead of changing
the ACPI parking protocol specs.

I will raise the point, apart from that I will send a v2 soon with
the issue you raised addressed in the specs.

Thanks,
Lorenzo

Leif Lindholm Aug. 28, 2015, 4:11 p.m. UTC | #20

On Fri, Aug 28, 2015 at 11:56:36AM -0400, Mark Salter wrote:
> > > > So that I understand: are you referring to memrange_efi_to_native()
> > > > in arch/arm64/kernel/efi.c ? Is it safe to round up (and add it to
> > > > the memblock layer) the memory region size to PAGE_SIZE without 
> > > > checking
> > > > attributes of overlapping (within PAGE_SIZE) UEFI regions ?
> > > 
> > > The problem is that nothing in the UEFI spec or parking protocol spec
> > > prevents firmware from placing a 4K parking protocol mailbox area in
> > > the same 64K page as normal cached memory which winds up mapped in
> > > the kernel linear mapping. The kernel might be able to work around
> > > that by not putting the 64K page in the linear map. There's nothing
> > > the kernel could do if the mailbox is in same 64K page with UEFI 
> > > runtime memory which would use a cached mapping.

Indeed.

> > Ok, I failed to explain myself. Let's imagine that we have a, say, 64k
> > aligned memory region (EFI_MEMORY_WB - size 16k) that lives in the same
> > 64K memory frame as a device's registers memory space.
> > The code I pointed at above creates a 64K memory frame out of the 16K
> > region and adds it to the memblock so that it ends up in the kernel 
> > linear
> > mapping which ends up mapping the device registers too with cacheable
> > attributes and that's not correct.
> > 
> > I am not sure what's the best way to solve that, probably we should
> > amend the UEFI specs to enforce uniform 64K memory region attributes,
> > comments welcome.

Agreed.

> OIC. Well it is a potential problem. Hardware designers do some whacky
> things, but I'd like to think I/O regions and cached mem regions bordering
> on such a small power of two boundary is not one of them.

Sure. But the same applies to memory regions _mapped_ as Device.
Which is (apparently) necessary for the spin-tables used by the
parking protocol.

So the spec should ban use of incompatible memory attributes between
any regions kept around at runtime, as opposed to specifically the
regions used for runtime services (the current wording in the spec).

/
    Leif

ARM64: kernel: implement ACPI parking protocol

Commit Message

Comments

Patch