From patchwork Wed Jan 16 17:57:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10766639 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8DCF913BF for ; Wed, 16 Jan 2019 18:00:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7EDFE2F0EC for ; Wed, 16 Jan 2019 18:00:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 72E1B2F103; Wed, 16 Jan 2019 18:00:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 256E72F0EC for ; Wed, 16 Jan 2019 18:00:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727967AbfAPR7j (ORCPT ); Wed, 16 Jan 2019 12:59:39 -0500 Received: from mga04.intel.com ([192.55.52.120]:8967 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727790AbfAPR7h (ORCPT ); Wed, 16 Jan 2019 12:59:37 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Jan 2019 09:59:36 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,487,1539673200"; d="scan'208";a="117227764" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by fmsmga008.fm.intel.com with ESMTP; 16 Jan 2019 09:59:35 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv4 01/13] acpi: Create subtable parsing infrastructure Date: Wed, 16 Jan 2019 10:57:52 -0700 Message-Id: <20190116175804.30196-2-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190116175804.30196-1-keith.busch@intel.com> References: <20190116175804.30196-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Parsing entries in an ACPI table had assumed a generic header structure. There is no standard ACPI header, though, so less common layouts with different field sizes required custom parsers to go through their subtable entry list. Create the infrastructure for adding different table types so parsing the entries array may be more reused for all ACPI system tables and the common code doesn't need to be duplicated. Reviewed-by: Rafael J. Wysocki Cc: Dan Williams Signed-off-by: Keith Busch --- arch/arm64/kernel/acpi_numa.c | 2 +- arch/arm64/kernel/smp.c | 4 +- arch/ia64/kernel/acpi.c | 12 ++--- arch/x86/kernel/acpi/boot.c | 36 +++++++------- drivers/acpi/numa.c | 16 +++---- drivers/acpi/scan.c | 4 +- drivers/acpi/tables.c | 67 +++++++++++++++++++++++---- drivers/irqchip/irq-gic-v2m.c | 2 +- drivers/irqchip/irq-gic-v3-its-pci-msi.c | 2 +- drivers/irqchip/irq-gic-v3-its-platform-msi.c | 2 +- drivers/irqchip/irq-gic-v3-its.c | 6 +-- drivers/irqchip/irq-gic-v3.c | 10 ++-- drivers/irqchip/irq-gic.c | 4 +- drivers/mailbox/pcc.c | 2 +- include/linux/acpi.h | 5 +- 15 files changed, 112 insertions(+), 62 deletions(-) diff --git a/arch/arm64/kernel/acpi_numa.c b/arch/arm64/kernel/acpi_numa.c index eac1d0cc595c..7ff800045434 100644 --- a/arch/arm64/kernel/acpi_numa.c +++ b/arch/arm64/kernel/acpi_numa.c @@ -45,7 +45,7 @@ static inline int get_cpu_for_acpi_id(u32 uid) return -EINVAL; } -static int __init acpi_parse_gicc_pxm(struct acpi_subtable_header *header, +static int __init acpi_parse_gicc_pxm(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_srat_gicc_affinity *pa; diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 1598d6f7200a..e6a148604dcc 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -553,7 +553,7 @@ acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor) } static int __init -acpi_parse_gic_cpu_interface(struct acpi_subtable_header *header, +acpi_parse_gic_cpu_interface(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_generic_interrupt *processor; @@ -562,7 +562,7 @@ acpi_parse_gic_cpu_interface(struct acpi_subtable_header *header, if (BAD_MADT_GICC_ENTRY(processor, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); acpi_map_gic_cpu_interface(processor); diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c index 41eb281709da..3973d2c2a9b0 100644 --- a/arch/ia64/kernel/acpi.c +++ b/arch/ia64/kernel/acpi.c @@ -177,7 +177,7 @@ struct acpi_table_madt *acpi_madt __initdata; static u8 has_8259; static int __init -acpi_parse_lapic_addr_ovr(struct acpi_subtable_header * header, +acpi_parse_lapic_addr_ovr(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_local_apic_override *lapic; @@ -216,7 +216,7 @@ acpi_parse_lsapic(struct acpi_subtable_header * header, const unsigned long end) } static int __init -acpi_parse_lapic_nmi(struct acpi_subtable_header * header, const unsigned long end) +acpi_parse_lapic_nmi(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_local_apic_nmi *lacpi_nmi; @@ -230,7 +230,7 @@ acpi_parse_lapic_nmi(struct acpi_subtable_header * header, const unsigned long e } static int __init -acpi_parse_iosapic(struct acpi_subtable_header * header, const unsigned long end) +acpi_parse_iosapic(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_io_sapic *iosapic; @@ -245,7 +245,7 @@ acpi_parse_iosapic(struct acpi_subtable_header * header, const unsigned long end static unsigned int __initdata acpi_madt_rev; static int __init -acpi_parse_plat_int_src(struct acpi_subtable_header * header, +acpi_parse_plat_int_src(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_interrupt_source *plintsrc; @@ -329,7 +329,7 @@ unsigned int get_cpei_target_cpu(void) } static int __init -acpi_parse_int_src_ovr(struct acpi_subtable_header * header, +acpi_parse_int_src_ovr(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_interrupt_override *p; @@ -350,7 +350,7 @@ acpi_parse_int_src_ovr(struct acpi_subtable_header * header, } static int __init -acpi_parse_nmi_src(struct acpi_subtable_header * header, const unsigned long end) +acpi_parse_nmi_src(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_nmi_source *nmi_src; diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 2624de16cd7a..b694a32f95d4 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -197,7 +197,7 @@ static int acpi_register_lapic(int id, u32 acpiid, u8 enabled) } static int __init -acpi_parse_x2apic(struct acpi_subtable_header *header, const unsigned long end) +acpi_parse_x2apic(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_local_x2apic *processor = NULL; #ifdef CONFIG_X86_X2APIC @@ -210,7 +210,7 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, const unsigned long end) if (BAD_MADT_ENTRY(processor, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); #ifdef CONFIG_X86_X2APIC apic_id = processor->local_apic_id; @@ -242,7 +242,7 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, const unsigned long end) } static int __init -acpi_parse_lapic(struct acpi_subtable_header * header, const unsigned long end) +acpi_parse_lapic(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_local_apic *processor = NULL; @@ -251,7 +251,7 @@ acpi_parse_lapic(struct acpi_subtable_header * header, const unsigned long end) if (BAD_MADT_ENTRY(processor, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); /* Ignore invalid ID */ if (processor->id == 0xff) @@ -272,7 +272,7 @@ acpi_parse_lapic(struct acpi_subtable_header * header, const unsigned long end) } static int __init -acpi_parse_sapic(struct acpi_subtable_header *header, const unsigned long end) +acpi_parse_sapic(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_local_sapic *processor = NULL; @@ -281,7 +281,7 @@ acpi_parse_sapic(struct acpi_subtable_header *header, const unsigned long end) if (BAD_MADT_ENTRY(processor, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); acpi_register_lapic((processor->id << 8) | processor->eid,/* APIC ID */ processor->processor_id, /* ACPI ID */ @@ -291,7 +291,7 @@ acpi_parse_sapic(struct acpi_subtable_header *header, const unsigned long end) } static int __init -acpi_parse_lapic_addr_ovr(struct acpi_subtable_header * header, +acpi_parse_lapic_addr_ovr(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_local_apic_override *lapic_addr_ovr = NULL; @@ -301,7 +301,7 @@ acpi_parse_lapic_addr_ovr(struct acpi_subtable_header * header, if (BAD_MADT_ENTRY(lapic_addr_ovr, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); acpi_lapic_addr = lapic_addr_ovr->address; @@ -309,7 +309,7 @@ acpi_parse_lapic_addr_ovr(struct acpi_subtable_header * header, } static int __init -acpi_parse_x2apic_nmi(struct acpi_subtable_header *header, +acpi_parse_x2apic_nmi(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_local_x2apic_nmi *x2apic_nmi = NULL; @@ -319,7 +319,7 @@ acpi_parse_x2apic_nmi(struct acpi_subtable_header *header, if (BAD_MADT_ENTRY(x2apic_nmi, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); if (x2apic_nmi->lint != 1) printk(KERN_WARNING PREFIX "NMI not connected to LINT 1!\n"); @@ -328,7 +328,7 @@ acpi_parse_x2apic_nmi(struct acpi_subtable_header *header, } static int __init -acpi_parse_lapic_nmi(struct acpi_subtable_header * header, const unsigned long end) +acpi_parse_lapic_nmi(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_local_apic_nmi *lapic_nmi = NULL; @@ -337,7 +337,7 @@ acpi_parse_lapic_nmi(struct acpi_subtable_header * header, const unsigned long e if (BAD_MADT_ENTRY(lapic_nmi, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); if (lapic_nmi->lint != 1) printk(KERN_WARNING PREFIX "NMI not connected to LINT 1!\n"); @@ -449,7 +449,7 @@ static int __init mp_register_ioapic_irq(u8 bus_irq, u8 polarity, } static int __init -acpi_parse_ioapic(struct acpi_subtable_header * header, const unsigned long end) +acpi_parse_ioapic(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_io_apic *ioapic = NULL; struct ioapic_domain_cfg cfg = { @@ -462,7 +462,7 @@ acpi_parse_ioapic(struct acpi_subtable_header * header, const unsigned long end) if (BAD_MADT_ENTRY(ioapic, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); /* Statically assign IRQ numbers for IOAPICs hosting legacy IRQs */ if (ioapic->global_irq_base < nr_legacy_irqs()) @@ -508,7 +508,7 @@ static void __init acpi_sci_ioapic_setup(u8 bus_irq, u16 polarity, u16 trigger, } static int __init -acpi_parse_int_src_ovr(struct acpi_subtable_header * header, +acpi_parse_int_src_ovr(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_interrupt_override *intsrc = NULL; @@ -518,7 +518,7 @@ acpi_parse_int_src_ovr(struct acpi_subtable_header * header, if (BAD_MADT_ENTRY(intsrc, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); if (intsrc->source_irq == acpi_gbl_FADT.sci_interrupt) { acpi_sci_ioapic_setup(intsrc->source_irq, @@ -550,7 +550,7 @@ acpi_parse_int_src_ovr(struct acpi_subtable_header * header, } static int __init -acpi_parse_nmi_src(struct acpi_subtable_header * header, const unsigned long end) +acpi_parse_nmi_src(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_nmi_source *nmi_src = NULL; @@ -559,7 +559,7 @@ acpi_parse_nmi_src(struct acpi_subtable_header * header, const unsigned long end if (BAD_MADT_ENTRY(nmi_src, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); /* TBD: Support nimsrc entries? */ diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c index 7bbbf8256a41..d6433b14864c 100644 --- a/drivers/acpi/numa.c +++ b/drivers/acpi/numa.c @@ -338,7 +338,7 @@ acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa) } static int __init -acpi_parse_x2apic_affinity(struct acpi_subtable_header *header, +acpi_parse_x2apic_affinity(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_srat_x2apic_cpu_affinity *processor_affinity; @@ -347,7 +347,7 @@ acpi_parse_x2apic_affinity(struct acpi_subtable_header *header, if (!processor_affinity) return -EINVAL; - acpi_table_print_srat_entry(header); + acpi_table_print_srat_entry(&header->common); /* let architecture-dependent part to do it */ acpi_numa_x2apic_affinity_init(processor_affinity); @@ -356,7 +356,7 @@ acpi_parse_x2apic_affinity(struct acpi_subtable_header *header, } static int __init -acpi_parse_processor_affinity(struct acpi_subtable_header *header, +acpi_parse_processor_affinity(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_srat_cpu_affinity *processor_affinity; @@ -365,7 +365,7 @@ acpi_parse_processor_affinity(struct acpi_subtable_header *header, if (!processor_affinity) return -EINVAL; - acpi_table_print_srat_entry(header); + acpi_table_print_srat_entry(&header->common); /* let architecture-dependent part to do it */ acpi_numa_processor_affinity_init(processor_affinity); @@ -374,7 +374,7 @@ acpi_parse_processor_affinity(struct acpi_subtable_header *header, } static int __init -acpi_parse_gicc_affinity(struct acpi_subtable_header *header, +acpi_parse_gicc_affinity(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_srat_gicc_affinity *processor_affinity; @@ -383,7 +383,7 @@ acpi_parse_gicc_affinity(struct acpi_subtable_header *header, if (!processor_affinity) return -EINVAL; - acpi_table_print_srat_entry(header); + acpi_table_print_srat_entry(&header->common); /* let architecture-dependent part to do it */ acpi_numa_gicc_affinity_init(processor_affinity); @@ -394,7 +394,7 @@ acpi_parse_gicc_affinity(struct acpi_subtable_header *header, static int __initdata parsed_numa_memblks; static int __init -acpi_parse_memory_affinity(struct acpi_subtable_header * header, +acpi_parse_memory_affinity(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_srat_mem_affinity *memory_affinity; @@ -403,7 +403,7 @@ acpi_parse_memory_affinity(struct acpi_subtable_header * header, if (!memory_affinity) return -EINVAL; - acpi_table_print_srat_entry(header); + acpi_table_print_srat_entry(&header->common); /* let architecture-dependent part to do it */ if (!acpi_numa_memory_affinity_init(memory_affinity)) diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c index 5efd4219f112..a894ce3556f2 100644 --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -2240,10 +2240,10 @@ static struct acpi_probe_entry *ape; static int acpi_probe_count; static DEFINE_MUTEX(acpi_probe_mutex); -static int __init acpi_match_madt(struct acpi_subtable_header *header, +static int __init acpi_match_madt(union acpi_subtable_headers *header, const unsigned long end) { - if (!ape->subtable_valid || ape->subtable_valid(header, ape)) + if (!ape->subtable_valid || ape->subtable_valid(&header->common, ape)) if (!ape->probe_subtbl(header, end)) acpi_probe_count++; diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c index 48eabb6c2d4f..967e1168becf 100644 --- a/drivers/acpi/tables.c +++ b/drivers/acpi/tables.c @@ -49,6 +49,15 @@ static struct acpi_table_desc initial_tables[ACPI_MAX_TABLES] __initdata; static int acpi_apic_instance __initdata; +enum acpi_subtable_type { + ACPI_SUBTABLE_COMMON, +}; + +struct acpi_subtable_entry { + union acpi_subtable_headers *hdr; + enum acpi_subtable_type type; +}; + /* * Disable table checksum verification for the early stage due to the size * limitation of the current x86 early mapping implementation. @@ -217,6 +226,42 @@ void acpi_table_print_madt_entry(struct acpi_subtable_header *header) } } +static unsigned long __init +acpi_get_entry_type(struct acpi_subtable_entry *entry) +{ + switch (entry->type) { + case ACPI_SUBTABLE_COMMON: + return entry->hdr->common.type; + } + return 0; +} + +static unsigned long __init +acpi_get_entry_length(struct acpi_subtable_entry *entry) +{ + switch (entry->type) { + case ACPI_SUBTABLE_COMMON: + return entry->hdr->common.length; + } + return 0; +} + +static unsigned long __init +acpi_get_subtable_header_length(struct acpi_subtable_entry *entry) +{ + switch (entry->type) { + case ACPI_SUBTABLE_COMMON: + return sizeof(entry->hdr->common); + } + return 0; +} + +static enum acpi_subtable_type __init +acpi_get_subtable_type(char *id) +{ + return ACPI_SUBTABLE_COMMON; +} + /** * acpi_parse_entries_array - for each proc_num find a suitable subtable * @@ -246,8 +291,8 @@ acpi_parse_entries_array(char *id, unsigned long table_size, struct acpi_subtable_proc *proc, int proc_num, unsigned int max_entries) { - struct acpi_subtable_header *entry; - unsigned long table_end; + struct acpi_subtable_entry entry; + unsigned long table_end, subtable_len, entry_len; int count = 0; int errs = 0; int i; @@ -270,19 +315,20 @@ acpi_parse_entries_array(char *id, unsigned long table_size, /* Parse all entries looking for a match. */ - entry = (struct acpi_subtable_header *) + entry.type = acpi_get_subtable_type(id); + entry.hdr = (union acpi_subtable_headers *) ((unsigned long)table_header + table_size); + subtable_len = acpi_get_subtable_header_length(&entry); - while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < - table_end) { + while (((unsigned long)entry.hdr) + subtable_len < table_end) { if (max_entries && count >= max_entries) break; for (i = 0; i < proc_num; i++) { - if (entry->type != proc[i].id) + if (acpi_get_entry_type(&entry) != proc[i].id) continue; if (!proc[i].handler || - (!errs && proc[i].handler(entry, table_end))) { + (!errs && proc[i].handler(entry.hdr, table_end))) { errs++; continue; } @@ -297,13 +343,14 @@ acpi_parse_entries_array(char *id, unsigned long table_size, * If entry->length is 0, break from this loop to avoid * infinite loop. */ - if (entry->length == 0) { + entry_len = acpi_get_entry_length(&entry); + if (entry_len == 0) { pr_err("[%4.4s:0x%02x] Invalid zero length\n", id, proc->id); return -EINVAL; } - entry = (struct acpi_subtable_header *) - ((unsigned long)entry + entry->length); + entry.hdr = (union acpi_subtable_headers *) + ((unsigned long)entry.hdr + entry_len); } if (max_entries && count > max_entries) { diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c index f5fe0100f9ff..de14e06fd9ec 100644 --- a/drivers/irqchip/irq-gic-v2m.c +++ b/drivers/irqchip/irq-gic-v2m.c @@ -446,7 +446,7 @@ static struct fwnode_handle *gicv2m_get_fwnode(struct device *dev) } static int __init -acpi_parse_madt_msi(struct acpi_subtable_header *header, +acpi_parse_madt_msi(union acpi_subtable_headers *header, const unsigned long end) { int ret; diff --git a/drivers/irqchip/irq-gic-v3-its-pci-msi.c b/drivers/irqchip/irq-gic-v3-its-pci-msi.c index 8d6d009d1d58..c81d5b81da56 100644 --- a/drivers/irqchip/irq-gic-v3-its-pci-msi.c +++ b/drivers/irqchip/irq-gic-v3-its-pci-msi.c @@ -159,7 +159,7 @@ static int __init its_pci_of_msi_init(void) #ifdef CONFIG_ACPI static int __init -its_pci_msi_parse_madt(struct acpi_subtable_header *header, +its_pci_msi_parse_madt(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_generic_translator *its_entry; diff --git a/drivers/irqchip/irq-gic-v3-its-platform-msi.c b/drivers/irqchip/irq-gic-v3-its-platform-msi.c index 7b8e87b493fe..9cdcda5bb3bd 100644 --- a/drivers/irqchip/irq-gic-v3-its-platform-msi.c +++ b/drivers/irqchip/irq-gic-v3-its-platform-msi.c @@ -117,7 +117,7 @@ static int __init its_pmsi_init_one(struct fwnode_handle *fwnode, #ifdef CONFIG_ACPI static int __init -its_pmsi_parse_madt(struct acpi_subtable_header *header, +its_pmsi_parse_madt(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_generic_translator *its_entry; diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index db20e992a40f..d6677075d68f 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -3764,13 +3764,13 @@ static int __init acpi_get_its_numa_node(u32 its_id) return NUMA_NO_NODE; } -static int __init gic_acpi_match_srat_its(struct acpi_subtable_header *header, +static int __init gic_acpi_match_srat_its(union acpi_subtable_headers *header, const unsigned long end) { return 0; } -static int __init gic_acpi_parse_srat_its(struct acpi_subtable_header *header, +static int __init gic_acpi_parse_srat_its(union acpi_subtable_headers *header, const unsigned long end) { int node; @@ -3837,7 +3837,7 @@ static int __init acpi_get_its_numa_node(u32 its_id) { return NUMA_NO_NODE; } static void __init acpi_its_srat_maps_free(void) { } #endif -static int __init gic_acpi_parse_madt_its(struct acpi_subtable_header *header, +static int __init gic_acpi_parse_madt_its(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_generic_translator *its_entry; diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 0868a9d81c3c..44db6b809c52 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -1392,7 +1392,7 @@ gic_acpi_register_redist(phys_addr_t phys_base, void __iomem *redist_base) } static int __init -gic_acpi_parse_madt_redist(struct acpi_subtable_header *header, +gic_acpi_parse_madt_redist(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_generic_redistributor *redist = @@ -1410,7 +1410,7 @@ gic_acpi_parse_madt_redist(struct acpi_subtable_header *header, } static int __init -gic_acpi_parse_madt_gicc(struct acpi_subtable_header *header, +gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_generic_interrupt *gicc = @@ -1452,14 +1452,14 @@ static int __init gic_acpi_collect_gicr_base(void) return -ENODEV; } -static int __init gic_acpi_match_gicr(struct acpi_subtable_header *header, +static int __init gic_acpi_match_gicr(union acpi_subtable_headers *header, const unsigned long end) { /* Subtable presence means that redist exists, that's it */ return 0; } -static int __init gic_acpi_match_gicc(struct acpi_subtable_header *header, +static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_generic_interrupt *gicc = @@ -1525,7 +1525,7 @@ static bool __init acpi_validate_gic_table(struct acpi_subtable_header *header, return true; } -static int __init gic_acpi_parse_virt_madt_gicc(struct acpi_subtable_header *header, +static int __init gic_acpi_parse_virt_madt_gicc(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_generic_interrupt *gicc = diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index ba2a37a27a54..a749d73f8337 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -1508,7 +1508,7 @@ static struct } acpi_data __initdata; static int __init -gic_acpi_parse_madt_cpu(struct acpi_subtable_header *header, +gic_acpi_parse_madt_cpu(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_generic_interrupt *processor; @@ -1540,7 +1540,7 @@ gic_acpi_parse_madt_cpu(struct acpi_subtable_header *header, } /* The things you have to do to just *count* something... */ -static int __init acpi_dummy_func(struct acpi_subtable_header *header, +static int __init acpi_dummy_func(union acpi_subtable_headers *header, const unsigned long end) { return 0; diff --git a/drivers/mailbox/pcc.c b/drivers/mailbox/pcc.c index 256f18b67e8a..08a0a3517138 100644 --- a/drivers/mailbox/pcc.c +++ b/drivers/mailbox/pcc.c @@ -382,7 +382,7 @@ static const struct mbox_chan_ops pcc_chan_ops = { * * This gets called for each entry in the PCC table. */ -static int parse_pcc_subspace(struct acpi_subtable_header *header, +static int parse_pcc_subspace(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_pcct_subspace *ss = (struct acpi_pcct_subspace *) header; diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 87715f20b69a..7c3c4ebaded6 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -141,10 +141,13 @@ enum acpi_address_range_id { /* Table Handlers */ +union acpi_subtable_headers { + struct acpi_subtable_header common; +}; typedef int (*acpi_tbl_table_handler)(struct acpi_table_header *table); -typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header, +typedef int (*acpi_tbl_entry_handler)(union acpi_subtable_headers *header, const unsigned long end); /* Debugger support */ From patchwork Wed Jan 16 17:57:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10766591 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1541E13B4 for ; Wed, 16 Jan 2019 17:59:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0525A2F0EC for ; Wed, 16 Jan 2019 17:59:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EDD972F0F2; Wed, 16 Jan 2019 17:59:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 92CBF2F0EF for ; Wed, 16 Jan 2019 17:59:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727958AbfAPR7i (ORCPT ); Wed, 16 Jan 2019 12:59:38 -0500 Received: from mga04.intel.com ([192.55.52.120]:8971 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727769AbfAPR7h (ORCPT ); Wed, 16 Jan 2019 12:59:37 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Jan 2019 09:59:37 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,487,1539673200"; d="scan'208";a="117227768" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by fmsmga008.fm.intel.com with ESMTP; 16 Jan 2019 09:59:36 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv4 02/13] acpi: Add HMAT to generic parsing tables Date: Wed, 16 Jan 2019 10:57:53 -0700 Message-Id: <20190116175804.30196-3-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190116175804.30196-1-keith.busch@intel.com> References: <20190116175804.30196-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The Heterogeneous Memory Attribute Table (HMAT) header has different field lengths than the existing parsing uses. Add the HMAT type to the parsing rules so it may be generically parsed. Cc: Dan Williams Reviewed-by: Rafael J. Wysocki Signed-off-by: Keith Busch --- drivers/acpi/tables.c | 9 +++++++++ include/linux/acpi.h | 1 + 2 files changed, 10 insertions(+) diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c index 967e1168becf..d9911cd55edc 100644 --- a/drivers/acpi/tables.c +++ b/drivers/acpi/tables.c @@ -51,6 +51,7 @@ static int acpi_apic_instance __initdata; enum acpi_subtable_type { ACPI_SUBTABLE_COMMON, + ACPI_SUBTABLE_HMAT, }; struct acpi_subtable_entry { @@ -232,6 +233,8 @@ acpi_get_entry_type(struct acpi_subtable_entry *entry) switch (entry->type) { case ACPI_SUBTABLE_COMMON: return entry->hdr->common.type; + case ACPI_SUBTABLE_HMAT: + return entry->hdr->hmat.type; } return 0; } @@ -242,6 +245,8 @@ acpi_get_entry_length(struct acpi_subtable_entry *entry) switch (entry->type) { case ACPI_SUBTABLE_COMMON: return entry->hdr->common.length; + case ACPI_SUBTABLE_HMAT: + return entry->hdr->hmat.length; } return 0; } @@ -252,6 +257,8 @@ acpi_get_subtable_header_length(struct acpi_subtable_entry *entry) switch (entry->type) { case ACPI_SUBTABLE_COMMON: return sizeof(entry->hdr->common); + case ACPI_SUBTABLE_HMAT: + return sizeof(entry->hdr->hmat); } return 0; } @@ -259,6 +266,8 @@ acpi_get_subtable_header_length(struct acpi_subtable_entry *entry) static enum acpi_subtable_type __init acpi_get_subtable_type(char *id) { + if (strncmp(id, ACPI_SIG_HMAT, 4) == 0) + return ACPI_SUBTABLE_HMAT; return ACPI_SUBTABLE_COMMON; } diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 7c3c4ebaded6..53f93dff171c 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -143,6 +143,7 @@ enum acpi_address_range_id { /* Table Handlers */ union acpi_subtable_headers { struct acpi_subtable_header common; + struct acpi_hmat_structure hmat; }; typedef int (*acpi_tbl_table_handler)(struct acpi_table_header *table); From patchwork Wed Jan 16 17:57:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10766637 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 111EF1390 for ; Wed, 16 Jan 2019 18:00:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F35A22F0F2 for ; Wed, 16 Jan 2019 18:00:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E754A2F104; Wed, 16 Jan 2019 18:00:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6652B2F0F2 for ; Wed, 16 Jan 2019 18:00:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727965AbfAPR7j (ORCPT ); Wed, 16 Jan 2019 12:59:39 -0500 Received: from mga04.intel.com ([192.55.52.120]:8973 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727955AbfAPR7i (ORCPT ); Wed, 16 Jan 2019 12:59:38 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Jan 2019 09:59:37 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,487,1539673200"; d="scan'208";a="117227772" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by fmsmga008.fm.intel.com with ESMTP; 16 Jan 2019 09:59:37 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv4 03/13] acpi/hmat: Parse and report heterogeneous memory Date: Wed, 16 Jan 2019 10:57:54 -0700 Message-Id: <20190116175804.30196-4-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190116175804.30196-1-keith.busch@intel.com> References: <20190116175804.30196-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Systems may provide different memory types and export this information in the ACPI Heterogeneous Memory Attribute Table (HMAT). Parse these tables provided by the platform and report the memory access and caching attributes. Signed-off-by: Keith Busch --- drivers/acpi/Kconfig | 1 + drivers/acpi/Makefile | 1 + drivers/acpi/hmat/Kconfig | 8 ++ drivers/acpi/hmat/Makefile | 1 + drivers/acpi/hmat/hmat.c | 180 +++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 191 insertions(+) create mode 100644 drivers/acpi/hmat/Kconfig create mode 100644 drivers/acpi/hmat/Makefile create mode 100644 drivers/acpi/hmat/hmat.c diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index 90ff0a47c12e..b377f970adfd 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -465,6 +465,7 @@ config ACPI_REDUCED_HARDWARE_ONLY If you are unsure what to do, do not enable this option. source "drivers/acpi/nfit/Kconfig" +source "drivers/acpi/hmat/Kconfig" source "drivers/acpi/apei/Kconfig" source "drivers/acpi/dptf/Kconfig" diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index 7c6afc111d76..bff8fbe5a6ab 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -79,6 +79,7 @@ obj-$(CONFIG_ACPI_PROCESSOR) += processor.o obj-$(CONFIG_ACPI) += container.o obj-$(CONFIG_ACPI_THERMAL) += thermal.o obj-$(CONFIG_ACPI_NFIT) += nfit/ +obj-$(CONFIG_ACPI_HMAT) += hmat/ obj-$(CONFIG_ACPI) += acpi_memhotplug.o obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o obj-$(CONFIG_ACPI_BATTERY) += battery.o diff --git a/drivers/acpi/hmat/Kconfig b/drivers/acpi/hmat/Kconfig new file mode 100644 index 000000000000..a4034d37a311 --- /dev/null +++ b/drivers/acpi/hmat/Kconfig @@ -0,0 +1,8 @@ +# SPDX-License-Identifier: GPL-2.0 +config ACPI_HMAT + bool "ACPI Heterogeneous Memory Attribute Table Support" + depends on ACPI_NUMA + help + Parses representation of the ACPI Heterogeneous Memory Attributes + Table (HMAT) and set the memory node relationships and access + attributes. diff --git a/drivers/acpi/hmat/Makefile b/drivers/acpi/hmat/Makefile new file mode 100644 index 000000000000..e909051d3d00 --- /dev/null +++ b/drivers/acpi/hmat/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_ACPI_HMAT) := hmat.o diff --git a/drivers/acpi/hmat/hmat.c b/drivers/acpi/hmat/hmat.c new file mode 100644 index 000000000000..833a783868d5 --- /dev/null +++ b/drivers/acpi/hmat/hmat.c @@ -0,0 +1,180 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Heterogeneous Memory Attributes Table (HMAT) representation + * + * Copyright (c) 2018, Intel Corporation. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static __init const char *hmat_data_type(u8 type) +{ + switch (type) { + case ACPI_HMAT_ACCESS_LATENCY: + return "Access Latency"; + case ACPI_HMAT_READ_LATENCY: + return "Read Latency"; + case ACPI_HMAT_WRITE_LATENCY: + return "Write Latency"; + case ACPI_HMAT_ACCESS_BANDWIDTH: + return "Access Bandwidth"; + case ACPI_HMAT_READ_BANDWIDTH: + return "Read Bandwidth"; + case ACPI_HMAT_WRITE_BANDWIDTH: + return "Write Bandwidth"; + default: + return "Reserved"; + }; +} + +static __init const char *hmat_data_type_suffix(u8 type) +{ + switch (type) { + case ACPI_HMAT_ACCESS_LATENCY: + case ACPI_HMAT_READ_LATENCY: + case ACPI_HMAT_WRITE_LATENCY: + return " nsec"; + case ACPI_HMAT_ACCESS_BANDWIDTH: + case ACPI_HMAT_READ_BANDWIDTH: + case ACPI_HMAT_WRITE_BANDWIDTH: + return " MB/s"; + default: + return ""; + }; +} + +static __init int hmat_parse_locality(union acpi_subtable_headers *header, + const unsigned long end) +{ + struct acpi_hmat_locality *loc = (void *)header; + unsigned int init, targ, total_size, ipds, tpds; + u32 *inits, *targs, value; + u16 *entries; + u8 type; + + if (loc->header.length < sizeof(*loc)) { + pr_err("HMAT: Unexpected locality header length: %d\n", + loc->header.length); + return -EINVAL; + } + + type = loc->data_type; + ipds = loc->number_of_initiator_Pds; + tpds = loc->number_of_target_Pds; + total_size = sizeof(*loc) + sizeof(*entries) * ipds * tpds + + sizeof(*inits) * ipds + sizeof(*targs) * tpds; + if (loc->header.length < total_size) { + pr_err("HMAT: Unexpected locality header length:%d, minimum required:%d\n", + loc->header.length, total_size); + return -EINVAL; + } + + pr_info("HMAT: Locality: Flags:%02x Type:%s Initiator Domains:%d Target Domains:%d Base:%lld\n", + loc->flags, hmat_data_type(type), ipds, tpds, + loc->entry_base_unit); + + inits = (u32 *)(loc + 1); + targs = &inits[ipds]; + entries = (u16 *)(&targs[tpds]); + for (targ = 0; targ < tpds; targ++) { + for (init = 0; init < ipds; init++) { + value = entries[init * tpds + targ]; + value = (value * loc->entry_base_unit) / 10; + pr_info(" Initiator-Target[%d-%d]:%d%s\n", + inits[init], targs[targ], value, + hmat_data_type_suffix(type)); + } + } + return 0; +} + +static __init int hmat_parse_cache(union acpi_subtable_headers *header, + const unsigned long end) +{ + struct acpi_hmat_cache *cache = (void *)header; + u32 attrs; + + if (cache->header.length < sizeof(*cache)) { + pr_err("HMAT: Unexpected cache header length: %d\n", + cache->header.length); + return -EINVAL; + } + + attrs = cache->cache_attributes; + pr_info("HMAT: Cache: Domain:%d Size:%llu Attrs:%08x SMBIOS Handles:%d\n", + cache->memory_PD, cache->cache_size, attrs, + cache->number_of_SMBIOShandles); + + return 0; +} + +static int __init hmat_parse_address_range(union acpi_subtable_headers *header, + const unsigned long end) +{ + struct acpi_hmat_address_range *spa = (void *)header; + + if (spa->header.length != sizeof(*spa)) { + pr_err("HMAT: Unexpected address range header length: %d\n", + spa->header.length); + return -EINVAL; + } + pr_info("HMAT: Memory (%#llx length %#llx) Flags:%04x Processor Domain:%d Memory Domain:%d\n", + spa->physical_address_base, spa->physical_address_length, + spa->flags, spa->processor_PD, spa->memory_PD); + return 0; +} + +static int __init hmat_parse_subtable(union acpi_subtable_headers *header, + const unsigned long end) +{ + struct acpi_hmat_structure *hdr = (void *)header; + + if (!hdr) + return -EINVAL; + + switch (hdr->type) { + case ACPI_HMAT_TYPE_ADDRESS_RANGE: + return hmat_parse_address_range(header, end); + case ACPI_HMAT_TYPE_LOCALITY: + return hmat_parse_locality(header, end); + case ACPI_HMAT_TYPE_CACHE: + return hmat_parse_cache(header, end); + default: + return -EINVAL; + } +} + +static __init int hmat_init(void) +{ + struct acpi_table_header *tbl; + enum acpi_hmat_type i; + acpi_status status; + + if (srat_disabled()) + return 0; + + status = acpi_get_table(ACPI_SIG_HMAT, 0, &tbl); + if (ACPI_FAILURE(status)) + return 0; + + for (i = ACPI_HMAT_TYPE_ADDRESS_RANGE; i < ACPI_HMAT_TYPE_RESERVED; i++) { + if (acpi_table_parse_entries(ACPI_SIG_HMAT, + sizeof(struct acpi_table_hmat), i, + hmat_parse_subtable, 0) < 0) + goto out_put; + } +out_put: + acpi_put_table(tbl); + return 0; +} +subsys_initcall(hmat_init); From patchwork Wed Jan 16 17:57:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10766641 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BD54513BF for ; Wed, 16 Jan 2019 18:00:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AFB702F0F2 for ; Wed, 16 Jan 2019 18:00:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A45022F104; Wed, 16 Jan 2019 18:00:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 04DFF2F0F2 for ; Wed, 16 Jan 2019 18:00:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728250AbfAPSAn (ORCPT ); Wed, 16 Jan 2019 13:00:43 -0500 Received: from mga04.intel.com ([192.55.52.120]:8974 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727956AbfAPR7i (ORCPT ); Wed, 16 Jan 2019 12:59:38 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Jan 2019 09:59:38 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,487,1539673200"; d="scan'208";a="117227779" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by fmsmga008.fm.intel.com with ESMTP; 16 Jan 2019 09:59:37 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv4 04/13] node: Link memory nodes to their compute nodes Date: Wed, 16 Jan 2019 10:57:55 -0700 Message-Id: <20190116175804.30196-5-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190116175804.30196-1-keith.busch@intel.com> References: <20190116175804.30196-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Systems may be constructed with various specialized nodes. Some nodes may provide memory, some provide compute devices that access and use that memory, and others may provide both. Nodes that provide memory are referred to as memory targets, and nodes that can initiate memory access are referred to as memory initiators. Memory targets will often have varying access characteristics from different initiators, and platforms may have ways to express those relationships. In preparation for these systems, provide interfaces for the kernel to export the memory relationship among different nodes memory targets and their initiators with symlinks to each other's nodes, and export node lists showing the same relationship. If a system provides access locality for each initiator-target pair, nodes may be grouped into ranked access classes relative to other nodes. The new interface allows a subsystem to register relationships of varying classes if available and desired to be exported. A lower class number indicates a higher performing tier, with 0 being the best performing class. A memory initiator may have multiple memory targets in the same access class. The initiator's memory targets in given class indicate the node's access characteristics perform better relative to other initiator nodes either unreported or in lower class numbers. The targets within an initiator's class, though, do not necessarily perform the same as each other. A memory target node may have multiple memory initiators. All linked initiators in a target's class have the same access characteristics to that target. The following example show the nodes' new sysfs hierarchy for a memory target node 'Y' with class 0 access from initiator node 'X': # symlinks -v /sys/devices/system/node/nodeX/class0/ relative: /sys/devices/system/node/nodeX/class0/targetY -> ../../nodeY # symlinks -v /sys/devices/system/node/nodeY/class0/ relative: /sys/devices/system/node/nodeY/class0/initiatorX -> ../../nodeX And the same information is reflected in the nodelist: # cat /sys/devices/system/node/nodeX/class0/target_nodelist Y # cat /sys/devices/system/node/nodeY/class0/initiator_nodelist X Signed-off-by: Keith Busch --- drivers/base/node.c | 127 ++++++++++++++++++++++++++++++++++++++++++++++++++- include/linux/node.h | 6 ++- 2 files changed, 131 insertions(+), 2 deletions(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index 86d6cd92ce3d..1da5072116ab 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -59,6 +60,91 @@ static inline ssize_t node_read_cpulist(struct device *dev, static DEVICE_ATTR(cpumap, S_IRUGO, node_read_cpumask, NULL); static DEVICE_ATTR(cpulist, S_IRUGO, node_read_cpulist, NULL); +struct node_class_nodes { + struct device dev; + struct list_head list_node; + unsigned class; + nodemask_t initiator_nodes; + nodemask_t target_nodes; +}; +#define to_class_nodes(dev) container_of(dev, struct node_class_nodes, dev) + +static ssize_t initiator_nodelist_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct node_class_nodes *c = to_class_nodes(dev); + return scnprintf(buf, PAGE_SIZE - 1, "%*pbl\n", + nodemask_pr_args(&c->initiator_nodes)); +} +static DEVICE_ATTR_RO(initiator_nodelist); + +static ssize_t target_nodelist_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct node_class_nodes *c = to_class_nodes(dev); + return scnprintf(buf, PAGE_SIZE - 1, "%*pbl\n", + nodemask_pr_args(&c->target_nodes)); +} +static DEVICE_ATTR_RO(target_nodelist); + +static struct attribute *node_class_node_attrs[] = { + &dev_attr_initiator_nodelist.attr, + &dev_attr_target_nodelist.attr, + NULL, +}; +ATTRIBUTE_GROUPS(node_class_node); + +static void node_remove_classes(struct node *node) +{ + struct node_class_nodes *c, *cnext; + + list_for_each_entry_safe(c, cnext, &node->class_list, list_node) { + list_del(&c->list_node); + device_unregister(&c->dev); + } +} + +static void node_class_release(struct device *dev) +{ + kfree(to_class_nodes(dev)); +} + +static struct node_class_nodes *node_init_node_class(struct device *parent, + struct list_head *list, + unsigned class) +{ + struct node_class_nodes *class_node; + struct device *dev; + + list_for_each_entry(class_node, list, list_node) + if (class_node->class == class) + return class_node; + + class_node = kzalloc(sizeof(*class_node), GFP_KERNEL); + if (!class_node) + return NULL; + + class_node->class = class; + dev = &class_node->dev; + dev->parent = parent; + dev->release = node_class_release; + dev->groups = node_class_node_groups; + if (dev_set_name(dev, "class%u", class)) + goto free; + + if (device_register(dev)) + goto free_name; + + pm_runtime_no_callbacks(dev); + list_add_tail(&class_node->list_node, list); + return class_node; +free_name: + kfree_const(dev->kobj.name); +free: + kfree(class_node); + return NULL; +} + #define K(x) ((x) << (PAGE_SHIFT - 10)) static ssize_t node_read_meminfo(struct device *dev, struct device_attribute *attr, char *buf) @@ -340,7 +426,7 @@ static int register_node(struct node *node, int num) void unregister_node(struct node *node) { hugetlb_unregister_node(node); /* no-op, if memoryless node */ - + node_remove_classes(node); device_unregister(&node->dev); } @@ -372,6 +458,44 @@ int register_cpu_under_node(unsigned int cpu, unsigned int nid) kobject_name(&node_devices[nid]->dev.kobj)); } +int register_memory_node_under_compute_node(unsigned int m, unsigned int p, + unsigned class) +{ + struct node *init, *targ; + struct node_class_nodes *i, *t; + char initiator[20]; /* "initiator4294967295\0" */ + char target[17]; /* "target4294967295\0" */ + int ret; + + if (!node_online(p) || !node_online(m)) + return -ENODEV; + + init = node_devices[p]; + targ = node_devices[m]; + i = node_init_node_class(&init->dev, &init->class_list, class); + t = node_init_node_class(&targ->dev, &targ->class_list, class); + if (!i || !t) + return -ENOMEM; + + snprintf(initiator, sizeof(initiator), "initiator%u", p); + snprintf(target, sizeof(target), "target%u", m); + ret = sysfs_create_link(&i->dev.kobj, &targ->dev.kobj, target); + if (ret) + return ret; + + ret = sysfs_create_link(&t->dev.kobj, &init->dev.kobj, initiator); + if (ret) + goto err; + + node_set(m, i->target_nodes); + node_set(p, t->initiator_nodes); + return 0; + err: + sysfs_remove_link(&node_devices[p]->dev.kobj, + kobject_name(&node_devices[m]->dev.kobj)); + return ret; +} + int unregister_cpu_under_node(unsigned int cpu, unsigned int nid) { struct device *obj; @@ -580,6 +704,7 @@ int __register_one_node(int nid) register_cpu_under_node(cpu, nid); } + INIT_LIST_HEAD(&node_devices[nid]->class_list); /* initialize work queue for memory hot plug */ init_node_hugetlb_work(nid); diff --git a/include/linux/node.h b/include/linux/node.h index 257bb3d6d014..8e3666c12ef2 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -17,11 +17,12 @@ #include #include +#include #include struct node { struct device dev; - + struct list_head class_list; #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HUGETLBFS) struct work_struct node_work; #endif @@ -75,6 +76,9 @@ extern int register_mem_sect_under_node(struct memory_block *mem_blk, extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk, unsigned long phys_index); +extern int register_memory_node_under_compute_node(unsigned int m, unsigned int p, + unsigned class); + #ifdef CONFIG_HUGETLBFS extern void register_hugetlbfs_with_node(node_registration_func_t doregister, node_registration_func_t unregister); From patchwork Wed Jan 16 17:57:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10766643 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D33C71580 for ; Wed, 16 Jan 2019 18:00:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C4CC12F0F2 for ; Wed, 16 Jan 2019 18:00:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B97932F141; Wed, 16 Jan 2019 18:00:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6200E2F102 for ; Wed, 16 Jan 2019 18:00:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727973AbfAPSAn (ORCPT ); Wed, 16 Jan 2019 13:00:43 -0500 Received: from mga04.intel.com ([192.55.52.120]:8973 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727963AbfAPR7j (ORCPT ); Wed, 16 Jan 2019 12:59:39 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Jan 2019 09:59:38 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,487,1539673200"; d="scan'208";a="117227782" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by fmsmga008.fm.intel.com with ESMTP; 16 Jan 2019 09:59:38 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv4 05/13] Documentation/ABI: Add new node sysfs attributes Date: Wed, 16 Jan 2019 10:57:56 -0700 Message-Id: <20190116175804.30196-6-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190116175804.30196-1-keith.busch@intel.com> References: <20190116175804.30196-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add entries for memory initiator and target node class attributes. Signed-off-by: Keith Busch --- Documentation/ABI/stable/sysfs-devices-node | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node index 3e90e1f3bf0a..a9c47b4b0eee 100644 --- a/Documentation/ABI/stable/sysfs-devices-node +++ b/Documentation/ABI/stable/sysfs-devices-node @@ -90,4 +90,27 @@ Date: December 2009 Contact: Lee Schermerhorn Description: The node's huge page size control/query attributes. - See Documentation/admin-guide/mm/hugetlbpage.rst \ No newline at end of file + See Documentation/admin-guide/mm/hugetlbpage.rst + +What: /sys/devices/system/node/nodeX/classY/ +Date: December 2018 +Contact: Keith Busch +Description: + The node's relationship to other nodes for access class "Y". + +What: /sys/devices/system/node/nodeX/classY/initiator_nodelist +Date: December 2018 +Contact: Keith Busch +Description: + The node list of memory initiators that have class "Y" access + to this node's memory. CPUs and other memory initiators in + nodes not in the list accessing this node's memory may have + different performance. + +What: /sys/devices/system/node/nodeX/classY/target_nodelist +Date: December 2018 +Contact: Keith Busch +Description: + The node list of memory targets that this initiator node has + class "Y" access. Memory accesses from this node to nodes not + in this list may have differet performance. From patchwork Wed Jan 16 17:57:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10766631 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 008061580 for ; Wed, 16 Jan 2019 18:00:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E75AA2F0F2 for ; Wed, 16 Jan 2019 18:00:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DB6152F103; Wed, 16 Jan 2019 18:00:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5AB792F0F2 for ; Wed, 16 Jan 2019 18:00:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727976AbfAPR7k (ORCPT ); Wed, 16 Jan 2019 12:59:40 -0500 Received: from mga04.intel.com ([192.55.52.120]:8977 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727969AbfAPR7j (ORCPT ); Wed, 16 Jan 2019 12:59:39 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Jan 2019 09:59:39 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,487,1539673200"; d="scan'208";a="117227786" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by fmsmga008.fm.intel.com with ESMTP; 16 Jan 2019 09:59:38 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv4 06/13] acpi/hmat: Register processor domain to its memory Date: Wed, 16 Jan 2019 10:57:57 -0700 Message-Id: <20190116175804.30196-7-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190116175804.30196-1-keith.busch@intel.com> References: <20190116175804.30196-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If the HMAT Subsystem Address Range provides a valid processor proximity domain for a memory domain, or a processor domain with the highest performing access exists, register the memory target with that initiator so this relationship will be visible under the node's sysfs directory. Since HMAT requires valid address ranges have an equivalent SRAT entry, verify each memory target satisfies this requirement. Signed-off-by: Keith Busch --- drivers/acpi/hmat/hmat.c | 143 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 136 insertions(+), 7 deletions(-) diff --git a/drivers/acpi/hmat/hmat.c b/drivers/acpi/hmat/hmat.c index 833a783868d5..efb33c74d1a3 100644 --- a/drivers/acpi/hmat/hmat.c +++ b/drivers/acpi/hmat/hmat.c @@ -17,6 +17,43 @@ #include #include +static LIST_HEAD(targets); + +struct memory_target { + struct list_head node; + unsigned int memory_pxm; + unsigned long p_nodes[BITS_TO_LONGS(MAX_NUMNODES)]; +}; + +static __init struct memory_target *find_mem_target(unsigned int m) +{ + struct memory_target *t; + + list_for_each_entry(t, &targets, node) + if (t->memory_pxm == m) + return t; + return NULL; +} + +static __init void alloc_memory_target(unsigned int mem_pxm) +{ + struct memory_target *t; + + if (pxm_to_node(mem_pxm) == NUMA_NO_NODE) + return; + + t = find_mem_target(mem_pxm); + if (t) + return; + + t = kzalloc(sizeof(*t), GFP_KERNEL); + if (!t) + return; + + t->memory_pxm = mem_pxm; + list_add_tail(&t->node, &targets); +} + static __init const char *hmat_data_type(u8 type) { switch (type) { @@ -53,11 +90,30 @@ static __init const char *hmat_data_type_suffix(u8 type) }; } +static __init void hmat_update_access(u8 type, u32 value, u32 *best) +{ + switch (type) { + case ACPI_HMAT_ACCESS_LATENCY: + case ACPI_HMAT_READ_LATENCY: + case ACPI_HMAT_WRITE_LATENCY: + if (!*best || *best > value) + *best = value; + break; + case ACPI_HMAT_ACCESS_BANDWIDTH: + case ACPI_HMAT_READ_BANDWIDTH: + case ACPI_HMAT_WRITE_BANDWIDTH: + if (!*best || *best < value) + *best = value; + break; + } +} + static __init int hmat_parse_locality(union acpi_subtable_headers *header, const unsigned long end) { + struct memory_target *t; struct acpi_hmat_locality *loc = (void *)header; - unsigned int init, targ, total_size, ipds, tpds; + unsigned int init, targ, pass, p_node, total_size, ipds, tpds; u32 *inits, *targs, value; u16 *entries; u8 type; @@ -87,12 +143,28 @@ static __init int hmat_parse_locality(union acpi_subtable_headers *header, targs = &inits[ipds]; entries = (u16 *)(&targs[tpds]); for (targ = 0; targ < tpds; targ++) { - for (init = 0; init < ipds; init++) { - value = entries[init * tpds + targ]; - value = (value * loc->entry_base_unit) / 10; - pr_info(" Initiator-Target[%d-%d]:%d%s\n", - inits[init], targs[targ], value, - hmat_data_type_suffix(type)); + u32 best = 0; + + t = find_mem_target(targs[targ]); + for (pass = 0; pass < 2; pass++) { + for (init = 0; init < ipds; init++) { + value = entries[init * tpds + targ]; + value = (value * loc->entry_base_unit) / 10; + + if (!pass) { + hmat_update_access(type, value, &best); + pr_info(" Initiator-Target[%d-%d]:%d%s\n", + inits[init], targs[targ], value, + hmat_data_type_suffix(type)); + continue; + } + + if (!t) + continue; + p_node = pxm_to_node(inits[init]); + if (p_node != NUMA_NO_NODE && value == best) + set_bit(p_node, t->p_nodes); + } } } return 0; @@ -122,6 +194,7 @@ static int __init hmat_parse_address_range(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_hmat_address_range *spa = (void *)header; + struct memory_target *t = NULL; if (spa->header.length != sizeof(*spa)) { pr_err("HMAT: Unexpected address range header length: %d\n", @@ -131,6 +204,23 @@ static int __init hmat_parse_address_range(union acpi_subtable_headers *header, pr_info("HMAT: Memory (%#llx length %#llx) Flags:%04x Processor Domain:%d Memory Domain:%d\n", spa->physical_address_base, spa->physical_address_length, spa->flags, spa->processor_PD, spa->memory_PD); + + if (spa->flags & ACPI_HMAT_MEMORY_PD_VALID) { + t = find_mem_target(spa->memory_PD); + if (!t) { + pr_warn("HMAT: Memory Domain missing from SRAT\n"); + return -EINVAL; + } + } + if (t && spa->flags & ACPI_HMAT_PROCESSOR_PD_VALID) { + int p_node = pxm_to_node(spa->processor_PD); + + if (p_node == NUMA_NO_NODE) { + pr_warn("HMAT: Invalid Processor Domain\n"); + return -EINVAL; + } + set_bit(p_node, t->p_nodes); + } return 0; } @@ -154,6 +244,33 @@ static int __init hmat_parse_subtable(union acpi_subtable_headers *header, } } +static __init int srat_parse_mem_affinity(union acpi_subtable_headers *header, + const unsigned long end) +{ + struct acpi_srat_mem_affinity *ma = (void *)header; + + if (!ma) + return -EINVAL; + if (!(ma->flags & ACPI_SRAT_MEM_ENABLED)) + return 0; + alloc_memory_target(ma->proximity_domain); + return 0; +} + +static __init void hmat_register_targets(void) +{ + struct memory_target *t, *next; + unsigned m, p; + + list_for_each_entry_safe(t, next, &targets, node) { + list_del(&t->node); + m = pxm_to_node(t->memory_pxm); + for_each_set_bit(p, t->p_nodes, MAX_NUMNODES) + register_memory_node_under_compute_node(m, p, 0); + kfree(t); + } +} + static __init int hmat_init(void) { struct acpi_table_header *tbl; @@ -163,6 +280,17 @@ static __init int hmat_init(void) if (srat_disabled()) return 0; + status = acpi_get_table(ACPI_SIG_SRAT, 0, &tbl); + if (ACPI_FAILURE(status)) + return 0; + + if (acpi_table_parse_entries(ACPI_SIG_SRAT, + sizeof(struct acpi_table_srat), + ACPI_SRAT_TYPE_MEMORY_AFFINITY, + srat_parse_mem_affinity, 0) < 0) + goto out_put; + acpi_put_table(tbl); + status = acpi_get_table(ACPI_SIG_HMAT, 0, &tbl); if (ACPI_FAILURE(status)) return 0; @@ -173,6 +301,7 @@ static __init int hmat_init(void) hmat_parse_subtable, 0) < 0) goto out_put; } + hmat_register_targets(); out_put: acpi_put_table(tbl); return 0; From patchwork Wed Jan 16 17:57:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10766635 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B66871880 for ; Wed, 16 Jan 2019 18:00:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A74142F0EC for ; Wed, 16 Jan 2019 18:00:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9BA502F102; Wed, 16 Jan 2019 18:00:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 33F0A2F0EC for ; Wed, 16 Jan 2019 18:00:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728202AbfAPSAb (ORCPT ); Wed, 16 Jan 2019 13:00:31 -0500 Received: from mga04.intel.com ([192.55.52.120]:8977 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727973AbfAPR7k (ORCPT ); Wed, 16 Jan 2019 12:59:40 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Jan 2019 09:59:39 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,487,1539673200"; d="scan'208";a="117227791" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by fmsmga008.fm.intel.com with ESMTP; 16 Jan 2019 09:59:39 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv4 07/13] node: Add heterogenous memory access attributes Date: Wed, 16 Jan 2019 10:57:58 -0700 Message-Id: <20190116175804.30196-8-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190116175804.30196-1-keith.busch@intel.com> References: <20190116175804.30196-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Heterogeneous memory systems provide memory nodes with different latency and bandwidth performance attributes. Provide a new kernel interface for subsystems to register the attributes under the memory target node's initiator access class. If the system provides this information, applications may query these attributes when deciding which node to request memory. The following example shows the new sysfs hierarchy for a node exporting performance attributes: # tree -P "read*|write*" /sys/devices/system/node/nodeY/classZ/ /sys/devices/system/node/nodeY/classZ/ |-- read_bandwidth |-- read_latency |-- write_bandwidth `-- write_latency The bandwidth is exported as MB/s and latency is reported in nanoseconds. Memory accesses from an initiator node that is not one of the memory's class "Z" initiator nodes may encounter different performance than reported here. When a subsystem makes use of this interface, initiators of a lower class number, "Z", have better performance relative to higher class numbers. When provided, class 0 is the highest performing access class. Signed-off-by: Keith Busch --- drivers/base/Kconfig | 8 ++++++++ drivers/base/node.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/node.h | 25 +++++++++++++++++++++++++ 3 files changed, 81 insertions(+) diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 3e63a900b330..6014980238e8 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -149,6 +149,14 @@ config DEBUG_TEST_DRIVER_REMOVE unusable. You should say N here unless you are explicitly looking to test this functionality. +config HMEM_REPORTING + bool + default y + depends on NUMA + help + Enable reporting for heterogenous memory access attributes under + their non-uniform memory nodes. + source "drivers/base/test/Kconfig" config SYS_HYPERVISOR diff --git a/drivers/base/node.c b/drivers/base/node.c index 1da5072116ab..1e909f61e8b1 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -66,6 +66,9 @@ struct node_class_nodes { unsigned class; nodemask_t initiator_nodes; nodemask_t target_nodes; +#ifdef CONFIG_HMEM_REPORTING + struct node_hmem_attrs hmem_attrs; +#endif }; #define to_class_nodes(dev) container_of(dev, struct node_class_nodes, dev) @@ -145,6 +148,51 @@ static struct node_class_nodes *node_init_node_class(struct device *parent, return NULL; } +#ifdef CONFIG_HMEM_REPORTING +#define ACCESS_ATTR(name) \ +static ssize_t name##_show(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + return sprintf(buf, "%u\n", to_class_nodes(dev)->hmem_attrs.name); \ +} \ +static DEVICE_ATTR_RO(name); + +ACCESS_ATTR(read_bandwidth) +ACCESS_ATTR(read_latency) +ACCESS_ATTR(write_bandwidth) +ACCESS_ATTR(write_latency) + +static struct attribute *access_attrs[] = { + &dev_attr_read_bandwidth.attr, + &dev_attr_read_latency.attr, + &dev_attr_write_bandwidth.attr, + &dev_attr_write_latency.attr, + NULL, +}; +ATTRIBUTE_GROUPS(access); + +void node_set_perf_attrs(unsigned int nid, struct node_hmem_attrs *hmem_attrs, + unsigned class) +{ + struct node_class_nodes *c; + struct node *node; + + if (WARN_ON_ONCE(!node_online(nid))) + return; + + node = node_devices[nid]; + c = node_init_node_class(&node->dev, &node->class_list, class); + if (!c) + return; + + c->hmem_attrs = *hmem_attrs; + if (sysfs_create_groups(&c->dev.kobj, access_groups)) + pr_info("failed to add performance attribute group to node %d\n", + nid); +} +#endif + #define K(x) ((x) << (PAGE_SHIFT - 10)) static ssize_t node_read_meminfo(struct device *dev, struct device_attribute *attr, char *buf) diff --git a/include/linux/node.h b/include/linux/node.h index 8e3666c12ef2..e22940a593c2 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -20,6 +20,31 @@ #include #include +#ifdef CONFIG_HMEM_REPORTING +/** + * struct node_hmem_attrs - heterogeneous memory performance attributes + * + * @read_bandwidth: Read bandwidth in MB/s + * @write_bandwidth: Write bandwidth in MB/s + * @read_latency: Read latency in nanoseconds + * @write_latency: Write latency in nanoseconds + */ +struct node_hmem_attrs { + unsigned int read_bandwidth; + unsigned int write_bandwidth; + unsigned int read_latency; + unsigned int write_latency; +}; +void node_set_perf_attrs(unsigned int nid, struct node_hmem_attrs *hmem_attrs, + unsigned class); +#else +static inline void node_set_perf_attrs(unsigned int nid, + struct node_hmem_attrs *hmem_attrs, + unsigned class) +{ +} +#endif + struct node { struct device dev; struct list_head class_list; From patchwork Wed Jan 16 17:57:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10766633 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 213EE13BF for ; Wed, 16 Jan 2019 18:00:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 12CEC2F0EC for ; Wed, 16 Jan 2019 18:00:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 06F722F102; Wed, 16 Jan 2019 18:00:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A9C092F0EC for ; Wed, 16 Jan 2019 18:00:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727951AbfAPSAb (ORCPT ); Wed, 16 Jan 2019 13:00:31 -0500 Received: from mga04.intel.com ([192.55.52.120]:8979 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727974AbfAPR7k (ORCPT ); Wed, 16 Jan 2019 12:59:40 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Jan 2019 09:59:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,487,1539673200"; d="scan'208";a="117227794" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by fmsmga008.fm.intel.com with ESMTP; 16 Jan 2019 09:59:39 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv4 08/13] Documentation/ABI: Add node performance attributes Date: Wed, 16 Jan 2019 10:57:59 -0700 Message-Id: <20190116175804.30196-9-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190116175804.30196-1-keith.busch@intel.com> References: <20190116175804.30196-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add descriptions for memory class initiator performance access attributes. Signed-off-by: Keith Busch --- Documentation/ABI/stable/sysfs-devices-node | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node index a9c47b4b0eee..2217557f29d3 100644 --- a/Documentation/ABI/stable/sysfs-devices-node +++ b/Documentation/ABI/stable/sysfs-devices-node @@ -114,3 +114,31 @@ Description: The node list of memory targets that this initiator node has class "Y" access. Memory accesses from this node to nodes not in this list may have differet performance. + +What: /sys/devices/system/node/nodeX/classY/read_bandwidth +Date: December 2018 +Contact: Keith Busch +Description: + This node's read bandwidth in MB/s available to memory + initiators in nodes found in this class's initiators_nodelist. + +What: /sys/devices/system/node/nodeX/classY/read_latency +Date: December 2018 +Contact: Keith Busch +Description: + This node's read latency in nanoseconds available to memory + initiators in nodes found in this class's initiators_nodelist. + +What: /sys/devices/system/node/nodeX/classY/write_bandwidth +Date: December 2018 +Contact: Keith Busch +Description: + This node's write bandwidth in MB/s available to memory + initiators in nodes found in this class's initiators_nodelist. + +What: /sys/devices/system/node/nodeX/classY/write_latency +Date: December 2018 +Contact: Keith Busch +Description: + This node's write latency in nanoseconds available to memory + initiators in nodes found in this class's initiators_nodelist. From patchwork Wed Jan 16 17:58:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10766595 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 390531390 for ; Wed, 16 Jan 2019 17:59:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 280FD2F0EC for ; Wed, 16 Jan 2019 17:59:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1BF242F0F2; Wed, 16 Jan 2019 17:59:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B51FD2F0EC for ; Wed, 16 Jan 2019 17:59:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727994AbfAPR7l (ORCPT ); Wed, 16 Jan 2019 12:59:41 -0500 Received: from mga04.intel.com ([192.55.52.120]:8980 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727982AbfAPR7l (ORCPT ); Wed, 16 Jan 2019 12:59:41 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Jan 2019 09:59:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,487,1539673200"; d="scan'208";a="117227797" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by fmsmga008.fm.intel.com with ESMTP; 16 Jan 2019 09:59:40 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv4 09/13] acpi/hmat: Register performance attributes Date: Wed, 16 Jan 2019 10:58:00 -0700 Message-Id: <20190116175804.30196-10-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190116175804.30196-1-keith.busch@intel.com> References: <20190116175804.30196-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Save the best performace access attributes and register these with the memory's node if HMAT provides the locality table. While HMAT does make it possible to know performance for all possible initiator-target pairings, we export only the best pairings at this time. Signed-off-by: Keith Busch --- drivers/acpi/hmat/Kconfig | 1 + drivers/acpi/hmat/hmat.c | 34 ++++++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+) diff --git a/drivers/acpi/hmat/Kconfig b/drivers/acpi/hmat/Kconfig index a4034d37a311..20a0e96ba58a 100644 --- a/drivers/acpi/hmat/Kconfig +++ b/drivers/acpi/hmat/Kconfig @@ -2,6 +2,7 @@ config ACPI_HMAT bool "ACPI Heterogeneous Memory Attribute Table Support" depends on ACPI_NUMA + select HMEM_REPORTING help Parses representation of the ACPI Heterogeneous Memory Attributes Table (HMAT) and set the memory node relationships and access diff --git a/drivers/acpi/hmat/hmat.c b/drivers/acpi/hmat/hmat.c index efb33c74d1a3..45e20dc677f9 100644 --- a/drivers/acpi/hmat/hmat.c +++ b/drivers/acpi/hmat/hmat.c @@ -23,6 +23,8 @@ struct memory_target { struct list_head node; unsigned int memory_pxm; unsigned long p_nodes[BITS_TO_LONGS(MAX_NUMNODES)]; + bool hmem_valid; + struct node_hmem_attrs hmem; }; static __init struct memory_target *find_mem_target(unsigned int m) @@ -108,6 +110,34 @@ static __init void hmat_update_access(u8 type, u32 value, u32 *best) } } +static __init void hmat_update_target(struct memory_target *t, u8 type, + u32 value) +{ + switch (type) { + case ACPI_HMAT_ACCESS_LATENCY: + t->hmem.read_latency = value; + t->hmem.write_latency = value; + break; + case ACPI_HMAT_READ_LATENCY: + t->hmem.read_latency = value; + break; + case ACPI_HMAT_WRITE_LATENCY: + t->hmem.write_latency = value; + break; + case ACPI_HMAT_ACCESS_BANDWIDTH: + t->hmem.read_bandwidth = value; + t->hmem.write_bandwidth = value; + break; + case ACPI_HMAT_READ_BANDWIDTH: + t->hmem.read_bandwidth = value; + break; + case ACPI_HMAT_WRITE_BANDWIDTH: + t->hmem.write_bandwidth = value; + break; + } + t->hmem_valid = true; +} + static __init int hmat_parse_locality(union acpi_subtable_headers *header, const unsigned long end) { @@ -166,6 +196,8 @@ static __init int hmat_parse_locality(union acpi_subtable_headers *header, set_bit(p_node, t->p_nodes); } } + if (t && best) + hmat_update_target(t, type, best); } return 0; } @@ -267,6 +299,8 @@ static __init void hmat_register_targets(void) m = pxm_to_node(t->memory_pxm); for_each_set_bit(p, t->p_nodes, MAX_NUMNODES) register_memory_node_under_compute_node(m, p, 0); + if (t->hmem_valid) + node_set_perf_attrs(m, &t->hmem, 0); kfree(t); } } From patchwork Wed Jan 16 17:58:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10766627 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 263AA1390 for ; Wed, 16 Jan 2019 18:00:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 173302F0EF for ; Wed, 16 Jan 2019 18:00:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0B1BF2F102; Wed, 16 Jan 2019 18:00:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5EC992F0EF for ; Wed, 16 Jan 2019 18:00:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728014AbfAPSAP (ORCPT ); Wed, 16 Jan 2019 13:00:15 -0500 Received: from mga04.intel.com ([192.55.52.120]:8980 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727996AbfAPR7m (ORCPT ); Wed, 16 Jan 2019 12:59:42 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Jan 2019 09:59:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,487,1539673200"; d="scan'208";a="117227800" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by fmsmga008.fm.intel.com with ESMTP; 16 Jan 2019 09:59:41 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv4 10/13] node: Add memory caching attributes Date: Wed, 16 Jan 2019 10:58:01 -0700 Message-Id: <20190116175804.30196-11-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190116175804.30196-1-keith.busch@intel.com> References: <20190116175804.30196-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP System memory may have side caches to help improve access speed to frequently requested address ranges. While the system provided cache is transparent to the software accessing these memory ranges, applications can optimize their own access based on cache attributes. Provide a new API for the kernel to register these memory side caches under the memory node that provides it. The new sysfs representation is modeled from the existing cpu cacheinfo attributes, as seen from /sys/devices/system/cpu/cpuX/side_cache/. Unlike CPU cacheinfo, though, the node cache level is reported from the view of the memory. A higher number is nearer to the CPU, while lower levels are closer to the backing memory. Also unlike CPU cache, it is assumed the system will handle flushing any dirty cached memory to the last level on a power failure if the range is persistent memory. The attributes we export are the cache size, the line size, associativity, and write back policy. Signed-off-by: Keith Busch --- drivers/base/node.c | 142 +++++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/node.h | 39 ++++++++++++++ 2 files changed, 181 insertions(+) diff --git a/drivers/base/node.c b/drivers/base/node.c index 1e909f61e8b1..7ff3ed566d7d 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -191,6 +191,146 @@ void node_set_perf_attrs(unsigned int nid, struct node_hmem_attrs *hmem_attrs, pr_info("failed to add performance attribute group to node %d\n", nid); } + +struct node_cache_info { + struct device dev; + struct list_head node; + struct node_cache_attrs cache_attrs; +}; +#define to_cache_info(device) container_of(device, struct node_cache_info, dev) + +#define CACHE_ATTR(name, fmt) \ +static ssize_t name##_show(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + return sprintf(buf, fmt "\n", to_cache_info(dev)->cache_attrs.name);\ +} \ +DEVICE_ATTR_RO(name); + +CACHE_ATTR(size, "%llu") +CACHE_ATTR(level, "%u") +CACHE_ATTR(line_size, "%u") +CACHE_ATTR(associativity, "%u") +CACHE_ATTR(write_policy, "%u") + +static struct attribute *cache_attrs[] = { + &dev_attr_level.attr, + &dev_attr_associativity.attr, + &dev_attr_size.attr, + &dev_attr_line_size.attr, + &dev_attr_write_policy.attr, + NULL, +}; +ATTRIBUTE_GROUPS(cache); + +static void node_cache_release(struct device *dev) +{ + kfree(dev); +} + +static void node_cacheinfo_release(struct device *dev) +{ + struct node_cache_info *info = to_cache_info(dev); + kfree(info); +} + +static void node_init_cache_dev(struct node *node) +{ + struct device *dev; + + dev = kzalloc(sizeof(*dev), GFP_KERNEL); + if (!dev) + return; + + dev->parent = &node->dev; + dev->release = node_cache_release; + if (dev_set_name(dev, "side_cache")) + goto free_dev; + + if (device_register(dev)) + goto free_name; + + pm_runtime_no_callbacks(dev); + node->cache_dev = dev; + return; +free_name: + kfree_const(dev->kobj.name); +free_dev: + kfree(dev); +} + +void node_add_cache(unsigned int nid, struct node_cache_attrs *cache_attrs) +{ + struct node_cache_info *info; + struct device *dev; + struct node *node; + + if (!node_online(nid) || !node_devices[nid]) + return; + + node = node_devices[nid]; + list_for_each_entry(info, &node->cache_attrs, node) { + if (info->cache_attrs.level == cache_attrs->level) { + dev_warn(&node->dev, + "attempt to add duplicate cache level:%d\n", + cache_attrs->level); + return; + } + } + + if (!node->cache_dev) + node_init_cache_dev(node); + if (!node->cache_dev) + return; + + info = kzalloc(sizeof(*info), GFP_KERNEL); + if (!info) + return; + + dev = &info->dev; + dev->parent = node->cache_dev; + dev->release = node_cacheinfo_release; + dev->groups = cache_groups; + if (dev_set_name(dev, "index%d", cache_attrs->level)) + goto free_cache; + + info->cache_attrs = *cache_attrs; + if (device_register(dev)) { + dev_warn(&node->dev, "failed to add cache level:%d\n", + cache_attrs->level); + goto free_name; + } + pm_runtime_no_callbacks(dev); + list_add_tail(&info->node, &node->cache_attrs); + return; +free_name: + kfree_const(dev->kobj.name); +free_cache: + kfree(info); +} + +static void node_remove_caches(struct node *node) +{ + struct node_cache_info *info, *next; + + if (!node->cache_dev) + return; + + list_for_each_entry_safe(info, next, &node->cache_attrs, node) { + list_del(&info->node); + device_unregister(&info->dev); + } + device_unregister(node->cache_dev); +} + +static void node_init_caches(unsigned int nid) +{ + INIT_LIST_HEAD(&node_devices[nid]->cache_attrs); +} +#else +static void node_init_caches(unsigned int nid) { } +static void node_remove_caches(struct node *node) { } #endif #define K(x) ((x) << (PAGE_SHIFT - 10)) @@ -475,6 +615,7 @@ void unregister_node(struct node *node) { hugetlb_unregister_node(node); /* no-op, if memoryless node */ node_remove_classes(node); + node_remove_caches(node); device_unregister(&node->dev); } @@ -755,6 +896,7 @@ int __register_one_node(int nid) INIT_LIST_HEAD(&node_devices[nid]->class_list); /* initialize work queue for memory hot plug */ init_node_hugetlb_work(nid); + node_init_caches(nid); return error; } diff --git a/include/linux/node.h b/include/linux/node.h index e22940a593c2..8cdf2b2808e4 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -37,12 +37,47 @@ struct node_hmem_attrs { }; void node_set_perf_attrs(unsigned int nid, struct node_hmem_attrs *hmem_attrs, unsigned class); + +enum cache_associativity { + NODE_CACHE_DIRECT_MAP, + NODE_CACHE_INDEXED, + NODE_CACHE_OTHER, +}; + +enum cache_write_policy { + NODE_CACHE_WRITE_BACK, + NODE_CACHE_WRITE_THROUGH, + NODE_CACHE_WRITE_OTHER, +}; + +/** + * struct node_cache_attrs - system memory caching attributes + * + * @associativity: The ways memory blocks may be placed in cache + * @write_policy: Write back or write through policy + * @size: Total size of cache in bytes + * @line_size: Number of bytes fetched on a cache miss + * @level: Represents the cache hierarchy level + */ +struct node_cache_attrs { + enum cache_associativity associativity; + enum cache_write_policy write_policy; + u64 size; + u16 line_size; + u8 level; +}; +void node_add_cache(unsigned int nid, struct node_cache_attrs *cache_attrs); #else static inline void node_set_perf_attrs(unsigned int nid, struct node_hmem_attrs *hmem_attrs, unsigned class) { } + +static inline void node_add_cache(unsigned int nid, + struct node_cache_attrs *cache_attrs) +{ +} #endif struct node { @@ -51,6 +86,10 @@ struct node { #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HUGETLBFS) struct work_struct node_work; #endif +#ifdef CONFIG_HMEM_REPORTING + struct list_head cache_attrs; + struct device *cache_dev; +#endif }; struct memory_block; From patchwork Wed Jan 16 17:58:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10766629 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A34E21390 for ; Wed, 16 Jan 2019 18:00:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 952EB2F0EC for ; Wed, 16 Jan 2019 18:00:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 89B8F2F103; Wed, 16 Jan 2019 18:00:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B5B752F0EC for ; Wed, 16 Jan 2019 18:00:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728158AbfAPSAP (ORCPT ); Wed, 16 Jan 2019 13:00:15 -0500 Received: from mga04.intel.com ([192.55.52.120]:8981 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728002AbfAPR7m (ORCPT ); Wed, 16 Jan 2019 12:59:42 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Jan 2019 09:59:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,487,1539673200"; d="scan'208";a="117227804" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by fmsmga008.fm.intel.com with ESMTP; 16 Jan 2019 09:59:41 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv4 11/13] Documentation/ABI: Add node cache attributes Date: Wed, 16 Jan 2019 10:58:02 -0700 Message-Id: <20190116175804.30196-12-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190116175804.30196-1-keith.busch@intel.com> References: <20190116175804.30196-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add the attributes for the system memory side caches. Signed-off-by: Keith Busch --- Documentation/ABI/stable/sysfs-devices-node | 34 +++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node index 2217557f29d3..613d51fb52a3 100644 --- a/Documentation/ABI/stable/sysfs-devices-node +++ b/Documentation/ABI/stable/sysfs-devices-node @@ -142,3 +142,37 @@ Contact: Keith Busch Description: This node's write latency in nanoseconds available to memory initiators in nodes found in this class's initiators_nodelist. + +What: /sys/devices/system/node/nodeX/side_cache/indexY/associativity +Date: December 2018 +Contact: Keith Busch +Description: + The caches associativity: 0 for direct mapped, non-zero if + indexed. + +What: /sys/devices/system/node/nodeX/side_cache/indexY/level +Date: December 2018 +Contact: Keith Busch +Description: + This cache's level in the memory hierarchy. Matches 'Y' in the + directory name. + +What: /sys/devices/system/node/nodeX/side_cache/indexY/line_size +Date: December 2018 +Contact: Keith Busch +Description: + The number of bytes accessed from the next cache level on a + cache miss. + +What: /sys/devices/system/node/nodeX/side_cache/indexY/size +Date: December 2018 +Contact: Keith Busch +Description: + The size of this memory side cache in bytes. + +What: /sys/devices/system/node/nodeX/side_cache/indexY/write_policy +Date: December 2018 +Contact: Keith Busch +Description: + The cache write policy: 0 for write-back, 1 for write-through, + 2 for other or unknown. From patchwork Wed Jan 16 17:58:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10766621 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5B2DD1390 for ; Wed, 16 Jan 2019 18:00:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4BF0D2F0EC for ; Wed, 16 Jan 2019 18:00:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4018B2F0F2; Wed, 16 Jan 2019 18:00:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EC59E2F0EC for ; Wed, 16 Jan 2019 18:00:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728126AbfAPSAI (ORCPT ); Wed, 16 Jan 2019 13:00:08 -0500 Received: from mga04.intel.com ([192.55.52.120]:8983 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728014AbfAPR7n (ORCPT ); Wed, 16 Jan 2019 12:59:43 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Jan 2019 09:59:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,487,1539673200"; d="scan'208";a="117227809" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by fmsmga008.fm.intel.com with ESMTP; 16 Jan 2019 09:59:42 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv4 12/13] acpi/hmat: Register memory side cache attributes Date: Wed, 16 Jan 2019 10:58:03 -0700 Message-Id: <20190116175804.30196-13-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190116175804.30196-1-keith.busch@intel.com> References: <20190116175804.30196-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Register memory side cache attributes with the memory's node if HMAT provides the side cache iniformation table. Signed-off-by: Keith Busch --- drivers/acpi/hmat/hmat.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/drivers/acpi/hmat/hmat.c b/drivers/acpi/hmat/hmat.c index 45e20dc677f9..9efdd0a63a79 100644 --- a/drivers/acpi/hmat/hmat.c +++ b/drivers/acpi/hmat/hmat.c @@ -206,6 +206,7 @@ static __init int hmat_parse_cache(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_hmat_cache *cache = (void *)header; + struct node_cache_attrs cache_attrs; u32 attrs; if (cache->header.length < sizeof(*cache)) { @@ -219,6 +220,37 @@ static __init int hmat_parse_cache(union acpi_subtable_headers *header, cache->memory_PD, cache->cache_size, attrs, cache->number_of_SMBIOShandles); + cache_attrs.size = cache->cache_size; + cache_attrs.level = (attrs & ACPI_HMAT_CACHE_LEVEL) >> 4; + cache_attrs.line_size = (attrs & ACPI_HMAT_CACHE_LINE_SIZE) >> 16; + + switch ((attrs & ACPI_HMAT_CACHE_ASSOCIATIVITY) >> 8) { + case ACPI_HMAT_CA_DIRECT_MAPPED: + cache_attrs.associativity = NODE_CACHE_DIRECT_MAP; + break; + case ACPI_HMAT_CA_COMPLEX_CACHE_INDEXING: + cache_attrs.associativity = NODE_CACHE_INDEXED; + break; + case ACPI_HMAT_CA_NONE: + default: + cache_attrs.associativity = NODE_CACHE_OTHER; + break; + } + + switch ((attrs & ACPI_HMAT_WRITE_POLICY) >> 12) { + case ACPI_HMAT_CP_WB: + cache_attrs.write_policy = NODE_CACHE_WRITE_BACK; + break; + case ACPI_HMAT_CP_WT: + cache_attrs.write_policy = NODE_CACHE_WRITE_THROUGH; + break; + case ACPI_HMAT_CP_NONE: + default: + cache_attrs.write_policy = NODE_CACHE_WRITE_OTHER; + break; + } + + node_add_cache(pxm_to_node(cache->memory_PD), &cache_attrs); return 0; } From patchwork Wed Jan 16 17:58:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10766615 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DCD761390 for ; Wed, 16 Jan 2019 18:00:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C7FE32F0EC for ; Wed, 16 Jan 2019 18:00:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BB68B2F102; Wed, 16 Jan 2019 18:00:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0788C2F0EC for ; Wed, 16 Jan 2019 18:00:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728032AbfAPR7o (ORCPT ); Wed, 16 Jan 2019 12:59:44 -0500 Received: from mga04.intel.com ([192.55.52.120]:8980 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728015AbfAPR7o (ORCPT ); Wed, 16 Jan 2019 12:59:44 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Jan 2019 09:59:43 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,487,1539673200"; d="scan'208";a="117227813" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by fmsmga008.fm.intel.com with ESMTP; 16 Jan 2019 09:59:42 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv4 13/13] doc/mm: New documentation for memory performance Date: Wed, 16 Jan 2019 10:58:04 -0700 Message-Id: <20190116175804.30196-14-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190116175804.30196-1-keith.busch@intel.com> References: <20190116175804.30196-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Platforms may provide system memory where some physical address ranges perform differently than others, or is side cached by the system. Add documentation describing a high level overview of such systems and the perforamnce and caching attributes the kernel provides for applications wishing to query this information. Reviewed-by: Mike Rapoport Signed-off-by: Keith Busch --- Documentation/admin-guide/mm/numaperf.rst | 184 ++++++++++++++++++++++++++++++ 1 file changed, 184 insertions(+) create mode 100644 Documentation/admin-guide/mm/numaperf.rst diff --git a/Documentation/admin-guide/mm/numaperf.rst b/Documentation/admin-guide/mm/numaperf.rst new file mode 100644 index 000000000000..963fbd3004d3 --- /dev/null +++ b/Documentation/admin-guide/mm/numaperf.rst @@ -0,0 +1,184 @@ +.. _numaperf: + +============= +NUMA Locality +============= + +Some platforms may have multiple types of memory attached to a single +CPU. These disparate memory ranges share some characteristics, such as +CPU cache coherence, but may have different performance. For example, +different media types and buses affect bandwidth and latency. + +A system supporting such heterogeneous memory by grouping each memory +type under different "nodes" based on similar CPU locality and performance +characteristics. Some memory may share the same node as a CPU, and others +are provided as memory only nodes. While memory only nodes do not provide +CPUs, they may still be directly accessible, or local, to one or more +compute nodes. The following diagram shows one such example of two compute +nodes with local memory and a memory only node for each of compute node: + + +------------------+ +------------------+ + | Compute Node 0 +-----+ Compute Node 1 | + | Local Node0 Mem | | Local Node1 Mem | + +--------+---------+ +--------+---------+ + | | + +--------+---------+ +--------+---------+ + | Slower Node2 Mem | | Slower Node3 Mem | + +------------------+ +--------+---------+ + +A "memory initiator" is a node containing one or more devices such as +CPUs or separate memory I/O devices that can initiate memory requests. +A "memory target" is a node containing one or more physical address +ranges accessible from one or more memory initiators. + +When multiple memory initiators exist, they may not all have the same +performance when accessing a given memory target. Each initiator-target +pair may be organized into different ranked access classes to represent +this relationship. The highest performing initiator to a given target +is considered to be one of that target's local initiators, and given +the highest access class, 0. Any given target may have one or more +local initiators, and any given initiator may have multiple local +memory targets. + +To aid applications matching memory targets with their initiators, the +kernel provides symlinks to each other. The following example lists the +relationship for the class "0" memory initiators and targets, which is +the of nodes with the highest performing access relationship:: + + # symlinks -v /sys/devices/system/node/nodeX/class0/ + relative: /sys/devices/system/node/nodeX/class0/targetY -> ../../nodeY + + # symlinks -v /sys/devices/system/node/nodeY/class0/ + relative: /sys/devices/system/node/nodeY/class0/initiatorX -> ../../nodeX + +The linked nodes will also have their node numbers set in the class's +target and initiator nodelist entries. Following the same example as +above may look like the following:: + + # cat /sys/devices/system/node/nodeX/class0/target_nodelist + Y + + # cat /sys/devices/system/node/nodeY/class0/initiator_nodelist + X + +An example showing how this may be used to run a particular task on CPUs +and memory using best class nodes for a particular PCI device can be done +using existing 'numactl' as follows:: + + # NODE=$(cat /sys/devices/pci:0000:00/.../numa_node) + # numactl --membind=$(cat /sys/devices/node/node${NODE}/class0/target_nodelist) \ + --cpunodebind=$(cat /sys/devices/node/node${NODE}/class0/initiator_nodelist) \ + -- + +================ +NUMA Performance +================ + +Applications may wish to consider which node they want their memory to +be allocated from based on the node's performance characteristics. If +the system provides these attributes, the kernel exports them under the +node sysfs hierarchy by appending the attributes directory under the +memory node's class 0 initiators as follows:: + + /sys/devices/system/node/nodeY/class0/ + +These attributes apply only to the memory initiator nodes that have the +same class access and are symlink under the class, and are set in the +initiators' nodelist. + +The performance characteristics the kernel provides for the local initiators +are exported are as follows:: + + # tree -P "read*|write*" /sys/devices/system/node/nodeY/class0/ + /sys/devices/system/node/nodeY/class0/ + |-- read_bandwidth + |-- read_latency + |-- write_bandwidth + `-- write_latency + +The bandwidth attributes are provided in MiB/second. + +The latency attributes are provided in nanoseconds. + +========== +NUMA Cache +========== + +System memory may be constructed in a hierarchy of elements with various +performance characteristics in order to provide large address space of +slower performing memory side-cached by a smaller higher performing +memory. The system physical addresses that initiators are aware of +are provided by the last memory level in the hierarchy. The system +meanwhile uses higher performing memory to transparently cache access +to progressively slower levels. + +The term "far memory" is used to denote the last level memory in the +hierarchy. Each increasing cache level provides higher performing +initiator access, and the term "near memory" represents the fastest +cache provided by the system. + +This numbering is different than CPU caches where the cache level (ex: +L1, L2, L3) uses a CPU centric view with each increased level is lower +performing. In contrast, the memory cache level is centric to the last +level memory, so the higher numbered cache level denotes memory nearer +to the CPU, and further from far memory. + +The memory side caches are not directly addressable by software. When +software accesses a system address, the system will return it from the +near memory cache if it is present. If it is not present, the system +accesses the next level of memory until there is either a hit in that +cache level, or it reaches far memory. + +An application does not need to know about caching attributes in order +to use the system. Software may optionally query the memory cache +attributes in order to maximize the performance out of such a setup. +If the system provides a way for the kernel to discover this information, +for example with ACPI HMAT (Heterogeneous Memory Attribute Table), +the kernel will append these attributes to the NUMA node memory target. + +When the kernel first registers a memory cache with a node, the kernel +will create the following directory:: + + /sys/devices/system/node/nodeX/side_cache/ + +If that directory is not present, the system either does not not provide +a memory side cache, or that information is not accessible to the kernel. + +The attributes for each level of cache is provided under its cache +level index:: + + /sys/devices/system/node/nodeX/side_cache/indexA/ + /sys/devices/system/node/nodeX/side_cache/indexB/ + /sys/devices/system/node/nodeX/side_cache/indexC/ + +Each cache level's directory provides its attributes. For example, the +following shows a single cache level and the attributes available for +software to query:: + + # tree sys/devices/system/node/node0/side_cache/ + /sys/devices/system/node/node0/side_cache/ + |-- index1 + | |-- associativity + | |-- level + | |-- line_size + | |-- size + | `-- write_policy + +The "associativity" will be 0 if it is a direct-mapped cache, and non-zero +for any other indexed based, multi-way associativity. + +The "level" is the distance from the far memory, and matches the number +appended to its "index" directory. + +The "line_size" is the number of bytes accessed on a cache miss. + +The "size" is the number of bytes provided by this cache level. + +The "write_policy" will be 0 for write-back, and non-zero for +write-through caching. + +======== +See Also +======== +.. [1] https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf + Section 5.2.27