From patchwork Tue Dec 11 01:02:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10722951 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 32B5591E for ; Tue, 11 Dec 2018 01:06:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 226D42A0E6 for ; Tue, 11 Dec 2018 01:06:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 152862A4ED; Tue, 11 Dec 2018 01:06:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 286F72A0E6 for ; Tue, 11 Dec 2018 01:06:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728509AbeLKBFp (ORCPT ); Mon, 10 Dec 2018 20:05:45 -0500 Received: from mga02.intel.com ([134.134.136.20]:24609 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729289AbeLKBFp (ORCPT ); Mon, 10 Dec 2018 20:05:45 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Dec 2018 17:05:44 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,340,1539673200"; d="scan'208";a="117705167" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by orsmga001.jf.intel.com with ESMTP; 10 Dec 2018 17:05:43 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv2 01/12] acpi: Create subtable parsing infrastructure Date: Mon, 10 Dec 2018 18:02:59 -0700 Message-Id: <20181211010310.8551-2-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20181211010310.8551-1-keith.busch@intel.com> References: <20181211010310.8551-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Parsing entries in an ACPI table had assumed a generic header structure that is most common. There is no standard ACPI header, though, so less common types would need custom parsers if they want go through their sub-table entry list. Create the infrastructure for adding different table types so parsing the entries array may be more reused for all ACPI system tables. Signed-off-by: Keith Busch Reviewed-by: Rafael J. Wysocki --- arch/x86/kernel/acpi/boot.c | 36 ++++++++++++------------ drivers/acpi/numa.c | 16 +++++------ drivers/acpi/scan.c | 4 +-- drivers/acpi/tables.c | 67 +++++++++++++++++++++++++++++++++++++------- drivers/irqchip/irq-gic-v3.c | 2 +- drivers/mailbox/pcc.c | 2 +- include/linux/acpi.h | 5 +++- 7 files changed, 91 insertions(+), 41 deletions(-) diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 06635fbca81c..58561b4df09d 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -197,7 +197,7 @@ static int acpi_register_lapic(int id, u32 acpiid, u8 enabled) } static int __init -acpi_parse_x2apic(struct acpi_subtable_header *header, const unsigned long end) +acpi_parse_x2apic(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_local_x2apic *processor = NULL; #ifdef CONFIG_X86_X2APIC @@ -210,7 +210,7 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, const unsigned long end) if (BAD_MADT_ENTRY(processor, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); #ifdef CONFIG_X86_X2APIC apic_id = processor->local_apic_id; @@ -242,7 +242,7 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, const unsigned long end) } static int __init -acpi_parse_lapic(struct acpi_subtable_header * header, const unsigned long end) +acpi_parse_lapic(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_local_apic *processor = NULL; @@ -251,7 +251,7 @@ acpi_parse_lapic(struct acpi_subtable_header * header, const unsigned long end) if (BAD_MADT_ENTRY(processor, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); /* Ignore invalid ID */ if (processor->id == 0xff) @@ -272,7 +272,7 @@ acpi_parse_lapic(struct acpi_subtable_header * header, const unsigned long end) } static int __init -acpi_parse_sapic(struct acpi_subtable_header *header, const unsigned long end) +acpi_parse_sapic(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_local_sapic *processor = NULL; @@ -281,7 +281,7 @@ acpi_parse_sapic(struct acpi_subtable_header *header, const unsigned long end) if (BAD_MADT_ENTRY(processor, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); acpi_register_lapic((processor->id << 8) | processor->eid,/* APIC ID */ processor->processor_id, /* ACPI ID */ @@ -291,7 +291,7 @@ acpi_parse_sapic(struct acpi_subtable_header *header, const unsigned long end) } static int __init -acpi_parse_lapic_addr_ovr(struct acpi_subtable_header * header, +acpi_parse_lapic_addr_ovr(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_local_apic_override *lapic_addr_ovr = NULL; @@ -301,7 +301,7 @@ acpi_parse_lapic_addr_ovr(struct acpi_subtable_header * header, if (BAD_MADT_ENTRY(lapic_addr_ovr, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); acpi_lapic_addr = lapic_addr_ovr->address; @@ -309,7 +309,7 @@ acpi_parse_lapic_addr_ovr(struct acpi_subtable_header * header, } static int __init -acpi_parse_x2apic_nmi(struct acpi_subtable_header *header, +acpi_parse_x2apic_nmi(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_local_x2apic_nmi *x2apic_nmi = NULL; @@ -319,7 +319,7 @@ acpi_parse_x2apic_nmi(struct acpi_subtable_header *header, if (BAD_MADT_ENTRY(x2apic_nmi, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); if (x2apic_nmi->lint != 1) printk(KERN_WARNING PREFIX "NMI not connected to LINT 1!\n"); @@ -328,7 +328,7 @@ acpi_parse_x2apic_nmi(struct acpi_subtable_header *header, } static int __init -acpi_parse_lapic_nmi(struct acpi_subtable_header * header, const unsigned long end) +acpi_parse_lapic_nmi(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_local_apic_nmi *lapic_nmi = NULL; @@ -337,7 +337,7 @@ acpi_parse_lapic_nmi(struct acpi_subtable_header * header, const unsigned long e if (BAD_MADT_ENTRY(lapic_nmi, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); if (lapic_nmi->lint != 1) printk(KERN_WARNING PREFIX "NMI not connected to LINT 1!\n"); @@ -449,7 +449,7 @@ static int __init mp_register_ioapic_irq(u8 bus_irq, u8 polarity, } static int __init -acpi_parse_ioapic(struct acpi_subtable_header * header, const unsigned long end) +acpi_parse_ioapic(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_io_apic *ioapic = NULL; struct ioapic_domain_cfg cfg = { @@ -462,7 +462,7 @@ acpi_parse_ioapic(struct acpi_subtable_header * header, const unsigned long end) if (BAD_MADT_ENTRY(ioapic, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); /* Statically assign IRQ numbers for IOAPICs hosting legacy IRQs */ if (ioapic->global_irq_base < nr_legacy_irqs()) @@ -508,7 +508,7 @@ static void __init acpi_sci_ioapic_setup(u8 bus_irq, u16 polarity, u16 trigger, } static int __init -acpi_parse_int_src_ovr(struct acpi_subtable_header * header, +acpi_parse_int_src_ovr(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_interrupt_override *intsrc = NULL; @@ -518,7 +518,7 @@ acpi_parse_int_src_ovr(struct acpi_subtable_header * header, if (BAD_MADT_ENTRY(intsrc, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); if (intsrc->source_irq == acpi_gbl_FADT.sci_interrupt) { acpi_sci_ioapic_setup(intsrc->source_irq, @@ -550,7 +550,7 @@ acpi_parse_int_src_ovr(struct acpi_subtable_header * header, } static int __init -acpi_parse_nmi_src(struct acpi_subtable_header * header, const unsigned long end) +acpi_parse_nmi_src(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_nmi_source *nmi_src = NULL; @@ -559,7 +559,7 @@ acpi_parse_nmi_src(struct acpi_subtable_header * header, const unsigned long end if (BAD_MADT_ENTRY(nmi_src, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); /* TBD: Support nimsrc entries? */ diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c index 274699463b4f..f5e09c39ff22 100644 --- a/drivers/acpi/numa.c +++ b/drivers/acpi/numa.c @@ -338,7 +338,7 @@ acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa) } static int __init -acpi_parse_x2apic_affinity(struct acpi_subtable_header *header, +acpi_parse_x2apic_affinity(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_srat_x2apic_cpu_affinity *processor_affinity; @@ -347,7 +347,7 @@ acpi_parse_x2apic_affinity(struct acpi_subtable_header *header, if (!processor_affinity) return -EINVAL; - acpi_table_print_srat_entry(header); + acpi_table_print_srat_entry(&header->common); /* let architecture-dependent part to do it */ acpi_numa_x2apic_affinity_init(processor_affinity); @@ -356,7 +356,7 @@ acpi_parse_x2apic_affinity(struct acpi_subtable_header *header, } static int __init -acpi_parse_processor_affinity(struct acpi_subtable_header *header, +acpi_parse_processor_affinity(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_srat_cpu_affinity *processor_affinity; @@ -365,7 +365,7 @@ acpi_parse_processor_affinity(struct acpi_subtable_header *header, if (!processor_affinity) return -EINVAL; - acpi_table_print_srat_entry(header); + acpi_table_print_srat_entry(&header->common); /* let architecture-dependent part to do it */ acpi_numa_processor_affinity_init(processor_affinity); @@ -374,7 +374,7 @@ acpi_parse_processor_affinity(struct acpi_subtable_header *header, } static int __init -acpi_parse_gicc_affinity(struct acpi_subtable_header *header, +acpi_parse_gicc_affinity(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_srat_gicc_affinity *processor_affinity; @@ -383,7 +383,7 @@ acpi_parse_gicc_affinity(struct acpi_subtable_header *header, if (!processor_affinity) return -EINVAL; - acpi_table_print_srat_entry(header); + acpi_table_print_srat_entry(&header->common); /* let architecture-dependent part to do it */ acpi_numa_gicc_affinity_init(processor_affinity); @@ -394,7 +394,7 @@ acpi_parse_gicc_affinity(struct acpi_subtable_header *header, static int __initdata parsed_numa_memblks; static int __init -acpi_parse_memory_affinity(struct acpi_subtable_header * header, +acpi_parse_memory_affinity(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_srat_mem_affinity *memory_affinity; @@ -403,7 +403,7 @@ acpi_parse_memory_affinity(struct acpi_subtable_header * header, if (!memory_affinity) return -EINVAL; - acpi_table_print_srat_entry(header); + acpi_table_print_srat_entry(&header->common); /* let architecture-dependent part to do it */ if (!acpi_numa_memory_affinity_init(memory_affinity)) diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c index bd1c59fb0e17..d98d5da6a279 100644 --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -2234,10 +2234,10 @@ static struct acpi_probe_entry *ape; static int acpi_probe_count; static DEFINE_MUTEX(acpi_probe_mutex); -static int __init acpi_match_madt(struct acpi_subtable_header *header, +static int __init acpi_match_madt(union acpi_subtable_headers *header, const unsigned long end) { - if (!ape->subtable_valid || ape->subtable_valid(header, ape)) + if (!ape->subtable_valid || ape->subtable_valid(&header->common, ape)) if (!ape->probe_subtbl(header, end)) acpi_probe_count++; diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c index 61203eebf3a1..e9643b4267c7 100644 --- a/drivers/acpi/tables.c +++ b/drivers/acpi/tables.c @@ -49,6 +49,15 @@ static struct acpi_table_desc initial_tables[ACPI_MAX_TABLES] __initdata; static int acpi_apic_instance __initdata; +enum acpi_subtable_type { + ACPI_SUBTABLE_COMMON, +}; + +struct acpi_subtable_entry { + union acpi_subtable_headers *hdr; + enum acpi_subtable_type type; +}; + /* * Disable table checksum verification for the early stage due to the size * limitation of the current x86 early mapping implementation. @@ -217,6 +226,42 @@ void acpi_table_print_madt_entry(struct acpi_subtable_header *header) } } +static unsigned long __init +acpi_get_entry_type(struct acpi_subtable_entry *entry) +{ + switch (entry->type) { + case ACPI_SUBTABLE_COMMON: + return entry->hdr->common.type; + } + return 0; +} + +static unsigned long __init +acpi_get_entry_length(struct acpi_subtable_entry *entry) +{ + switch (entry->type) { + case ACPI_SUBTABLE_COMMON: + return entry->hdr->common.length; + } + return 0; +} + +static unsigned long __init +acpi_get_subtable_header_length(struct acpi_subtable_entry *entry) +{ + switch (entry->type) { + case ACPI_SUBTABLE_COMMON: + return sizeof(entry->hdr->common); + } + return 0; +} + +static enum acpi_subtable_type __init +acpi_get_subtable_type(char *id) +{ + return ACPI_SUBTABLE_COMMON; +} + /** * acpi_parse_entries_array - for each proc_num find a suitable subtable * @@ -246,8 +291,8 @@ acpi_parse_entries_array(char *id, unsigned long table_size, struct acpi_subtable_proc *proc, int proc_num, unsigned int max_entries) { - struct acpi_subtable_header *entry; - unsigned long table_end; + struct acpi_subtable_entry entry; + unsigned long table_end, subtable_len, entry_len; int count = 0; int errs = 0; int i; @@ -270,19 +315,20 @@ acpi_parse_entries_array(char *id, unsigned long table_size, /* Parse all entries looking for a match. */ - entry = (struct acpi_subtable_header *) + entry.type = acpi_get_subtable_type(id); + entry.hdr = (union acpi_subtable_headers *) ((unsigned long)table_header + table_size); + subtable_len = acpi_get_subtable_header_length(&entry); - while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < - table_end) { + while (((unsigned long)entry.hdr) + subtable_len < table_end) { if (max_entries && count >= max_entries) break; for (i = 0; i < proc_num; i++) { - if (entry->type != proc[i].id) + if (acpi_get_entry_type(&entry) != proc[i].id) continue; if (!proc[i].handler || - (!errs && proc[i].handler(entry, table_end))) { + (!errs && proc[i].handler(entry.hdr, table_end))) { errs++; continue; } @@ -297,13 +343,14 @@ acpi_parse_entries_array(char *id, unsigned long table_size, * If entry->length is 0, break from this loop to avoid * infinite loop. */ - if (entry->length == 0) { + entry_len = acpi_get_entry_length(&entry); + if (entry_len == 0) { pr_err("[%4.4s:0x%02x] Invalid zero length\n", id, proc->id); return -EINVAL; } - entry = (struct acpi_subtable_header *) - ((unsigned long)entry + entry->length); + entry.hdr = (union acpi_subtable_headers *) + ((unsigned long)entry.hdr + entry_len); } if (max_entries && count > max_entries) { diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 8f87f40c9460..ef3f72196ad6 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -1383,7 +1383,7 @@ gic_acpi_parse_madt_redist(struct acpi_subtable_header *header, } static int __init -gic_acpi_parse_madt_gicc(struct acpi_subtable_header *header, +gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_generic_interrupt *gicc = diff --git a/drivers/mailbox/pcc.c b/drivers/mailbox/pcc.c index 256f18b67e8a..08a0a3517138 100644 --- a/drivers/mailbox/pcc.c +++ b/drivers/mailbox/pcc.c @@ -382,7 +382,7 @@ static const struct mbox_chan_ops pcc_chan_ops = { * * This gets called for each entry in the PCC table. */ -static int parse_pcc_subspace(struct acpi_subtable_header *header, +static int parse_pcc_subspace(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_pcct_subspace *ss = (struct acpi_pcct_subspace *) header; diff --git a/include/linux/acpi.h b/include/linux/acpi.h index ed80f147bd50..18805a967c70 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -141,10 +141,13 @@ enum acpi_address_range_id { /* Table Handlers */ +union acpi_subtable_headers { + struct acpi_subtable_header common; +}; typedef int (*acpi_tbl_table_handler)(struct acpi_table_header *table); -typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header, +typedef int (*acpi_tbl_entry_handler)(union acpi_subtable_headers *header, const unsigned long end); /* Debugger support */ From patchwork Tue Dec 11 01:03:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10722947 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D27DC112E for ; Tue, 11 Dec 2018 01:06:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C269C2A0E6 for ; Tue, 11 Dec 2018 01:06:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B3D212A4ED; Tue, 11 Dec 2018 01:06:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F3C162A0E6 for ; Tue, 11 Dec 2018 01:06:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729883AbeLKBGo (ORCPT ); Mon, 10 Dec 2018 20:06:44 -0500 Received: from mga02.intel.com ([134.134.136.20]:24609 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729389AbeLKBFq (ORCPT ); Mon, 10 Dec 2018 20:05:46 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Dec 2018 17:05:45 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,340,1539673200"; d="scan'208";a="117705172" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by orsmga001.jf.intel.com with ESMTP; 10 Dec 2018 17:05:44 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv2 02/12] acpi/hmat: Parse and report heterogeneous memory Date: Mon, 10 Dec 2018 18:03:00 -0700 Message-Id: <20181211010310.8551-3-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20181211010310.8551-1-keith.busch@intel.com> References: <20181211010310.8551-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Systems may provide different memory types and export this information in the ACPI Heterogeneous Memory Attribute Table (HMAT). Parse these tables provided by the platform and report the memory access and caching attributes. Signed-off-by: Keith Busch --- drivers/acpi/Kconfig | 8 +++ drivers/acpi/Makefile | 1 + drivers/acpi/hmat.c | 192 ++++++++++++++++++++++++++++++++++++++++++++++++++ drivers/acpi/tables.c | 9 +++ include/linux/acpi.h | 1 + 5 files changed, 211 insertions(+) create mode 100644 drivers/acpi/hmat.c diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index 7cea769c37df..9a05af3a18cf 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -327,6 +327,14 @@ config ACPI_NUMA depends on (X86 || IA64 || ARM64) default y if IA64_GENERIC || IA64_SGI_SN2 || ARM64 +config ACPI_HMAT + bool "ACPI Heterogeneous Memory Attribute Table Support" + depends on ACPI_NUMA + help + Parses representation of the ACPI Heterogeneous Memory Attributes + Table (HMAT) and set the memory node relationships and access + attributes. + config ACPI_CUSTOM_DSDT_FILE string "Custom DSDT Table file to include" default "" diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index edc039313cd6..b5e13499f88b 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -55,6 +55,7 @@ acpi-$(CONFIG_X86) += x86/apple.o acpi-$(CONFIG_X86) += x86/utils.o acpi-$(CONFIG_DEBUG_FS) += debugfs.o acpi-$(CONFIG_ACPI_NUMA) += numa.o +acpi-$(CONFIG_ACPI_HMAT) += hmat.o acpi-$(CONFIG_ACPI_PROCFS_POWER) += cm_sbs.o acpi-y += acpi_lpat.o acpi-$(CONFIG_ACPI_LPIT) += acpi_lpit.o diff --git a/drivers/acpi/hmat.c b/drivers/acpi/hmat.c new file mode 100644 index 000000000000..ef3881f0f370 --- /dev/null +++ b/drivers/acpi/hmat.c @@ -0,0 +1,192 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Heterogeneous Memory Attributes Table (HMAT) representation + * + * Copyright (c) 2018, Intel Corporation. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static __init const char *hmat_data_type(u8 type) +{ + switch (type) { + case ACPI_HMAT_ACCESS_LATENCY: + return "Access Latency"; + case ACPI_HMAT_READ_LATENCY: + return "Read Latency"; + case ACPI_HMAT_WRITE_LATENCY: + return "Write Latency"; + case ACPI_HMAT_ACCESS_BANDWIDTH: + return "Access Bandwidth"; + case ACPI_HMAT_READ_BANDWIDTH: + return "Read Bandwidth"; + case ACPI_HMAT_WRITE_BANDWIDTH: + return "Write Bandwidth"; + default: + return "Reserved"; + }; +} + +static __init const char *hmat_data_type_suffix(u8 type) +{ + switch (type) { + case ACPI_HMAT_ACCESS_LATENCY: + case ACPI_HMAT_READ_LATENCY: + case ACPI_HMAT_WRITE_LATENCY: + return " nsec"; + case ACPI_HMAT_ACCESS_BANDWIDTH: + case ACPI_HMAT_READ_BANDWIDTH: + case ACPI_HMAT_WRITE_BANDWIDTH: + return " MB/s"; + default: + return ""; + }; +} + +static __init int hmat_parse_locality(union acpi_subtable_headers *header, + const unsigned long end) +{ + struct acpi_hmat_locality *loc = (void *)header; + unsigned int init, targ, total_size, ipds, tpds; + u32 *inits, *targs, value; + u16 *entries; + u8 type; + + if (loc->header.length < sizeof(*loc)) { + pr_err("HMAT: Unexpected locality header length: %d\n", + loc->header.length); + return -EINVAL; + } + + type = loc->data_type; + ipds = loc->number_of_initiator_Pds; + tpds = loc->number_of_target_Pds; + total_size = sizeof(*loc) + sizeof(*entries) * ipds * tpds + + sizeof(*inits) * ipds + sizeof(*targs) * tpds; + if (loc->header.length < total_size) { + pr_err("HMAT: Unexpected locality header length:%d, minimum required:%d\n", + loc->header.length, total_size); + return -EINVAL; + } + + pr_info("HMAT: Locality: Flags:%02x Type:%s Initiator Domains:%d Target Domains:%d Base:%lld\n", + loc->flags, hmat_data_type(type), ipds, tpds, + loc->entry_base_unit); + + inits = (u32 *)(loc + 1); + targs = &inits[ipds]; + entries = (u16 *)(&targs[tpds]); + for (targ = 0; targ < tpds; targ++) { + for (init = 0; init < ipds; init++) { + value = entries[init * tpds + targ]; + value = (value * loc->entry_base_unit) / 10; + pr_info(" Initiator-Target[%d-%d]:%d%s\n", + inits[init], targs[targ], value, + hmat_data_type_suffix(type)); + } + } + return 0; +} + +static __init int hmat_parse_cache(union acpi_subtable_headers *header, + const unsigned long end) +{ + struct acpi_hmat_cache *cache = (void *)header; + u32 attrs; + + if (cache->header.length < sizeof(*cache)) { + pr_err("HMAT: Unexpected cache header length: %d\n", + cache->header.length); + return -EINVAL; + } + + attrs = cache->cache_attributes; + pr_info("HMAT: Cache: Domain:%d Size:%llu Attrs:%08x SMBIOS Handles:%d\n", + cache->memory_PD, cache->cache_size, attrs, + cache->number_of_SMBIOShandles); + + return 0; +} + +static int __init hmat_parse_address_range(union acpi_subtable_headers *header, + const unsigned long end) +{ + struct acpi_hmat_address_range *spa = (void *)header; + + if (spa->header.length != sizeof(*spa)) { + pr_err("HMAT: Unexpected address range header length: %d\n", + spa->header.length); + return -EINVAL; + } + pr_info("HMAT: Memory (%#llx length %#llx) Flags:%04x Processor Domain:%d Memory Domain:%d\n", + spa->physical_address_base, spa->physical_address_length, + spa->flags, spa->processor_PD, spa->memory_PD); + return 0; +} + +static int __init hmat_parse_subtable(union acpi_subtable_headers *header, + const unsigned long end) +{ + struct acpi_hmat_structure *hdr = (void *)header; + + if (!hdr) + return -EINVAL; + + switch (hdr->type) { + case ACPI_HMAT_TYPE_ADDRESS_RANGE: + return hmat_parse_address_range(header, end); + case ACPI_HMAT_TYPE_LOCALITY: + return hmat_parse_locality(header, end); + case ACPI_HMAT_TYPE_CACHE: + return hmat_parse_cache(header, end); + default: + return -EINVAL; + } +} + +static __init int parse_noop(struct acpi_table_header *table) +{ + return 0; +} + +static __init int hmat_init(void) +{ + struct acpi_subtable_proc subtable_proc; + struct acpi_table_header *tbl; + enum acpi_hmat_type i; + acpi_status status; + + if (srat_disabled()) + return 0; + + status = acpi_get_table(ACPI_SIG_HMAT, 0, &tbl); + if (ACPI_FAILURE(status)) + return 0; + + if (acpi_table_parse(ACPI_SIG_HMAT, parse_noop)) + goto out_put; + + memset(&subtable_proc, 0, sizeof(subtable_proc)); + subtable_proc.handler = hmat_parse_subtable; + for (i = ACPI_HMAT_TYPE_ADDRESS_RANGE; i < ACPI_HMAT_TYPE_RESERVED; i++) { + subtable_proc.id = i; + if (acpi_table_parse_entries_array(ACPI_SIG_HMAT, + sizeof(struct acpi_table_hmat), + &subtable_proc, 1, 0) < 0) + goto out_put; + } + out_put: + acpi_put_table(tbl); + return 0; +} +subsys_initcall(hmat_init); diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c index e9643b4267c7..bc1addf715dc 100644 --- a/drivers/acpi/tables.c +++ b/drivers/acpi/tables.c @@ -51,6 +51,7 @@ static int acpi_apic_instance __initdata; enum acpi_subtable_type { ACPI_SUBTABLE_COMMON, + ACPI_SUBTABLE_HMAT, }; struct acpi_subtable_entry { @@ -232,6 +233,8 @@ acpi_get_entry_type(struct acpi_subtable_entry *entry) switch (entry->type) { case ACPI_SUBTABLE_COMMON: return entry->hdr->common.type; + case ACPI_SUBTABLE_HMAT: + return entry->hdr->hmat.type; } return 0; } @@ -242,6 +245,8 @@ acpi_get_entry_length(struct acpi_subtable_entry *entry) switch (entry->type) { case ACPI_SUBTABLE_COMMON: return entry->hdr->common.length; + case ACPI_SUBTABLE_HMAT: + return entry->hdr->hmat.length; } return 0; } @@ -252,6 +257,8 @@ acpi_get_subtable_header_length(struct acpi_subtable_entry *entry) switch (entry->type) { case ACPI_SUBTABLE_COMMON: return sizeof(entry->hdr->common); + case ACPI_SUBTABLE_HMAT: + return sizeof(entry->hdr->hmat); } return 0; } @@ -259,6 +266,8 @@ acpi_get_subtable_header_length(struct acpi_subtable_entry *entry) static enum acpi_subtable_type __init acpi_get_subtable_type(char *id) { + if (strncmp(id, ACPI_SIG_HMAT, 4) == 0) + return ACPI_SUBTABLE_HMAT; return ACPI_SUBTABLE_COMMON; } diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 18805a967c70..4373f5ba0f95 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -143,6 +143,7 @@ enum acpi_address_range_id { /* Table Handlers */ union acpi_subtable_headers { struct acpi_subtable_header common; + struct acpi_hmat_structure hmat; }; typedef int (*acpi_tbl_table_handler)(struct acpi_table_header *table); From patchwork Tue Dec 11 01:03:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10722949 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9F6B8112E for ; Tue, 11 Dec 2018 01:06:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8CA442A0E6 for ; Tue, 11 Dec 2018 01:06:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7E00F2A4ED; Tue, 11 Dec 2018 01:06:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D4ED92A0E6 for ; Tue, 11 Dec 2018 01:06:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729864AbeLKBGn (ORCPT ); Mon, 10 Dec 2018 20:06:43 -0500 Received: from mga02.intel.com ([134.134.136.20]:24625 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729427AbeLKBFr (ORCPT ); Mon, 10 Dec 2018 20:05:47 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Dec 2018 17:05:46 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,340,1539673200"; d="scan'208";a="117705176" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by orsmga001.jf.intel.com with ESMTP; 10 Dec 2018 17:05:45 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv2 03/12] node: Link memory nodes to their compute nodes Date: Mon, 10 Dec 2018 18:03:01 -0700 Message-Id: <20181211010310.8551-4-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20181211010310.8551-1-keith.busch@intel.com> References: <20181211010310.8551-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Memory-only nodes will often have affinity to a compute node that can initiate memory access. Platforms may have various ways to express that locality. The nodes that initiate memory requests closest to a memory target node may be considered to have 'primary' access. In preparation for these systems, provide a way for the kernel to link the target memory node to its primary initiator compute nodes with symlinks to each other. Also add memory target and initiator node masks showing the same relationship. The following example show the node's new sysfs hierarchy setup for memory node 'Y' with primary access to commpute node 'X': # ls -l /sys/devices/system/node/nodeX/primary_target* /sys/devices/system/node/nodeX/primary_targetY -> ../nodeY # ls -l /sys/devices/system/node/nodeY/primary_initiator* /sys/devices/system/node/nodeY/primary_initiatorX -> ../nodeX A single initiator may have primary access to multiple memory targets, and the targets may also have primary access from multiple memory initiators. The following demonstrates how this may look for a theoretical system with 8 memory nodes and 2 compute nodes. # cat /sys/devices/system/node/node0/primary_mem_nodelist 0,2,4,6 # cat /sys/devices/system/node/node1/primary_mem_nodelist 1,3,5,7 And then going the other way to identify the cpu lists of a node's primary targets: # cat /sys/devices/system/node/node0/primary_target*/primary_cpu_nodelist | tr "\n" "," 0,0,0,0 # cat /sys/devices/system/node/node1/primary_target*/primary_cpu_nodelist 1,1,1,1 As an example of what you may be able to do with this, let's say we have a PCIe storage device, /dev/nvme0n1, attached to a particular node, and we want to run IO to it using only CPUs and Memory that share primary access. The following shell script is such an example to achieve that goal: #!/bin/bash DEV_NODE=/sys/devices/system/node/node$(cat /sys/block/nvme0n1/device/device/numa_node) numactl --membind=$(cat ${DEV_NODE}/primary_mem_nodelist) \ --cpunodebind=$(cat ${DEV_NODE}/primary_cpu_nodelist) \ -- fio --filename=/dev/nvme0n1 --bs=4k --name=access-test Signed-off-by: Keith Busch --- drivers/base/node.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/node.h | 4 +++ 2 files changed, 89 insertions(+) diff --git a/drivers/base/node.c b/drivers/base/node.c index 86d6cd92ce3d..50412ce3fd7d 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -56,6 +56,46 @@ static inline ssize_t node_read_cpulist(struct device *dev, return node_read_cpumap(dev, true, buf); } +static ssize_t node_read_nodemap(nodemask_t *mask, bool list, char *buf) +{ + return list ? scnprintf(buf, PAGE_SIZE - 1, "%*pbl\n", + nodemask_pr_args(mask)) : + scnprintf(buf, PAGE_SIZE - 1, "%*pb\n", + nodemask_pr_args(mask)); +} + +static ssize_t primary_mem_nodelist_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct node *n = to_node(dev); + return node_read_nodemap(&n->primary_mem_nodes, true, buf); +} + +static ssize_t primary_mem_nodemask_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct node *n = to_node(dev); + return node_read_nodemap(&n->primary_mem_nodes, false, buf); +} + +static ssize_t primary_cpu_nodelist_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct node *n = to_node(dev); + return node_read_nodemap(&n->primary_cpu_nodes, true, buf); +} + +static ssize_t primary_cpu_nodemask_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct node *n = to_node(dev); + return node_read_nodemap(&n->primary_cpu_nodes, false, buf); +} + +static DEVICE_ATTR_RO(primary_mem_nodelist); +static DEVICE_ATTR_RO(primary_mem_nodemask); +static DEVICE_ATTR_RO(primary_cpu_nodemask); +static DEVICE_ATTR_RO(primary_cpu_nodelist); static DEVICE_ATTR(cpumap, S_IRUGO, node_read_cpumask, NULL); static DEVICE_ATTR(cpulist, S_IRUGO, node_read_cpulist, NULL); @@ -240,6 +280,10 @@ static struct attribute *node_dev_attrs[] = { &dev_attr_numastat.attr, &dev_attr_distance.attr, &dev_attr_vmstat.attr, + &dev_attr_primary_mem_nodelist.attr, + &dev_attr_primary_mem_nodemask.attr, + &dev_attr_primary_cpu_nodemask.attr, + &dev_attr_primary_cpu_nodelist.attr, NULL }; ATTRIBUTE_GROUPS(node_dev); @@ -372,6 +416,42 @@ int register_cpu_under_node(unsigned int cpu, unsigned int nid) kobject_name(&node_devices[nid]->dev.kobj)); } +int register_memory_node_under_compute_node(unsigned int m, unsigned int p) +{ + struct node *init, *targ; + char initiator[28]; /* "primary_initiator4294967295\0" */ + char target[25]; /* "primary_target4294967295\0" */ + int ret; + + if (!node_online(p) || !node_online(m)) + return -ENODEV; + if (m == p) + return 0; + + snprintf(initiator, sizeof(initiator), "primary_initiator%d", p); + snprintf(target, sizeof(target), "primary_target%d", m); + + init = node_devices[p]; + targ = node_devices[m]; + + ret = sysfs_create_link(&init->dev.kobj, &targ->dev.kobj, target); + if (ret) + return ret; + + ret = sysfs_create_link(&targ->dev.kobj, &init->dev.kobj, initiator); + if (ret) + goto err; + + node_set(m, init->primary_mem_nodes); + node_set(p, targ->primary_cpu_nodes); + + return 0; + err: + sysfs_remove_link(&node_devices[p]->dev.kobj, + kobject_name(&node_devices[m]->dev.kobj)); + return ret; +} + int unregister_cpu_under_node(unsigned int cpu, unsigned int nid) { struct device *obj; @@ -580,6 +660,11 @@ int __register_one_node(int nid) register_cpu_under_node(cpu, nid); } + if (node_state(nid, N_MEMORY)) + node_set(nid, node_devices[nid]->primary_mem_nodes); + if (node_state(nid, N_CPU)) + node_set(nid, node_devices[nid]->primary_cpu_nodes); + /* initialize work queue for memory hot plug */ init_node_hugetlb_work(nid); diff --git a/include/linux/node.h b/include/linux/node.h index 257bb3d6d014..3d06de045cbf 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -21,6 +21,8 @@ struct node { struct device dev; + nodemask_t primary_mem_nodes; + nodemask_t primary_cpu_nodes; #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HUGETLBFS) struct work_struct node_work; @@ -75,6 +77,8 @@ extern int register_mem_sect_under_node(struct memory_block *mem_blk, extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk, unsigned long phys_index); +extern int register_memory_node_under_compute_node(unsigned int m, unsigned int p); + #ifdef CONFIG_HUGETLBFS extern void register_hugetlbfs_with_node(node_registration_func_t doregister, node_registration_func_t unregister); From patchwork Tue Dec 11 01:03:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10722937 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B9F7891E for ; Tue, 11 Dec 2018 01:06:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AB9012A0E6 for ; Tue, 11 Dec 2018 01:06:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9F8212A50D; Tue, 11 Dec 2018 01:06:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5320D2A0E6 for ; Tue, 11 Dec 2018 01:06:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729517AbeLKBFw (ORCPT ); Mon, 10 Dec 2018 20:05:52 -0500 Received: from mga02.intel.com ([134.134.136.20]:24637 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729434AbeLKBFr (ORCPT ); Mon, 10 Dec 2018 20:05:47 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Dec 2018 17:05:47 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,340,1539673200"; d="scan'208";a="117705179" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by orsmga001.jf.intel.com with ESMTP; 10 Dec 2018 17:05:46 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv2 04/12] Documentation/ABI: Add new node sysfs attributes Date: Mon, 10 Dec 2018 18:03:02 -0700 Message-Id: <20181211010310.8551-5-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20181211010310.8551-1-keith.busch@intel.com> References: <20181211010310.8551-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add the entries for primary cpu and memory node attributes. Signed-off-by: Keith Busch --- Documentation/ABI/stable/sysfs-devices-node | 34 ++++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node index 3e90e1f3bf0a..8430d5b261f6 100644 --- a/Documentation/ABI/stable/sysfs-devices-node +++ b/Documentation/ABI/stable/sysfs-devices-node @@ -90,4 +90,36 @@ Date: December 2009 Contact: Lee Schermerhorn Description: The node's huge page size control/query attributes. - See Documentation/admin-guide/mm/hugetlbpage.rst \ No newline at end of file + See Documentation/admin-guide/mm/hugetlbpage.rst + +What: /sys/devices/system/node/nodeX/primary_cpu_nodelist +Date: December 2018 +Contact: Keith Busch +Description: + The node list of CPUs that have primary access to this node's + memory. CPUs not in the list accessing this node's memory may + encounter a performance penalty. + +What: /sys/devices/system/node/nodeX/primary_cpu_nodemask +Date: December 2018 +Contact: Keith Busch +Description: + The node map for CPUs that have primary access to this node's + memory. CPUs not in the list accessing this node's memory may + encounter a performance penalty. + +What: /sys/devices/system/node/nodeX/primary_mem_nodelist +Date: December 2018 +Contact: Keith Busch +Description: + The list of memory nodes that this node has primary access. + Memory accesses from this node to nodes not in this list may + encounter a performance penalty. + +What: /sys/devices/system/node/nodeX/primary_mem_nodemask +Date: December 2018 +Contact: Keith Busch +Description: + The map of memory nodes that this node has primary access. + Memory accesses from this node to nodes not in this map may + encounter a performance penalty. From patchwork Tue Dec 11 01:03:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10722941 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0D2D4112E for ; Tue, 11 Dec 2018 01:06:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F174B2A0E6 for ; Tue, 11 Dec 2018 01:06:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E5B222A50D; Tue, 11 Dec 2018 01:06:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 650F52A0E6 for ; Tue, 11 Dec 2018 01:06:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729499AbeLKBFw (ORCPT ); Mon, 10 Dec 2018 20:05:52 -0500 Received: from mga02.intel.com ([134.134.136.20]:24609 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729462AbeLKBFu (ORCPT ); Mon, 10 Dec 2018 20:05:50 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Dec 2018 17:05:48 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,340,1539673200"; d="scan'208";a="117705183" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by orsmga001.jf.intel.com with ESMTP; 10 Dec 2018 17:05:47 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv2 05/12] acpi/hmat: Register processor domain to its memory Date: Mon, 10 Dec 2018 18:03:03 -0700 Message-Id: <20181211010310.8551-6-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20181211010310.8551-1-keith.busch@intel.com> References: <20181211010310.8551-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If the HMAT Subsystem Address Range provides a valid processor proximity domain for a memory domain, or a processor domain with the highest performing access, register the node as one of the initiator's primary memory targets so this relationship will be visible under the node's sysfs directory. Since HMAT requires valid address ranges have an equivalent SRAT entry, verify each memory target satisfies this requirement. Signed-off-by: Keith Busch --- drivers/acpi/hmat.c | 149 +++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 142 insertions(+), 7 deletions(-) diff --git a/drivers/acpi/hmat.c b/drivers/acpi/hmat.c index ef3881f0f370..5d8747ad025f 100644 --- a/drivers/acpi/hmat.c +++ b/drivers/acpi/hmat.c @@ -17,6 +17,43 @@ #include #include +static LIST_HEAD(targets); + +struct memory_target { + struct list_head node; + unsigned int memory_pxm; + unsigned long p_nodes[BITS_TO_LONGS(MAX_NUMNODES)]; +}; + +static __init struct memory_target *find_mem_target(unsigned int m) +{ + struct memory_target *t; + + list_for_each_entry(t, &targets, node) + if (t->memory_pxm == m) + return t; + return NULL; +} + +static __init void alloc_memory_target(unsigned int mem_pxm) +{ + struct memory_target *t; + + if (pxm_to_node(mem_pxm) == NUMA_NO_NODE) + return; + + t = find_mem_target(mem_pxm); + if (t) + return; + + t = kzalloc(sizeof(*t), GFP_KERNEL); + if (!t) + return; + + t->memory_pxm = mem_pxm; + list_add_tail(&t->node, &targets); +} + static __init const char *hmat_data_type(u8 type) { switch (type) { @@ -53,11 +90,30 @@ static __init const char *hmat_data_type_suffix(u8 type) }; } +static __init void hmat_update_access(u8 type, u32 value, u32 *best) +{ + switch (type) { + case ACPI_HMAT_ACCESS_LATENCY: + case ACPI_HMAT_READ_LATENCY: + case ACPI_HMAT_WRITE_LATENCY: + if (!*best || *best > value) + *best = value; + break; + case ACPI_HMAT_ACCESS_BANDWIDTH: + case ACPI_HMAT_READ_BANDWIDTH: + case ACPI_HMAT_WRITE_BANDWIDTH: + if (!*best || *best < value) + *best = value; + break; + } +} + static __init int hmat_parse_locality(union acpi_subtable_headers *header, const unsigned long end) { + struct memory_target *t; struct acpi_hmat_locality *loc = (void *)header; - unsigned int init, targ, total_size, ipds, tpds; + unsigned int init, targ, pass, p_node, total_size, ipds, tpds; u32 *inits, *targs, value; u16 *entries; u8 type; @@ -87,12 +143,28 @@ static __init int hmat_parse_locality(union acpi_subtable_headers *header, targs = &inits[ipds]; entries = (u16 *)(&targs[tpds]); for (targ = 0; targ < tpds; targ++) { - for (init = 0; init < ipds; init++) { - value = entries[init * tpds + targ]; - value = (value * loc->entry_base_unit) / 10; - pr_info(" Initiator-Target[%d-%d]:%d%s\n", - inits[init], targs[targ], value, - hmat_data_type_suffix(type)); + u32 best = 0; + + t = find_mem_target(targs[targ]); + for (pass = 0; pass < 2; pass++) { + for (init = 0; init < ipds; init++) { + value = entries[init * tpds + targ]; + value = (value * loc->entry_base_unit) / 10; + + if (!pass) { + hmat_update_access(type, value, &best); + pr_info(" Initiator-Target[%d-%d]:%d%s\n", + inits[init], targs[targ], value, + hmat_data_type_suffix(type)); + continue; + } + + if (!t) + continue; + p_node = pxm_to_node(inits[init]); + if (p_node != NUMA_NO_NODE && value == best) + set_bit(p_node, t->p_nodes); + } } } return 0; @@ -122,6 +194,7 @@ static int __init hmat_parse_address_range(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_hmat_address_range *spa = (void *)header; + struct memory_target *t = NULL; if (spa->header.length != sizeof(*spa)) { pr_err("HMAT: Unexpected address range header length: %d\n", @@ -131,6 +204,23 @@ static int __init hmat_parse_address_range(union acpi_subtable_headers *header, pr_info("HMAT: Memory (%#llx length %#llx) Flags:%04x Processor Domain:%d Memory Domain:%d\n", spa->physical_address_base, spa->physical_address_length, spa->flags, spa->processor_PD, spa->memory_PD); + + if (spa->flags & ACPI_HMAT_MEMORY_PD_VALID) { + t = find_mem_target(spa->memory_PD); + if (!t) { + pr_warn("HMAT: Memory Domain missing from SRAT\n"); + return -EINVAL; + } + } + if (t && spa->flags & ACPI_HMAT_PROCESSOR_PD_VALID) { + int p_node = pxm_to_node(spa->processor_PD); + + if (p_node == NUMA_NO_NODE) { + pr_warn("HMAT: Invalid Processor Domain\n"); + return -EINVAL; + } + set_bit(p_node, t->p_nodes); + } return 0; } @@ -154,6 +244,33 @@ static int __init hmat_parse_subtable(union acpi_subtable_headers *header, } } +static __init int srat_parse_mem_affinity(union acpi_subtable_headers *header, + const unsigned long end) +{ + struct acpi_srat_mem_affinity *ma = (void *)header; + + if (!ma) + return -EINVAL; + if (!(ma->flags & ACPI_SRAT_MEM_ENABLED)) + return 0; + alloc_memory_target(ma->proximity_domain); + return 0; +} + +static __init void hmat_register_targets(void) +{ + struct memory_target *t, *next; + unsigned m, p; + + list_for_each_entry_safe(t, next, &targets, node) { + list_del(&t->node); + m = pxm_to_node(t->memory_pxm); + for_each_set_bit(p, t->p_nodes, MAX_NUMNODES) + register_memory_node_under_compute_node(m, p); + kfree(t); + } +} + static __init int parse_noop(struct acpi_table_header *table) { return 0; @@ -169,6 +286,23 @@ static __init int hmat_init(void) if (srat_disabled()) return 0; + status = acpi_get_table(ACPI_SIG_SRAT, 0, &tbl); + if (ACPI_FAILURE(status)) + return 0; + + if (acpi_table_parse(ACPI_SIG_SRAT, parse_noop)) + goto out_put; + + memset(&subtable_proc, 0, sizeof(subtable_proc)); + subtable_proc.id = ACPI_SRAT_TYPE_MEMORY_AFFINITY; + subtable_proc.handler = srat_parse_mem_affinity; + + if (acpi_table_parse_entries_array(ACPI_SIG_SRAT, + sizeof(struct acpi_table_srat), + &subtable_proc, 1, 0) < 0) + goto out_put; + acpi_put_table(tbl); + status = acpi_get_table(ACPI_SIG_HMAT, 0, &tbl); if (ACPI_FAILURE(status)) return 0; @@ -185,6 +319,7 @@ static __init int hmat_init(void) &subtable_proc, 1, 0) < 0) goto out_put; } + hmat_register_targets(); out_put: acpi_put_table(tbl); return 0; From patchwork Tue Dec 11 01:03:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10722945 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0E3A891E for ; Tue, 11 Dec 2018 01:06:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F2AD92A0E6 for ; Tue, 11 Dec 2018 01:06:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E6A6A2A4ED; Tue, 11 Dec 2018 01:06:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7AAB22A0E6 for ; Tue, 11 Dec 2018 01:06:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729474AbeLKBFw (ORCPT ); Mon, 10 Dec 2018 20:05:52 -0500 Received: from mga02.intel.com ([134.134.136.20]:24625 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729289AbeLKBFv (ORCPT ); Mon, 10 Dec 2018 20:05:51 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Dec 2018 17:05:49 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,340,1539673200"; d="scan'208";a="117705189" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by orsmga001.jf.intel.com with ESMTP; 10 Dec 2018 17:05:48 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv2 06/12] node: Add heterogenous memory performance Date: Mon, 10 Dec 2018 18:03:04 -0700 Message-Id: <20181211010310.8551-7-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20181211010310.8551-1-keith.busch@intel.com> References: <20181211010310.8551-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Heterogeneous memory systems provide memory nodes with different latency and bandwidth performance attributes. Create an interface for the kernel to register the attributes for the primary memory initiators under the memory node. If the system provides this information, applications can then query the node attributes when deciding which node to request memory. The following example shows the new sysfs hierarchy for a node exporting performance attributes: # tree /sys/devices/system/node/nodeY/primary_initiator_access /sys/devices/system/node/nodeY/primary_initiator_access |-- read_bandwidth |-- read_latency |-- write_bandwidth `-- write_latency The bandwidth is exported as MB/s and latency is reported in nanoseconds. Memory accesses from an initiator node that is not one of the memory's primary compute nodes may encounter a performance penalty that does not match the performance reported for primary memory initiators. As an example of what you may be able to do with this, let's say we have a PCIe storage device, /dev/nvme0n1, attached to a particular node, and we want to run IO to it using the fastest memory with primary access from the same node as that PCIe device. The following shell script is such an example to achieve that goal: #!/bin/bash DEV_NODE=/sys/devices/system/node/node$(cat /sys/block/nvme0n1/device/device/numa_node) BEST_WRITE_BW=0 BEST_MEM_NODE=0 for i in $(ls -d ${DEV_NODE}/primary_target*); do tmp=$(cat ${i}/primary_initiator_access/write_bandwidth); if ((${tmp} > ${BEST_WRITE_BW})); then BEST_WRITE_BW=${tmp} BEST_MEM_NODE=$(echo ${i} | sed s/^.*primary_target//g) fi done numactl --membind=${BEST_MEM_NODE} \ --cpunodebind=$(cat ${DEV_NODE}/primary_cpu_nodelist) \ -- fio --filename=/dev/nvme0n1 --bs=4k --name=access-test Signed-off-by: Keith Busch --- drivers/base/Kconfig | 8 ++++++++ drivers/base/node.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/node.h | 22 ++++++++++++++++++++++ 3 files changed, 74 insertions(+) diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 3e63a900b330..6014980238e8 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -149,6 +149,14 @@ config DEBUG_TEST_DRIVER_REMOVE unusable. You should say N here unless you are explicitly looking to test this functionality. +config HMEM_REPORTING + bool + default y + depends on NUMA + help + Enable reporting for heterogenous memory access attributes under + their non-uniform memory nodes. + source "drivers/base/test/Kconfig" config SYS_HYPERVISOR diff --git a/drivers/base/node.c b/drivers/base/node.c index 50412ce3fd7d..768612c06c56 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -99,6 +99,50 @@ static DEVICE_ATTR_RO(primary_cpu_nodelist); static DEVICE_ATTR(cpumap, S_IRUGO, node_read_cpumask, NULL); static DEVICE_ATTR(cpulist, S_IRUGO, node_read_cpulist, NULL); +#ifdef CONFIG_HMEM_REPORTING +const struct attribute_group node_access_attrs_group; + +#define ACCESS_ATTR(name) \ +static ssize_t name##_show(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + return sprintf(buf, "%d\n", to_node(dev)->hmem_attrs.name); \ +} \ +static DEVICE_ATTR_RO(name); + +ACCESS_ATTR(read_bandwidth) +ACCESS_ATTR(read_latency) +ACCESS_ATTR(write_bandwidth) +ACCESS_ATTR(write_latency) + +static struct attribute *access_attrs[] = { + &dev_attr_read_bandwidth.attr, + &dev_attr_read_latency.attr, + &dev_attr_write_bandwidth.attr, + &dev_attr_write_latency.attr, + NULL, +}; + +const struct attribute_group node_access_attrs_group = { + .name = "primary_initiator_access", + .attrs = access_attrs, +}; + +void node_set_perf_attrs(unsigned int nid, struct node_hmem_attrs *hmem_attrs) +{ + struct node *node; + + if (WARN_ON_ONCE(!node_online(nid))) + return; + node = node_devices[nid]; + node->hmem_attrs = *hmem_attrs; + if (sysfs_create_group(&node->dev.kobj, &node_access_attrs_group)) + pr_info("failed to add performance attribute group to node %d\n", + nid); +} +#endif + #define K(x) ((x) << (PAGE_SHIFT - 10)) static ssize_t node_read_meminfo(struct device *dev, struct device_attribute *attr, char *buf) diff --git a/include/linux/node.h b/include/linux/node.h index 3d06de045cbf..71abaf0d4f4b 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -17,8 +17,27 @@ #include #include +#include #include +#ifdef CONFIG_HMEM_REPORTING +/** + * struct node_hmem_attrs - heterogeneous memory performance attributes + * + * read_bandwidth: Read bandwidth in MB/s + * write_bandwidth: Write bandwidth in MB/s + * read_latency: Read latency in nanoseconds + * write_latency: Write latency in nanoseconds + */ +struct node_hmem_attrs { + unsigned int read_bandwidth; + unsigned int write_bandwidth; + unsigned int read_latency; + unsigned int write_latency; +}; +void node_set_perf_attrs(unsigned int nid, struct node_hmem_attrs *hmem_attrs); +#endif + struct node { struct device dev; nodemask_t primary_mem_nodes; @@ -27,6 +46,9 @@ struct node { #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HUGETLBFS) struct work_struct node_work; #endif +#ifdef CONFIG_HMEM_REPORTING + struct node_hmem_attrs hmem_attrs; +#endif }; struct memory_block; From patchwork Tue Dec 11 01:03:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10722939 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EC1C1112E for ; Tue, 11 Dec 2018 01:06:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DCF8A2A0E6 for ; Tue, 11 Dec 2018 01:06:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D13602A4ED; Tue, 11 Dec 2018 01:06:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7F60A2A170 for ; Tue, 11 Dec 2018 01:06:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729511AbeLKBFw (ORCPT ); Mon, 10 Dec 2018 20:05:52 -0500 Received: from mga02.intel.com ([134.134.136.20]:24637 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729463AbeLKBFu (ORCPT ); Mon, 10 Dec 2018 20:05:50 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Dec 2018 17:05:50 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,340,1539673200"; d="scan'208";a="117705195" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by orsmga001.jf.intel.com with ESMTP; 10 Dec 2018 17:05:49 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv2 07/12] Documentation/ABI: Add node performance attributes Date: Mon, 10 Dec 2018 18:03:05 -0700 Message-Id: <20181211010310.8551-8-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20181211010310.8551-1-keith.busch@intel.com> References: <20181211010310.8551-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add descriptions for primary memory initiator performance access attributes. Signed-off-by: Keith Busch --- Documentation/ABI/stable/sysfs-devices-node | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node index 8430d5b261f6..6cdb0643f9fd 100644 --- a/Documentation/ABI/stable/sysfs-devices-node +++ b/Documentation/ABI/stable/sysfs-devices-node @@ -123,3 +123,31 @@ Description: The map of memory nodes that this node has primary access. Memory accesses from this node to nodes not in this map may encounter a performance penalty. + +What: /sys/devices/system/node/nodeX/primary_initiator_access/read_bandwidth +Date: December 2018 +Contact: Keith Busch +Description: + Read bandwidth in MB/s available to memory initiators found in + primary_cpu_nodemask. + +What: /sys/devices/system/node/nodeX/primary_initiator_access/read_latency +Date: December 2018 +Contact: Keith Busch +Description: + Read latency in nanosecondss available to memory initiators + found in primary_cpu_nodemask. + +What: /sys/devices/system/node/nodeX/primary_initiator_access/write_bandwidth +Date: December 2018 +Contact: Keith Busch +Description: + Write bandwidth in MB/s available to memory initiators found in + primary_cpu_nodemask. + +What: /sys/devices/system/node/nodeX/primary_initiator_access/write_latency +Date: December 2018 +Contact: Keith Busch +Description: + Write latency in nanosecondss available to memory initiators + found in primary_cpu_nodemask. From patchwork Tue Dec 11 01:03:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10722943 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 541E117FE for ; Tue, 11 Dec 2018 01:06:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 44D612A0E6 for ; Tue, 11 Dec 2018 01:06:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 38E272A4ED; Tue, 11 Dec 2018 01:06:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DD5D02A170 for ; Tue, 11 Dec 2018 01:06:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729486AbeLKBFw (ORCPT ); Mon, 10 Dec 2018 20:05:52 -0500 Received: from mga02.intel.com ([134.134.136.20]:24625 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729465AbeLKBFv (ORCPT ); Mon, 10 Dec 2018 20:05:51 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Dec 2018 17:05:51 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,340,1539673200"; d="scan'208";a="117705205" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by orsmga001.jf.intel.com with ESMTP; 10 Dec 2018 17:05:50 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv2 08/12] acpi/hmat: Register performance attributes Date: Mon, 10 Dec 2018 18:03:06 -0700 Message-Id: <20181211010310.8551-9-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20181211010310.8551-1-keith.busch@intel.com> References: <20181211010310.8551-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Save the best performance access attributes and register these with the memory's node if HMAT provides the locality table. Signed-off-by: Keith Busch --- drivers/acpi/Kconfig | 1 + drivers/acpi/hmat.c | 34 ++++++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+) diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index 9a05af3a18cf..6b5f6ca690af 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -330,6 +330,7 @@ config ACPI_NUMA config ACPI_HMAT bool "ACPI Heterogeneous Memory Attribute Table Support" depends on ACPI_NUMA + select HMEM_REPORTING help Parses representation of the ACPI Heterogeneous Memory Attributes Table (HMAT) and set the memory node relationships and access diff --git a/drivers/acpi/hmat.c b/drivers/acpi/hmat.c index 5d8747ad025f..40bc83f4b593 100644 --- a/drivers/acpi/hmat.c +++ b/drivers/acpi/hmat.c @@ -23,6 +23,8 @@ struct memory_target { struct list_head node; unsigned int memory_pxm; unsigned long p_nodes[BITS_TO_LONGS(MAX_NUMNODES)]; + bool hmem_valid; + struct node_hmem_attrs hmem; }; static __init struct memory_target *find_mem_target(unsigned int m) @@ -108,6 +110,34 @@ static __init void hmat_update_access(u8 type, u32 value, u32 *best) } } +static __init void hmat_update_target(struct memory_target *t, u8 type, + u32 value) +{ + switch (type) { + case ACPI_HMAT_ACCESS_LATENCY: + t->hmem.read_latency = value; + t->hmem.write_latency = value; + break; + case ACPI_HMAT_READ_LATENCY: + t->hmem.read_latency = value; + break; + case ACPI_HMAT_WRITE_LATENCY: + t->hmem.write_latency = value; + break; + case ACPI_HMAT_ACCESS_BANDWIDTH: + t->hmem.read_bandwidth = value; + t->hmem.write_bandwidth = value; + break; + case ACPI_HMAT_READ_BANDWIDTH: + t->hmem.read_bandwidth = value; + break; + case ACPI_HMAT_WRITE_BANDWIDTH: + t->hmem.write_bandwidth = value; + break; + } + t->hmem_valid = true; +} + static __init int hmat_parse_locality(union acpi_subtable_headers *header, const unsigned long end) { @@ -166,6 +196,8 @@ static __init int hmat_parse_locality(union acpi_subtable_headers *header, set_bit(p_node, t->p_nodes); } } + if (t && best) + hmat_update_target(t, type, best); } return 0; } @@ -267,6 +299,8 @@ static __init void hmat_register_targets(void) m = pxm_to_node(t->memory_pxm); for_each_set_bit(p, t->p_nodes, MAX_NUMNODES) register_memory_node_under_compute_node(m, p); + if (t->hmem_valid) + node_set_perf_attrs(m, &t->hmem); kfree(t); } } From patchwork Tue Dec 11 01:03:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10722935 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E150891E for ; Tue, 11 Dec 2018 01:06:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D08392A0E6 for ; Tue, 11 Dec 2018 01:06:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C47C32A4ED; Tue, 11 Dec 2018 01:06:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 44CE32A0E6 for ; Tue, 11 Dec 2018 01:06:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729605AbeLKBGb (ORCPT ); Mon, 10 Dec 2018 20:06:31 -0500 Received: from mga02.intel.com ([134.134.136.20]:24704 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729546AbeLKBFw (ORCPT ); Mon, 10 Dec 2018 20:05:52 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Dec 2018 17:05:52 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,340,1539673200"; d="scan'208";a="117705208" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by orsmga001.jf.intel.com with ESMTP; 10 Dec 2018 17:05:51 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv2 09/12] node: Add memory caching attributes Date: Mon, 10 Dec 2018 18:03:07 -0700 Message-Id: <20181211010310.8551-10-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20181211010310.8551-1-keith.busch@intel.com> References: <20181211010310.8551-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP System memory may have side caches to help improve access speed. While the system provided cache is transparent to the software accessing these memory ranges, applications can optimize their own access based on cache attributes. Provide a new API for the kernel to register these memory side caches under the memory node that provides it. The kernel's sysfs representation is modeled from the cpu cacheinfo attributes, as seen from /sys/devices/system/cpu/cpuX/side_cache/. Unlike CPU cacheinfo, though, the node cache level is reported from the view of the memory. A higher number is nearer to the CPU, while lower levels are closer to the backing memory. Also unlike CPU cache, it is assumed the system will handle flushing any dirty cached memory to the last level the memory on a power failure if the range is persistent memory. The attributes we export are the cache size, the line size, associativity, and write back policy. Signed-off-by: Keith Busch --- drivers/base/node.c | 140 +++++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/node.h | 23 +++++++++ 2 files changed, 163 insertions(+) diff --git a/drivers/base/node.c b/drivers/base/node.c index 768612c06c56..54184424ca7f 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -141,6 +142,143 @@ void node_set_perf_attrs(unsigned int nid, struct node_hmem_attrs *hmem_attrs) pr_info("failed to add performance attribute group to node %d\n", nid); } + +struct node_cache_info { + struct device dev; + struct list_head node; + struct node_cache_attrs cache_attrs; +}; +#define to_cache_info(device) container_of(device, struct node_cache_info, dev) + +#define CACHE_ATTR(name, fmt) \ +static ssize_t name##_show(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + return sprintf(buf, fmt "\n", to_cache_info(dev)->cache_attrs.name); \ +} \ +DEVICE_ATTR_RO(name); + +CACHE_ATTR(size, "%lld") +CACHE_ATTR(level, "%d") +CACHE_ATTR(line_size, "%d") +CACHE_ATTR(associativity, "%d") +CACHE_ATTR(write_policy, "%d") + +static struct attribute *cache_attrs[] = { + &dev_attr_level.attr, + &dev_attr_associativity.attr, + &dev_attr_size.attr, + &dev_attr_line_size.attr, + &dev_attr_write_policy.attr, + NULL, +}; + +const struct attribute_group node_cache_attrs_group = { + .attrs = cache_attrs, +}; + +const struct attribute_group *node_cache_attrs_groups[] = { + &node_cache_attrs_group, + NULL, +}; + +static void node_release(struct device *dev) +{ + kfree(dev); +} + +static void node_cache_release(struct device *dev) +{ + struct node_cache_info *info = to_cache_info(dev); + kfree(info); +} + +static void node_init_cache_dev(struct node *node) +{ + struct device *dev; + + dev = kzalloc(sizeof(*dev), GFP_KERNEL); + if (!dev) + return; + + dev->parent = &node->dev; + dev->release = node_release; + dev_set_name(dev, "side_cache"); + + if (device_register(dev)) { + kfree(dev); + return; + } + pm_runtime_no_callbacks(dev); + node->cache_dev = dev; +} + +void node_add_cache(unsigned int nid, struct node_cache_attrs *cache_attrs) +{ + struct node_cache_info *info; + struct device *dev; + struct node *node; + + if (!node_online(nid) || !node_devices[nid]) + return; + + node = node_devices[nid]; + list_for_each_entry(info, &node->cache_attrs, node) { + if (info->cache_attrs.level == cache_attrs->level) { + dev_warn(&node->dev, + "attempt to add duplicate cache level:%d\n", + cache_attrs->level); + return; + } + } + + if (!node->cache_dev) + node_init_cache_dev(node); + if (!node->cache_dev) + return; + + info = kzalloc(sizeof(*info), GFP_KERNEL); + if (!info) + return; + + dev = &info->dev; + dev->parent = node->cache_dev; + dev->release = node_cache_release; + dev->groups = node_cache_attrs_groups; + dev_set_name(dev, "index%d", cache_attrs->level); + info->cache_attrs = *cache_attrs; + if (device_register(dev)) { + dev_warn(&node->dev, "failed to add cache level:%d\n", + cache_attrs->level); + kfree(info); + return; + } + pm_runtime_no_callbacks(dev); + list_add_tail(&info->node, &node->cache_attrs); +} + +static void node_remove_caches(struct node *node) +{ + struct node_cache_info *info, *next; + + if (!node->cache_dev) + return; + + list_for_each_entry_safe(info, next, &node->cache_attrs, node) { + list_del(&info->node); + device_unregister(&info->dev); + } + device_unregister(node->cache_dev); +} + +static void node_init_caches(unsigned int nid) +{ + INIT_LIST_HEAD(&node_devices[nid]->cache_attrs); +} +#else +static void node_init_caches(unsigned int nid) { } +static void node_remove_caches(struct node *node) { } #endif #define K(x) ((x) << (PAGE_SHIFT - 10)) @@ -389,6 +527,7 @@ static void node_device_release(struct device *dev) */ flush_work(&node->node_work); #endif + node_remove_caches(node); kfree(node); } @@ -711,6 +850,7 @@ int __register_one_node(int nid) /* initialize work queue for memory hot plug */ init_node_hugetlb_work(nid); + node_init_caches(nid); return error; } diff --git a/include/linux/node.h b/include/linux/node.h index 71abaf0d4f4b..897e04e99e80 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -36,6 +36,27 @@ struct node_hmem_attrs { unsigned int write_latency; }; void node_set_perf_attrs(unsigned int nid, struct node_hmem_attrs *hmem_attrs); + +enum cache_associativity { + NODE_CACHE_DIRECT_MAP, + NODE_CACHE_INDEXED, + NODE_CACHE_OTHER, +}; + +enum cache_write_policy { + NODE_CACHE_WRITE_BACK, + NODE_CACHE_WRITE_THROUGH, + NODE_CACHE_WRITE_OTHER, +}; + +struct node_cache_attrs { + enum cache_associativity associativity; + enum cache_write_policy write_policy; + u64 size; + u16 line_size; + u8 level; +}; +void node_add_cache(unsigned int nid, struct node_cache_attrs *cache_attrs); #endif struct node { @@ -48,6 +69,8 @@ struct node { #endif #ifdef CONFIG_HMEM_REPORTING struct node_hmem_attrs hmem_attrs; + struct list_head cache_attrs; + struct device *cache_dev; #endif }; From patchwork Tue Dec 11 01:03:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10722933 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E73B5112E for ; Tue, 11 Dec 2018 01:06:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D4EA52A0E6 for ; Tue, 11 Dec 2018 01:06:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C90892A4ED; Tue, 11 Dec 2018 01:06:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 754DB2A0E6 for ; Tue, 11 Dec 2018 01:06:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729621AbeLKBFy (ORCPT ); Mon, 10 Dec 2018 20:05:54 -0500 Received: from mga02.intel.com ([134.134.136.20]:24704 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729591AbeLKBFy (ORCPT ); Mon, 10 Dec 2018 20:05:54 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Dec 2018 17:05:53 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,340,1539673200"; d="scan'208";a="117705213" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by orsmga001.jf.intel.com with ESMTP; 10 Dec 2018 17:05:52 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv2 10/12] Documentation/ABI: Add node cache attributes Date: Mon, 10 Dec 2018 18:03:08 -0700 Message-Id: <20181211010310.8551-11-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20181211010310.8551-1-keith.busch@intel.com> References: <20181211010310.8551-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add the attributes for the system memory side caches. Signed-off-by: Keith Busch --- Documentation/ABI/stable/sysfs-devices-node | 34 +++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node index 6cdb0643f9fd..63a757b8a29e 100644 --- a/Documentation/ABI/stable/sysfs-devices-node +++ b/Documentation/ABI/stable/sysfs-devices-node @@ -151,3 +151,37 @@ Contact: Keith Busch Description: Write latency in nanosecondss available to memory initiators found in primary_cpu_nodemask. + +What: /sys/devices/system/node/nodeX/side_cache/indexY/associativity +Date: December 2018 +Contact: Keith Busch +Description: + The caches associativity: 0 for direct mapped, non-zero if + indexed. + +What: /sys/devices/system/node/nodeX/side_cache/indexY/level +Date: December 2018 +Contact: Keith Busch +Description: + This cache's level in the memory hierarchy. Matches 'Y' in the + directory name. + +What: /sys/devices/system/node/nodeX/side_cache/indexY/line_size +Date: December 2018 +Contact: Keith Busch +Description: + The number of bytes accessed from the next cache level on a + cache miss. + +What: /sys/devices/system/node/nodeX/side_cache/indexY/size +Date: December 2018 +Contact: Keith Busch +Description: + The size of this memory side cache in bytes. + +What: /sys/devices/system/node/nodeX/side_cache/indexY/write_policy +Date: December 2018 +Contact: Keith Busch +Description: + The cache write policy: 0 for write-back, 1 for write-through, + 2 for other or unknown. From patchwork Tue Dec 11 01:03:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10722931 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 58BD491E for ; Tue, 11 Dec 2018 01:06:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 42A812A0E6 for ; Tue, 11 Dec 2018 01:06:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 357FC2A4ED; Tue, 11 Dec 2018 01:06:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E05162A0E6 for ; Tue, 11 Dec 2018 01:06:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729633AbeLKBGT (ORCPT ); Mon, 10 Dec 2018 20:06:19 -0500 Received: from mga02.intel.com ([134.134.136.20]:24704 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729605AbeLKBFz (ORCPT ); Mon, 10 Dec 2018 20:05:55 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Dec 2018 17:05:54 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,340,1539673200"; d="scan'208";a="117705216" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by orsmga001.jf.intel.com with ESMTP; 10 Dec 2018 17:05:53 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv2 11/12] acpi/hmat: Register memory side cache attributes Date: Mon, 10 Dec 2018 18:03:09 -0700 Message-Id: <20181211010310.8551-12-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20181211010310.8551-1-keith.busch@intel.com> References: <20181211010310.8551-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Register memory side cache attributes with the memory's node if HMAT provides the side cache information table. Signed-off-by: Keith Busch --- drivers/acpi/hmat.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/drivers/acpi/hmat.c b/drivers/acpi/hmat.c index 40bc83f4b593..48d53ceb4778 100644 --- a/drivers/acpi/hmat.c +++ b/drivers/acpi/hmat.c @@ -206,6 +206,7 @@ static __init int hmat_parse_cache(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_hmat_cache *cache = (void *)header; + struct node_cache_attrs cache_attrs; u32 attrs; if (cache->header.length < sizeof(*cache)) { @@ -219,6 +220,37 @@ static __init int hmat_parse_cache(union acpi_subtable_headers *header, cache->memory_PD, cache->cache_size, attrs, cache->number_of_SMBIOShandles); + cache_attrs.size = cache->cache_size; + cache_attrs.level = (attrs & ACPI_HMAT_CACHE_LEVEL) >> 4; + cache_attrs.line_size = (attrs & ACPI_HMAT_CACHE_LINE_SIZE) >> 16; + + switch ((attrs & ACPI_HMAT_CACHE_ASSOCIATIVITY) >> 8) { + case ACPI_HMAT_CA_DIRECT_MAPPED: + cache_attrs.associativity = NODE_CACHE_DIRECT_MAP; + break; + case ACPI_HMAT_CA_COMPLEX_CACHE_INDEXING: + cache_attrs.associativity = NODE_CACHE_INDEXED; + break; + case ACPI_HMAT_CA_NONE: + default: + cache_attrs.associativity = NODE_CACHE_OTHER; + break; + } + + switch ((attrs & ACPI_HMAT_WRITE_POLICY) >> 12) { + case ACPI_HMAT_CP_WB: + cache_attrs.write_policy = NODE_CACHE_WRITE_BACK; + break; + case ACPI_HMAT_CP_WT: + cache_attrs.write_policy = NODE_CACHE_WRITE_THROUGH; + break; + case ACPI_HMAT_CP_NONE: + default: + cache_attrs.write_policy = NODE_CACHE_WRITE_OTHER; + break; + } + + node_add_cache(pxm_to_node(cache->memory_PD), &cache_attrs); return 0; } From patchwork Tue Dec 11 01:03:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10722929 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 30401112E for ; Tue, 11 Dec 2018 01:06:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 200302A0E6 for ; Tue, 11 Dec 2018 01:06:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 140572A4ED; Tue, 11 Dec 2018 01:06:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 635842A170 for ; Tue, 11 Dec 2018 01:06:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729579AbeLKBGM (ORCPT ); Mon, 10 Dec 2018 20:06:12 -0500 Received: from mga02.intel.com ([134.134.136.20]:24704 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729633AbeLKBF4 (ORCPT ); Mon, 10 Dec 2018 20:05:56 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Dec 2018 17:05:55 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,340,1539673200"; d="scan'208";a="117705221" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by orsmga001.jf.intel.com with ESMTP; 10 Dec 2018 17:05:54 -0800 From: Keith Busch To: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org Cc: Greg Kroah-Hartman , Rafael Wysocki , Dave Hansen , Dan Williams , Keith Busch Subject: [PATCHv2 12/12] doc/mm: New documentation for memory performance Date: Mon, 10 Dec 2018 18:03:10 -0700 Message-Id: <20181211010310.8551-13-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20181211010310.8551-1-keith.busch@intel.com> References: <20181211010310.8551-1-keith.busch@intel.com> Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Platforms may provide system memory where some physical address ranges perform differently than others, or is side cached by the system. Add documentation describing a high level overview of such systems and the performance and caching attributes the kernel provides for applications wishing to query this information. Signed-off-by: Keith Busch --- Documentation/admin-guide/mm/numaperf.rst | 171 ++++++++++++++++++++++++++++++ 1 file changed, 171 insertions(+) create mode 100644 Documentation/admin-guide/mm/numaperf.rst diff --git a/Documentation/admin-guide/mm/numaperf.rst b/Documentation/admin-guide/mm/numaperf.rst new file mode 100644 index 000000000000..846b3f991e7f --- /dev/null +++ b/Documentation/admin-guide/mm/numaperf.rst @@ -0,0 +1,171 @@ +.. _numaperf: + +============= +NUMA Locality +============= + +Some platforms may have multiple types of memory attached to a single +CPU. These disparate memory ranges share some characteristics, such as +CPU cache coherence, but may have different performance. For example, +different media types and buses affect bandwidth and latency. + +A system supporting such heterogeneous memory by grouping each memory +type under different "nodes" based on similar CPU locality and performance +characteristics. Some memory may share the same node as a CPU, and others +are provided as memory only nodes. While memory only nodes do not provide +CPUs, they may still be directly accessible, or local, to one or more +compute nodes. The following diagram shows one such example of two compute +noes with local memory and a memory only node for each of compute node: + + +------------------+ +------------------+ + | Compute Node 0 +-----+ Compute Node 1 | + | Local Node0 Mem | | Local Node1 Mem | + +--------+---------+ +--------+---------+ + | | + +--------+---------+ +--------+---------+ + | Slower Node2 Mem | | Slower Node3 Mem | + +------------------+ +--------+---------+ + +A "memory initiator" is a node containing one or more devices such as +CPUs or separate memory I/O devices that can initiate memory requests. A +"memory target" is a node containing one or more accessible physical +address ranges from one or more memory initiators. + +When multiple memory initiators exist, they may not all have the same +performance when accessing a given memory target. The highest performing +initiator to a given target is considered to be one of that target's +local initiators. Any given target may have one or more local initiators, +and any given initiator may have multiple local memory targets. + +To aid applications matching memory targets with their initiators, +the kernel provide symlinks to each other like the following example:: + + # ls -l /sys/devices/system/node/nodeX/local_target* + /sys/devices/system/node/nodeX/local_targetY -> ../nodeY + + # ls -l /sys/devices/system/node/nodeY/local_initiator* + /sys/devices/system/node/nodeY/local_initiatorX -> ../nodeX + +The linked nodes will also have their node number set in the local_mem +and local_cpu node list and maps. + +An example showing how this may be used to run a particular task on CPUs +and memory that are both local to a particular PCI device can be done +using existing 'numactl' as follows:: + + # NODE=$(cat /sys/devices/pci:0000:00/.../numa_node) + # numactl --membind=$(cat /sys/devices/node/node${NODE}/local_mem_nodelist) \ + --cpunodebind=$(cat /sys/devices/node/node${NODE}/local_cpu_nodelist) \ + -- + +================ +NUMA Performance +================ + +Applications may wish to consider which node they want their memory to +be allocated from based on the node's performance characteristics. If the +system provides these attributes, the kernel exports them under the node +sysfs hierarchy by appending the local_initiator_access directory under +the memory node as follows:: + + /sys/devices/system/node/nodeY/local_initiator_access/ + +The kernel does not provide performance attributes for non-local memory +initiators. These attributes apply only to the memory initiator nodes that +have a local_initiatorX link, or are set in the local_cpu_nodelist. A +memory initiator node is considered local to itself if it also is +a memory target and will be set it its node list and map, but won't +contain a symlink to itself. + +The performance characteristics the kernel provides for the local initiators +are exported are as follows:: + + # tree /sys/devices/system/node/nodeY/local_initiator_access + /sys/devices/system/node/nodeY/local_initiator_access + |-- read_bandwidth + |-- read_latency + |-- write_bandwidth + `-- write_latency + +The bandwidth attributes are provided in MiB/second. + +The latency attributes are provided in nanoseconds. + +========== +NUMA Cache +========== + +System memory may be constructed in a hierarchy of elements with various +performance characteristics in order to provide large address space +of slower performing memory side-cached by a smaller higher performing +memory. The system physical addresses that initiators are aware of is +provided by the last memory level in the hierarchy, while the system uses +higher performing memory to transparently cache access to progressively +slower levels. + +The term "far memory" is used to denote the last level memory in the +hierarchy. Each increasing cache level provides higher performing +initiator access, and the term "near memory" represents the fastest +cache provided by the system. + +This numbering is different than CPU caches where the cache level (ex: +L1, L2, L3) uses a CPU centric view with each increased level is lower +performing. In contrast, the memory cache level is centric to the last +level memory, so the higher numbered cache level denotes memory nearer +to the CPU, and further from far memory. + +The memory side caches are not directly addressable by software. When +software accesses a system address, the system will return it from the +near memory cache if it is present. If it is not present, the system +accesses the next level of memory until there is either a hit in that +cache level, or it reaches far memory. + +An application does not need to know about caching attributes in order +to use the system, software may optionally query the memory cache +attributes in order to maximize the performance out of such a setup. +If the system provides a way for the kernel to discover this information, +for example with ACPI HMAT (Heterogeneous Memory Attribute Table), +the kernel will append these attributes to the NUMA node memory target. + +When the kernel first registers a memory cache with a node, the kernel +will create the following directory:: + + /sys/devices/system/node/nodeX/side_cache/ + +If that directory is not present, the system either does not not provide +a memory side cache, or that information is not accessible to the kernel. + +The attributes for each level of cache is provided under its cache +level index:: + + /sys/devices/system/node/nodeX/side_cache/indexA/ + /sys/devices/system/node/nodeX/side_cache/indexB/ + /sys/devices/system/node/nodeX/side_cache/indexC/ + +Each cache level's directory provides its attributes. For example, +the following is a single cache level and the attributes available for +software to query:: + + # tree sys/devices/system/node/node0/side_cache/ + /sys/devices/system/node/node0/side_cache/ + |-- index1 + | |-- associativity + | |-- level + | |-- line_size + | |-- size + | `-- write_policy + +The "associativity" will be 0 if it is a direct-mapped cache, and non-zero +for any other indexed based, multi-way associativity. + +The "level" is the distance from the far memory, and matches the number +appended to its "index" directory. + +The "line_size" is the number of bytes accessed on a cache miss. + +The "size" is the number of bytes provided by this cache level. + +The "write_policy" will be 0 for write-back, and non-zero for +write-through caching. + +See also: https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf