From patchwork Tue Dec 18 04:23:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10734783 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 471536C2 for ; Tue, 18 Dec 2018 04:36:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 32DC12A783 for ; Tue, 18 Dec 2018 04:36:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 261962A785; Tue, 18 Dec 2018 04:36:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BEB492A783 for ; Tue, 18 Dec 2018 04:36:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A4F628E0003; Mon, 17 Dec 2018 23:36:06 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9FE298E0001; Mon, 17 Dec 2018 23:36:06 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8EF378E0003; Mon, 17 Dec 2018 23:36:06 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id 3EFE38E0001 for ; Mon, 17 Dec 2018 23:36:06 -0500 (EST) Received: by mail-pl1-f198.google.com with SMTP id ay11so10944089plb.20 for ; Mon, 17 Dec 2018 20:36:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=wl9Zd94aAkMZV4QPb9vWp1qsfFio/IQEoadYj7Am2+U=; b=bnnVzRHmtsOpoXg15iipt5H5+6Tw4rrRYmZsSh61yePdTNGpslM3wsB+iHIsUA+7Cf LhaYx6xqfjvE/bHFieG815SI6hn0UvAXndnrjFmY7UtJwcAj3fwsiWfZGkH1bl+F2k5Y otmvLg6rN9Cf0H4vg7KCKHki4nlBtKDURZnE8mkEWIjdAGOxOuJT4kNHpe3S+gW6FB1b g6zIWlilvSQhN/KynQVnK3g5cMupp4fCQjl/Nxks2CBnrlTo3FAM0Q8vclI8RNO0x1t+ WgH51x2N01xQcIn4fd1uHbvKNnz6gxwh31QncioADgdA1QuneQxim37cN9vpo5Bzx+xS FJ+g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWYXN8Me2vcYO6pm728iyY5xtlozd+BH4K8OqRXNn35LJ5sDImDu P58FFNKshe3575oz85O4LnN4mkcB2Jb37VQTbZfM9+MlyxKFVZm6gcbV0WO9mtkaPCNX54qydmf S44gKir0txBNsQOBxjTWRPrXMRry0BaEpXrUn4ntLYNV1uk6koMGVxLzxFBkORnGTAw== X-Received: by 2002:a17:902:6502:: with SMTP id b2mr14944488plk.44.1545107765868; Mon, 17 Dec 2018 20:36:05 -0800 (PST) X-Google-Smtp-Source: AFSGD/X8QXWMFjvu986pWh4fjICSr+0S4M601cfGhtDswkNZxWKv6p1cQHliHLAm1J9OK08YVurQ X-Received: by 2002:a17:902:6502:: with SMTP id b2mr14944428plk.44.1545107764242; Mon, 17 Dec 2018 20:36:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545107764; cv=none; d=google.com; s=arc-20160816; b=TLrxidqzcbKxVm7VdspvNmRTEgooKuZgp4LnNI2ti4f919GfZQaehn7/oqM3X6fXzc vko34GQjFrg9d6SFdA9REEmehOft9QHakT9c/tZvoOomxViqU3zk/k+ep0/OAXNfnt+3 IrRqnkjbukhlUj4xSQLbd8y8PhRPc2gurjJlpQYzNtwWiPoLiY3xAJzfx3vbLRUEg5Ux MwcMNZPpHL9nuTRkWovHHHP2WpRBWrR9s5oZbxESehIUrsSSq9WhZEwxalPwCgsvxw25 JoOqgpjs6dBe0ZU+5jdGgX1KxZdXvw0Wprsk3MKJaUXdY5XsDvcgkdFuGiz3Ol/fS7ZH lCaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=wl9Zd94aAkMZV4QPb9vWp1qsfFio/IQEoadYj7Am2+U=; b=JpqxujIezk8txYN/Swu4Ow1cjxl/EU41whmvvt3b/E4Doob3999k+HeZf5v3UJ+4nK A2agi+CnHGo6bvbRHUENQkt75c6qUy6bTOgulOou28/kD5W2Qb+TbTrD8t+2JKJlJuJv +C8nC2vz5/Rmb/cg/xVOEMH5d0gvVyqGp7iNGTSjwbrlj2QyjFSvEraX7ogeWsjepIdv iGqDpxGFIIs7I7Fbj6TWSPjx/2ntwbGXhcDE5MtEbVTGtpb+6r59zRGurQaGChJPGojT Ptl8L9i5dUZ9x7BbcoLqwTwkVSfIRaudEUykBFV8TNyGQ1QlEWEh0Wce/MkPPp/VNkgj ohTQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga03.intel.com (mga03.intel.com. [134.134.136.65]) by mx.google.com with ESMTPS id n13si12482064pgp.307.2018.12.17.20.36.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 17 Dec 2018 20:36:04 -0800 (PST) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.65 as permitted sender) client-ip=134.134.136.65; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Dec 2018 20:36:03 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,367,1539673200"; d="scan'208";a="111404570" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga003.jf.intel.com with ESMTP; 17 Dec 2018 20:36:03 -0800 Subject: [PATCH v6 1/6] acpi: Create subtable parsing infrastructure From: Dan Williams To: akpm@linux-foundation.org Cc: "Rafael J. Wysocki" , Keith Busch , peterz@infradead.org, dave.hansen@linux.intel.com, linux-mm@kvack.org, x86@kernel.org, linux-kernel@vger.kernel.org, mgorman@suse.de Date: Mon, 17 Dec 2018 20:23:28 -0800 Message-ID: <154510700824.1941238.14650493839997144294.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com> References: <154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Keith Busch Parsing entries in an ACPI table had assumed a generic header structure. There is no standard ACPI header, though, so less common layouts with different field sizes required custom parsers to go through their subtable entry list. Create the infrastructure for adding different table types so parsing the entries array may be more reused for all ACPI system tables so that the common code doesn't need to be duplicated. Reviewed-by: Rafael J. Wysocki Signed-off-by: Keith Busch Signed-off-by: Dan Williams --- arch/ia64/kernel/acpi.c | 12 ++-- arch/x86/kernel/acpi/boot.c | 36 +++++++------ drivers/acpi/numa.c | 16 +++--- drivers/acpi/scan.c | 4 + drivers/acpi/tables.c | 67 +++++++++++++++++++++---- drivers/irqchip/irq-gic-v2m.c | 2 - drivers/irqchip/irq-gic-v3-its-pci-msi.c | 2 - drivers/irqchip/irq-gic-v3-its-platform-msi.c | 2 - drivers/irqchip/irq-gic-v3-its.c | 6 +- drivers/irqchip/irq-gic-v3.c | 8 +-- drivers/irqchip/irq-gic.c | 4 + drivers/mailbox/pcc.c | 2 - include/linux/acpi.h | 5 +- 13 files changed, 108 insertions(+), 58 deletions(-) diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c index 41eb281709da..3973d2c2a9b0 100644 --- a/arch/ia64/kernel/acpi.c +++ b/arch/ia64/kernel/acpi.c @@ -177,7 +177,7 @@ struct acpi_table_madt *acpi_madt __initdata; static u8 has_8259; static int __init -acpi_parse_lapic_addr_ovr(struct acpi_subtable_header * header, +acpi_parse_lapic_addr_ovr(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_local_apic_override *lapic; @@ -216,7 +216,7 @@ acpi_parse_lsapic(struct acpi_subtable_header * header, const unsigned long end) } static int __init -acpi_parse_lapic_nmi(struct acpi_subtable_header * header, const unsigned long end) +acpi_parse_lapic_nmi(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_local_apic_nmi *lacpi_nmi; @@ -230,7 +230,7 @@ acpi_parse_lapic_nmi(struct acpi_subtable_header * header, const unsigned long e } static int __init -acpi_parse_iosapic(struct acpi_subtable_header * header, const unsigned long end) +acpi_parse_iosapic(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_io_sapic *iosapic; @@ -245,7 +245,7 @@ acpi_parse_iosapic(struct acpi_subtable_header * header, const unsigned long end static unsigned int __initdata acpi_madt_rev; static int __init -acpi_parse_plat_int_src(struct acpi_subtable_header * header, +acpi_parse_plat_int_src(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_interrupt_source *plintsrc; @@ -329,7 +329,7 @@ unsigned int get_cpei_target_cpu(void) } static int __init -acpi_parse_int_src_ovr(struct acpi_subtable_header * header, +acpi_parse_int_src_ovr(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_interrupt_override *p; @@ -350,7 +350,7 @@ acpi_parse_int_src_ovr(struct acpi_subtable_header * header, } static int __init -acpi_parse_nmi_src(struct acpi_subtable_header * header, const unsigned long end) +acpi_parse_nmi_src(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_nmi_source *nmi_src; diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 06635fbca81c..58561b4df09d 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -197,7 +197,7 @@ static int acpi_register_lapic(int id, u32 acpiid, u8 enabled) } static int __init -acpi_parse_x2apic(struct acpi_subtable_header *header, const unsigned long end) +acpi_parse_x2apic(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_local_x2apic *processor = NULL; #ifdef CONFIG_X86_X2APIC @@ -210,7 +210,7 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, const unsigned long end) if (BAD_MADT_ENTRY(processor, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); #ifdef CONFIG_X86_X2APIC apic_id = processor->local_apic_id; @@ -242,7 +242,7 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, const unsigned long end) } static int __init -acpi_parse_lapic(struct acpi_subtable_header * header, const unsigned long end) +acpi_parse_lapic(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_local_apic *processor = NULL; @@ -251,7 +251,7 @@ acpi_parse_lapic(struct acpi_subtable_header * header, const unsigned long end) if (BAD_MADT_ENTRY(processor, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); /* Ignore invalid ID */ if (processor->id == 0xff) @@ -272,7 +272,7 @@ acpi_parse_lapic(struct acpi_subtable_header * header, const unsigned long end) } static int __init -acpi_parse_sapic(struct acpi_subtable_header *header, const unsigned long end) +acpi_parse_sapic(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_local_sapic *processor = NULL; @@ -281,7 +281,7 @@ acpi_parse_sapic(struct acpi_subtable_header *header, const unsigned long end) if (BAD_MADT_ENTRY(processor, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); acpi_register_lapic((processor->id << 8) | processor->eid,/* APIC ID */ processor->processor_id, /* ACPI ID */ @@ -291,7 +291,7 @@ acpi_parse_sapic(struct acpi_subtable_header *header, const unsigned long end) } static int __init -acpi_parse_lapic_addr_ovr(struct acpi_subtable_header * header, +acpi_parse_lapic_addr_ovr(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_local_apic_override *lapic_addr_ovr = NULL; @@ -301,7 +301,7 @@ acpi_parse_lapic_addr_ovr(struct acpi_subtable_header * header, if (BAD_MADT_ENTRY(lapic_addr_ovr, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); acpi_lapic_addr = lapic_addr_ovr->address; @@ -309,7 +309,7 @@ acpi_parse_lapic_addr_ovr(struct acpi_subtable_header * header, } static int __init -acpi_parse_x2apic_nmi(struct acpi_subtable_header *header, +acpi_parse_x2apic_nmi(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_local_x2apic_nmi *x2apic_nmi = NULL; @@ -319,7 +319,7 @@ acpi_parse_x2apic_nmi(struct acpi_subtable_header *header, if (BAD_MADT_ENTRY(x2apic_nmi, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); if (x2apic_nmi->lint != 1) printk(KERN_WARNING PREFIX "NMI not connected to LINT 1!\n"); @@ -328,7 +328,7 @@ acpi_parse_x2apic_nmi(struct acpi_subtable_header *header, } static int __init -acpi_parse_lapic_nmi(struct acpi_subtable_header * header, const unsigned long end) +acpi_parse_lapic_nmi(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_local_apic_nmi *lapic_nmi = NULL; @@ -337,7 +337,7 @@ acpi_parse_lapic_nmi(struct acpi_subtable_header * header, const unsigned long e if (BAD_MADT_ENTRY(lapic_nmi, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); if (lapic_nmi->lint != 1) printk(KERN_WARNING PREFIX "NMI not connected to LINT 1!\n"); @@ -449,7 +449,7 @@ static int __init mp_register_ioapic_irq(u8 bus_irq, u8 polarity, } static int __init -acpi_parse_ioapic(struct acpi_subtable_header * header, const unsigned long end) +acpi_parse_ioapic(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_io_apic *ioapic = NULL; struct ioapic_domain_cfg cfg = { @@ -462,7 +462,7 @@ acpi_parse_ioapic(struct acpi_subtable_header * header, const unsigned long end) if (BAD_MADT_ENTRY(ioapic, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); /* Statically assign IRQ numbers for IOAPICs hosting legacy IRQs */ if (ioapic->global_irq_base < nr_legacy_irqs()) @@ -508,7 +508,7 @@ static void __init acpi_sci_ioapic_setup(u8 bus_irq, u16 polarity, u16 trigger, } static int __init -acpi_parse_int_src_ovr(struct acpi_subtable_header * header, +acpi_parse_int_src_ovr(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_interrupt_override *intsrc = NULL; @@ -518,7 +518,7 @@ acpi_parse_int_src_ovr(struct acpi_subtable_header * header, if (BAD_MADT_ENTRY(intsrc, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); if (intsrc->source_irq == acpi_gbl_FADT.sci_interrupt) { acpi_sci_ioapic_setup(intsrc->source_irq, @@ -550,7 +550,7 @@ acpi_parse_int_src_ovr(struct acpi_subtable_header * header, } static int __init -acpi_parse_nmi_src(struct acpi_subtable_header * header, const unsigned long end) +acpi_parse_nmi_src(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_madt_nmi_source *nmi_src = NULL; @@ -559,7 +559,7 @@ acpi_parse_nmi_src(struct acpi_subtable_header * header, const unsigned long end if (BAD_MADT_ENTRY(nmi_src, end)) return -EINVAL; - acpi_table_print_madt_entry(header); + acpi_table_print_madt_entry(&header->common); /* TBD: Support nimsrc entries? */ diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c index 274699463b4f..f5e09c39ff22 100644 --- a/drivers/acpi/numa.c +++ b/drivers/acpi/numa.c @@ -338,7 +338,7 @@ acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa) } static int __init -acpi_parse_x2apic_affinity(struct acpi_subtable_header *header, +acpi_parse_x2apic_affinity(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_srat_x2apic_cpu_affinity *processor_affinity; @@ -347,7 +347,7 @@ acpi_parse_x2apic_affinity(struct acpi_subtable_header *header, if (!processor_affinity) return -EINVAL; - acpi_table_print_srat_entry(header); + acpi_table_print_srat_entry(&header->common); /* let architecture-dependent part to do it */ acpi_numa_x2apic_affinity_init(processor_affinity); @@ -356,7 +356,7 @@ acpi_parse_x2apic_affinity(struct acpi_subtable_header *header, } static int __init -acpi_parse_processor_affinity(struct acpi_subtable_header *header, +acpi_parse_processor_affinity(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_srat_cpu_affinity *processor_affinity; @@ -365,7 +365,7 @@ acpi_parse_processor_affinity(struct acpi_subtable_header *header, if (!processor_affinity) return -EINVAL; - acpi_table_print_srat_entry(header); + acpi_table_print_srat_entry(&header->common); /* let architecture-dependent part to do it */ acpi_numa_processor_affinity_init(processor_affinity); @@ -374,7 +374,7 @@ acpi_parse_processor_affinity(struct acpi_subtable_header *header, } static int __init -acpi_parse_gicc_affinity(struct acpi_subtable_header *header, +acpi_parse_gicc_affinity(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_srat_gicc_affinity *processor_affinity; @@ -383,7 +383,7 @@ acpi_parse_gicc_affinity(struct acpi_subtable_header *header, if (!processor_affinity) return -EINVAL; - acpi_table_print_srat_entry(header); + acpi_table_print_srat_entry(&header->common); /* let architecture-dependent part to do it */ acpi_numa_gicc_affinity_init(processor_affinity); @@ -394,7 +394,7 @@ acpi_parse_gicc_affinity(struct acpi_subtable_header *header, static int __initdata parsed_numa_memblks; static int __init -acpi_parse_memory_affinity(struct acpi_subtable_header * header, +acpi_parse_memory_affinity(union acpi_subtable_headers * header, const unsigned long end) { struct acpi_srat_mem_affinity *memory_affinity; @@ -403,7 +403,7 @@ acpi_parse_memory_affinity(struct acpi_subtable_header * header, if (!memory_affinity) return -EINVAL; - acpi_table_print_srat_entry(header); + acpi_table_print_srat_entry(&header->common); /* let architecture-dependent part to do it */ if (!acpi_numa_memory_affinity_init(memory_affinity)) diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c index bd1c59fb0e17..d98d5da6a279 100644 --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -2234,10 +2234,10 @@ static struct acpi_probe_entry *ape; static int acpi_probe_count; static DEFINE_MUTEX(acpi_probe_mutex); -static int __init acpi_match_madt(struct acpi_subtable_header *header, +static int __init acpi_match_madt(union acpi_subtable_headers *header, const unsigned long end) { - if (!ape->subtable_valid || ape->subtable_valid(header, ape)) + if (!ape->subtable_valid || ape->subtable_valid(&header->common, ape)) if (!ape->probe_subtbl(header, end)) acpi_probe_count++; diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c index 61203eebf3a1..e9643b4267c7 100644 --- a/drivers/acpi/tables.c +++ b/drivers/acpi/tables.c @@ -49,6 +49,15 @@ static struct acpi_table_desc initial_tables[ACPI_MAX_TABLES] __initdata; static int acpi_apic_instance __initdata; +enum acpi_subtable_type { + ACPI_SUBTABLE_COMMON, +}; + +struct acpi_subtable_entry { + union acpi_subtable_headers *hdr; + enum acpi_subtable_type type; +}; + /* * Disable table checksum verification for the early stage due to the size * limitation of the current x86 early mapping implementation. @@ -217,6 +226,42 @@ void acpi_table_print_madt_entry(struct acpi_subtable_header *header) } } +static unsigned long __init +acpi_get_entry_type(struct acpi_subtable_entry *entry) +{ + switch (entry->type) { + case ACPI_SUBTABLE_COMMON: + return entry->hdr->common.type; + } + return 0; +} + +static unsigned long __init +acpi_get_entry_length(struct acpi_subtable_entry *entry) +{ + switch (entry->type) { + case ACPI_SUBTABLE_COMMON: + return entry->hdr->common.length; + } + return 0; +} + +static unsigned long __init +acpi_get_subtable_header_length(struct acpi_subtable_entry *entry) +{ + switch (entry->type) { + case ACPI_SUBTABLE_COMMON: + return sizeof(entry->hdr->common); + } + return 0; +} + +static enum acpi_subtable_type __init +acpi_get_subtable_type(char *id) +{ + return ACPI_SUBTABLE_COMMON; +} + /** * acpi_parse_entries_array - for each proc_num find a suitable subtable * @@ -246,8 +291,8 @@ acpi_parse_entries_array(char *id, unsigned long table_size, struct acpi_subtable_proc *proc, int proc_num, unsigned int max_entries) { - struct acpi_subtable_header *entry; - unsigned long table_end; + struct acpi_subtable_entry entry; + unsigned long table_end, subtable_len, entry_len; int count = 0; int errs = 0; int i; @@ -270,19 +315,20 @@ acpi_parse_entries_array(char *id, unsigned long table_size, /* Parse all entries looking for a match. */ - entry = (struct acpi_subtable_header *) + entry.type = acpi_get_subtable_type(id); + entry.hdr = (union acpi_subtable_headers *) ((unsigned long)table_header + table_size); + subtable_len = acpi_get_subtable_header_length(&entry); - while (((unsigned long)entry) + sizeof(struct acpi_subtable_header) < - table_end) { + while (((unsigned long)entry.hdr) + subtable_len < table_end) { if (max_entries && count >= max_entries) break; for (i = 0; i < proc_num; i++) { - if (entry->type != proc[i].id) + if (acpi_get_entry_type(&entry) != proc[i].id) continue; if (!proc[i].handler || - (!errs && proc[i].handler(entry, table_end))) { + (!errs && proc[i].handler(entry.hdr, table_end))) { errs++; continue; } @@ -297,13 +343,14 @@ acpi_parse_entries_array(char *id, unsigned long table_size, * If entry->length is 0, break from this loop to avoid * infinite loop. */ - if (entry->length == 0) { + entry_len = acpi_get_entry_length(&entry); + if (entry_len == 0) { pr_err("[%4.4s:0x%02x] Invalid zero length\n", id, proc->id); return -EINVAL; } - entry = (struct acpi_subtable_header *) - ((unsigned long)entry + entry->length); + entry.hdr = (union acpi_subtable_headers *) + ((unsigned long)entry.hdr + entry_len); } if (max_entries && count > max_entries) { diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c index f5fe0100f9ff..de14e06fd9ec 100644 --- a/drivers/irqchip/irq-gic-v2m.c +++ b/drivers/irqchip/irq-gic-v2m.c @@ -446,7 +446,7 @@ static struct fwnode_handle *gicv2m_get_fwnode(struct device *dev) } static int __init -acpi_parse_madt_msi(struct acpi_subtable_header *header, +acpi_parse_madt_msi(union acpi_subtable_headers *header, const unsigned long end) { int ret; diff --git a/drivers/irqchip/irq-gic-v3-its-pci-msi.c b/drivers/irqchip/irq-gic-v3-its-pci-msi.c index 8d6d009d1d58..c81d5b81da56 100644 --- a/drivers/irqchip/irq-gic-v3-its-pci-msi.c +++ b/drivers/irqchip/irq-gic-v3-its-pci-msi.c @@ -159,7 +159,7 @@ static int __init its_pci_of_msi_init(void) #ifdef CONFIG_ACPI static int __init -its_pci_msi_parse_madt(struct acpi_subtable_header *header, +its_pci_msi_parse_madt(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_generic_translator *its_entry; diff --git a/drivers/irqchip/irq-gic-v3-its-platform-msi.c b/drivers/irqchip/irq-gic-v3-its-platform-msi.c index 7b8e87b493fe..9cdcda5bb3bd 100644 --- a/drivers/irqchip/irq-gic-v3-its-platform-msi.c +++ b/drivers/irqchip/irq-gic-v3-its-platform-msi.c @@ -117,7 +117,7 @@ static int __init its_pmsi_init_one(struct fwnode_handle *fwnode, #ifdef CONFIG_ACPI static int __init -its_pmsi_parse_madt(struct acpi_subtable_header *header, +its_pmsi_parse_madt(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_generic_translator *its_entry; diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index db20e992a40f..d6677075d68f 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -3764,13 +3764,13 @@ static int __init acpi_get_its_numa_node(u32 its_id) return NUMA_NO_NODE; } -static int __init gic_acpi_match_srat_its(struct acpi_subtable_header *header, +static int __init gic_acpi_match_srat_its(union acpi_subtable_headers *header, const unsigned long end) { return 0; } -static int __init gic_acpi_parse_srat_its(struct acpi_subtable_header *header, +static int __init gic_acpi_parse_srat_its(union acpi_subtable_headers *header, const unsigned long end) { int node; @@ -3837,7 +3837,7 @@ static int __init acpi_get_its_numa_node(u32 its_id) { return NUMA_NO_NODE; } static void __init acpi_its_srat_maps_free(void) { } #endif -static int __init gic_acpi_parse_madt_its(struct acpi_subtable_header *header, +static int __init gic_acpi_parse_madt_its(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_generic_translator *its_entry; diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 8f87f40c9460..1729514a0578 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -1365,7 +1365,7 @@ gic_acpi_register_redist(phys_addr_t phys_base, void __iomem *redist_base) } static int __init -gic_acpi_parse_madt_redist(struct acpi_subtable_header *header, +gic_acpi_parse_madt_redist(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_generic_redistributor *redist = @@ -1383,7 +1383,7 @@ gic_acpi_parse_madt_redist(struct acpi_subtable_header *header, } static int __init -gic_acpi_parse_madt_gicc(struct acpi_subtable_header *header, +gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_generic_interrupt *gicc = @@ -1425,14 +1425,14 @@ static int __init gic_acpi_collect_gicr_base(void) return -ENODEV; } -static int __init gic_acpi_match_gicr(struct acpi_subtable_header *header, +static int __init gic_acpi_match_gicr(union acpi_subtable_headers *header, const unsigned long end) { /* Subtable presence means that redist exists, that's it */ return 0; } -static int __init gic_acpi_match_gicc(struct acpi_subtable_header *header, +static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_generic_interrupt *gicc = diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index ced10c44b68a..8d2750b835da 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -1508,7 +1508,7 @@ static struct } acpi_data __initdata; static int __init -gic_acpi_parse_madt_cpu(struct acpi_subtable_header *header, +gic_acpi_parse_madt_cpu(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_madt_generic_interrupt *processor; @@ -1540,7 +1540,7 @@ gic_acpi_parse_madt_cpu(struct acpi_subtable_header *header, } /* The things you have to do to just *count* something... */ -static int __init acpi_dummy_func(struct acpi_subtable_header *header, +static int __init acpi_dummy_func(union acpi_subtable_headers *header, const unsigned long end) { return 0; diff --git a/drivers/mailbox/pcc.c b/drivers/mailbox/pcc.c index 256f18b67e8a..08a0a3517138 100644 --- a/drivers/mailbox/pcc.c +++ b/drivers/mailbox/pcc.c @@ -382,7 +382,7 @@ static const struct mbox_chan_ops pcc_chan_ops = { * * This gets called for each entry in the PCC table. */ -static int parse_pcc_subspace(struct acpi_subtable_header *header, +static int parse_pcc_subspace(union acpi_subtable_headers *header, const unsigned long end) { struct acpi_pcct_subspace *ss = (struct acpi_pcct_subspace *) header; diff --git a/include/linux/acpi.h b/include/linux/acpi.h index ed80f147bd50..18805a967c70 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -141,10 +141,13 @@ enum acpi_address_range_id { /* Table Handlers */ +union acpi_subtable_headers { + struct acpi_subtable_header common; +}; typedef int (*acpi_tbl_table_handler)(struct acpi_table_header *table); -typedef int (*acpi_tbl_entry_handler)(struct acpi_subtable_header *header, +typedef int (*acpi_tbl_entry_handler)(union acpi_subtable_headers *header, const unsigned long end); /* Debugger support */ From patchwork Tue Dec 18 04:23:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10734785 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B2C266C5 for ; Tue, 18 Dec 2018 04:36:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A2DA42A77F for ; Tue, 18 Dec 2018 04:36:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9737A2A784; Tue, 18 Dec 2018 04:36:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 316DB2A77F for ; Tue, 18 Dec 2018 04:36:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 064638E0004; Mon, 17 Dec 2018 23:36:11 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 013138E0001; Mon, 17 Dec 2018 23:36:10 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E99B18E0004; Mon, 17 Dec 2018 23:36:10 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id A96D58E0001 for ; Mon, 17 Dec 2018 23:36:10 -0500 (EST) Received: by mail-pf1-f200.google.com with SMTP id 68so14025034pfr.6 for ; Mon, 17 Dec 2018 20:36:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=BxleHWxGAny2vwiQobBNCIcz5w7bsmwYXoWxG9f9QeE=; b=Wrn5hR+4W4Z/TJQwtxBoXeqnExlWIbm67aJ6pIKeKIUqGM3xfVX1VcFaET2KHgmWp2 ka4RwCuqOnDC40uA4g9TSe53Vu2xA0fzFyDyXbAssuwJn6KA/kixIA5k1bGAJ2//dRyL s2qM23KE8txUdOOvf2VhimDyohobkQcaCnoTPnochjnQlTktrSHiJc4XvKDfJL5nLY/6 VIC+OcdeoA/fGxQxpA8N6pyH+c2DXHCsRH/X4UFygQlrLiRJYQAC8/xajdcP9Z5nTdQN FyVJ+wACcdz0KFMNZ6mfST6KxLbPAIaTYbM8BFOiaPeupsqH3E9dJXwIgGSfanu751r7 Gonw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWZzkFU66y6/jeSHvL+iCe+8xKeVbQBB2G3E1YoSvZ8bsk1lpzfx AzpyEXi3nq4QzB96F2CCkiOfgcFFY7ybC5wYdsDkQ88v5RNuANyMZDwPD44jf2sxuQc+IbNTgQk NRm2714xZTThvhE4BlBK4dPwa5MyHoTUMS2NFelKXRFOFoEcj+6U8UZ/DVPSWJ6Z3CQ== X-Received: by 2002:a63:3d49:: with SMTP id k70mr14103058pga.191.1545107770299; Mon, 17 Dec 2018 20:36:10 -0800 (PST) X-Google-Smtp-Source: AFSGD/VKofB5JBRpbbOsobTyiUnfPBaoWN8PT2a9mi63FPPnjX4YU4TRaZbLJYD2cEUttl252USR X-Received: by 2002:a63:3d49:: with SMTP id k70mr14103035pga.191.1545107769580; Mon, 17 Dec 2018 20:36:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545107769; cv=none; d=google.com; s=arc-20160816; b=nM5Aq5fUIojjiUMpUdijTztLHEflt/XhsR5KWqk2hl4Jd47RYgfwWAftT71JfPvmp7 p4lJpJaAZChp8+LqUfLwREhAsx4z2CbLkpdatNyH4tMi3Vsv0dd3NbHwpvAcYlNUzpRg EOiuaBqDhXDOWR7Wn+VzhF3GLyrionasbUTKnpWWGFRcH/Y9sj0C1BT5J1xovi40jQSm iwJfgI09csNx98nRK9V+8CTbUTUo9lbS0bUtqCgXhqQokTYUKjjgupB0byi/WSEZfxOU uxMMNmeIGNrwKhcAqpZPMI8fpiacv33+JNKgu3n/Ieft9ji/ssjB+x5va4h08Ln5qXcz IKkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=BxleHWxGAny2vwiQobBNCIcz5w7bsmwYXoWxG9f9QeE=; b=WAc3NnFc/G73C3kmRk0zrgE9QIz8NSlWFmV/HWe1We5NwdVLq6SeSUasI+zWBNUkdm WcECEzipO+B7w7Vh8POvcEtAm4u81ssqDDf2Poo2MZmCEP2RSYpsKKdyVzFF8gEG2wNc Ogaawr8/XhqbLQWwS9wie6P3DkWjkIIrWVT/4IY9/HRtcjZmfCd+u1lTZwdoYLa4+44y sBkQw7zwso8x5c9LKUPblo+HUecW4soxqXD/cSRzz61AL6YKySz+Dn21GdH2rpGYCGdH VBHmJq58ky7NtKYVouqLlBX61pzSq/xpBXQLo21tKqh/bNg8Fp920bKVKs2jwFGuIjId BJbg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga04.intel.com (mga04.intel.com. [192.55.52.120]) by mx.google.com with ESMTPS id 97si12335161plm.312.2018.12.17.20.36.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 17 Dec 2018 20:36:09 -0800 (PST) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.120 as permitted sender) client-ip=192.55.52.120; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Dec 2018 20:36:09 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,367,1539673200"; d="scan'208";a="284353351" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga005.jf.intel.com with ESMTP; 17 Dec 2018 20:36:08 -0800 Subject: [PATCH v6 2/6] acpi: Add HMAT to generic parsing tables From: Dan Williams To: akpm@linux-foundation.org Cc: Keith Busch , peterz@infradead.org, dave.hansen@linux.intel.com, linux-mm@kvack.org, x86@kernel.org, linux-kernel@vger.kernel.org, mgorman@suse.de Date: Mon, 17 Dec 2018 20:23:33 -0800 Message-ID: <154510701343.1941238.7745758103869136625.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com> References: <154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Keith Busch The HMAT table header has different field lengths than the existing parsing uses. Add the HMAT type to the parsing rules so it may be generically parsed. Signed-off-by: Keith Busch Signed-off-by: Dan Williams --- drivers/acpi/tables.c | 9 +++++++++ include/linux/acpi.h | 1 + 2 files changed, 10 insertions(+) diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c index e9643b4267c7..bc1addf715dc 100644 --- a/drivers/acpi/tables.c +++ b/drivers/acpi/tables.c @@ -51,6 +51,7 @@ static int acpi_apic_instance __initdata; enum acpi_subtable_type { ACPI_SUBTABLE_COMMON, + ACPI_SUBTABLE_HMAT, }; struct acpi_subtable_entry { @@ -232,6 +233,8 @@ acpi_get_entry_type(struct acpi_subtable_entry *entry) switch (entry->type) { case ACPI_SUBTABLE_COMMON: return entry->hdr->common.type; + case ACPI_SUBTABLE_HMAT: + return entry->hdr->hmat.type; } return 0; } @@ -242,6 +245,8 @@ acpi_get_entry_length(struct acpi_subtable_entry *entry) switch (entry->type) { case ACPI_SUBTABLE_COMMON: return entry->hdr->common.length; + case ACPI_SUBTABLE_HMAT: + return entry->hdr->hmat.length; } return 0; } @@ -252,6 +257,8 @@ acpi_get_subtable_header_length(struct acpi_subtable_entry *entry) switch (entry->type) { case ACPI_SUBTABLE_COMMON: return sizeof(entry->hdr->common); + case ACPI_SUBTABLE_HMAT: + return sizeof(entry->hdr->hmat); } return 0; } @@ -259,6 +266,8 @@ acpi_get_subtable_header_length(struct acpi_subtable_entry *entry) static enum acpi_subtable_type __init acpi_get_subtable_type(char *id) { + if (strncmp(id, ACPI_SIG_HMAT, 4) == 0) + return ACPI_SUBTABLE_HMAT; return ACPI_SUBTABLE_COMMON; } diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 18805a967c70..4373f5ba0f95 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -143,6 +143,7 @@ enum acpi_address_range_id { /* Table Handlers */ union acpi_subtable_headers { struct acpi_subtable_header common; + struct acpi_hmat_structure hmat; }; typedef int (*acpi_tbl_table_handler)(struct acpi_table_header *table); From patchwork Tue Dec 18 04:23:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10734787 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5F5666C5 for ; Tue, 18 Dec 2018 04:36:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4E2632A77F for ; Tue, 18 Dec 2018 04:36:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 426742A784; Tue, 18 Dec 2018 04:36:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8741D2A77F for ; Tue, 18 Dec 2018 04:36:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8A4288E0005; Mon, 17 Dec 2018 23:36:16 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 802608E0001; Mon, 17 Dec 2018 23:36:16 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6CA568E0005; Mon, 17 Dec 2018 23:36:16 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 2FFC48E0001 for ; Mon, 17 Dec 2018 23:36:16 -0500 (EST) Received: by mail-pg1-f200.google.com with SMTP id d3so12573827pgv.23 for ; Mon, 17 Dec 2018 20:36:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=9U+5fp7gpnvA/P2KndwCzAFeQtrVaHdpqHHk+CwYpQU=; b=X+WJANj+dvHGsCiv774YBCaTMbvH6FKCtExhfn6C4KUmIkJBstpH8gkzzAdkLr7Idl LNl1BaFIgu2UW/rKf0ot2aDXhGrUAt8GR+hbIRVw9BaUMuKstfAJjbgECSRL0MndZLGA 8m0/LHZO/dUwjlDtZuXmfXw1soo3CPYSJ1RhplVAhYvrAvfgWGSYwMAhZkRUbsRN+cMs xiP1DLVJhKYkCtM+otikqvjlP30x8fDPnv1lblOVOosVAK/bmp16WoY/TVBVVhgKvZXa n92pw27WxPhTXShaJgoqFbkG0UTk4V98MBcrqB43sPxHQ5n+QcWFTj1x//e8BsXbNXct dKQg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWb6ux81jb+2BRpmzSujV+0TqF22iXssYmxe6YrnPdyfBF1OWCLn p0FWtgmewRXBpTzErfQzfQnNF4db0sr8JoIR3Ta/BBl4RI+T9MgLm3y/Zh9MyS8v6S6BdydkbrB uFF+JGHHNwZrG88+/GfCFtL6PXSIKRvatV98w5wJle/ePwfho2Q1oAXl4VnR0felzsA== X-Received: by 2002:a62:160d:: with SMTP id 13mr15189003pfw.203.1545107775845; Mon, 17 Dec 2018 20:36:15 -0800 (PST) X-Google-Smtp-Source: AFSGD/UDTloLSgZfkx+G6F2ab1Jzmk1wapaJCxxxk3waMn+i4NxKLLIhp4ECdhf+gaNVzCp2YTG6 X-Received: by 2002:a62:160d:: with SMTP id 13mr15188977pfw.203.1545107774880; Mon, 17 Dec 2018 20:36:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545107774; cv=none; d=google.com; s=arc-20160816; b=sTecNPbVD4k1L1l9/lxgXaApWJSe7eTvIrTL8kAHXvwNytf/3sNdUJXcXeE2lIUSZN mxUgqcrMiOBqSfNo6RMLbMG8zc+pANGUPQQOy+EUuX8csl/Qv6MzckdmlZsnUQ3oBNZ/ HaF8sXGFhZKHdrlJIzo19c44c7e7m64fVpWRAo/lQZQZyzqzEC9gTCX6jvDJVU57iG5r AXdWKibj+EuhIthcDKJmo4DyaC+lwVZ0tbNwyIp/vCdojW5fb6/Zort0PEwY4fQQ1ONM 4d2EmiIU+unlB561OPORvYbB9eJtKbuqzaZmKK/XnkqSFGSx8GYPaXXuJZQp7Iv5FFtw 8P0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=9U+5fp7gpnvA/P2KndwCzAFeQtrVaHdpqHHk+CwYpQU=; b=BWJrO43gZJZy0PGmijTNTyId7IwOvr/pF+vsO5dth+P/7TzK/Ck1mVNgmDx0xWg3an E+ArzYTo4Jme+mllr6A66mt3wkeJ9UDT7wP7jboS0iC2qS+mDXnHIsy+8Esp485W/QC0 b8Qip8+cEIfCg8O72MoUwSH0zeGUyjdCom/DRz7Ek6MGqbKYW+YDOtD9h1M/91jVWRAr UjvuJStiT7edjfIBq+c51xVcT2rmnveUK921zEBBLxcN1oHnvZcgVdcTipnET+BdO6Vs lWAoBq1GELtICZHPq68ASX7Slbz6uVaFOM46YJKkxNpZijK3/OX5cU3plXel4ALcywBc c9Cg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id e37si8444892plb.172.2018.12.17.20.36.14 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 17 Dec 2018 20:36:14 -0800 (PST) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.115 as permitted sender) client-ip=192.55.52.115; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Dec 2018 20:36:14 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,367,1539673200"; d="scan'208";a="303044346" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga006.fm.intel.com with ESMTP; 17 Dec 2018 20:36:14 -0800 Subject: [PATCH v6 3/6] acpi/numa: Set the memory-side-cache size in memblocks From: Dan Williams To: akpm@linux-foundation.org Cc: x86@kernel.org, "Rafael J. Wysocki" , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Mike Rapoport , Keith Busch , linux-mm@kvack.org, x86@kernel.org, linux-kernel@vger.kernel.org, mgorman@suse.de Date: Mon, 17 Dec 2018 20:23:38 -0800 Message-ID: <154510701860.1941238.2239802602407206153.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com> References: <154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Keith Busch Add memblock based enumeration of memory-side-cache of System RAM. Detect the capability in early init through HMAT tables, and set the size in the address range memblocks if a direct mapped side cache is present. Cc: Cc: "Rafael J. Wysocki" Cc: Dave Hansen Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Mike Rapoport Signed-off-by: Keith Busch Signed-off-by: Dan Williams --- arch/x86/Kconfig | 1 + drivers/acpi/numa.c | 32 ++++++++++++++++++++++++++++++++ include/linux/memblock.h | 38 ++++++++++++++++++++++++++++++++++++++ mm/Kconfig | 3 +++ mm/memblock.c | 34 ++++++++++++++++++++++++++++++++++ 5 files changed, 108 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8689e794a43c..3f9c413d8eb5 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -171,6 +171,7 @@ config X86 select HAVE_KVM select HAVE_LIVEPATCH if X86_64 select HAVE_MEMBLOCK_NODE_MAP + select HAVE_MEMBLOCK_CACHE_INFO if ACPI_NUMA select HAVE_MIXED_BREAKPOINTS_REGS select HAVE_MOD_ARCH_SPECIFIC select HAVE_NMI diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c index f5e09c39ff22..ec7e849f1c19 100644 --- a/drivers/acpi/numa.c +++ b/drivers/acpi/numa.c @@ -40,6 +40,12 @@ static int pxm_to_node_map[MAX_PXM_DOMAINS] static int node_to_pxm_map[MAX_NUMNODES] = { [0 ... MAX_NUMNODES - 1] = PXM_INVAL }; +struct mem_cacheinfo { + phys_addr_t size; + bool direct_mapped; +}; +static struct mem_cacheinfo side_cached_pxms[MAX_PXM_DOMAINS] __initdata; + unsigned char acpi_srat_revision __initdata; int acpi_numa __initdata; @@ -262,6 +268,8 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma) u64 start, end; u32 hotpluggable; int node, pxm; + u64 cache_size; + bool direct; if (srat_disabled()) goto out_err; @@ -308,6 +316,13 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma) pr_warn("SRAT: Failed to mark hotplug range [mem %#010Lx-%#010Lx] in memblock\n", (unsigned long long)start, (unsigned long long)end - 1); + cache_size = side_cached_pxms[pxm].size; + direct = side_cached_pxms[pxm].direct_mapped; + if (cache_size && + memblock_set_sidecache(start, ma->length, cache_size, direct)) + pr_warn("SRAT: Failed to mark side cached range [mem %#010Lx-%#010Lx] in memblock\n", + (unsigned long long)start, (unsigned long long)end - 1); + max_possible_pfn = max(max_possible_pfn, PFN_UP(end - 1)); return 0; @@ -411,6 +426,18 @@ acpi_parse_memory_affinity(union acpi_subtable_headers * header, return 0; } +static int __init +acpi_parse_cache(union acpi_subtable_headers *header, const unsigned long end) +{ + struct acpi_hmat_cache *c = (void *)header; + u32 attrs = (c->cache_attributes & ACPI_HMAT_CACHE_ASSOCIATIVITY) >> 8; + + if (attrs == ACPI_HMAT_CA_DIRECT_MAPPED) + side_cached_pxms[c->memory_PD].direct_mapped = true; + side_cached_pxms[c->memory_PD].size += c->cache_size; + return 0; +} + static int __init acpi_parse_srat(struct acpi_table_header *table) { struct acpi_table_srat *srat = (struct acpi_table_srat *)table; @@ -460,6 +487,11 @@ int __init acpi_numa_init(void) sizeof(struct acpi_table_srat), srat_proc, ARRAY_SIZE(srat_proc), 0); + acpi_table_parse_entries(ACPI_SIG_HMAT, + sizeof(struct acpi_table_hmat), + ACPI_HMAT_TYPE_CACHE, + acpi_parse_cache, 0); + cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY, acpi_parse_memory_affinity, 0); } diff --git a/include/linux/memblock.h b/include/linux/memblock.h index aee299a6aa76..29c3c88a5207 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -52,6 +52,8 @@ enum memblock_flags { * @size: size of the region * @flags: memory region attributes * @nid: NUMA node id + * @cache_size: size of memory side cache in bytes + * @direct_mapped: true if direct mapped cache associativity exists */ struct memblock_region { phys_addr_t base; @@ -60,6 +62,10 @@ struct memblock_region { #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP int nid; #endif +#ifdef CONFIG_HAVE_MEMBLOCK_CACHE_INFO + phys_addr_t cache_size; + bool direct_mapped; +#endif }; /** @@ -317,6 +323,38 @@ static inline int memblock_get_region_node(const struct memblock_region *r) } #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ +#ifdef CONFIG_HAVE_MEMBLOCK_CACHE_INFO +int memblock_set_sidecache(phys_addr_t base, phys_addr_t size, + phys_addr_t cache_size, bool direct_mapped); + +static inline bool memblock_sidecache_direct_mapped(struct memblock_region *m) +{ + return m->direct_mapped; +} + +static inline phys_addr_t memblock_sidecache_size(struct memblock_region *m) +{ + return m->cache_size; +} +#else +static inline int memblock_set_sidecache(phys_addr_t base, phys_addr_t size, + phys_addr_t cache_size, + bool direct_mapped) +{ + return 0; +} + +static inline phys_addr_t memblock_sidecache_size(struct memblock_region *m) +{ + return 0; +} + +static inline bool memblock_sidecache_direct_mapped(struct memblock_region *m) +{ + return false; +} +#endif /* CONFIG_HAVE_MEMBLOCK_CACHE_INFO */ + /* Flags for memblock allocation APIs */ #define MEMBLOCK_ALLOC_ANYWHERE (~(phys_addr_t)0) #define MEMBLOCK_ALLOC_ACCESSIBLE 0 diff --git a/mm/Kconfig b/mm/Kconfig index d85e39da47ae..c7944299a89e 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -142,6 +142,9 @@ config ARCH_DISCARD_MEMBLOCK config MEMORY_ISOLATION bool +config HAVE_MEMBLOCK_CACHE_INFO + bool + # # Only be set on architectures that have completely implemented memory hotplug # feature. If you are not sure, don't touch it. diff --git a/mm/memblock.c b/mm/memblock.c index 9a2d5ae81ae1..8ebbc77f20c5 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -822,6 +822,40 @@ int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size) return memblock_add_range(&memblock.reserved, base, size, MAX_NUMNODES, 0); } +#ifdef CONFIG_HAVE_MEMBLOCK_CACHE_INFO +/** + * memblock_set_sidecache - set the system memory cache info + * @base: base address of the region + * @size: size of the region + * @cache_size: system side cache size in bytes + * @direct: true if the cache has direct mapped associativity + * + * This function isolates region [@base, @base + @size), and saves the cache + * information. + * + * Return: 0 on success, -errno on failure. + */ +int __init_memblock memblock_set_sidecache(phys_addr_t base, phys_addr_t size, + phys_addr_t cache_size, bool direct_mapped) +{ + struct memblock_type *type = &memblock.memory; + int i, ret, start_rgn, end_rgn; + + ret = memblock_isolate_range(type, base, size, &start_rgn, &end_rgn); + if (ret) + return ret; + + for (i = start_rgn; i < end_rgn; i++) { + struct memblock_region *r = &type->regions[i]; + + r->cache_size = cache_size; + r->direct_mapped = direct_mapped; + } + + return 0; +} +#endif + /** * memblock_setclr_flag - set or clear flag for a memory region * @base: base address of the region From patchwork Tue Dec 18 04:23:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10734791 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1DB176C5 for ; Tue, 18 Dec 2018 04:36:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0B0172A77F for ; Tue, 18 Dec 2018 04:36:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F28042A785; Tue, 18 Dec 2018 04:36:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5958A2A77F for ; Tue, 18 Dec 2018 04:36:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 56EB38E0006; Mon, 17 Dec 2018 23:36:23 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 51D3A8E0001; Mon, 17 Dec 2018 23:36:23 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 433638E0006; Mon, 17 Dec 2018 23:36:23 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id E6CE88E0001 for ; Mon, 17 Dec 2018 23:36:22 -0500 (EST) Received: by mail-pg1-f197.google.com with SMTP id r13so12602933pgb.7 for ; Mon, 17 Dec 2018 20:36:22 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=FBwD6C9LHwrJwuepoGcOzerrwadbddjaA3KChCu6Cno=; b=PXK9y/N2fVjWh4tYZZ4mYYJPoTZ/lDo80OqBVvV6ZLn9JzSv4GECYAc/876rKX54mp ZWOYXVG1Xk5lQTIre7HXt4R/GI/SlqMaLcX0S5vs2Cqcw+dwdMdmpbTImrB7aciqKdTr p5LyHK9iFJFrJt9gPsTvV4MLm/Kh+3R9Cl70w+J+5dTsLdpSB4U9HPrS8GY4uKYVSH9y fbw0ZHQBKEEEyTGlX6AIIHZUFn/gjjGrYTDB9SN17ALB8j/Lasku/xHk3L7BqKlJeZ1G Z9/9/T+KsLhrZl/bhHx/V8ZQv4Td0gyFWoNRlI64tJutlfB+6FKHoC7RzQx+a+lN/dfJ 4c3w== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWY8j+PUPki1D7FIdCJS2giW550y8sBZZGTHQMFv49swkHv/7Pvn amuuZ0xDS6obWJOH+ag1HtffppNGC6ALBuGWKgpoPllni/eKPLIGwnd2kVqDp2KGILma+UU+2wa 5yjCSc7xc1FHQpe4kpzW/RxV1W5PEV+tqHPVEwvxvj/5r9xIVgzzrKGSI5qPwi84xKg== X-Received: by 2002:a17:902:2c03:: with SMTP id m3mr14344655plb.6.1545107782442; Mon, 17 Dec 2018 20:36:22 -0800 (PST) X-Google-Smtp-Source: AFSGD/Xoiizt1MqvCd/Ju7Zo6ZLsw8jTJw+yrJzm0dr41/WZW88mht+4OlVpsJEE6gEcFedBbxSI X-Received: by 2002:a17:902:2c03:: with SMTP id m3mr14344606plb.6.1545107781064; Mon, 17 Dec 2018 20:36:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545107781; cv=none; d=google.com; s=arc-20160816; b=TxYn5htnavv7YdzLgXmAg5FZ2LkVQADLtoPg9Azy4no78AP/n/dTRXaRY5y10Ktyjl 4g3VqC2A9frdGx0vRpaiZc86PFtwfwu9wHdfQP04r5sSPqaXW2DJY8m0LWE2j1hRskSS 5OkJfqY9BBfJUmJVwgbsB8092KpmEBGdVXyDIxmn1Y6Px2kwFbXjWvsVXb2MXbJi8DBp kVoTVTFtwc2uMaLul2isH86YqgwzwG0+3TSnwM4qCkaiYwn7Ua3+y2mbSsxvoAvZBbbk /xPNCsU+X7qegFyKCDeWc3jqf3rM/AP2uF8YFf9JZk8Sl9wD+vmbjdHp9rGUf1V3sV4J Mqsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=FBwD6C9LHwrJwuepoGcOzerrwadbddjaA3KChCu6Cno=; b=dwB29C4XBDGNDNf7ntZ7XKEl4jhDNYAhd0cH3U3oSRWkqWtTJlqnjaa9ynoGMPXA/x oUK54lj9aRm8v1J3/utvgpCV+4xZPeUiNlEWZFRcoOYbl56cc2QAy1xxyexPULO7gEHH gvXI4HsANM1mfJEOmsnqtCx+/yP+tapReeyqEWAnO1EW4xx/orTFwh5sISKoKtZ3C6P8 XRWQPVSGYx5iHpptvTahcoGDkz0pQ8dZnjBDg4RU1Ny6mRohfQxZpdhUCrUgQTgYFFoI ycXO5hLEux4m9jV2vASt/OInBUD9hlBwM0OMHX6CKMNTk761T2D/9prRKxjbWluJ2ml/ /vkg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id r14si12554650pfh.229.2018.12.17.20.36.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 17 Dec 2018 20:36:21 -0800 (PST) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Dec 2018 20:36:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,367,1539673200"; d="scan'208";a="130830957" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga001.fm.intel.com with ESMTP; 17 Dec 2018 20:36:19 -0800 Subject: [PATCH v6 4/6] mm: Shuffle initial free memory to improve memory-side-cache utilization From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Kees Cook , Dave Hansen , peterz@infradead.org, linux-mm@kvack.org, x86@kernel.org, linux-kernel@vger.kernel.org, mgorman@suse.de Date: Mon, 17 Dec 2018 20:23:44 -0800 Message-ID: <154510702402.1941238.1616430879354317384.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com> References: <154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Randomization of the page allocator improves the average utilization of a direct-mapped memory-side-cache. Memory side caching is a platform capability that Linux has been previously exposed to in HPC (high-performance computing) environments on specialty platforms. In that instance it was a smaller pool of high-bandwidth-memory relative to higher-capacity / lower-bandwidth DRAM. Now, this capability is going to be found on general purpose server platforms where DRAM is a cache in front of higher latency persistent memory [1]. Robert offered an explanation of the state of the art of Linux interactions with memory-side-caches [2], and I copy it here: It's been a problem in the HPC space: http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/ A kernel module called zonesort is available to try to help: https://software.intel.com/en-us/articles/xeon-phi-software and this abandoned patch series proposed that for the kernel: https://lkml.org/lkml/2017/8/23/195 Dan's patch series doesn't attempt to ensure buffers won't conflict, but also reduces the chance that the buffers will. This will make performance more consistent, albeit slower than "optimal" (which is near impossible to attain in a general-purpose kernel). That's better than forcing users to deploy remedies like: "To eliminate this gradual degradation, we have added a Stream measurement to the Node Health Check that follows each job; nodes are rebooted whenever their measured memory bandwidth falls below 300 GB/s." A replacement for zonesort was merged upstream in commit cc9aec03e58f "x86/numa_emulation: Introduce uniform split capability". With this numa_emulation capability, memory can be split into cache sized ("near-memory" sized) numa nodes. A bind operation to such a node, and disabling workloads on other nodes, enables full cache performance. However, once the workload exceeds the cache size then cache conflicts are unavoidable. While HPC environments might be able to tolerate time-scheduling of cache sized workloads, for general purpose server platforms, the oversubscribed cache case will be the common case. The worst case scenario is that a server system owner benchmarks a workload at boot with an un-contended cache only to see that performance degrade over time, even below the average cache performance due to excessive conflicts. Randomization clips the peaks and fills in the valleys of cache utilization to yield steady average performance. Here are some performance impact details of the patches: 1/ An Intel internal synthetic memory bandwidth measurement tool, saw a 3X speedup in a contrived case that tries to force cache conflicts. The contrived cased used the numa_emulation capability to force an instance of the benchmark to be run in two of the near-memory sized numa nodes. If both instances were placed on the same emulated they would fit and cause zero conflicts. While on separate emulated nodes without randomization they underutilized the cache and conflicted unnecessarily due to the in-order allocation per node. 2/ A well known Java server application benchmark was run with a heap size that exceeded cache size by 3X. The cache conflict rate was 8% for the first run and degraded to 21% after page allocator aging. With randomization enabled the rate levelled out at 11%. 3/ A MongoDB workload did not observe measurable difference in cache-conflict rates, but the overall throughput dropped by 7% with randomization in one case. 4/ Mel Gorman ran his suite of performance workloads with randomization enabled on platforms without a memory-side-cache and saw a mix of some improvements and some losses [3]. While there is potentially significant improvement for applications that depend on low latency access across a wide working-set, the performance may be negligible to negative for other workloads. For this reason the shuffle capability defaults to off unless a direct-mapped memory-side-cache is detected. Even then, the page_alloc.shuffle=0 parameter can be specified to disable the randomization on those systems. Outside of memory-side-cache utilization concerns there is potentially security benefit from randomization. Some data exfiltration and return-oriented-programming attacks rely on the ability to infer the location of sensitive data objects. The kernel page allocator, especially early in system boot, has predictable first-in-first out behavior for physical pages. Pages are freed in physical address order when first onlined. Quoting Kees: "While we already have a base-address randomization (CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and memory layouts would certainly be using the predictability of allocation ordering (i.e. for attacks where the base address isn't important: only the relative positions between allocated memory). This is common in lots of heap-style attacks. They try to gain control over ordering by spraying allocations, etc. I'd really like to see this because it gives us something similar to CONFIG_SLAB_FREELIST_RANDOM but for the page allocator." While SLAB_FREELIST_RANDOM reduces the predictability of some local slab caches it leaves vast bulk of memory to be predictably in order allocated. However, it should be noted, the concrete security benefits are hard to quantify, and no known CVE is mitigated by this randomization. Introduce shuffle_free_memory(), and its helper shuffle_zone(), to perform a Fisher-Yates shuffle of the page allocator 'free_area' lists when they are initially populated with free memory at boot and at hotplug time. The shuffling is done in terms of CONFIG_SHUFFLE_PAGE_ORDER sized free pages where the default CONFIG_SHUFFLE_PAGE_ORDER is MAX_ORDER-1 i.e. 10, 4MB this trades off randomization granularity for time spent shuffling. MAX_ORDER-1 was chosen to be minimally invasive to the page allocator while still showing memory-side cache behavior improvements, and the expectation that the security implications of finer granularity randomization is mitigated by CONFIG_SLAB_FREELIST_RANDOM. The performance impact of the shuffling appears to be in the noise compared to other memory initialization work. Also the bulk of the work is done in the background as a part of deferred_init_memmap(). This initial randomization can be undone over time so a follow-on patch is introduced to inject entropy on page free decisions. It is reasonable to ask if the page free entropy is sufficient, but it is not enough due to the in-order initial freeing of pages. At the start of that process putting page1 in front or behind page0 still keeps them close together, page2 is still near page1 and has a high chance of being adjacent. As more pages are added ordering diversity improves, but there is still high page locality for the low address pages and this leads to no significant impact to the cache conflict rate. [1]: https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/ [2]: https://lkml.org/lkml/2018/9/22/54 [3]: https://lkml.org/lkml/2018/10/12/309 Cc: Michal Hocko Cc: Kees Cook Cc: Dave Hansen Signed-off-by: Dan Williams --- include/linux/list.h | 17 ++++ include/linux/mmzone.h | 4 + include/linux/shuffle.h | 47 ++++++++++ init/Kconfig | 36 ++++++++ mm/Makefile | 7 +- mm/memblock.c | 16 +++ mm/memory_hotplug.c | 3 + mm/page_alloc.c | 3 + mm/shuffle.c | 215 +++++++++++++++++++++++++++++++++++++++++++++++ 9 files changed, 346 insertions(+), 2 deletions(-) create mode 100644 include/linux/shuffle.h create mode 100644 mm/shuffle.c diff --git a/include/linux/list.h b/include/linux/list.h index edb7628e46ed..3dfb8953f241 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -150,6 +150,23 @@ static inline void list_replace_init(struct list_head *old, INIT_LIST_HEAD(old); } +/** + * list_swap - replace entry1 with entry2 and re-add entry1 at entry2's position + * @entry1: the location to place entry2 + * @entry2: the location to place entry1 + */ +static inline void list_swap(struct list_head *entry1, + struct list_head *entry2) +{ + struct list_head *pos = entry2->prev; + + list_del(entry2); + list_replace(entry1, entry2); + if (pos == entry1) + pos = entry2; + list_add(entry1, pos); +} + /** * list_del_init - deletes entry from list and reinitialize it. * @entry: the element to delete from the list. diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 847705a6d0ec..eafa66d66232 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1266,6 +1266,10 @@ void sparse_init(void); #else #define sparse_init() do {} while (0) #define sparse_index_init(_sec, _nid) do {} while (0) +static inline int pfn_present(unsigned long pfn) +{ + return 1; +} #endif /* CONFIG_SPARSEMEM */ /* diff --git a/include/linux/shuffle.h b/include/linux/shuffle.h new file mode 100644 index 000000000000..a8a168919cb5 --- /dev/null +++ b/include/linux/shuffle.h @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright(c) 2018 Intel Corporation. All rights reserved. +#ifndef _MM_SHUFFLE_H +#define _MM_SHUFFLE_H + +enum mm_shuffle_ctl { + SHUFFLE_ENABLE, + SHUFFLE_FORCE_DISABLE, +}; +#ifdef CONFIG_SHUFFLE_PAGE_ALLOCATOR +DECLARE_STATIC_KEY_FALSE(page_alloc_shuffle_key); +extern void page_alloc_shuffle(enum mm_shuffle_ctl); +extern void __shuffle_free_memory(pg_data_t *pgdat, unsigned long start_pfn, + unsigned long end_pfn); +static inline void shuffle_free_memory(pg_data_t *pgdat, + unsigned long start_pfn, unsigned long end_pfn) +{ + if (!static_branch_unlikely(&page_alloc_shuffle_key)) + return; + __shuffle_free_memory(pgdat, start_pfn, end_pfn); +} + +extern void __shuffle_zone(struct zone *z, unsigned long start_pfn, + unsigned long end_pfn); +static inline void shuffle_zone(struct zone *z, unsigned long start_pfn, + unsigned long end_pfn) +{ + if (!static_branch_unlikely(&page_alloc_shuffle_key)) + return; + __shuffle_zone(z, start_pfn, end_pfn); +} +#else +static inline void shuffle_free_memory(pg_data_t *pgdat, unsigned long start_pfn, + unsigned long end_pfn) +{ +} + +static inline void shuffle_zone(struct zone *z, unsigned long start_pfn, + unsigned long end_pfn) +{ +} + +static inline void page_alloc_shuffle(void) +{ +} +#endif +#endif /* _MM_SHUFFLE_H */ diff --git a/init/Kconfig b/init/Kconfig index cf5b5a0dcbc2..fa6812d995ec 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1720,6 +1720,42 @@ config SLAB_FREELIST_HARDENED sacrifies to harden the kernel slab allocator against common freelist exploit methods. +config SHUFFLE_PAGE_ALLOCATOR + bool "Page allocator randomization" + depends on HAVE_MEMBLOCK_CACHE_INFO + default SLAB_FREELIST_RANDOM + help + Randomization of the page allocator improves the average + utilization of a direct-mapped memory-side-cache. See section + 5.2.27 Heterogeneous Memory Attribute Table (HMAT) in the ACPI + 6.2a specification for an example of how a platform advertises + the presence of a memory-side-cache. There are also incidental + security benefits as it reduces the predictability of page + allocations to compliment SLAB_FREELIST_RANDOM, but the + default granularity of shuffling on 4MB (MAX_ORDER) pages is + selected based on cache utilization benefits. + + While the randomization improves cache utilization it may + negatively impact workloads on platforms without a cache. For + this reason, by default, the randomization is enabled only + after runtime detection of a direct-mapped memory-side-cache. + Otherwise, the randomization may be force enabled with the + 'page_alloc.shuffle' kernel command line parameter. + + Say Y if unsure. + +config SHUFFLE_PAGE_ORDER + depends on SHUFFLE_PAGE_ALLOCATOR + int "Page allocator shuffle order" + range 0 10 + default 10 + help + Specify the granularity at which shuffling (randomization) is + performed. By default this is set to MAX_ORDER-1 to minimize + runtime impact of randomization and with the expectation that + SLAB_FREELIST_RANDOM mitigates heap attacks on smaller + object granularities. + config SLUB_CPU_PARTIAL default y depends on SLUB && SMP diff --git a/mm/Makefile b/mm/Makefile index d210cc9d6f80..ac5e5ba78874 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -33,7 +33,7 @@ mmu-$(CONFIG_MMU) += process_vm_access.o endif obj-y := filemap.o mempool.o oom_kill.o fadvise.o \ - maccess.o page_alloc.o page-writeback.o \ + maccess.o page-writeback.o \ readahead.o swap.o truncate.o vmscan.o shmem.o \ util.o mmzone.o vmstat.o backing-dev.o \ mm_init.o mmu_context.o percpu.o slab_common.o \ @@ -41,6 +41,11 @@ obj-y := filemap.o mempool.o oom_kill.o fadvise.o \ interval_tree.o list_lru.o workingset.o \ debug.o $(mmu-y) +# Give 'page_alloc' its own module-parameter namespace +page-alloc-y := page_alloc.o +page-alloc-$(CONFIG_SHUFFLE_PAGE_ALLOCATOR) += shuffle.o + +obj-y += page-alloc.o obj-y += init-mm.o obj-y += memblock.o diff --git a/mm/memblock.c b/mm/memblock.c index 8ebbc77f20c5..e51ecd6c1308 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -850,6 +851,12 @@ int __init_memblock memblock_set_sidecache(phys_addr_t base, phys_addr_t size, r->cache_size = cache_size; r->direct_mapped = direct_mapped; + /* + * Enable randomization for amortizing direct-mapped + * memory-side-cache conflicts. + */ + if (r->size > r->cache_size && r->direct_mapped) + page_alloc_shuffle(SHUFFLE_ENABLE); } return 0; @@ -1971,9 +1978,16 @@ static unsigned long __init free_low_memory_core_early(void) * low ram will be on Node1 */ for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, - NULL) + NULL) { + pg_data_t *pgdat; + count += __free_memory_core(start, end); + for_each_online_pgdat(pgdat) + shuffle_free_memory(pgdat, PHYS_PFN(start), + PHYS_PFN(end)); + } + return count; } diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 2b2b3ccbbfb5..697669ffce32 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -895,6 +896,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ zone->zone_pgdat->node_present_pages += onlined_pages; pgdat_resize_unlock(zone->zone_pgdat, &flags); + shuffle_zone(zone, pfn, zone_end_pfn(zone)); + if (onlined_pages) { node_states_set_node(nid, &arg); if (need_zonelists_rebuild) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2ec9cc407216..eaa9a012d6ae 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -60,6 +60,7 @@ #include #include #include +#include #include #include #include @@ -1595,6 +1596,8 @@ static int __init deferred_init_memmap(void *data) } pgdat_resize_unlock(pgdat, &flags); + shuffle_zone(zone, first_init_pfn, zone_end_pfn(zone)); + /* Sanity check that the next zone really is unpopulated */ WARN_ON(++zid < MAX_NR_ZONES && populated_zone(++zone)); diff --git a/mm/shuffle.c b/mm/shuffle.c new file mode 100644 index 000000000000..07961ff41a03 --- /dev/null +++ b/mm/shuffle.c @@ -0,0 +1,215 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright(c) 2018 Intel Corporation. All rights reserved. + +#include +#include +#include +#include +#include +#include +#include "internal.h" + +DEFINE_STATIC_KEY_FALSE(page_alloc_shuffle_key); +static unsigned long shuffle_state; + +/* + * Depending on the architecture, module parameter parsing may run + * before, or after the cache detection. SHUFFLE_FORCE_DISABLE prevents, + * or reverts the enabling of the shuffle implementation. SHUFFLE_ENABLE + * attempts to turn on the implementation, but aborts if it finds + * SHUFFLE_FORCE_DISABLE already set. + */ +void page_alloc_shuffle(enum mm_shuffle_ctl ctl) +{ + if (ctl == SHUFFLE_FORCE_DISABLE) + set_bit(SHUFFLE_FORCE_DISABLE, &shuffle_state); + + if (test_bit(SHUFFLE_FORCE_DISABLE, &shuffle_state)) { + if (test_and_clear_bit(SHUFFLE_ENABLE, &shuffle_state)) + static_branch_disable(&page_alloc_shuffle_key); + } else if (ctl == SHUFFLE_ENABLE + && !test_and_set_bit(SHUFFLE_ENABLE, &shuffle_state)) + static_branch_enable(&page_alloc_shuffle_key); +} + +static bool shuffle_param; +extern int shuffle_show(char *buffer, const struct kernel_param *kp) +{ + return sprintf(buffer, "%c\n", test_bit(SHUFFLE_ENABLE, &shuffle_state) + ? 'Y' : 'N'); +} +static int shuffle_store(const char *val, const struct kernel_param *kp) +{ + int rc = param_set_bool(val, kp); + + if (rc < 0) + return rc; + if (shuffle_param) + page_alloc_shuffle(SHUFFLE_ENABLE); + else + page_alloc_shuffle(SHUFFLE_FORCE_DISABLE); + return 0; +} +module_param_call(shuffle, shuffle_store, shuffle_show, &shuffle_param, 0400); + +/* + * For two pages to be swapped in the shuffle, they must be free (on a + * 'free_area' lru), have the same order, and have the same migratetype. + */ +static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order) +{ + struct page *page; + + /* + * Given we're dealing with randomly selected pfns in a zone we + * need to ask questions like... + */ + + /* ...is the pfn even in the memmap? */ + if (!pfn_valid_within(pfn)) + return NULL; + + /* ...is the pfn in a present section or a hole? */ + if (!pfn_present(pfn)) + return NULL; + + /* ...is the page free and currently on a free_area list? */ + page = pfn_to_page(pfn); + if (!PageBuddy(page)) + return NULL; + + /* + * ...is the page on the same list as the page we will + * shuffle it with? + */ + if (page_order(page) != order) + return NULL; + + return page; +} + +/* + * Fisher-Yates shuffle the freelist which prescribes iterating through + * an array, pfns in this case, and randomly swapping each entry with + * another in the span, end_pfn - start_pfn. + * + * To keep the implementation simple it does not attempt to correct for + * sources of bias in the distribution, like modulo bias or + * pseudo-random number generator bias. I.e. the expectation is that + * this shuffling raises the bar for attacks that exploit the + * predictability of page allocations, but need not be a perfect + * shuffle. + * + * Note that we don't use @z->zone_start_pfn and zone_end_pfn(@z) + * directly since the caller may be aware of holes in the zone and can + * improve the accuracy of the random pfn selection. + */ +#define SHUFFLE_RETRY 10 +static void __meminit shuffle_zone_order(struct zone *z, unsigned long start_pfn, + unsigned long end_pfn, const int order) +{ + unsigned long i, flags; + const int order_pages = 1 << order; + + if (start_pfn < z->zone_start_pfn) + start_pfn = z->zone_start_pfn; + if (end_pfn > zone_end_pfn(z)) + end_pfn = zone_end_pfn(z); + + /* probably means that start/end were outside the zone */ + if (end_pfn <= start_pfn) + return; + spin_lock_irqsave(&z->lock, flags); + start_pfn = ALIGN(start_pfn, order_pages); + for (i = start_pfn; i < end_pfn; i += order_pages) { + unsigned long j; + int migratetype, retry; + struct page *page_i, *page_j; + + /* + * We expect page_i, in the sub-range of a zone being + * added (@start_pfn to @end_pfn), to more likely be + * valid compared to page_j randomly selected in the + * span @zone_start_pfn to @spanned_pages. + */ + page_i = shuffle_valid_page(i, order); + if (!page_i) + continue; + + for (retry = 0; retry < SHUFFLE_RETRY; retry++) { + /* + * Pick a random order aligned page from the + * start of the zone. Use the *whole* zone here + * so that if it is freed in tiny pieces that we + * randomize in the whole zone, not just within + * those fragments. + * + * Since page_j comes from a potentially sparse + * address range we want to try a bit harder to + * find a shuffle point for page_i. + */ + j = z->zone_start_pfn + + ALIGN_DOWN(get_random_long() % z->spanned_pages, + order_pages); + page_j = shuffle_valid_page(j, order); + if (page_j && page_j != page_i) + break; + } + if (retry >= SHUFFLE_RETRY) { + pr_debug("%s: failed to swap %#lx\n", __func__, i); + continue; + } + + /* + * Each migratetype corresponds to its own list, make + * sure the types match otherwise we're moving pages to + * lists where they do not belong. + */ + migratetype = get_pageblock_migratetype(page_i); + if (get_pageblock_migratetype(page_j) != migratetype) { + pr_debug("%s: migratetype mismatch %#lx\n", __func__, i); + continue; + } + + list_swap(&page_i->lru, &page_j->lru); + + pr_debug("%s: swap: %#lx -> %#lx\n", __func__, i, j); + + /* take it easy on the zone lock */ + if ((i % (100 * order_pages)) == 0) { + spin_unlock_irqrestore(&z->lock, flags); + cond_resched(); + spin_lock_irqsave(&z->lock, flags); + } + } + spin_unlock_irqrestore(&z->lock, flags); +} + +void __meminit __shuffle_zone(struct zone *z, unsigned long start_pfn, + unsigned long end_pfn) +{ + int i; + + /* shuffle all the orders at the specified order and higher */ + for (i = CONFIG_SHUFFLE_PAGE_ORDER; i < MAX_ORDER; i++) + shuffle_zone_order(z, start_pfn, end_pfn, i); +} + +/** + * shuffle_free_memory - reduce the predictability of the page allocator + * @pgdat: node page data + * @start_pfn: Limit the shuffle to the greater of this value or zone start + * @end_pfn: Limit the shuffle to the less of this value or zone end + * + * While shuffle_zone() attempts to avoid holes with pfn_valid() and + * pfn_present() they can not report sub-section sized holes. @start_pfn + * and @end_pfn limit the shuffle to the exact memory pages being freed. + */ +void __meminit __shuffle_free_memory(pg_data_t *pgdat, unsigned long start_pfn, + unsigned long end_pfn) +{ + struct zone *z; + + for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++) + shuffle_zone(z, start_pfn, end_pfn); +} From patchwork Tue Dec 18 04:23:50 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10734793 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 747046C5 for ; Tue, 18 Dec 2018 04:36:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 61AC22A77F for ; Tue, 18 Dec 2018 04:36:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 54C472A785; Tue, 18 Dec 2018 04:36:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 579C42A77F for ; Tue, 18 Dec 2018 04:36:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4AE838E0001; Mon, 17 Dec 2018 23:36:28 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 436F88E0007; Mon, 17 Dec 2018 23:36:28 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3067C8E0001; Mon, 17 Dec 2018 23:36:28 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id E57528E0001 for ; Mon, 17 Dec 2018 23:36:27 -0500 (EST) Received: by mail-pf1-f200.google.com with SMTP id u20so14078793pfa.1 for ; Mon, 17 Dec 2018 20:36:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=L7Kmhk2wSUGKOpMqjLZKSL+DeGCxqriWncJZYo7+YNE=; b=o8wTvfo/HPyPMMITKmkISjlNdoaDG1WoP50Cv/3Cq3Xjzflt3MdQpix2zHBdilHcWS 590Me+ZReErjXlCVVnYGid4T8QKP0A8n44wXCN5eBe7DeiVYm/VQJYjW+zu5ih5ePqdg Q/zcsp4Rh8RhXYDuzmfQCHjB12baOoe7b7SFtOz9fzAQ2IzHgaQeA7o0Z+BY0l6BC5Et ZMzzioPkBgGmtNtmC91kgNyZZuJEyY64qHefrMSdTigkNjvRlTFaMUQNxjgIloqsDZGo z1Kf6yV+Z4dBLZP4bg9pitzQqoUPBt5QAzUyrQvltJfk1+jID9qvQ/8mkoKgH167ixa7 BPrA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWYX60RXdulmkoo1bF81XfrK/eYte0xNql56y7MYSMxe7WPIsohl Bubcu/PEVe+HxULLExkNSFr0jA6a7nd4U9rOeFnwUOtadxBGmHMCmsiHHZUxLC+nRtVKz+cI4dW uVbULfQn0PnZ4JX1HTHfKIK1ENAfKG78nnKYMPG6lwDZsDfT2u5PYmto2UeRESPdsXg== X-Received: by 2002:a17:902:930b:: with SMTP id bc11mr15520309plb.17.1545107787571; Mon, 17 Dec 2018 20:36:27 -0800 (PST) X-Google-Smtp-Source: AFSGD/V32T9Ah+miMRI8prW7AXF2RpTVNFaqkpiksarwzc+PiqeBIy6ZIZyVboDy6/J4r3gsqeE3 X-Received: by 2002:a17:902:930b:: with SMTP id bc11mr15520282plb.17.1545107786491; Mon, 17 Dec 2018 20:36:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545107786; cv=none; d=google.com; s=arc-20160816; b=eT1ykI4sT2L5f7U6cBhygzcNb885Dv1Ojgv/toQEd03C5mMCrON6xqDSCYvg9BTKJ9 ZTkmxqy/iMzdeFaVXsJyexWyKE0GFe0goYPAsMWsgdha1O29ahSF85VyC1KmgPOxRvn2 JRpfxMcFMzMA8eZ+h3vR+T2A7u5eQ0vxertruCyduVJx36xcO+bErqL/KSttnLHMMspr Xh7NJ1SmHID+y/hWwDHI2NBamd+8Iotq9SeWNND6nqaoYVo18CgQHJvMsJNxCrkiaqt0 R85/ihvH9t+dol1Y88DKR/Z6DABhsEJtlp7HaVXkFr5pDZtW4nDMV+xxdIDjQz7GZT+C q0aQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=L7Kmhk2wSUGKOpMqjLZKSL+DeGCxqriWncJZYo7+YNE=; b=Nj1CvBikJ1qAozzujw07ffPE71RH0d3iv/LCGKf6EmSYKYLOSvkXMdZrqLsxutx/6h 4CN4dG7tINKjGYTLzjvdc2Wr3aY8vHOJlUervoDxtVzbxaaJ4WuScnb04i4R4qPsp0SF raTEAunumBSzL1Glkwi1kz/Yl/QUnNkNUKhe3JPmNAczkkwgyRx8uBXnzA1x78tVLklj UfFxffQ/shZ09og+yskzPInYvXTE4oc8PanLannsLp7scYiW20FhSuVELJ9wjXuIedDn AT6jcJUhKe/eoWjMCQVAJ20U9uSVT/k20DC0jHvs+Bg3w8UUoXbll+tqvvTb1VZbIxK5 DCkQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id b3si12388557pgh.496.2018.12.17.20.36.26 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 17 Dec 2018 20:36:26 -0800 (PST) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) client-ip=134.134.136.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Dec 2018 20:36:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,367,1539673200"; d="scan'208";a="126871776" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga002.fm.intel.com with ESMTP; 17 Dec 2018 20:36:25 -0800 Subject: [PATCH v6 5/6] mm: Move buddy list manipulations into helpers From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Dave Hansen , peterz@infradead.org, linux-mm@kvack.org, x86@kernel.org, linux-kernel@vger.kernel.org, mgorman@suse.de Date: Mon, 17 Dec 2018 20:23:50 -0800 Message-ID: <154510703002.1941238.13463167622846312555.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com> References: <154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP In preparation for runtime randomization of the zone lists, take all (well, most of) the list_*() functions in the buddy allocator and put them in helper functions. Provide a common control point for injecting additional behavior when freeing pages. Cc: Michal Hocko Cc: Dave Hansen Signed-off-by: Dan Williams --- include/linux/mm.h | 3 -- include/linux/mm_types.h | 3 ++ include/linux/mmzone.h | 51 ++++++++++++++++++++++++++++++++++ mm/compaction.c | 4 +-- mm/page_alloc.c | 70 ++++++++++++++++++---------------------------- 5 files changed, 84 insertions(+), 47 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 5411de93a363..e1d23f80d3ba 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -473,9 +473,6 @@ static inline void vma_set_anonymous(struct vm_area_struct *vma) struct mmu_gather; struct inode; -#define page_private(page) ((page)->private) -#define set_page_private(page, v) ((page)->private = (v)) - #if !defined(__HAVE_ARCH_PTE_DEVMAP) || !defined(CONFIG_TRANSPARENT_HUGEPAGE) static inline int pmd_devmap(pmd_t pmd) { diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5ed8f6292a53..72f37ea6dedb 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -209,6 +209,9 @@ struct page { #define PAGE_FRAG_CACHE_MAX_SIZE __ALIGN_MASK(32768, ~PAGE_MASK) #define PAGE_FRAG_CACHE_MAX_ORDER get_order(PAGE_FRAG_CACHE_MAX_SIZE) +#define page_private(page) ((page)->private) +#define set_page_private(page, v) ((page)->private = (v)) + struct page_frag_cache { void * va; #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index eafa66d66232..35cc33af87f2 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -18,6 +18,8 @@ #include #include #include +#include +#include #include /* Free memory management - zoned buddy allocator. */ @@ -98,6 +100,55 @@ struct free_area { unsigned long nr_free; }; +/* Used for pages not on another list */ +static inline void add_to_free_area(struct page *page, struct free_area *area, + int migratetype) +{ + list_add(&page->lru, &area->free_list[migratetype]); + area->nr_free++; +} + +/* Used for pages not on another list */ +static inline void add_to_free_area_tail(struct page *page, struct free_area *area, + int migratetype) +{ + list_add_tail(&page->lru, &area->free_list[migratetype]); + area->nr_free++; +} + +/* Used for pages which are on another list */ +static inline void move_to_free_area(struct page *page, struct free_area *area, + int migratetype) +{ + list_move(&page->lru, &area->free_list[migratetype]); +} + +static inline struct page *get_page_from_free_area(struct free_area *area, + int migratetype) +{ + return list_first_entry_or_null(&area->free_list[migratetype], + struct page, lru); +} + +static inline void rmv_page_order(struct page *page) +{ + __ClearPageBuddy(page); + set_page_private(page, 0); +} + +static inline void del_page_from_free_area(struct page *page, + struct free_area *area, int migratetype) +{ + list_del(&page->lru); + rmv_page_order(page); + area->nr_free--; +} + +static inline bool free_area_empty(struct free_area *area, int migratetype) +{ + return list_empty(&area->free_list[migratetype]); +} + struct pglist_data; /* diff --git a/mm/compaction.c b/mm/compaction.c index 7c607479de4a..44adbfa073b3 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1359,13 +1359,13 @@ static enum compact_result __compact_finished(struct zone *zone, bool can_steal; /* Job done if page is free of the right migratetype */ - if (!list_empty(&area->free_list[migratetype])) + if (!free_area_empty(area, migratetype)) return COMPACT_SUCCESS; #ifdef CONFIG_CMA /* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */ if (migratetype == MIGRATE_MOVABLE && - !list_empty(&area->free_list[MIGRATE_CMA])) + !free_area_empty(area, MIGRATE_CMA)) return COMPACT_SUCCESS; #endif /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index eaa9a012d6ae..de8b5eb78d13 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -705,12 +705,6 @@ static inline void set_page_order(struct page *page, unsigned int order) __SetPageBuddy(page); } -static inline void rmv_page_order(struct page *page) -{ - __ClearPageBuddy(page); - set_page_private(page, 0); -} - /* * This function checks whether a page is free && is the buddy * we can coalesce a page and its buddy if @@ -811,13 +805,11 @@ static inline void __free_one_page(struct page *page, * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page, * merge with it and move up one order. */ - if (page_is_guard(buddy)) { + if (page_is_guard(buddy)) clear_page_guard(zone, buddy, order, migratetype); - } else { - list_del(&buddy->lru); - zone->free_area[order].nr_free--; - rmv_page_order(buddy); - } + else + del_page_from_free_area(buddy, &zone->free_area[order], + migratetype); combined_pfn = buddy_pfn & pfn; page = page + (combined_pfn - pfn); pfn = combined_pfn; @@ -867,15 +859,13 @@ static inline void __free_one_page(struct page *page, higher_buddy = higher_page + (buddy_pfn - combined_pfn); if (pfn_valid_within(buddy_pfn) && page_is_buddy(higher_page, higher_buddy, order + 1)) { - list_add_tail(&page->lru, - &zone->free_area[order].free_list[migratetype]); - goto out; + add_to_free_area_tail(page, &zone->free_area[order], + migratetype); + return; } } - list_add(&page->lru, &zone->free_area[order].free_list[migratetype]); -out: - zone->free_area[order].nr_free++; + add_to_free_area(page, &zone->free_area[order], migratetype); } /* @@ -1820,7 +1810,7 @@ static inline void expand(struct zone *zone, struct page *page, if (set_page_guard(zone, &page[size], high, migratetype)) continue; - list_add(&page[size].lru, &area->free_list[migratetype]); + add_to_free_area(&page[size], area, migratetype); area->nr_free++; set_page_order(&page[size], high); } @@ -1962,13 +1952,10 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, /* Find a page of the appropriate size in the preferred list */ for (current_order = order; current_order < MAX_ORDER; ++current_order) { area = &(zone->free_area[current_order]); - page = list_first_entry_or_null(&area->free_list[migratetype], - struct page, lru); + page = get_page_from_free_area(area, migratetype); if (!page) continue; - list_del(&page->lru); - rmv_page_order(page); - area->nr_free--; + del_page_from_free_area(page, area, migratetype); expand(zone, page, order, current_order, area, migratetype); set_pcppage_migratetype(page, migratetype); return page; @@ -2054,8 +2041,7 @@ static int move_freepages(struct zone *zone, } order = page_order(page); - list_move(&page->lru, - &zone->free_area[order].free_list[migratetype]); + move_to_free_area(page, &zone->free_area[order], migratetype); page += 1 << order; pages_moved += 1 << order; } @@ -2207,7 +2193,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page, single_page: area = &zone->free_area[current_order]; - list_move(&page->lru, &area->free_list[start_type]); + move_to_free_area(page, area, start_type); } /* @@ -2231,7 +2217,7 @@ int find_suitable_fallback(struct free_area *area, unsigned int order, if (fallback_mt == MIGRATE_TYPES) break; - if (list_empty(&area->free_list[fallback_mt])) + if (free_area_empty(area, fallback_mt)) continue; if (can_steal_fallback(order, migratetype)) @@ -2318,9 +2304,7 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, for (order = 0; order < MAX_ORDER; order++) { struct free_area *area = &(zone->free_area[order]); - page = list_first_entry_or_null( - &area->free_list[MIGRATE_HIGHATOMIC], - struct page, lru); + page = get_page_from_free_area(area, MIGRATE_HIGHATOMIC); if (!page) continue; @@ -2433,8 +2417,7 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) VM_BUG_ON(current_order == MAX_ORDER); do_steal: - page = list_first_entry(&area->free_list[fallback_mt], - struct page, lru); + page = get_page_from_free_area(area, fallback_mt); steal_suitable_fallback(zone, page, start_migratetype, can_steal); @@ -2861,6 +2844,7 @@ EXPORT_SYMBOL_GPL(split_page); int __isolate_free_page(struct page *page, unsigned int order) { + struct free_area *area = &page_zone(page)->free_area[order]; unsigned long watermark; struct zone *zone; int mt; @@ -2885,9 +2869,8 @@ int __isolate_free_page(struct page *page, unsigned int order) } /* Remove page from free list */ - list_del(&page->lru); - zone->free_area[order].nr_free--; - rmv_page_order(page); + + del_page_from_free_area(page, area, mt); /* * Set the pageblock if the isolated page is at least half of a @@ -3181,13 +3164,13 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, continue; for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) { - if (!list_empty(&area->free_list[mt])) + if (!free_area_empty(area, mt)) return true; } #ifdef CONFIG_CMA if ((alloc_flags & ALLOC_CMA) && - !list_empty(&area->free_list[MIGRATE_CMA])) { + !free_area_empty(area, MIGRATE_CMA)) { return true; } #endif @@ -5020,7 +5003,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) types[order] = 0; for (type = 0; type < MIGRATE_TYPES; type++) { - if (!list_empty(&area->free_list[type])) + if (!free_area_empty(area, type)) types[order] |= 1 << type; } } @@ -8145,6 +8128,9 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) spin_lock_irqsave(&zone->lock, flags); pfn = start_pfn; while (pfn < end_pfn) { + struct free_area *area; + int mt; + if (!pfn_valid(pfn)) { pfn++; continue; @@ -8163,13 +8149,13 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) BUG_ON(page_count(page)); BUG_ON(!PageBuddy(page)); order = page_order(page); + area = &zone->free_area[order]; #ifdef CONFIG_DEBUG_VM pr_info("remove from free list %lx %d %lx\n", pfn, 1 << order, end_pfn); #endif - list_del(&page->lru); - rmv_page_order(page); - zone->free_area[order].nr_free--; + mt = get_pageblock_migratetype(page); + del_page_from_free_area(page, area, mt); for (i = 0; i < (1 << order); i++) SetPageReserved((page+i)); pfn += (1 << order); From patchwork Tue Dec 18 04:23:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10734795 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7609C6C2 for ; Tue, 18 Dec 2018 04:36:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6595E2A77F for ; Tue, 18 Dec 2018 04:36:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 59EBF2A785; Tue, 18 Dec 2018 04:36:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BDDFC2A77F for ; Tue, 18 Dec 2018 04:36:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CE6408E0008; Mon, 17 Dec 2018 23:36:33 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C95E88E0007; Mon, 17 Dec 2018 23:36:33 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BADD58E0008; Mon, 17 Dec 2018 23:36:33 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 7C8A78E0007 for ; Mon, 17 Dec 2018 23:36:33 -0500 (EST) Received: by mail-pf1-f198.google.com with SMTP id d18so14077333pfe.0 for ; Mon, 17 Dec 2018 20:36:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:subject:from :to:cc:date:message-id:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=t8gmx1MsSVXZhGWKv7RuEJqZ2UQovoSqwCwKG1ArY1c=; b=bdoP48eATmUpumL/ygFbrdIDhQRsF6rqcIoNbWh+DmL01iyYL7LdjelllZEtHrBNV/ 0eKQ324oihr4SrAZfjuCtEpzfFAk0P2sXYTGV19YmctfvLX3wlc9CYQ4MiDLGR+Rx0N4 jHt5T2Oq5vLc+33NpmDQZ3EF6JF+ZZ7avqdjysXIDwhSv8cjgnk5saxhEwDyQidVDJHH 3ihtzOkvSipjRDgqSFb2RN0DMxAtWaYefZLr7srhAePaL7nMcjW95q85ID+b/F+ZPdtv os2OEzh2PBwiGz1iMCdCuJRCVV235an3ZZXS4eGadEqqu8pREEYzBoUjR7jj8eBtzRk8 Ekxw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AA+aEWbpeIuwWWJ/lfAjfJbAwouasCd5FW1ZoEuzkzL6/MEtgM6NgW3W FZh8S/aVzyICvh3Avlg0dXGe2mUXMRcE6smXB4wmSyZ8tPZrn/KyGKK3JsyR6VdidrXK2fkqjTw yLYkSDBL6/UNlMUoNb96X+xSBNFrBriPGMPt9doxfSa2H/sx7n8A3fjlTD4BkSPAP9g== X-Received: by 2002:a62:5884:: with SMTP id m126mr15260590pfb.177.1545107793130; Mon, 17 Dec 2018 20:36:33 -0800 (PST) X-Google-Smtp-Source: AFSGD/U4G3Y3XBiro+PlLz8+J/OztuIvPAqTA6KMN1cuD4xI5Ds/OAb//7BvG3t9BfmbQB+ehJRC X-Received: by 2002:a62:5884:: with SMTP id m126mr15260572pfb.177.1545107792389; Mon, 17 Dec 2018 20:36:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545107792; cv=none; d=google.com; s=arc-20160816; b=hMTOrR0uMLDvpDwoDPxjwTlJbeIG3nz0RXACk5uLXF7Ep7L6NqtRK3p9Sm7xBQ54f/ YgVtvDX/EfL+7/x5NilbhFIjEiB/KlGQvadSPz0Qia10gvueuGn1mai7ZQllS6dv5r8v 4MCXLk1cCOUIL292veEducULkwXnEPjfkHlJEUek/Fc51QIEBy2KPXai1IP/IaujW2Ml huJ7g6UuPsCUHFSylLRIjKXTK6ilziO/LPXQUhUGhcrbutqPmftIOQtNTOZ09DANkViw OwFg+HO/cR5TD5utfrYGRyq7Bm0+HH8KuaNR8eDAJEQzBJLJ6VgkQB7PWPGlbjZ5CU3f oWwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:message-id:date:cc:to:from:subject; bh=t8gmx1MsSVXZhGWKv7RuEJqZ2UQovoSqwCwKG1ArY1c=; b=Fk6dMCprVBDfIdWBY0xbK67IhfonUj2yzReCAoPzUdr2BdA28JRsAqMkGaUVCLPWQk LejgnQsiiQgb8PITFbV9VvtgSf2Y9+kvVMZ28sd4C3tyjLmVLmVy4ELBuKVuu/NqiuS3 YT1oISUTvmANsBHH9ejnj9NmaWN3TuRegFTUFjkH6fL1YxAQH0QS/cYUF2i6bgBGFLY+ 9zTctBfIjA8Z/U5bO2S81n/8ADW/8b6fLSPY0/SLSk7+ZGcD5XoQDYoOhgZj5oY6bsip iEJe7txX1r2Tsie31mcwaLSQ//1VKF6zs+IeNpwF6Zrv2dtx9to70Oa/V27NRR8q/7Cd L+kg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTPS id v14si13414844pfc.76.2018.12.17.20.36.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 17 Dec 2018 20:36:32 -0800 (PST) Received-SPF: pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) client-ip=134.134.136.20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dan.j.williams@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Dec 2018 20:36:31 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,367,1539673200"; d="scan'208";a="110266003" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga008.fm.intel.com with ESMTP; 17 Dec 2018 20:36:30 -0800 Subject: [PATCH v6 6/6] mm: Maintain randomization of page free lists From: Dan Williams To: akpm@linux-foundation.org Cc: Michal Hocko , Kees Cook , Dave Hansen , peterz@infradead.org, linux-mm@kvack.org, x86@kernel.org, linux-kernel@vger.kernel.org, mgorman@suse.de Date: Mon, 17 Dec 2018 20:23:55 -0800 Message-ID: <154510703541.1941238.6053320635908576300.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com> References: <154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When freeing a page with an order >= shuffle_page_order randomly select the front or back of the list for insertion. While the mm tries to defragment physical pages into huge pages this can tend to make the page allocator more predictable over time. Inject the front-back randomness to preserve the initial randomness established by shuffle_free_memory() when the kernel was booted. The overhead of this manipulation is constrained by only being applied for MAX_ORDER sized pages by default. Cc: Michal Hocko Cc: Kees Cook Cc: Dave Hansen Signed-off-by: Dan Williams --- include/linux/mmzone.h | 10 ++++++++++ include/linux/shuffle.h | 12 ++++++++++++ mm/page_alloc.c | 11 +++++++++-- mm/shuffle.c | 16 ++++++++++++++++ 4 files changed, 47 insertions(+), 2 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 35cc33af87f2..338929647eea 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -98,6 +98,8 @@ extern int page_group_by_mobility_disabled; struct free_area { struct list_head free_list[MIGRATE_TYPES]; unsigned long nr_free; + u64 rand; + u8 rand_bits; }; /* Used for pages not on another list */ @@ -116,6 +118,14 @@ static inline void add_to_free_area_tail(struct page *page, struct free_area *ar area->nr_free++; } +#ifdef CONFIG_SHUFFLE_PAGE_ALLOCATOR +/* Used to preserve page allocation order entropy */ +void add_to_free_area_random(struct page *page, struct free_area *area, + int migratetype); +#else +#define add_to_free_area_random add_to_free_area +#endif + /* Used for pages which are on another list */ static inline void move_to_free_area(struct page *page, struct free_area *area, int migratetype) diff --git a/include/linux/shuffle.h b/include/linux/shuffle.h index a8a168919cb5..8b3941a87c2c 100644 --- a/include/linux/shuffle.h +++ b/include/linux/shuffle.h @@ -29,6 +29,13 @@ static inline void shuffle_zone(struct zone *z, unsigned long start_pfn, return; __shuffle_zone(z, start_pfn, end_pfn); } + +static inline bool is_shuffle_order(int order) +{ + if (!static_branch_unlikely(&page_alloc_shuffle_key)) + return false; + return order >= CONFIG_SHUFFLE_PAGE_ORDER; +} #else static inline void shuffle_free_memory(pg_data_t *pgdat, unsigned long start_pfn, unsigned long end_pfn) @@ -43,5 +50,10 @@ static inline void shuffle_zone(struct zone *z, unsigned long start_pfn, static inline void page_alloc_shuffle(void) { } + +static inline bool is_shuffle_order(int order) +{ + return false; +} #endif #endif /* _MM_SHUFFLE_H */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index de8b5eb78d13..3a932ba23daf 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -42,6 +42,7 @@ #include #include #include +#include #include #include #include @@ -851,7 +852,8 @@ static inline void __free_one_page(struct page *page, * so it's less likely to be used soon and more likely to be merged * as a higher order page */ - if ((order < MAX_ORDER-2) && pfn_valid_within(buddy_pfn)) { + if ((order < MAX_ORDER-2) && pfn_valid_within(buddy_pfn) + && !is_shuffle_order(order)) { struct page *higher_page, *higher_buddy; combined_pfn = buddy_pfn & pfn; higher_page = page + (combined_pfn - pfn); @@ -865,7 +867,12 @@ static inline void __free_one_page(struct page *page, } } - add_to_free_area(page, &zone->free_area[order], migratetype); + if (is_shuffle_order(order)) + add_to_free_area_random(page, &zone->free_area[order], + migratetype); + else + add_to_free_area(page, &zone->free_area[order], migratetype); + } /* diff --git a/mm/shuffle.c b/mm/shuffle.c index 07961ff41a03..4cadf51c9b40 100644 --- a/mm/shuffle.c +++ b/mm/shuffle.c @@ -213,3 +213,19 @@ void __meminit __shuffle_free_memory(pg_data_t *pgdat, unsigned long start_pfn, for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++) shuffle_zone(z, start_pfn, end_pfn); } + +void add_to_free_area_random(struct page *page, struct free_area *area, + int migratetype) +{ + if (area->rand_bits == 0) { + area->rand_bits = 64; + area->rand = get_random_u64(); + } + + if (area->rand & 1) + add_to_free_area(page, area, migratetype); + else + add_to_free_area_tail(page, area, migratetype); + area->rand_bits--; + area->rand >>= 1; +}