From patchwork Wed Mar 2 22:56:01 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Daney X-Patchwork-Id: 8486671 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 59946C0553 for ; Wed, 2 Mar 2016 23:00:13 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 83F9B2025B for ; Wed, 2 Mar 2016 23:00:11 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A04DF20254 for ; Wed, 2 Mar 2016 23:00:09 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1abFin-0003LQ-T6; Wed, 02 Mar 2016 22:58:21 +0000 Received: from mail-pf0-x242.google.com ([2607:f8b0:400e:c00::242]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1abFhD-0002AG-4q for linux-arm-kernel@lists.infradead.org; Wed, 02 Mar 2016 22:56:52 +0000 Received: by mail-pf0-x242.google.com with SMTP id q129so213924pfb.3 for ; Wed, 02 Mar 2016 14:56:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=wOCYN+jzLs/4kBPY4NAqUToHj5UBafCkwd93ka/tXl4=; b=QPDGy952uMc2oHXghc/L2o7CI6XJxnfhJKUY8adFSdT9RuzfVVPFlIsHkNQV/RkDJT eooikNgHXSsc9PLIYwHy0UhIHvQNFtPBLZJ2JDemHJRHJngcDGI3KKLpmYyuDb1k1Ib1 Ny8UO8xXyPhoNJopIZAg9DvVXGjoC9zBczT64lhBxM5uJaYccFIkWCzKcZd4K2DkIQ8m YrnhNvH96lx6NDSz6pXcamk8S4O8YVgf3nCzpfGTO9ODiZ/Kcb3UC8iGXN3ZXkw3CwI5 rvHm0gBXqSyBBrGoPoOQaxQwZl5wGYhp1kfg2+MV6goIXgDuVH5BaQ5MdBINJh7QphHQ 8q1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=wOCYN+jzLs/4kBPY4NAqUToHj5UBafCkwd93ka/tXl4=; b=Wj2QnjWsRcTXE3ENzBGPyeF/FKZXb8l1KIPdxFUHAPdLVtEolzRiyXw7Cfe/V/pect aGagPJk6Cr7b3aehdKroYzSVyjQqcci2HzlyoRBsmLaigm6PIHSzyGxQpLQNR4v/i2Ds DEJYNAswoetmEVrs61O+6ezCl0yvEuPx6wJpxn6DoFmg7WWokCJprjZYmZTWZgzX5gyD UK9Xs4i5cG0ykX+6oT4Zxk1e2hg03X7sY2qBcvc8/JXyjK2zF+AA4qSZ0/KDYmXJtMMJ EjniBj1Z3wyklXDYZpSq1ZWs5xJVoDFLKuF22RzMHIOFqGNn6VQ5ubjmjKv8lqOipSsD FjrQ== X-Gm-Message-State: AD7BkJK7Ssw7yhIivSHuS8xqUfdaHyFDBvS+olESMvxXGUypI8+5tRoELgd0DEqJ6NgI4Q== X-Received: by 10.98.80.78 with SMTP id e75mr19141028pfb.147.1456959383312; Wed, 02 Mar 2016 14:56:23 -0800 (PST) Received: from dl.caveonetworks.com ([64.2.3.194]) by smtp.gmail.com with ESMTPSA id 83sm35704813pfr.22.2016.03.02.14.56.14 (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 02 Mar 2016 14:56:19 -0800 (PST) Received: from dl.caveonetworks.com (localhost.localdomain [127.0.0.1]) by dl.caveonetworks.com (8.14.5/8.14.5) with ESMTP id u22MuEwa002095; Wed, 2 Mar 2016 14:56:14 -0800 Received: (from ddaney@localhost) by dl.caveonetworks.com (8.14.5/8.14.5/Submit) id u22MuEQm002094; Wed, 2 Mar 2016 14:56:14 -0800 From: David Daney To: Will Deacon , linux-arm-kernel@lists.infradead.org, Rob Herring , Frank Rowand , Grant Likely , Pawel Moll , Ian Campbell , Kumar Gala , Ganapatrao Kulkarni , Robert Richter , Ard Biesheuvel , Matt Fleming , Mark Rutland , Catalin Marinas Subject: [PATCH v13 5/6] arm64, numa: Add NUMA support for arm64 platforms. Date: Wed, 2 Mar 2016 14:56:01 -0800 Message-Id: <1456959362-2036-6-git-send-email-ddaney.cavm@gmail.com> X-Mailer: git-send-email 1.7.11.7 In-Reply-To: <1456959362-2036-1-git-send-email-ddaney.cavm@gmail.com> References: <1456959362-2036-1-git-send-email-ddaney.cavm@gmail.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20160302_145643_518793_B84D1558 X-CRM114-Status: GOOD ( 29.14 ) X-Spam-Score: -2.7 (--) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: devicetree@vger.kernel.org, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, David Daney MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Ganapatrao Kulkarni Attempt to get the memory and CPU NUMA node via of_numa. If that fails, default the dummy NUMA node and map all memory and CPUs to node 0. Tested-by: Shannon Zhao Reviewed-by: Robert Richter Signed-off-by: Ganapatrao Kulkarni Signed-off-by: David Daney --- arch/arm64/Kconfig | 26 +++ arch/arm64/include/asm/mmzone.h | 12 ++ arch/arm64/include/asm/numa.h | 45 +++++ arch/arm64/include/asm/topology.h | 10 + arch/arm64/kernel/pci.c | 10 + arch/arm64/kernel/setup.c | 4 + arch/arm64/kernel/smp.c | 4 + arch/arm64/mm/Makefile | 1 + arch/arm64/mm/init.c | 34 +++- arch/arm64/mm/mmu.c | 1 + arch/arm64/mm/numa.c | 396 ++++++++++++++++++++++++++++++++++++++ 11 files changed, 538 insertions(+), 5 deletions(-) create mode 100644 arch/arm64/include/asm/mmzone.h create mode 100644 arch/arm64/include/asm/numa.h create mode 100644 arch/arm64/mm/numa.c diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 39f2203..7013087 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -74,6 +74,7 @@ config ARM64 select HAVE_HW_BREAKPOINT if PERF_EVENTS select HAVE_IRQ_TIME_ACCOUNTING select HAVE_MEMBLOCK + select HAVE_MEMBLOCK_NODE_MAP if NUMA select HAVE_PATA_PLATFORM select HAVE_PERF_EVENTS select HAVE_PERF_REGS @@ -96,6 +97,7 @@ config ARM64 select SYSCTL_EXCEPTION_TRACE select HAVE_CONTEXT_TRACKING select HAVE_ARM_SMCCC + select OF_NUMA if NUMA && OF help ARM 64-bit (AArch64) Linux support. @@ -545,6 +547,30 @@ config HOTPLUG_CPU Say Y here to experiment with turning CPUs off and on. CPUs can be controlled through /sys/devices/system/cpu. +# Common NUMA Features +config NUMA + bool "Numa Memory Allocation and Scheduler Support" + depends on SMP + help + Enable NUMA (Non Uniform Memory Access) support. + + The kernel will try to allocate memory used by a CPU on the + local memory of the CPU and add some more + NUMA awareness to the kernel. + +config NODES_SHIFT + int "Maximum NUMA Nodes (as a power of 2)" + range 1 10 + default "2" + depends on NEED_MULTIPLE_NODES + help + Specify the maximum number of NUMA Nodes available on the target + system. Increases memory reserved to accommodate various tables. + +config USE_PERCPU_NUMA_NODE_ID + def_bool y + depends on NUMA + source kernel/Kconfig.preempt source kernel/Kconfig.hz diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h new file mode 100644 index 0000000..a0de9e6 --- /dev/null +++ b/arch/arm64/include/asm/mmzone.h @@ -0,0 +1,12 @@ +#ifndef __ASM_MMZONE_H +#define __ASM_MMZONE_H + +#ifdef CONFIG_NUMA + +#include + +extern struct pglist_data *node_data[]; +#define NODE_DATA(nid) (node_data[(nid)]) + +#endif /* CONFIG_NUMA */ +#endif /* __ASM_MMZONE_H */ diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h new file mode 100644 index 0000000..e9b4f29 --- /dev/null +++ b/arch/arm64/include/asm/numa.h @@ -0,0 +1,45 @@ +#ifndef __ASM_NUMA_H +#define __ASM_NUMA_H + +#include + +#ifdef CONFIG_NUMA + +/* currently, arm64 implements flat NUMA topology */ +#define parent_node(node) (node) + +int __node_distance(int from, int to); +#define node_distance(a, b) __node_distance(a, b) + +extern nodemask_t numa_nodes_parsed __initdata; + +/* Mappings between node number and cpus on that node. */ +extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES]; +void numa_clear_node(unsigned int cpu); + +#ifdef CONFIG_DEBUG_PER_CPU_MAPS +const struct cpumask *cpumask_of_node(int node); +#else +/* Returns a pointer to the cpumask of CPUs on Node 'node'. */ +static inline const struct cpumask *cpumask_of_node(int node) +{ + return node_to_cpumask_map[node]; +} +#endif + +void __init arm64_numa_init(void); +int __init numa_add_memblk(int nodeid, u64 start, u64 end); +void __init numa_set_distance(int from, int to, int distance); +void __init numa_free_distance(void); +void __init early_map_cpu_to_node(unsigned int cpu, int nid); +void numa_store_cpu_info(unsigned int cpu); + +#else /* CONFIG_NUMA */ + +static inline void numa_store_cpu_info(unsigned int cpu) { } +static inline void arm64_numa_init(void) { } +static inline void early_map_cpu_to_node(unsigned int cpu, int nid) { } + +#endif /* CONFIG_NUMA */ + +#endif /* __ASM_NUMA_H */ diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h index a3e9d6f..8b57339 100644 --- a/arch/arm64/include/asm/topology.h +++ b/arch/arm64/include/asm/topology.h @@ -22,6 +22,16 @@ void init_cpu_topology(void); void store_cpu_topology(unsigned int cpuid); const struct cpumask *cpu_coregroup_mask(int cpu); +#ifdef CONFIG_NUMA + +struct pci_bus; +int pcibus_to_node(struct pci_bus *bus); +#define cpumask_of_pcibus(bus) (pcibus_to_node(bus) == -1 ? \ + cpu_all_mask : \ + cpumask_of_node(pcibus_to_node(bus))) + +#endif /* CONFIG_NUMA */ + #include #endif /* _ASM_ARM_TOPOLOGY_H */ diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c index b3d098b..65e6b7d 100644 --- a/arch/arm64/kernel/pci.c +++ b/arch/arm64/kernel/pci.c @@ -76,6 +76,16 @@ int raw_pci_write(unsigned int domain, unsigned int bus, return -ENXIO; } +#ifdef CONFIG_NUMA + +int pcibus_to_node(struct pci_bus *bus) +{ + return dev_to_node(&bus->dev); +} +EXPORT_SYMBOL(pcibus_to_node); + +#endif + #ifdef CONFIG_ACPI /* Root bridge scanning */ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index feae073..52b38d4 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include #include @@ -371,6 +372,9 @@ static int __init topology_init(void) { int i; + for_each_online_node(i) + register_one_node(i); + for_each_possible_cpu(i) { struct cpu *cpu = &per_cpu(cpu_data.cpu, i); cpu->hotpluggable = 1; diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index b1adc51..46c45c8 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -45,6 +45,7 @@ #include #include #include +#include #include #include #include @@ -125,6 +126,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) static void smp_store_cpu_info(unsigned int cpuid) { store_cpu_topology(cpuid); + numa_store_cpu_info(cpuid); } /* @@ -518,6 +520,8 @@ static void __init of_parse_and_init_cpus(void) pr_debug("cpu logical map 0x%llx\n", hwid); cpu_logical_map(cpu_count) = hwid; + + early_map_cpu_to_node(cpu_count, of_node_to_nid(dn)); next: cpu_count++; } diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile index 57f57fd..54bb209 100644 --- a/arch/arm64/mm/Makefile +++ b/arch/arm64/mm/Makefile @@ -4,6 +4,7 @@ obj-y := dma-mapping.o extable.o fault.o init.o \ context.o proc.o pageattr.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o obj-$(CONFIG_ARM64_PTDUMP) += dump.o +obj-$(CONFIG_NUMA) += numa.o obj-$(CONFIG_KASAN) += kasan_init.o KASAN_SANITIZE_kasan_init.o := n diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index f3b061e..c39c670 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -37,6 +37,7 @@ #include #include +#include #include #include #include @@ -77,6 +78,21 @@ static phys_addr_t __init max_zone_dma_phys(void) return min(offset + (1ULL << 32), memblock_end_of_DRAM()); } +#ifdef CONFIG_NUMA + +static void __init zone_sizes_init(unsigned long min, unsigned long max) +{ + unsigned long max_zone_pfns[MAX_NR_ZONES] = {0}; + + if (IS_ENABLED(CONFIG_ZONE_DMA)) + max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys()); + max_zone_pfns[ZONE_NORMAL] = max; + + free_area_init_nodes(max_zone_pfns); +} + +#else + static void __init zone_sizes_init(unsigned long min, unsigned long max) { struct memblock_region *reg; @@ -117,6 +133,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max) free_area_init_node(0, zone_size, min, zhole_size); } +#endif /* CONFIG_NUMA */ + #ifdef CONFIG_HAVE_ARCH_PFN_VALID int pfn_valid(unsigned long pfn) { @@ -133,10 +151,15 @@ static void __init arm64_memory_present(void) static void __init arm64_memory_present(void) { struct memblock_region *reg; + int nid = 0; - for_each_memblock(memory, reg) - memory_present(0, memblock_region_memory_base_pfn(reg), - memblock_region_memory_end_pfn(reg)); + for_each_memblock(memory, reg) { +#ifdef CONFIG_NUMA + nid = reg->nid; +#endif + memory_present(nid, memblock_region_memory_base_pfn(reg), + memblock_region_memory_end_pfn(reg)); + } } #endif @@ -181,7 +204,6 @@ void __init arm64_memblock_init(void) dma_contiguous_reserve(arm64_dma_phys_limit); memblock_allow_resize(); - memblock_dump_all(); } void __init bootmem_init(void) @@ -193,6 +215,9 @@ void __init bootmem_init(void) early_memtest(min << PAGE_SHIFT, max << PAGE_SHIFT); + max_pfn = max_low_pfn = max; + + arm64_numa_init(); /* * Sparsemem tries to allocate bootmem in memory_present(), so must be * done after the fixed reservations. @@ -203,7 +228,6 @@ void __init bootmem_init(void) zone_sizes_init(min, max); high_memory = __va((max << PAGE_SHIFT) - 1) + 1; - max_pfn = max_low_pfn = max; } #ifndef CONFIG_SPARSEMEM_VMEMMAP diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index ee6e6b0..7bc92c1 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -468,6 +468,7 @@ void __init paging_init(void) zero_page = early_alloc(PAGE_SIZE); bootmem_init(); + memblock_dump_all(); empty_zero_page = virt_to_page(zero_page); diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c new file mode 100644 index 0000000..98dc104 --- /dev/null +++ b/arch/arm64/mm/numa.c @@ -0,0 +1,396 @@ +/* + * NUMA support, based on the x86 implementation. + * + * Copyright (C) 2015 Cavium Inc. + * Author: Ganapatrao Kulkarni + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see . + */ + +#include +#include +#include +#include + +struct pglist_data *node_data[MAX_NUMNODES] __read_mostly; +EXPORT_SYMBOL(node_data); +nodemask_t numa_nodes_parsed __initdata; +static int cpu_to_node_map[NR_CPUS] = { [0 ... NR_CPUS-1] = NUMA_NO_NODE }; + +static int numa_distance_cnt; +static u8 *numa_distance; +static int numa_off; + +static __init int numa_parse_early_param(char *opt) +{ + if (!opt) + return -EINVAL; + if (!strncmp(opt, "off", 3)) { + pr_info("%s\n", "NUMA turned off"); + numa_off = 1; + } + return 0; +} +early_param("numa", numa_parse_early_param); + +cpumask_var_t node_to_cpumask_map[MAX_NUMNODES]; +EXPORT_SYMBOL(node_to_cpumask_map); + +#ifdef CONFIG_DEBUG_PER_CPU_MAPS + +/* + * Returns a pointer to the bitmask of CPUs on Node 'node'. + */ +const struct cpumask *cpumask_of_node(int node) +{ + if (WARN_ON(node >= nr_node_ids)) + return cpu_none_mask; + + if (WARN_ON(node_to_cpumask_map[node] == NULL)) + return cpu_online_mask; + + return node_to_cpumask_map[node]; +} +EXPORT_SYMBOL(cpumask_of_node); + +#endif + +static void map_cpu_to_node(unsigned int cpu, int nid) +{ + set_cpu_numa_node(cpu, nid); + if (nid >= 0) + cpumask_set_cpu(cpu, node_to_cpumask_map[nid]); +} + +void numa_clear_node(unsigned int cpu) +{ + int nid = cpu_to_node(cpu); + + if (nid >= 0) + cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]); + set_cpu_numa_node(cpu, NUMA_NO_NODE); +} + +/* + * Allocate node_to_cpumask_map based on number of available nodes + * Requires node_possible_map to be valid. + * + * Note: cpumask_of_node() is not valid until after this is done. + * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.) + */ +static void __init setup_node_to_cpumask_map(void) +{ + unsigned int cpu; + int node; + + /* setup nr_node_ids if not done yet */ + if (nr_node_ids == MAX_NUMNODES) + setup_nr_node_ids(); + + /* allocate and clear the mapping */ + for (node = 0; node < nr_node_ids; node++) { + alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]); + cpumask_clear(node_to_cpumask_map[node]); + } + + for_each_possible_cpu(cpu) + set_cpu_numa_node(cpu, NUMA_NO_NODE); + + /* cpumask_of_node() will now work */ + pr_debug("NUMA: Node to cpumask map for %d nodes\n", nr_node_ids); +} + +/* + * Set the cpu to node and mem mapping + */ +void numa_store_cpu_info(unsigned int cpu) +{ + map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]); +} + +void __init early_map_cpu_to_node(unsigned int cpu, int nid) +{ + /* fallback to node 0 */ + if (nid < 0 || nid >= MAX_NUMNODES) + nid = 0; + + cpu_to_node_map[cpu] = nid; +} + +/** + * numa_add_memblk - Set node id to memblk + * @nid: NUMA node ID of the new memblk + * @start: Start address of the new memblk + * @size: Size of the new memblk + * + * RETURNS: + * 0 on success, -errno on failure. + */ +int __init numa_add_memblk(int nid, u64 start, u64 size) +{ + int ret; + + ret = memblock_set_node(start, size, &memblock.memory, nid); + if (ret < 0) { + pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n", + start, (start + size - 1), nid); + return ret; + } + + node_set(nid, numa_nodes_parsed); + pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n", + start, (start + size - 1), nid); + return ret; +} + +/** + * Initialize NODE_DATA for a node on the local memory + */ +static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn) +{ + const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES); + u64 nd_pa; + void *nd; + int tnid; + + pr_info("NUMA: Initmem setup node %d [mem %#010Lx-%#010Lx]\n", + nid, start_pfn << PAGE_SHIFT, + (end_pfn << PAGE_SHIFT) - 1); + + nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid); + nd = __va(nd_pa); + + /* report and initialize */ + pr_info("NUMA: NODE_DATA [mem %#010Lx-%#010Lx]\n", + nd_pa, nd_pa + nd_size - 1); + tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT); + if (tnid != nid) + pr_info("NUMA: NODE_DATA(%d) on node %d\n", nid, tnid); + + node_data[nid] = nd; + memset(NODE_DATA(nid), 0, sizeof(pg_data_t)); + NODE_DATA(nid)->node_id = nid; + NODE_DATA(nid)->node_start_pfn = start_pfn; + NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn; +} + +/** + * numa_free_distance + * + * The current table is freed. + */ +void __init numa_free_distance(void) +{ + size_t size; + + if (!numa_distance) + return; + + size = numa_distance_cnt * numa_distance_cnt * + sizeof(numa_distance[0]); + + memblock_free(__pa(numa_distance), size); + numa_distance_cnt = 0; + numa_distance = NULL; +} + +/** + * + * Create a new NUMA distance table. + * + */ +static int __init numa_alloc_distance(void) +{ + size_t size; + u64 phys; + int i, j; + + size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]); + phys = memblock_find_in_range(0, PFN_PHYS(max_pfn), + size, PAGE_SIZE); + if (WARN_ON(!phys)) + return -ENOMEM; + + memblock_reserve(phys, size); + + numa_distance = __va(phys); + numa_distance_cnt = nr_node_ids; + + /* fill with the default distances */ + for (i = 0; i < numa_distance_cnt; i++) + for (j = 0; j < numa_distance_cnt; j++) + numa_distance[i * numa_distance_cnt + j] = i == j ? + LOCAL_DISTANCE : REMOTE_DISTANCE; + + pr_debug("NUMA: Initialized distance table, cnt=%d\n", + numa_distance_cnt); + + return 0; +} + +/** + * numa_set_distance - Set inter node NUMA distance from node to node. + * @from: the 'from' node to set distance + * @to: the 'to' node to set distance + * @distance: NUMA distance + * + * Set the distance from node @from to @to to @distance. + * If distance table doesn't exist, a warning is printed. + * + * If @from or @to is higher than the highest known node or lower than zero + * or @distance doesn't make sense, the call is ignored. + * + */ +void __init numa_set_distance(int from, int to, int distance) +{ + if (!numa_distance) { + pr_warn_once("NUMA: Warning: distance table not allocated yet\n"); + return; + } + + if (from >= numa_distance_cnt || to >= numa_distance_cnt || + from < 0 || to < 0) { + pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n", + from, to, distance); + return; + } + + if ((u8)distance != distance || + (from == to && distance != LOCAL_DISTANCE)) { + pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n", + from, to, distance); + return; + } + + numa_distance[from * numa_distance_cnt + to] = distance; +} + +/** + * Return NUMA distance @from to @to + */ +int __node_distance(int from, int to) +{ + if (from >= numa_distance_cnt || to >= numa_distance_cnt) + return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE; + return numa_distance[from * numa_distance_cnt + to]; +} +EXPORT_SYMBOL(__node_distance); + +static int __init numa_register_nodes(void) +{ + int nid; + struct memblock_region *mblk; + + /* Check that valid nid is set to memblks */ + for_each_memblock(memory, mblk) + if (mblk->nid == NUMA_NO_NODE || mblk->nid >= MAX_NUMNODES) { + pr_warn("NUMA: Warning: invalid memblk node %d [mem %#010Lx-%#010Lx]\n", + mblk->nid, mblk->base, + mblk->base + mblk->size - 1); + return -EINVAL; + } + + /* Finally register nodes. */ + for_each_node_mask(nid, numa_nodes_parsed) { + unsigned long start_pfn, end_pfn; + + get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); + setup_node_data(nid, start_pfn, end_pfn); + node_set_online(nid); + } + + /* Setup online nodes to actual nodes*/ + node_possible_map = numa_nodes_parsed; + + return 0; +} + +static int __init numa_init(int (*init_func)(void)) +{ + int ret; + + nodes_clear(numa_nodes_parsed); + nodes_clear(node_possible_map); + nodes_clear(node_online_map); + numa_free_distance(); + + ret = numa_alloc_distance(); + if (ret < 0) + return ret; + + ret = init_func(); + if (ret < 0) + return ret; + + if (nodes_empty(numa_nodes_parsed)) + return -EINVAL; + + ret = numa_register_nodes(); + if (ret < 0) + return ret; + + setup_node_to_cpumask_map(); + + /* init boot processor */ + cpu_to_node_map[0] = 0; + map_cpu_to_node(0, 0); + + return 0; +} + +/** + * dummy_numa_init - Fallback dummy NUMA init + * + * Used if there's no underlying NUMA architecture, NUMA initialization + * fails, or NUMA is disabled on the command line. + * + * Must online at least one node (node 0) and add memory blocks that cover all + * allowed memory. It is unlikely that this function fails. + */ +static int __init dummy_numa_init(void) +{ + int ret; + struct memblock_region *mblk; + + pr_info("%s\n", "No NUMA configuration found"); + pr_info("NUMA: Faking a node at [mem %#018Lx-%#018Lx]\n", + 0LLU, PFN_PHYS(max_pfn) - 1); + + for_each_memblock(memory, mblk) { + ret = numa_add_memblk(0, mblk->base, mblk->size); + if (!ret) + continue; + + pr_err("NUMA init failed\n"); + return ret; + } + + numa_off = 1; + return 0; +} + +/** + * arm64_numa_init - Initialize NUMA + * + * Try each configured NUMA initialization method until one succeeds. The + * last fallback is dummy single node config encomapssing whole memory. + */ +void __init arm64_numa_init(void) +{ + if (!numa_off) { + if (!numa_init(of_numa_init)) + return; + } + + numa_init(dummy_numa_init); +}