Message ID | 1455930799-5371-9-git-send-email-ddaney.cavm@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Feb 19, 2016 at 05:13:17PM -0800, David Daney wrote: > From: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> > > ADD device tree node parsing for NUMA topology using device > "numa-node-id" property distance-map. I still want an adequate explanation why NUMA setup cannot be done with an unflattened tree. PowerPC manages to do that and should have a similar init flow being memblock based, so I would expect arm64 can too. Rob
On 02/23/2016 11:36 AM, Rob Herring wrote: > On Fri, Feb 19, 2016 at 05:13:17PM -0800, David Daney wrote: >> From: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> >> >> ADD device tree node parsing for NUMA topology using device >> "numa-node-id" property distance-map. > > I still want an adequate explanation why NUMA setup cannot be done with > an unflattened tree. PowerPC manages to do that and should have a > similar init flow being memblock based, so I would expect arm64 can too. Many things could be done. Really, we want to know what *should* be done. In the context of the current arm64 memory initialization we (more or less) do: 1) early_init_fdt_scan_reserved_mem(); 2) memory_present() 3) sparse_init() 4) other things 5) unflatten_device_tree() We are already reading information out of the FDT at #1. This patch set adds a step between 1 and 2 where we read NUMA information out of the FDT. Hypothetically, it might be possible to rewrite the arm64 setup code so that the ordering was different, and the NUMA setup was done on the unflattened tree, but that would certainly be a much more invasive patch. If the arm64 maintainers would like a rewrite of: arch/arm64/kernel/setup.c arch/arm64/mm/init.c arch/arm64/mm/mm/mmu.c . . . we can discuss doing NUMA setup with the unflattened tree. With the current memory initialization code, I think it makes more sense to parse the NUMA information out of the flattened form. David Daney
On Thu, Feb 25, 2016 at 05:26:34PM -0800, David Daney wrote: > On 02/23/2016 11:36 AM, Rob Herring wrote: > >On Fri, Feb 19, 2016 at 05:13:17PM -0800, David Daney wrote: > >>From: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> > >> > >>ADD device tree node parsing for NUMA topology using device > >>"numa-node-id" property distance-map. > > > >I still want an adequate explanation why NUMA setup cannot be done with > >an unflattened tree. PowerPC manages to do that and should have a > >similar init flow being memblock based, so I would expect arm64 can too. > > Many things could be done. Really, we want to know what *should* be done. > > In the context of the current arm64 memory initialization we (more or less) > do: > > 1) early_init_fdt_scan_reserved_mem(); > 2) memory_present() > 3) sparse_init() > 4) other things > 5) unflatten_device_tree() > > We are already reading information out of the FDT at #1. > > This patch set adds a step between 1 and 2 where we read NUMA information > out of the FDT. > > Hypothetically, it might be possible to rewrite the arm64 setup code so that > the ordering was different, and the NUMA setup was done on the unflattened > tree, but that would certainly be a much more invasive patch. I just looked at what PPC get up to, and there's really not an obvious way we could do that on arm64. They run whole swathes of the kernel with the MMU off directly from head.S to parse the flattened tree and get memblock up really early. On arm64, the head.S environment is considerably more hostile, and I don't think we'd want to do that (not to mention the interaction with EFI stub). So I'm perfectly happy for this to operate on the flattened tree. Will
On Thu, Feb 25, 2016 at 7:26 PM, David Daney <ddaney@caviumnetworks.com> wrote: > On 02/23/2016 11:36 AM, Rob Herring wrote: >> >> On Fri, Feb 19, 2016 at 05:13:17PM -0800, David Daney wrote: >>> >>> From: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> >>> >>> ADD device tree node parsing for NUMA topology using device >>> "numa-node-id" property distance-map. >> >> >> I still want an adequate explanation why NUMA setup cannot be done with >> an unflattened tree. PowerPC manages to do that and should have a >> similar init flow being memblock based, so I would expect arm64 can too. > > > Many things could be done. Really, we want to know what *should* be done. > > In the context of the current arm64 memory initialization we (more or less) > do: > > 1) early_init_fdt_scan_reserved_mem(); > 2) memory_present() > 3) sparse_init() > 4) other things > 5) unflatten_device_tree() > > We are already reading information out of the FDT at #1. > > This patch set adds a step between 1 and 2 where we read NUMA information > out of the FDT. The dependency on unflattening is that memblock is up and we can allocate a chunk from it. Isn't that dependency met by step 1 or is there a dependency on sparsemem (or something else)? Rob
On Fri, Feb 26, 2016 at 12:27 PM, Will Deacon <will.deacon@arm.com> wrote: > On Thu, Feb 25, 2016 at 05:26:34PM -0800, David Daney wrote: >> On 02/23/2016 11:36 AM, Rob Herring wrote: >> >On Fri, Feb 19, 2016 at 05:13:17PM -0800, David Daney wrote: >> >>From: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> >> >> >> >>ADD device tree node parsing for NUMA topology using device >> >>"numa-node-id" property distance-map. >> > >> >I still want an adequate explanation why NUMA setup cannot be done with >> >an unflattened tree. PowerPC manages to do that and should have a >> >similar init flow being memblock based, so I would expect arm64 can too. >> >> Many things could be done. Really, we want to know what *should* be done. >> >> In the context of the current arm64 memory initialization we (more or less) >> do: >> >> 1) early_init_fdt_scan_reserved_mem(); >> 2) memory_present() >> 3) sparse_init() >> 4) other things >> 5) unflatten_device_tree() >> >> We are already reading information out of the FDT at #1. >> >> This patch set adds a step between 1 and 2 where we read NUMA information >> out of the FDT. >> >> Hypothetically, it might be possible to rewrite the arm64 setup code so that >> the ordering was different, and the NUMA setup was done on the unflattened >> tree, but that would certainly be a much more invasive patch. > > I just looked at what PPC get up to, and there's really not an obvious > way we could do that on arm64. They run whole swathes of the kernel > with the MMU off directly from head.S to parse the flattened tree and > get memblock up really early. On arm64, the head.S environment is > considerably more hostile, and I don't think we'd want to do that (not > to mention the interaction with EFI stub). > > So I'm perfectly happy for this to operate on the flattened tree. Of course you are. The code is not in your tree. I'm all for keeping things out of /arch, but just want to understand what are the dependencies which aren't clearly spelled out here. Rob
On 03/01/2016 08:47 AM, Rob Herring wrote: > On Thu, Feb 25, 2016 at 7:26 PM, David Daney <ddaney@caviumnetworks.com> wrote: >> On 02/23/2016 11:36 AM, Rob Herring wrote: >>> >>> On Fri, Feb 19, 2016 at 05:13:17PM -0800, David Daney wrote: >>>> >>>> From: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> >>>> >>>> ADD device tree node parsing for NUMA topology using device >>>> "numa-node-id" property distance-map. >>> >>> >>> I still want an adequate explanation why NUMA setup cannot be done with >>> an unflattened tree. PowerPC manages to do that and should have a >>> similar init flow being memblock based, so I would expect arm64 can too. >> >> >> Many things could be done. Really, we want to know what *should* be done. >> >> In the context of the current arm64 memory initialization we (more or less) >> do: >> >> 1) early_init_fdt_scan_reserved_mem(); >> 2) memory_present() >> 3) sparse_init() >> 4) other things >> 5) unflatten_device_tree() >> >> We are already reading information out of the FDT at #1. >> >> This patch set adds a step between 1 and 2 where we read NUMA information >> out of the FDT. > > The dependency on unflattening is that memblock is up and we can > allocate a chunk from it. Isn't that dependency met by step 1 No. > or is > there a dependency on sparsemem (or something else)? Will Deacon talked about this over here: https://lkml.org/lkml/2016/2/26/782 I am happy to modify the patch set, but I don't want to get stuck as an intermediary between two opposing blocs. David Daney
On Tue, Mar 1, 2016 at 10:57 AM, David Daney <ddaney@caviumnetworks.com> wrote: > On 03/01/2016 08:47 AM, Rob Herring wrote: >> >> On Thu, Feb 25, 2016 at 7:26 PM, David Daney <ddaney@caviumnetworks.com> >> wrote: >>> >>> On 02/23/2016 11:36 AM, Rob Herring wrote: >>>> >>>> >>>> On Fri, Feb 19, 2016 at 05:13:17PM -0800, David Daney wrote: >>>>> >>>>> >>>>> From: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> >>>>> >>>>> ADD device tree node parsing for NUMA topology using device >>>>> "numa-node-id" property distance-map. >>>> >>>> >>>> >>>> I still want an adequate explanation why NUMA setup cannot be done with >>>> an unflattened tree. PowerPC manages to do that and should have a >>>> similar init flow being memblock based, so I would expect arm64 can too. >>> >>> >>> >>> Many things could be done. Really, we want to know what *should* be >>> done. >>> >>> In the context of the current arm64 memory initialization we (more or >>> less) >>> do: >>> >>> 1) early_init_fdt_scan_reserved_mem(); >>> 2) memory_present() >>> 3) sparse_init() >>> 4) other things >>> 5) unflatten_device_tree() >>> >>> We are already reading information out of the FDT at #1. >>> >>> This patch set adds a step between 1 and 2 where we read NUMA information >>> out of the FDT. >> >> >> The dependency on unflattening is that memblock is up and we can >> allocate a chunk from it. Isn't that dependency met by step 1 > > > No. Really, because it seems that numa_alloc_distance is essentially doing a memblock alloc and that happens before memory_present. > >> or is >> there a dependency on sparsemem (or something else)? > > > Will Deacon talked about this over here: > > https://lkml.org/lkml/2016/2/26/782 I'm not saying to move memblock setup earlier nor before the MMU is on, so I don't see how Will's reply is relevant other than PPC doesn't serve as an example. Maybe PPC should be ignored because I think maybe NUMA is only used on non-FDT systems. In any case, no one has clearly explained what the dependencies are or what happens if you moved the unflattening up sooner. You told me what the current order is which doesn't equate to dependencies. For example, step 4 may or may not be a dependency of step 5. These are the dependencies I'm aware of: memblock dependent on DT memory and reserved-memory parsing unflattening dependent on memblock_alloc() sparsemem dependent on NUMA parsing and memblock What am I missing from here? Rob
On 03/01/2016 09:43 AM, Rob Herring wrote: > On Tue, Mar 1, 2016 at 10:57 AM, David Daney <ddaney@caviumnetworks.com> wrote: >> On 03/01/2016 08:47 AM, Rob Herring wrote: >>> >>> On Thu, Feb 25, 2016 at 7:26 PM, David Daney <ddaney@caviumnetworks.com> >>> wrote: >>>> >>>> On 02/23/2016 11:36 AM, Rob Herring wrote: >>>>> >>>>> >>>>> On Fri, Feb 19, 2016 at 05:13:17PM -0800, David Daney wrote: >>>>>> >>>>>> >>>>>> From: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> >>>>>> >>>>>> ADD device tree node parsing for NUMA topology using device >>>>>> "numa-node-id" property distance-map. >>>>> >>>>> >>>>> >>>>> I still want an adequate explanation why NUMA setup cannot be done with >>>>> an unflattened tree. PowerPC manages to do that and should have a >>>>> similar init flow being memblock based, so I would expect arm64 can too. >>>> >>>> >>>> >>>> Many things could be done. Really, we want to know what *should* be >>>> done. >>>> >>>> In the context of the current arm64 memory initialization we (more or >>>> less) >>>> do: >>>> >>>> 1) early_init_fdt_scan_reserved_mem(); >>>> 2) memory_present() >>>> 3) sparse_init() >>>> 4) other things >>>> 5) unflatten_device_tree() >>>> >>>> We are already reading information out of the FDT at #1. >>>> >>>> This patch set adds a step between 1 and 2 where we read NUMA information >>>> out of the FDT. >>> >>> >>> The dependency on unflattening is that memblock is up and we can >>> allocate a chunk from it. Isn't that dependency met by step 1 >> >> >> No. > > Really, because it seems that numa_alloc_distance is essentially doing > a memblock alloc and that happens before memory_present. > >> >>> or is >>> there a dependency on sparsemem (or something else)? >> >> >> Will Deacon talked about this over here: >> >> https://lkml.org/lkml/2016/2/26/782 > > I'm not saying to move memblock setup earlier nor before the MMU is > on, so I don't see how Will's reply is relevant other than PPC doesn't > serve as an example. Maybe PPC should be ignored because I think maybe > NUMA is only used on non-FDT systems. > > In any case, no one has clearly explained what the dependencies are or > what happens if you moved the unflattening up sooner. You told me what > the current order is which doesn't equate to dependencies. For > example, step 4 may or may not be a dependency of step 5. These are > the dependencies I'm aware of: > > memblock dependent on DT memory and reserved-memory parsing > unflattening dependent on memblock_alloc() > sparsemem dependent on NUMA parsing and memblock > I understand what you are saying. Let me go back over the code looking to separate the issues of the inertia of the initial implementation, the need to cleanly support both EFI and non-EFI firmware, and your desire to unflatten the device tree much earlier than we currently do. At this point, arm64 is the only user of the of_numa.c file. So, if you think it is not general purpose enough to live in drivers/of we could discuss the possibility of moving it under arch/arm64 David.
diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig index e2a4841..b3bec3a 100644 --- a/drivers/of/Kconfig +++ b/drivers/of/Kconfig @@ -112,4 +112,7 @@ config OF_OVERLAY While this option is selected automatically when needed, you can enable it manually to improve device tree unit test coverage. +config OF_NUMA + bool + endif # OF diff --git a/drivers/of/Makefile b/drivers/of/Makefile index 156c072..bee3fa9 100644 --- a/drivers/of/Makefile +++ b/drivers/of/Makefile @@ -14,5 +14,6 @@ obj-$(CONFIG_OF_MTD) += of_mtd.o obj-$(CONFIG_OF_RESERVED_MEM) += of_reserved_mem.o obj-$(CONFIG_OF_RESOLVE) += resolver.o obj-$(CONFIG_OF_OVERLAY) += overlay.o +obj-$(CONFIG_OF_NUMA) += of_numa.o obj-$(CONFIG_OF_UNITTEST) += unittest-data/ diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c new file mode 100644 index 0000000..a691d06 --- /dev/null +++ b/drivers/of/of_numa.c @@ -0,0 +1,211 @@ +/* + * OF NUMA Parsing support. + * + * Copyright (C) 2015 - 2016 Cavium Inc. + * Author: Ganapatrao Kulkarni <gkulkarni@cavium.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include <linux/of.h> +#include <linux/of_fdt.h> +#include <linux/nodemask.h> + +#include <asm/numa.h> + +/* define default numa node to 0 */ +#define DEFAULT_NODE 0 + +/* Returns nid in the range [0..MAX_NUMNODES-1], + * or NUMA_NO_NODE if no valid numa-node-id entry found + * or DEFAULT_NODE if no numa-node-id entry exists + */ +static int of_numa_prop_to_nid(const __be32 *of_numa_prop, int length) +{ + int nid; + + if (!of_numa_prop) + return DEFAULT_NODE; + + if (length != sizeof(*of_numa_prop)) { + pr_warn("NUMA: Invalid of_numa_prop length %d found.\n", + length); + return NUMA_NO_NODE; + } + + nid = of_read_number(of_numa_prop, 1); + if (nid >= MAX_NUMNODES) { + pr_warn("NUMA: Invalid numa node %d found.\n", nid); + return NUMA_NO_NODE; + } + + return nid; +} + +static int __init early_init_of_node_to_nid(unsigned long node) +{ + int length; + const __be32 *of_numa_prop; + + of_numa_prop = of_get_flat_dt_prop(node, "numa-node-id", &length); + + return of_numa_prop_to_nid(of_numa_prop, length); +} + +/* + * Even though we connect cpus to numa domains later in SMP + * init, we need to know the node ids now for all cpus. +*/ +static int __init early_init_parse_cpu_node(unsigned long node) +{ + int nid; + const char *type = of_get_flat_dt_prop(node, "device_type", NULL); + + if (type == NULL) + return 0; + + if (strcmp(type, "cpu") != 0) + return 0; + + nid = early_init_of_node_to_nid(node); + if (nid == NUMA_NO_NODE) + return -EINVAL; + + node_set(nid, numa_nodes_parsed); + return 0; +} + +static int __init early_init_parse_memory_node(unsigned long node) +{ + const __be32 *reg, *endp; + int length; + int nid; + const char *type = of_get_flat_dt_prop(node, "device_type", NULL); + + if (type == NULL) + return 0; + + if (strcmp(type, "memory") != 0) + return 0; + + nid = early_init_of_node_to_nid(node); + if (nid == NUMA_NO_NODE) + return -EINVAL; + + reg = of_get_flat_dt_prop(node, "reg", &length); + endp = reg + (length / sizeof(__be32)); + + while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) { + u64 base, size; + + base = dt_mem_next_cell(dt_root_addr_cells, ®); + size = dt_mem_next_cell(dt_root_size_cells, ®); + pr_debug("NUMA: base = %llx , node = %u\n", + base, nid); + + if (numa_add_memblk(nid, base, size) < 0) + return -EINVAL; + } + + return 0; +} + +static int __init early_init_parse_distance_map_v1(unsigned long node, + const char *uname) +{ + const __be32 *prop_dist_matrix; + int length = 0, i, matrix_count; + int nr_size_cells = OF_ROOT_NODE_SIZE_CELLS_DEFAULT; + + pr_info("NUMA: parsing numa-distance-map-v1\n"); + + prop_dist_matrix = + of_get_flat_dt_prop(node, "distance-matrix", &length); + + if (!length) { + pr_err("NUMA: failed to parse distance-matrix\n"); + return -ENODEV; + } + + matrix_count = ((length / sizeof(__be32)) / (3 * nr_size_cells)); + + if ((matrix_count * sizeof(__be32) * 3 * nr_size_cells) != length) { + pr_warn("NUMA: invalid distance-matrix length %d\n", length); + return -EINVAL; + } + + for (i = 0; i < matrix_count; i++) { + u32 nodea, nodeb, distance; + + nodea = dt_mem_next_cell(nr_size_cells, &prop_dist_matrix); + nodeb = dt_mem_next_cell(nr_size_cells, &prop_dist_matrix); + distance = dt_mem_next_cell(nr_size_cells, &prop_dist_matrix); + numa_set_distance(nodea, nodeb, distance); + pr_debug("NUMA: distance[node%d -> node%d] = %d\n", + nodea, nodeb, distance); + + /* Set default distance of node B->A same as A->B */ + if (nodeb > nodea) + numa_set_distance(nodeb, nodea, distance); + } + + return 0; +} + +static int __init early_init_parse_distance_map(unsigned long node, + const char *uname) +{ + if (strcmp(uname, "distance-map") != 0) + return 0; + + if (of_flat_dt_is_compatible(node, "numa-distance-map-v1")) + return early_init_parse_distance_map_v1(node, uname); + + pr_err("NUMA: invalid distance-map device node\n"); + return -EINVAL; +} + +static int __init early_init_of_scan_numa_map(unsigned long node, + const char *uname, + int depth, void *data) +{ + int ret; + + ret = early_init_parse_cpu_node(node); + if (ret) + return ret; + + ret = early_init_parse_memory_node(node); + if (ret) + return ret; + + return early_init_parse_distance_map(node, uname); +} + +int of_node_to_nid(struct device_node *device) +{ + const __be32 *of_numa_prop; + int length; + + of_numa_prop = of_get_property(device, "numa-node-id", &length); + if (of_numa_prop) + return of_numa_prop_to_nid(of_numa_prop, length); + + return NUMA_NO_NODE; +} + +/* DT node mapping is done already early_init_of_scan_memory */ +int __init of_numa_init(void) +{ + return of_scan_flat_dt(early_init_of_scan_numa_map, NULL); +} diff --git a/include/linux/of.h b/include/linux/of.h index dc6e396..fe67a4c 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -685,6 +685,15 @@ static inline int of_node_to_nid(struct device_node *device) } #endif +#ifdef CONFIG_OF_NUMA +extern int of_numa_init(void); +#else +static inline int of_numa_init(void) +{ + return -ENOSYS; +} +#endif + static inline struct device_node *of_find_matching_node( struct device_node *from, const struct of_device_id *matches)