Message ID | 5241DB62.2090300@gmail.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Wed, Sep 25, 2013 at 02:35:14AM +0800, Zhang Yanfei wrote: > From: Tang Chen <tangchen@cn.fujitsu.com> > > The hot-Pluggable field in SRAT specifies which memory is hotpluggable. > As we mentioned before, if hotpluggable memory is used by the kernel, > it cannot be hot-removed. So memory hotplug users may want to set all > hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it. > > Memory hotplug users may also set a node as movable node, which has > ZONE_MOVABLE only, so that the whole node can be hot-removed. > > But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the > kernel cannot use memory in movable nodes. This will cause NUMA > performance down. And other users may be unhappy. > > So we need a way to allow users to enable and disable this functionality. > In this patch, we introduce movablenode boot option to allow users to > choose to not to consume hotpluggable memory at early boot time and > later we can set it as ZONE_MOVABLE. > > To achieve this, the movablenode boot option will control the memblock > allocation direction. That said, after memblock is ready, before SRAT is > parsed, we should allocate memory near the kernel image as we explained > in the previous patches. So if movablenode boot option is set, the kernel > does the following: > > 1. After memblock is ready, make memblock allocate memory bottom up. > 2. After SRAT is parsed, make memblock behave as default, allocate memory > top down. > > Users can specify "movablenode" in kernel commandline to enable this > functionality. For those who don't use memory hotplug or who don't want > to lose their NUMA performance, just don't specify anything. The kernel > will work as before. > > Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> > Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> I hope the param description and comment were better. Not necessarily longer, but clearer, so it'd be great if you can polish them a bit more. Other than that, Acked-by: Tejun Heo <tj@kernel.org> Thanks.
On 09/26/2013 10:53 PM, Tejun Heo wrote: > On Wed, Sep 25, 2013 at 02:35:14AM +0800, Zhang Yanfei wrote: >> From: Tang Chen <tangchen@cn.fujitsu.com> >> >> The hot-Pluggable field in SRAT specifies which memory is hotpluggable. >> As we mentioned before, if hotpluggable memory is used by the kernel, >> it cannot be hot-removed. So memory hotplug users may want to set all >> hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it. >> >> Memory hotplug users may also set a node as movable node, which has >> ZONE_MOVABLE only, so that the whole node can be hot-removed. >> >> But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the >> kernel cannot use memory in movable nodes. This will cause NUMA >> performance down. And other users may be unhappy. >> >> So we need a way to allow users to enable and disable this functionality. >> In this patch, we introduce movablenode boot option to allow users to >> choose to not to consume hotpluggable memory at early boot time and >> later we can set it as ZONE_MOVABLE. >> >> To achieve this, the movablenode boot option will control the memblock >> allocation direction. That said, after memblock is ready, before SRAT is >> parsed, we should allocate memory near the kernel image as we explained >> in the previous patches. So if movablenode boot option is set, the kernel >> does the following: >> >> 1. After memblock is ready, make memblock allocate memory bottom up. >> 2. After SRAT is parsed, make memblock behave as default, allocate memory >> top down. >> >> Users can specify "movablenode" in kernel commandline to enable this >> functionality. For those who don't use memory hotplug or who don't want >> to lose their NUMA performance, just don't specify anything. The kernel >> will work as before. >> >> Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> >> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> >> Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> > > I hope the param description and comment were better. Not necessarily > longer, but clearer, so it'd be great if you can polish them a bit OK. Trying below: movablenode [KNL,X86] This option enables the kernel to arrange hotpluggable memory into ZONE_MOVABLE zone. If memory in a node is all hotpluggable, the option may make the whole node has only one ZONE_MOVABLE zone, so that the whole node can be hot-removed after system is up. Note that this option may cause NUMA performance down. As for the comment in cmdline_parse_movablenode(): /* * ACPI SRAT records all hotpluggable memory ranges. But before * SRAT is parsed, we don't know about it. So by specifying this * option, we will use the bottom-up mode to try allocating memory * near the kernel image before SRAT is parsed. * * Bottom-up mode prevents memblock allocating hotpluggable memory * for the kernel so that the kernel will arrange hotpluggable * memory into ZONE_MOVABLE zone when possible. */ Thanks. > more. Other than that, > > Acked-by: Tejun Heo <tj@kernel.org> > > Thanks. >
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 1a036cd..8c056c4 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1769,6 +1769,21 @@ bytes respectively. Such letter suffixes can also be entirely omitted. that the amount of memory usable for all allocations is not too small. + movablenode [KNL,X86] This parameter enables/disables the + kernel to arrange hotpluggable memory ranges recorded + in ACPI SRAT(System Resource Affinity Table) as + ZONE_MOVABLE. And these memory can be hot-removed when + the system is up. + By specifying this option, all the hotpluggable memory + will be in ZONE_MOVABLE, which the kernel cannot use. + This will cause NUMA performance down. For users who + care about NUMA performance, just don't use it. + If all the memory ranges in the system are hotpluggable, + then the ones used by the kernel at early time, such as + kernel code and data segments, initrd file and so on, + won't be set as ZONE_MOVABLE, and won't be hotpluggable. + Otherwise the kernel won't have enough memory to boot. + MTD_Partition= [MTD] Format: <name>,<region-number>,<size>,<offset> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 36cfce3..b8fefb7 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1132,6 +1132,13 @@ void __init setup_arch(char **cmdline_p) early_acpi_boot_init(); initmem_init(); + + /* + * When ACPI SRAT is parsed, which is done in initmem_init(), + * set memblock back to the top-down direction. + */ + memblock_set_bottom_up(false); + memblock_find_dma_reserve(); /* diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index ed85fe3..dcd819a 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -31,6 +31,7 @@ #include <linux/firmware-map.h> #include <linux/stop_machine.h> #include <linux/hugetlb.h> +#include <linux/memblock.h> #include <asm/tlbflush.h> @@ -1412,6 +1413,36 @@ static bool can_offline_normal(struct zone *zone, unsigned long nr_pages) } #endif /* CONFIG_MOVABLE_NODE */ +static int __init cmdline_parse_movablenode(char *p) +{ +#ifdef CONFIG_MOVABLE_NODE + /* + * Memory used by the kernel cannot be hot-removed because Linux + * cannot migrate the kernel pages. When memory hotplug is + * enabled, we should prevent memblock from allocating memory + * for the kernel. + * + * ACPI SRAT records all hotpluggable memory ranges. But before + * SRAT is parsed, we don't know about it. + * + * The kernel image is loaded into memory at very early time. We + * cannot prevent this anyway. So on NUMA system, we set any + * node the kernel resides in as un-hotpluggable. + * + * Since on modern servers, one node could have double-digit + * gigabytes memory, we can assume the memory around the kernel + * image is also un-hotpluggable. So before SRAT is parsed, just + * allocate memory near the kernel image to try the best to keep + * the kernel away from hotpluggable memory. + */ + memblock_set_bottom_up(true); +#else + pr_warn("movablenode option not supported"); +#endif + return 0; +} +early_param("movablenode", cmdline_parse_movablenode); + /* check which state of node_states will be changed when offline memory */ static void node_states_check_changes_offline(unsigned long nr_pages, struct zone *zone, struct memory_notify *arg)