diff mbox

[v5,2/6] memblock: Introduce bottom-up allocation mode

Message ID 5241D9A4.4080305@gmail.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Zhang Yanfei Sept. 24, 2013, 6:27 p.m. UTC
From: Tang Chen <tangchen@cn.fujitsu.com>

The Linux kernel cannot migrate pages used by the kernel. As a result, kernel
pages cannot be hot-removed. So we cannot allocate hotpluggable memory for
the kernel.

ACPI SRAT (System Resource Affinity Table) contains the memory hotplug info.
But before SRAT is parsed, memblock has already started to allocate memory
for the kernel. So we need to prevent memblock from doing this.

In a memory hotplug system, any numa node the kernel resides in should
be unhotpluggable. And for a modern server, each node could have at least
16GB memory. So memory around the kernel image is highly likely unhotpluggable.

So the basic idea is: Allocate memory from the end of the kernel image and
to the higher memory. Since memory allocation before SRAT is parsed won't
be too much, it could highly likely be in the same node with kernel image.

The current memblock can only allocate memory top-down. So this patch introduces
a new bottom-up allocation mode to allocate memory bottom-up. And later
when we use this allocation direction to allocate memory, we will limit
the start address above the kernel.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
---
 include/linux/memblock.h |   16 +++++++++
 mm/memblock.c            |   81 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 95 insertions(+), 2 deletions(-)

Comments

Tejun Heo Sept. 26, 2013, 2:45 p.m. UTC | #1
Hello,

On Wed, Sep 25, 2013 at 02:27:48AM +0800, Zhang Yanfei wrote:
> +#ifdef CONFIG_MOVABLE_NODE
> +static inline void memblock_set_bottom_up(bool enable)
> +{
> +	memblock.bottom_up = enable;
> +}
> +
> +static inline bool memblock_bottom_up(void)
> +{
> +	return memblock.bottom_up;
> +}

Can you please explain what this is for here?

> +		/*
> +		 * we always limit bottom-up allocation above the kernel,
> +		 * but top-down allocation doesn't have the limit, so
> +		 * retrying top-down allocation may succeed when bottom-up
> +		 * allocation failed.
> +		 *
> +		 * bottom-up allocation is expected to be fail very rarely,
> +		 * so we use WARN_ONCE() here to see the stack trace if
> +		 * fail happens.
> +		 */
> +		WARN_ONCE(1, "memblock: Failed to allocate memory in bottom up "
> +			"direction. Now try top down direction.\n");
> +	}

You and I would know what was going on and what the consequence of the
failure may be but the above warning message is kinda useless to a
user / admin, right?  It doesn't really say anything meaningful.

Thanks.
Zhang Yanfei Sept. 26, 2013, 3:37 p.m. UTC | #2
Hello tejun,

Thanks for your quick comments first:)

On 09/26/2013 10:45 PM, Tejun Heo wrote:
> Hello,
> 
> On Wed, Sep 25, 2013 at 02:27:48AM +0800, Zhang Yanfei wrote:
>> +#ifdef CONFIG_MOVABLE_NODE
>> +static inline void memblock_set_bottom_up(bool enable)
>> +{
>> +	memblock.bottom_up = enable;
>> +}
>> +
>> +static inline bool memblock_bottom_up(void)
>> +{
>> +	return memblock.bottom_up;
>> +}
> 
> Can you please explain what this is for here?

OK, will do.

> 
>> +		/*
>> +		 * we always limit bottom-up allocation above the kernel,
>> +		 * but top-down allocation doesn't have the limit, so
>> +		 * retrying top-down allocation may succeed when bottom-up
>> +		 * allocation failed.
>> +		 *
>> +		 * bottom-up allocation is expected to be fail very rarely,
>> +		 * so we use WARN_ONCE() here to see the stack trace if
>> +		 * fail happens.
>> +		 */
>> +		WARN_ONCE(1, "memblock: Failed to allocate memory in bottom up "
>> +			"direction. Now try top down direction.\n");
>> +	}
> 
> You and I would know what was going on and what the consequence of the
> failure may be but the above warning message is kinda useless to a
> user / admin, right?  It doesn't really say anything meaningful.
> 

Hmmmm.. May be something like this:

WARN_ONCE(1, "Failed to allocated memory above the kernel in bottom-up,"
          "so try to allocate memory below the kernel.");

Thanks
Tejun Heo Sept. 26, 2013, 3:50 p.m. UTC | #3
On Thu, Sep 26, 2013 at 11:37:34PM +0800, Zhang Yanfei wrote:
> >> +		WARN_ONCE(1, "memblock: Failed to allocate memory in bottom up "
> >> +			"direction. Now try top down direction.\n");
> >> +	}
> > 
> > You and I would know what was going on and what the consequence of the
> > failure may be but the above warning message is kinda useless to a
> > user / admin, right?  It doesn't really say anything meaningful.
> > 
> 
> Hmmmm.. May be something like this:
> 
> WARN_ONCE(1, "Failed to allocated memory above the kernel in bottom-up,"
>           "so try to allocate memory below the kernel.");

How about something like "memblock: bottom-up allocation failed,
memory hotunplug may be affected\n".

Thanks.
Zhang Yanfei Sept. 26, 2013, 3:54 p.m. UTC | #4
On 09/26/2013 11:50 PM, Tejun Heo wrote:
> On Thu, Sep 26, 2013 at 11:37:34PM +0800, Zhang Yanfei wrote:
>>>> +		WARN_ONCE(1, "memblock: Failed to allocate memory in bottom up "
>>>> +			"direction. Now try top down direction.\n");
>>>> +	}
>>>
>>> You and I would know what was going on and what the consequence of the
>>> failure may be but the above warning message is kinda useless to a
>>> user / admin, right?  It doesn't really say anything meaningful.
>>>
>>
>> Hmmmm.. May be something like this:
>>
>> WARN_ONCE(1, "Failed to allocated memory above the kernel in bottom-up,"
>>           "so try to allocate memory below the kernel.");
> 
> How about something like "memblock: bottom-up allocation failed,
> memory hotunplug may be affected\n".
> 

Ok, I understand what you want. Explicitly telling the user the functionality
may be invalid due to some failure. Yeah, this is really meaningful, i will
take yours, thanks.
Zhang Yanfei Sept. 26, 2013, 4:51 p.m. UTC | #5
On 09/26/2013 11:37 PM, Zhang Yanfei wrote:
> Hello tejun,
> 
> Thanks for your quick comments first:)
> 
> On 09/26/2013 10:45 PM, Tejun Heo wrote:
>> Hello,
>>
>> On Wed, Sep 25, 2013 at 02:27:48AM +0800, Zhang Yanfei wrote:
>>> +#ifdef CONFIG_MOVABLE_NODE
>>> +static inline void memblock_set_bottom_up(bool enable)
>>> +{
>>> +	memblock.bottom_up = enable;
>>> +}
>>> +
>>> +static inline bool memblock_bottom_up(void)
>>> +{
>>> +	return memblock.bottom_up;
>>> +}
>>
>> Can you please explain what this is for here?
> 
> OK, will do.

I write the function description here so you could give your
comments still in this version.

/*
 * Set the allocation direction to bottom-up or top-down.
 */
static inline void memblock_set_bottom_up(bool enable)
{
        memblock.bottom_up = enable;
}


/*
 * Check if the allocation direction is bottom-up or not.
 * if this is true, that said, the boot option "movablenode"
 * has been specified, and memblock will allocate memory
 * just near the kernel image.
 */
static inline bool memblock_bottom_up(void)
{
        return memblock.bottom_up;
}

Thanks.
Toshi Kani Sept. 27, 2013, 10:29 p.m. UTC | #6
On Wed, 2013-09-25 at 02:27 +0800, Zhang Yanfei wrote:
> From: Tang Chen <tangchen@cn.fujitsu.com>
> 
> The Linux kernel cannot migrate pages used by the kernel. As a result, kernel
> pages cannot be hot-removed. So we cannot allocate hotpluggable memory for
> the kernel.
> 
> ACPI SRAT (System Resource Affinity Table) contains the memory hotplug info.
> But before SRAT is parsed, memblock has already started to allocate memory
> for the kernel. So we need to prevent memblock from doing this.
> 
> In a memory hotplug system, any numa node the kernel resides in should
> be unhotpluggable. And for a modern server, each node could have at least
> 16GB memory. So memory around the kernel image is highly likely unhotpluggable.
> 
> So the basic idea is: Allocate memory from the end of the kernel image and
> to the higher memory. Since memory allocation before SRAT is parsed won't
> be too much, it could highly likely be in the same node with kernel image.
> 
> The current memblock can only allocate memory top-down. So this patch introduces
> a new bottom-up allocation mode to allocate memory bottom-up. And later
> when we use this allocation direction to allocate memory, we will limit
> the start address above the kernel.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>

 :

>  /**
> + * __memblock_find_range - find free area utility
> + * @start: start of candidate range
> + * @end: end of candidate range, can be %MEMBLOCK_ALLOC_{ANYWHERE|ACCESSIBLE}
> + * @size: size of free area to find
> + * @align: alignment of free area to find
> + * @nid: nid of the free area to find, %MAX_NUMNODES for any node
> + *
> + * Utility called from memblock_find_in_range_node(), find free area bottom-up.
> + *
> + * RETURNS:
> + * Found address on success, 0 on failure.
> + */
> +static phys_addr_t __init_memblock
> +__memblock_find_range(phys_addr_t start, phys_addr_t end, phys_addr_t size,

Similarly, how about name this function as
__memblock_find_range_bottom_up()?


> +		      phys_addr_t align, int nid)
> +{
> +	phys_addr_t this_start, this_end, cand;
> +	u64 i;
> +
> +	for_each_free_mem_range(i, nid, &this_start, &this_end, NULL) {
> +		this_start = clamp(this_start, start, end);
> +		this_end = clamp(this_end, start, end);
> +
> +		cand = round_up(this_start, align);
> +		if (cand < this_end && this_end - cand >= size)
> +			return cand;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
>   * __memblock_find_range_rev - find free area utility, in reverse order
>   * @start: start of candidate range
>   * @end: end of candidate range, can be %MEMBLOCK_ALLOC_{ANYWHERE|ACCESSIBLE}
> @@ -93,7 +128,7 @@ static long __init_memblock memblock_overlaps_region(struct memblock_type *type,
>   * Utility called from memblock_find_in_range_node(), find free area top-down.
>   *
>   * RETURNS:
> - * Found address on success, %0 on failure.
> + * Found address on success, 0 on failure.
>   */
>  static phys_addr_t __init_memblock
>  __memblock_find_range_rev(phys_addr_t start, phys_addr_t end,
> @@ -127,13 +162,24 @@ __memblock_find_range_rev(phys_addr_t start, phys_addr_t end,
>   *
>   * Find @size free area aligned to @align in the specified range and node.
>   *
> + * When allocation direction is bottom-up, the @start should be greater
> + * than the end of the kernel image. Otherwise, it will be trimmed. The
> + * reason is that we want the bottom-up allocation just near the kernel
> + * image so it is highly likely that the allocated memory and the kernel
> + * will reside in the same node.
> + *
> + * If bottom-up allocation failed, will try to allocate memory top-down.
> + *
>   * RETURNS:
> - * Found address on success, %0 on failure.
> + * Found address on success, 0 on failure.
>   */
>  phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
>  					phys_addr_t end, phys_addr_t size,
>  					phys_addr_t align, int nid)
>  {
> +	int ret;
> +	phys_addr_t kernel_end;
> +
>  	/* pump up @end */
>  	if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
>  		end = memblock.current_limit;
> @@ -141,6 +187,37 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
>  	/* avoid allocating the first page */
>  	start = max_t(phys_addr_t, start, PAGE_SIZE);
>  	end = max(start, end);
> +	kernel_end = __pa_symbol(_end);

Please address the issue in __pa_symbol() that Andrew pointed out.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 31e95ac..c1e2633 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -35,6 +35,7 @@  struct memblock_type {
 };
 
 struct memblock {
+	bool bottom_up;  /* is bottom up direction? */
 	phys_addr_t current_limit;
 	struct memblock_type memory;
 	struct memblock_type reserved;
@@ -148,6 +149,21 @@  phys_addr_t memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, int nid)
 
 phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align);
 
+#ifdef CONFIG_MOVABLE_NODE
+static inline void memblock_set_bottom_up(bool enable)
+{
+	memblock.bottom_up = enable;
+}
+
+static inline bool memblock_bottom_up(void)
+{
+	return memblock.bottom_up;
+}
+#else
+static inline void memblock_set_bottom_up(bool enable) {}
+static inline bool memblock_bottom_up(void) { return false; }
+#endif
+
 /* Flags for memblock_alloc_base() amd __memblock_alloc_base() */
 #define MEMBLOCK_ALLOC_ANYWHERE	(~(phys_addr_t)0)
 #define MEMBLOCK_ALLOC_ACCESSIBLE	0
diff --git a/mm/memblock.c b/mm/memblock.c
index 3d80c74..a8e81c3 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -20,6 +20,8 @@ 
 #include <linux/seq_file.h>
 #include <linux/memblock.h>
 
+#include <asm-generic/sections.h>
+
 static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
 static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
 
@@ -32,6 +34,7 @@  struct memblock memblock __initdata_memblock = {
 	.reserved.cnt		= 1,	/* empty dummy entry */
 	.reserved.max		= INIT_MEMBLOCK_REGIONS,
 
+	.bottom_up		= false,
 	.current_limit		= MEMBLOCK_ALLOC_ANYWHERE,
 };
 
@@ -83,6 +86,38 @@  static long __init_memblock memblock_overlaps_region(struct memblock_type *type,
 }
 
 /**
+ * __memblock_find_range - find free area utility
+ * @start: start of candidate range
+ * @end: end of candidate range, can be %MEMBLOCK_ALLOC_{ANYWHERE|ACCESSIBLE}
+ * @size: size of free area to find
+ * @align: alignment of free area to find
+ * @nid: nid of the free area to find, %MAX_NUMNODES for any node
+ *
+ * Utility called from memblock_find_in_range_node(), find free area bottom-up.
+ *
+ * RETURNS:
+ * Found address on success, 0 on failure.
+ */
+static phys_addr_t __init_memblock
+__memblock_find_range(phys_addr_t start, phys_addr_t end, phys_addr_t size,
+		      phys_addr_t align, int nid)
+{
+	phys_addr_t this_start, this_end, cand;
+	u64 i;
+
+	for_each_free_mem_range(i, nid, &this_start, &this_end, NULL) {
+		this_start = clamp(this_start, start, end);
+		this_end = clamp(this_end, start, end);
+
+		cand = round_up(this_start, align);
+		if (cand < this_end && this_end - cand >= size)
+			return cand;
+	}
+
+	return 0;
+}
+
+/**
  * __memblock_find_range_rev - find free area utility, in reverse order
  * @start: start of candidate range
  * @end: end of candidate range, can be %MEMBLOCK_ALLOC_{ANYWHERE|ACCESSIBLE}
@@ -93,7 +128,7 @@  static long __init_memblock memblock_overlaps_region(struct memblock_type *type,
  * Utility called from memblock_find_in_range_node(), find free area top-down.
  *
  * RETURNS:
- * Found address on success, %0 on failure.
+ * Found address on success, 0 on failure.
  */
 static phys_addr_t __init_memblock
 __memblock_find_range_rev(phys_addr_t start, phys_addr_t end,
@@ -127,13 +162,24 @@  __memblock_find_range_rev(phys_addr_t start, phys_addr_t end,
  *
  * Find @size free area aligned to @align in the specified range and node.
  *
+ * When allocation direction is bottom-up, the @start should be greater
+ * than the end of the kernel image. Otherwise, it will be trimmed. The
+ * reason is that we want the bottom-up allocation just near the kernel
+ * image so it is highly likely that the allocated memory and the kernel
+ * will reside in the same node.
+ *
+ * If bottom-up allocation failed, will try to allocate memory top-down.
+ *
  * RETURNS:
- * Found address on success, %0 on failure.
+ * Found address on success, 0 on failure.
  */
 phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
 					phys_addr_t end, phys_addr_t size,
 					phys_addr_t align, int nid)
 {
+	int ret;
+	phys_addr_t kernel_end;
+
 	/* pump up @end */
 	if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
 		end = memblock.current_limit;
@@ -141,6 +187,37 @@  phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
 	/* avoid allocating the first page */
 	start = max_t(phys_addr_t, start, PAGE_SIZE);
 	end = max(start, end);
+	kernel_end = __pa_symbol(_end);
+
+	/*
+	 * try bottom-up allocation only when bottom-up mode
+	 * is set and @end is above the kernel image.
+	 */
+	if (memblock_bottom_up() && end > kernel_end) {
+		phys_addr_t bottom_up_start;
+
+		/* make sure we will allocate above the kernel */
+		bottom_up_start = max(start, kernel_end);
+
+		/* ok, try bottom-up allocation first */
+		ret = __memblock_find_range(bottom_up_start, end,
+					    size, align, nid);
+		if (ret)
+			return ret;
+
+		/*
+		 * we always limit bottom-up allocation above the kernel,
+		 * but top-down allocation doesn't have the limit, so
+		 * retrying top-down allocation may succeed when bottom-up
+		 * allocation failed.
+		 *
+		 * bottom-up allocation is expected to be fail very rarely,
+		 * so we use WARN_ONCE() here to see the stack trace if
+		 * fail happens.
+		 */
+		WARN_ONCE(1, "memblock: Failed to allocate memory in bottom up "
+			"direction. Now try top down direction.\n");
+	}
 
 	return __memblock_find_range_rev(start, end, size, align, nid);
 }