diff mbox

[v5,4/4] mm/sparse: Optimize memmap allocation during sparse_init()

Message ID 20180627013116.12411-5-bhe@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Baoquan He June 27, 2018, 1:31 a.m. UTC
In sparse_init(), two temporary pointer arrays, usemap_map and map_map
are allocated with the size of NR_MEM_SECTIONS. They are used to store
each memory section's usemap and mem map if marked as present. With
the help of these two arrays, continuous memory chunk is allocated for
usemap and memmap for memory sections on one node. This avoids too many
memory fragmentations. Like below diagram, '1' indicates the present
memory section, '0' means absent one. The number 'n' could be much
smaller than NR_MEM_SECTIONS on most of systems.

|1|1|1|1|0|0|0|0|1|1|0|0|...|1|0||1|0|...|1||0|1|...|0|
-------------------------------------------------------
 0 1 2 3         4 5         i   i+1     n-1   n

If fail to populate the page tables to map one section's memmap, its
->section_mem_map will be cleared finally to indicate that it's not present.
After use, these two arrays will be released at the end of sparse_init().

In 4-level paging mode, each array costs 4M which can be ignorable. While
in 5-level paging, they costs 256M each, 512M altogether. Kdump kernel
Usually only reserves very few memory, e.g 256M. So, even thouth they are
temporarily allocated, still not acceptable.

In fact, there's no need to allocate them with the size of NR_MEM_SECTIONS.
Since the ->section_mem_map clearing has been deferred to the last, the
number of present memory sections are kept the same during sparse_init()
until we finally clear out the memory section's ->section_mem_map if its
usemap or memmap is not correctly handled. Thus in the middle whenever
for_each_present_section_nr() loop is taken, the i-th present memory
section is always the same one.

Here only allocate usemap_map and map_map with the size of
'nr_present_sections'. For the i-th present memory section, install its
usemap and memmap to usemap_map[i] and mam_map[i] during allocation. Then
in the last for_each_present_section_nr() loop which clears the failed
memory section's ->section_mem_map, fetch usemap and memmap from
usemap_map[] and map_map[] array and set them into mem_section[]
accordingly.

Signed-off-by: Baoquan He <bhe@redhat.com>

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/sparse-vmemmap.c |  5 +++--
 mm/sparse.c         | 43 ++++++++++++++++++++++++++++++++++---------
 2 files changed, 37 insertions(+), 11 deletions(-)

Comments

Pavel Tatashin June 28, 2018, 3:19 a.m. UTC | #1
> Signed-off-by: Baoquan He <bhe@redhat.com>
>
> Signed-off-by: Baoquan He <bhe@redhat.com>

Please remove duplicated signed-off

>                 if (!usemap) {
>                         ms->section_mem_map = 0;
> +                       nr_consumed_maps++;

Currently, we do not set ms->section_mem_map to 0 when fail to
allocate usemap, only when fail to allocate mmap we set
section_mem_map to 0. I think this is an existing bug.

Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Baoquan He June 28, 2018, 6:39 a.m. UTC | #2
On 06/27/18 at 11:19pm, Pavel Tatashin wrote:
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> >
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> 
> Please remove duplicated signed-off

Done.
> 
> >                 if (!usemap) {
> >                         ms->section_mem_map = 0;
> > +                       nr_consumed_maps++;
> 
> Currently, we do not set ms->section_mem_map to 0 when fail to
> allocate usemap, only when fail to allocate mmap we set
> section_mem_map to 0. I think this is an existing bug.

Yes, found it when changing code. Later in sparse_init(), added checking
to see if usemap is available, otherwise also do "ms->section_mem_map = 0;"
to clear its ->section_mem_map.

Here if want to be perfect, we may need to free the relevant memmap
because usemap is allocated together, memmap could be allocated one
section by one section. I didn't do that because usemap allocation is
smaller one, if that allocation even failed in this early system
initializaiton stage, the kernel won't live long, so don't bother to do
that to complicate code.

> 
> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Dave Hansen June 29, 2018, 5:16 p.m. UTC | #3
> +	/* The numner of present sections stored in nr_present_sections

		^ "number", please.

This comment needs CodingStyle love.

> +	 * are kept the same since mem sections are marked as present in
	
	   ^ s/are/is/

This sentence is really odd to me.  What is the point?  Just that we are
not making sections present?  Could we just say that instead of
referring to functions and variable names?

> +	 * memory_present(). In this for loop, we need check which sections
> +	 * failed to allocate memmap or usemap, then clear its
> +	 * ->section_mem_map accordingly.

Rather than referring to the for loop, how about we actually comment the
code that is doing this operation?

>  					   During this process, we need

This is missing a "to": "we need _to_ increase".

> +	 * increase 'nr_consumed_maps' whether its allocation of memmap
> +	 * or usemap failed or not, so that after we handle the i-th
> +	 * memory section, can get memmap and usemap of (i+1)-th section
> +	 * correctly. */

This makes no sense to me.  Why are we incrementing 'nr_consumed_maps'
when we do not consume one?

You say that we increment it so that things will work, but not how or
why it makes things work.  I'm confused.
Pavel Tatashin June 29, 2018, 5:48 p.m. UTC | #4
> > +      * increase 'nr_consumed_maps' whether its allocation of memmap
> > +      * or usemap failed or not, so that after we handle the i-th
> > +      * memory section, can get memmap and usemap of (i+1)-th section
> > +      * correctly. */
>
> This makes no sense to me.  Why are we incrementing 'nr_consumed_maps'
> when we do not consume one?
>
> You say that we increment it so that things will work, but not how or
> why it makes things work.  I'm confused.

Hi Dave,

nr_consumed_maps is a local counter. map_map contains struct pages for
each section. In order to assign them to correct sections this local
counter must be incremented even when some parts of map_map are empty.

Here is example:
Node1:
map_map[0] -> Struct pages ...
map_map[1] -> NULL
Node2:
map_map[2] -> Struct pages ...

We always want to configure section from Node2 with struct pages from
Node2. Even, if there are holes in-between. The same with usemap.

Pavel
Dave Hansen June 29, 2018, 5:52 p.m. UTC | #5
On 06/29/2018 10:48 AM, Pavel Tatashin wrote:
> Here is example:
> Node1:
> map_map[0] -> Struct pages ...
> map_map[1] -> NULL
> Node2:
> map_map[2] -> Struct pages ...
> 
> We always want to configure section from Node2 with struct pages from
> Node2. Even, if there are holes in-between. The same with usemap.

Right...  But your example consumes two mem_map[]s.

But, from scanning the code, we increment nr_consumed_maps three times.
Correct?
Pavel Tatashin June 29, 2018, 6:01 p.m. UTC | #6
On 06/29/2018 01:52 PM, Dave Hansen wrote:
> On 06/29/2018 10:48 AM, Pavel Tatashin wrote:
>> Here is example:
>> Node1:
>> map_map[0] -> Struct pages ...
>> map_map[1] -> NULL
>> Node2:
>> map_map[2] -> Struct pages ...
>>
>> We always want to configure section from Node2 with struct pages from
>> Node2. Even, if there are holes in-between. The same with usemap.
> 
> Right...  But your example consumes two mem_map[]s.
> 
> But, from scanning the code, we increment nr_consumed_maps three times.
> Correct?

Correct: it should be incremented on every iteration of the loop. No matter if the entries contained valid data or NULLs. So we increment in three places:

if map_map[] has invalid entry, increment, continue
if usemap_map[] has invalid entry, increment, continue
at the end of the loop, everything was valid we increment it

This is done so nr_consumed_maps does not get out of sync with the current pnum. pnum does not equal to nr_consumed_maps, as there are may be holes in pnums, but there is one-to-one correlation.

Pavel
Dave Hansen June 29, 2018, 6:56 p.m. UTC | #7
On 06/29/2018 11:01 AM, Pavel Tatashin wrote:
> Correct: it should be incremented on every iteration of the loop. No matter if the entries contained valid data or NULLs. So we increment in three places:
> 
> if map_map[] has invalid entry, increment, continue
> if usemap_map[] has invalid entry, increment, continue
> at the end of the loop, everything was valid we increment it
> 
> This is done so nr_consumed_maps does not get out of sync with the
> current pnum. pnum does not equal to nr_consumed_maps, as there are
> may be holes in pnums, but there is one-to-one correlation.
Can this be made more clear in the code?
Pavel Tatashin June 29, 2018, 6:59 p.m. UTC | #8
> > This is done so nr_consumed_maps does not get out of sync with the
> > current pnum. pnum does not equal to nr_consumed_maps, as there are
> > may be holes in pnums, but there is one-to-one correlation.
> Can this be made more clear in the code?

Absolutely. I've done it here:
http://lkml.kernel.org/r/20180628173010.23849-1-pasha.tatashin@oracle.com

Pavel
diff mbox

Patch

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 640e68f8324b..a98ec2fb6915 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -281,6 +281,7 @@  void __init sparse_mem_maps_populate_node(struct page **map_map,
 	unsigned long pnum;
 	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
 	void *vmemmap_buf_start;
+	int nr_consumed_maps = 0;
 
 	size = ALIGN(size, PMD_SIZE);
 	vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
@@ -297,8 +298,8 @@  void __init sparse_mem_maps_populate_node(struct page **map_map,
 		if (!present_section_nr(pnum))
 			continue;
 
-		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
-		if (map_map[pnum])
+		map_map[nr_consumed_maps] = sparse_mem_map_populate(pnum, nodeid, NULL);
+		if (map_map[nr_consumed_maps++])
 			continue;
 		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
diff --git a/mm/sparse.c b/mm/sparse.c
index b2848cc6e32a..2eb8ee72e44d 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -386,6 +386,7 @@  static void __init sparse_early_usemaps_alloc_node(void *data,
 	unsigned long pnum;
 	unsigned long **usemap_map = (unsigned long **)data;
 	int size = usemap_size();
+	int nr_consumed_maps = 0;
 
 	usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
 							  size * usemap_count);
@@ -397,9 +398,10 @@  static void __init sparse_early_usemaps_alloc_node(void *data,
 	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
 		if (!present_section_nr(pnum))
 			continue;
-		usemap_map[pnum] = usemap;
+		usemap_map[nr_consumed_maps] = usemap;
 		usemap += size;
-		check_usemap_section_nr(nodeid, usemap_map[pnum]);
+		check_usemap_section_nr(nodeid, usemap_map[nr_consumed_maps]);
+		nr_consumed_maps++;
 	}
 }
 
@@ -424,29 +426,33 @@  void __init sparse_mem_maps_populate_node(struct page **map_map,
 	void *map;
 	unsigned long pnum;
 	unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
+	int nr_consumed_maps;
 
 	size = PAGE_ALIGN(size);
 	map = memblock_virt_alloc_try_nid_raw(size * map_count,
 					      PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
 					      BOOTMEM_ALLOC_ACCESSIBLE, nodeid);
 	if (map) {
+		nr_consumed_maps = 0;
 		for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
 			if (!present_section_nr(pnum))
 				continue;
-			map_map[pnum] = map;
+			map_map[nr_consumed_maps] = map;
 			map += size;
+			nr_consumed_maps++;
 		}
 		return;
 	}
 
 	/* fallback */
+	nr_consumed_maps = 0;
 	for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
 		struct mem_section *ms;
 
 		if (!present_section_nr(pnum))
 			continue;
-		map_map[pnum] = sparse_mem_map_populate(pnum, nodeid, NULL);
-		if (map_map[pnum])
+		map_map[nr_consumed_maps] = sparse_mem_map_populate(pnum, nodeid, NULL);
+		if (map_map[nr_consumed_maps++])
 			continue;
 		ms = __nr_to_section(pnum);
 		pr_err("%s: sparsemem memory map backing failed some memory will not be available\n",
@@ -526,6 +532,7 @@  static void __init alloc_usemap_and_memmap(void (*alloc_func)
 		/* new start, update count etc*/
 		nodeid_begin = nodeid;
 		pnum_begin = pnum;
+		data += map_count * data_unit_size;
 		map_count = 1;
 	}
 	/* ok, last chunk */
@@ -544,6 +551,7 @@  void __init sparse_init(void)
 	unsigned long *usemap;
 	unsigned long **usemap_map;
 	int size;
+	int nr_consumed_maps = 0;
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
 	int size2;
 	struct page **map_map;
@@ -566,7 +574,7 @@  void __init sparse_init(void)
 	 * powerpc need to call sparse_init_one_section right after each
 	 * sparse_early_mem_map_alloc, so allocate usemap_map at first.
 	 */
-	size = sizeof(unsigned long *) * NR_MEM_SECTIONS;
+	size = sizeof(unsigned long *) * nr_present_sections;
 	usemap_map = memblock_virt_alloc(size, 0);
 	if (!usemap_map)
 		panic("can not allocate usemap_map\n");
@@ -575,7 +583,7 @@  void __init sparse_init(void)
 				sizeof(usemap_map[0]));
 
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-	size2 = sizeof(struct page *) * NR_MEM_SECTIONS;
+	size2 = sizeof(struct page *) * nr_present_sections;
 	map_map = memblock_virt_alloc(size2, 0);
 	if (!map_map)
 		panic("can not allocate map_map\n");
@@ -584,27 +592,44 @@  void __init sparse_init(void)
 				sizeof(map_map[0]));
 #endif
 
+	/* The numner of present sections stored in nr_present_sections
+	 * are kept the same since mem sections are marked as present in
+	 * memory_present(). In this for loop, we need check which sections
+	 * failed to allocate memmap or usemap, then clear its
+	 * ->section_mem_map accordingly. During this process, we need
+	 * increase 'nr_consumed_maps' whether its allocation of memmap
+	 * or usemap failed or not, so that after we handle the i-th
+	 * memory section, can get memmap and usemap of (i+1)-th section
+	 * correctly. */
 	for_each_present_section_nr(0, pnum) {
 		struct mem_section *ms;
+
+		if (nr_consumed_maps >= nr_present_sections) {
+			pr_err("nr_consumed_maps goes beyond nr_present_sections\n");
+			break;
+		}
 		ms = __nr_to_section(pnum);
-		usemap = usemap_map[pnum];
+		usemap = usemap_map[nr_consumed_maps];
 		if (!usemap) {
 			ms->section_mem_map = 0;
+			nr_consumed_maps++;
 			continue;
 		}
 
 #ifdef CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-		map = map_map[pnum];
+		map = map_map[nr_consumed_maps];
 #else
 		map = sparse_early_mem_map_alloc(pnum);
 #endif
 		if (!map) {
 			ms->section_mem_map = 0;
+			nr_consumed_maps++;
 			continue;
 		}
 
 		sparse_init_one_section(__nr_to_section(pnum), pnum, map,
 								usemap);
+		nr_consumed_maps++;
 	}
 
 	vmemmap_populate_print_last();