[v10,4/8] mm,memory_hotplug: Allocate memmap from the added memory range

Physical memory hotadd has to allocate a memmap (struct page array) for
the newly added memory section. Currently, alloc_pages_node() is used
for those allocations.

This has some disadvantages:
 a) an existing memory is consumed for that purpose
    (eg: ~2MB per 128MB memory section on x86_64)
    This can even lead to extreme cases where system goes OOM because
    the physically hotplugged memory depletes the available memory before
    it is onlined.
 b) if the whole node is movable then we have off-node struct pages
    which has performance drawbacks.
 c) It might be there are no PMD_ALIGNED chunks so memmap array gets
    populated with base pages.

This can be improved when CONFIG_SPARSEMEM_VMEMMAP is enabled.

Vmemap page tables can map arbitrary memory. That means that we can
reserve a part of the physically hotadded memory to back vmemmap page
tables. This implementation uses the beginning of the hotplugged memory
for that purpose.

There are some non-obviously things to consider though.  Vmemmap
pages are allocated/freed during the memory hotplug events
(add_memory_resource(), try_remove_memory()) when the memory is
added/removed. This means that the reserved physical range is not online
although it is used. The most obvious side effect is that pfn_to_online_page()
returns NULL for those pfns. The current design expects that this
should be OK as the hotplugged memory is considered a garbage until it
is onlined. For example hibernation wouldn't save the content of those
vmmemmaps into the image so it wouldn't be restored on resume but this
should be OK as there no real content to recover anyway while metadata
is reachable from other data structures (e.g. vmemmap page tables).

The reserved space is therefore (de)initialized during the {on,off}line
events (mhp_{de}init_memmap_on_memory). That is done by extracting page
allocator independent initialization from the regular onlining path.
The primary reason to handle the reserved space outside of {on,off}line_pages
is to make each initialization specific to the purpose rather than
special case them in a single function.

As per above, the functions that are introduced are:

 - mhp_init_memmap_on_memory:
		       Initializes vmemmap pages by calling move_pfn_range_to_zone(),
		       calls kasan_add_zero_shadow(), and onlines as many sections
		       as vmemmap pages fully span.
 - mhp_deinit_memmap_on_memory:
		       Offlines as many sections as vmemmap pages fully span,
		       removes the range from zhe zone by remove_pfn_range_from_zone(),
		       and calls kasan_remove_zero_shadow() for the range.

The new function memory_block_online() calls mhp_init_memmap_on_memory() before
doing the actual online_pages(). Should online_pages() fail, we clean up
by calling mhp_deinit_memmap_on_memory().
Adjusting of present_pages is done at the end once we know that online_pages()
succedeed.

On offline, memory_block_offline() needs to unaccount vmemmap pages from
present_pages() before calling offline_pages().
This is necessary because offline_pages() tears down some structures based
on the fact whether the node or the zone become empty.
If offline_pages() fails, we account back vmemmap pages.
If it succeeds, we call mhp_deinit_memmap_on_memory().

Hot-remove:

 We need to be careful when removing memory, as adding and
 removing memory needs to be done with the same granularity.
 To check that this assumption is not violated, we check the
 memory range we want to remove and if a) any memory block has
 vmemmap pages and b) the range spans more than a single memory
 block, we scream out loud and refuse to proceed.

 If all is good and the range was using memmap on memory (aka vmemmap pages),
 we construct an altmap structure so free_hugepage_table does the right
 thing and calls vmem_altmap_free instead of free_pagetable.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c          |  72 ++++++++++++++++--
 include/linux/memory.h         |   8 +-
 include/linux/memory_hotplug.h |  15 +++-
 include/linux/memremap.h       |   2 +-
 include/linux/mmzone.h         |   7 +-
 mm/Kconfig                     |   5 ++
 mm/memory_hotplug.c            | 161 ++++++++++++++++++++++++++++++++++++++---
 mm/sparse.c                    |   2 -
 8 files changed, 250 insertions(+), 22 deletions(-)

Message ID	20210421102701.25051-5-osalvador@suse.de (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=3Ur0=JS=kvack.org=owner-linux-mm@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72FD3C433ED for <linux-mm@archiver.kernel.org>; Wed, 21 Apr 2021 10:27:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C6F816144A for <linux-mm@archiver.kernel.org>; Wed, 21 Apr 2021 10:27:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C6F816144A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D3AFB6B0072; Wed, 21 Apr 2021 06:27:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D16106B0073; Wed, 21 Apr 2021 06:27:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B64A16B0074; Wed, 21 Apr 2021 06:27:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0116.hostedemail.com [216.40.44.116]) by kanga.kvack.org (Postfix) with ESMTP id 8FCA86B0072 for <linux-mm@kvack.org>; Wed, 21 Apr 2021 06:27:10 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 4E56182499B9 for <linux-mm@kvack.org>; Wed, 21 Apr 2021 10:27:10 +0000 (UTC) X-FDA: 78055996620.03.DECA237 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf19.hostedemail.com (Postfix) with ESMTP id D391E90009F4 for <linux-mm@kvack.org>; Wed, 21 Apr 2021 10:26:46 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id C21D0B035; Wed, 21 Apr 2021 10:27:08 +0000 (UTC) From: Oscar Salvador <osalvador@suse.de> To: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com>, Michal Hocko <mhocko@kernel.org>, Anshuman Khandual <anshuman.khandual@arm.com>, Vlastimil Babka <vbabka@suse.cz>, Pavel Tatashin <pasha.tatashin@soleen.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador <osalvador@suse.de> Subject: [PATCH v10 4/8] mm,memory_hotplug: Allocate memmap from the added memory range Date: Wed, 21 Apr 2021 12:26:57 +0200 Message-Id: <20210421102701.25051-5-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20210421102701.25051-1-osalvador@suse.de> References: <20210421102701.25051-1-osalvador@suse.de> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: D391E90009F4 X-Stat-Signature: mkg5jz8fz113cx7zke13cqtg7i74mape Received-SPF: none (suse.de>: No applicable sender policy available) receiver=imf19; identity=mailfrom; envelope-from="<osalvador@suse.de>"; helo=mx2.suse.de; client-ip=195.135.220.15 X-HE-DKIM-Result: none/none X-HE-Tag: 1619000806-542107 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: <linux-mm.kvack.org>
Series	Allocate memmap from hotadded memory (per device) \| expand [v10,0/8] Allocate memmap from hotadded memory (per device) [v10,1/8] drivers/base/memory: Introduce memory_block_{online,offline} [v10,2/8] mm,memory_hotplug: Relax fully spanned sections check [v10,3/8] mm,memory_hotplug: Factor out adjusting present pages into adjust_present_page_count() [v10,4/8] mm,memory_hotplug: Allocate memmap from the added memory range [v10,5/8] acpi,memhotplug: Enable MHP_MEMMAP_ON_MEMORY when supported [v10,6/8] mm,memory_hotplug: Add kernel boot option to enable memmap_on_memory [v10,7/8] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE [v10,8/8] arm64/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE

[v10,4/8] mm,memory_hotplug: Allocate memmap from the added memory range

Commit Message

Comments

Patch