From patchwork Thu Jul 25 16:02:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 11059177 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 588F713A4 for ; Thu, 25 Jul 2019 16:02:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 48CE528A48 for ; Thu, 25 Jul 2019 16:02:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3BE1428A54; Thu, 25 Jul 2019 16:02:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5E0DD28A53 for ; Thu, 25 Jul 2019 16:02:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A3DF76B0007; Thu, 25 Jul 2019 12:02:18 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 87EFD8E0003; Thu, 25 Jul 2019 12:02:18 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 73B0D8E0002; Thu, 25 Jul 2019 12:02:18 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by kanga.kvack.org (Postfix) with ESMTP id ECF506B000A for ; Thu, 25 Jul 2019 12:02:17 -0400 (EDT) Received: by mail-ed1-f71.google.com with SMTP id i44so32440938eda.3 for ; Thu, 25 Jul 2019 09:02:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=dYPyV2/sBc8VWNtA55oyBRVoL378fhxQLTc42Q0vxc4=; b=nbQelOKVZXjN3Ra+G8QZ0n6aiUyPPfdlu6dTmfzjQDiWwLQ3ZvKyHg8xbEq9amqsZ5 C4DOllcnx20wiDjbX0MlALXOd+r3BYb2XMUDJlvGMW0uOjzfzJtRQN58rMZqIDbDeKO6 jL5pmW1wN++ujh4Ne9sVXuyACPZmTSAf9ljYXNfh9TeQ4gETf+nyPipcTeS4JRXcG/F2 OCzbSu14Mw1vJ51qqvDvfFgOO6PJbmXhH73FQ/DZrPMXC22gwnJB/BRyjjyEfLVnjBll O7uyiISqmMPTrKERsBG1F1sWlV8f9mzey1sJUF3RKQ3Y/EWcMSCE8K4nI8CGH3gYAmTy VhYA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.220.15 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Gm-Message-State: APjAAAVjN2vpb0qrGoq7kCK+QZ59T8kr7BCNp96cbSPGOnDLUEv6lQoa rdQHyIoZtCmdVE/CbkeWG8CjQYEXAsBSfAFZlUel5F3NnSDD6RyzGcDxdtdluB2DnClQ96JlIJE onZQ3uOCOqSJAPyy7VnEdWLY22+UydrUXeT0+GiS7iH3jgu6nEd5VOWB/LTHge16CnA== X-Received: by 2002:a17:906:bcd6:: with SMTP id lw22mr69108551ejb.68.1564070537280; Thu, 25 Jul 2019 09:02:17 -0700 (PDT) X-Google-Smtp-Source: APXvYqwxvm2vJ3D3LlMsDn9qYZ1xA3uBGtybT2UhjCROCb/CBxray3IjKXSxLdXnhjrPKWlrVbiK X-Received: by 2002:a17:906:bcd6:: with SMTP id lw22mr69108435ejb.68.1564070535869; Thu, 25 Jul 2019 09:02:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564070535; cv=none; d=google.com; s=arc-20160816; b=uGqJ2Ooz+xaLPg1sH/aBwHRF/m463JqIHo6F9/67YXSl4Xc/c6ZvVsb1bSzIOdsx1F gOR8MCdUrYmSqrtAklcROZ7e6Q5fmp19qydJObvuyBlvahE79yVgV3OmeckOcPl38qBU joX/jGkTN7u9NddgUBs+RPGn4jgEEYHjoDk4cmw67PWkLhrOeGkC17GF+hQa828wji76 SE4DsmjwwTCLlusczlnCUUifHY+IJNR8vPxv/l8VsnyaW84JuPTBBj+l0h7e3P22Gk3m PiNkCJWRYhYpWRZJ74LeSYPlCLkQEm+E88KB1xk5JqtFar50h4eaMkr0rpjvvrsdVboO IIVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=dYPyV2/sBc8VWNtA55oyBRVoL378fhxQLTc42Q0vxc4=; b=QoDj4IUyxhQh6YBI9eC8dEyEJENmDdv8h1eMpd9sEPNAO6jdTdQVVThHTOaJJ0kDIw qPMcav6Rg6WP9P7jq8prx06FJ3/V6YkSv0oXlV75vU+OY5UKmWisEXGLySHCUMavKK9f VY3HUAfsHt1CfcQV8p5JMRiZRZqzEk8Cy7APABxx22nPhfKQyovBMvbwfzjWVx6nSaXZ FbG/nrNCAqxulsihA2maBbyTWxD3pxGpSHRPjf+fdAfTPuF9Mb1gAkwyqfl4coQG2MbL 1nx2YRSYVpFR01qlvXha5Yu5hqFnyWlg65KTZJ8GaBXrdyuYqeOBeGO6U/psnlcbMR4F B5lw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.220.15 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id t27si12488159eda.244.2019.07.25.09.02.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Jul 2019 09:02:15 -0700 (PDT) Received-SPF: pass (google.com: domain of osalvador@suse.de designates 195.135.220.15 as permitted sender) client-ip=195.135.220.15; Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.220.15 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 18E10AFD4; Thu, 25 Jul 2019 16:02:15 +0000 (UTC) From: Oscar Salvador To: akpm@linux-foundation.org Cc: dan.j.williams@intel.com, david@redhat.com, pasha.tatashin@soleen.com, mhocko@suse.com, anshuman.khandual@arm.com, Jonathan.Cameron@huawei.com, vbabka@suse.cz, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador Subject: [PATCH v3 1/5] mm,memory_hotplug: Introduce MHP_MEMMAP_ON_MEMORY Date: Thu, 25 Jul 2019 18:02:03 +0200 Message-Id: <20190725160207.19579-2-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20190725160207.19579-1-osalvador@suse.de> References: <20190725160207.19579-1-osalvador@suse.de> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This patch introduces MHP_MEMMAP_ON_MEMORY flag, and prepares the callers that add memory to take a "flags" parameter. This "flags" parameter will be evaluated later on in Patch#3 to init mhp_restrictions struct. The callers are: add_memory __add_memory add_memory_resource Unfortunately, we do not have a single entry point to add memory, as depending on the requisites of the caller, they want to hook up in different places, (e.g: Xen reserve_additional_memory()), so we have to spread the parameter in the three callers. MHP_MEMMAP_ON_MEMORY flag parameter will specify to allocate memmaps from the hot-added range. If callers wants memmaps to be allocated per memory block, it will have to call add_memory() variants in memory-block granularity spanning the whole range, while if it wants to allocate memmaps per whole memory range, just one call will do. Want to add 384MB (3 sections, 3 memory-blocks) e.g: add_memory(0x1000, size_memory_block); add_memory(0x2000, size_memory_block); add_memory(0x3000, size_memory_block); [memblock#0 ] [0 - 511 pfns ] - vmemmaps for section#0 [512 - 32767 pfns ] - normal memory [memblock#1 ] [32768 - 33279 pfns] - vmemmaps for section#1 [33280 - 65535 pfns] - normal memory [memblock#2 ] [65536 - 66047 pfns] - vmemmap for section#2 [66048 - 98304 pfns] - normal memory or add_memory(0x1000, size_memory_block * 3); [memblock #0 ] [0 - 1533 pfns ] - vmemmap for section#{0-2} [1534 - 98304 pfns] - normal memory When using larger memory blocks (1GB or 2GB), the principle is the same. Of course, per whole-range granularity is nicer when it comes to have a large contigous area, while per memory-block granularity allows us to have flexibility when removing the memory. Signed-off-by: Oscar Salvador Reviewed-by: David Hildenbrand --- drivers/acpi/acpi_memhotplug.c | 2 +- drivers/base/memory.c | 2 +- drivers/dax/kmem.c | 2 +- drivers/hv/hv_balloon.c | 2 +- drivers/s390/char/sclp_cmd.c | 2 +- drivers/xen/balloon.c | 2 +- include/linux/memory_hotplug.h | 25 ++++++++++++++++++++++--- mm/memory_hotplug.c | 10 +++++----- 8 files changed, 33 insertions(+), 14 deletions(-) diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index e294f44a7850..d91b3584d4b2 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -207,7 +207,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) if (node < 0) node = memory_add_physaddr_to_nid(info->start_addr); - result = __add_memory(node, info->start_addr, info->length); + result = __add_memory(node, info->start_addr, info->length, 0); /* * If the memory block has been used by the kernel, add_memory() diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 154d5d4a0779..d30d0f6c8ad0 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -521,7 +521,7 @@ static ssize_t probe_store(struct device *dev, struct device_attribute *attr, nid = memory_add_physaddr_to_nid(phys_addr); ret = __add_memory(nid, phys_addr, - MIN_MEMORY_BLOCK_SIZE * sections_per_block); + MIN_MEMORY_BLOCK_SIZE * sections_per_block, 0); if (ret) goto out; diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index 3d0a7e702c94..e159184e0ba0 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -65,7 +65,7 @@ int dev_dax_kmem_probe(struct device *dev) new_res->flags = IORESOURCE_SYSTEM_RAM; new_res->name = dev_name(dev); - rc = add_memory(numa_node, new_res->start, resource_size(new_res)); + rc = add_memory(numa_node, new_res->start, resource_size(new_res), 0); if (rc) { release_resource(new_res); kfree(new_res); diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index 6fb4ea5f0304..beb92bc56186 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -731,7 +731,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size, nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn)); ret = add_memory(nid, PFN_PHYS((start_pfn)), - (HA_CHUNK << PAGE_SHIFT)); + (HA_CHUNK << PAGE_SHIFT), 0); if (ret) { pr_err("hot_add memory failed error is %d\n", ret); diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c index 37d42de06079..f61026c7db7e 100644 --- a/drivers/s390/char/sclp_cmd.c +++ b/drivers/s390/char/sclp_cmd.c @@ -406,7 +406,7 @@ static void __init add_memory_merged(u16 rn) if (!size) goto skip_add; for (addr = start; addr < start + size; addr += block_size) - add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size); + add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size, 0); skip_add: first_rn = rn; num = 1; diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 4e11de6cde81..e4934ce40478 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -349,7 +349,7 @@ static enum bp_state reserve_additional_memory(void) mutex_unlock(&balloon_mutex); /* add_memory_resource() requires the device_hotplug lock */ lock_device_hotplug(); - rc = add_memory_resource(nid, resource); + rc = add_memory_resource(nid, resource, 0); unlock_device_hotplug(); mutex_lock(&balloon_mutex); diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index f46ea71b4ffd..45dece922d7c 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -54,6 +54,25 @@ enum { }; /* + * We want memmap (struct page array) to be allocated from the hotadded range. + * To do so, there are two possible ways depending on what the caller wants. + * 1) Allocate memmap pages whole hot-added range. + * Here the caller will only call any add_memory() variant with the whole + * memory address. + * 2) Allocate memmap pages per memblock + * Here, the caller will call any add_memory() variant per memblock + * granularity. + * The former implies that we will use the beginning of the hot-added range + * to store the memmap pages of the whole range, while the latter implies + * that we will use the beginning of each memblock to store its own memmap + * pages. + * + * Please note that this is only a hint, not a guarantee. Only selected + * architectures support it with SPARSE_VMEMMAP. + */ +#define MHP_MEMMAP_ON_MEMORY (1UL<<1) + +/* * Restrictions for the memory hotplug: * flags: MHP_ flags * altmap: alternative allocator for memmap array @@ -340,9 +359,9 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {} #endif /* CONFIG_MEMORY_HOTREMOVE */ extern void __ref free_area_init_core_hotplug(int nid); -extern int __add_memory(int nid, u64 start, u64 size); -extern int add_memory(int nid, u64 start, u64 size); -extern int add_memory_resource(int nid, struct resource *resource); +extern int __add_memory(int nid, u64 start, u64 size, unsigned long flags); +extern int add_memory(int nid, u64 start, u64 size, unsigned long flags); +extern int add_memory_resource(int nid, struct resource *resource, unsigned long flags); extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap); extern bool is_memblock_offlined(struct memory_block *mem); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 9a82e12bd0e7..3d97c3711333 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1046,7 +1046,7 @@ static int online_memory_block(struct memory_block *mem, void *arg) * * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ -int __ref add_memory_resource(int nid, struct resource *res) +int __ref add_memory_resource(int nid, struct resource *res, unsigned long flags) { struct mhp_restrictions restrictions = {}; u64 start, size; @@ -1123,7 +1123,7 @@ int __ref add_memory_resource(int nid, struct resource *res) } /* requires device_hotplug_lock, see add_memory_resource() */ -int __ref __add_memory(int nid, u64 start, u64 size) +int __ref __add_memory(int nid, u64 start, u64 size, unsigned long flags) { struct resource *res; int ret; @@ -1132,18 +1132,18 @@ int __ref __add_memory(int nid, u64 start, u64 size) if (IS_ERR(res)) return PTR_ERR(res); - ret = add_memory_resource(nid, res); + ret = add_memory_resource(nid, res, flags); if (ret < 0) release_memory_resource(res); return ret; } -int add_memory(int nid, u64 start, u64 size) +int add_memory(int nid, u64 start, u64 size, unsigned long flags) { int rc; lock_device_hotplug(); - rc = __add_memory(nid, start, size); + rc = __add_memory(nid, start, size, flags); unlock_device_hotplug(); return rc; From patchwork Thu Jul 25 16:02:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 11059173 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 99B74138D for ; Thu, 25 Jul 2019 16:02:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8AA1F28A54 for ; Thu, 25 Jul 2019 16:02:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7ED8028A6B; Thu, 25 Jul 2019 16:02:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5588D28A56 for ; Thu, 25 Jul 2019 16:02:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 36E296B0005; Thu, 25 Jul 2019 12:02:18 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 345C16B0010; Thu, 25 Jul 2019 12:02:18 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1BDEB8E0002; Thu, 25 Jul 2019 12:02:18 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by kanga.kvack.org (Postfix) with ESMTP id C06306B0005 for ; Thu, 25 Jul 2019 12:02:17 -0400 (EDT) Received: by mail-ed1-f71.google.com with SMTP id z20so32436181edr.15 for ; Thu, 25 Jul 2019 09:02:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=6Mh79RAA5pSJuJeknurkVikW8ZCWGhCG4w/vjmj7nmU=; b=FOR2RM9C6rEHKcqqOkx3zh+ykjb64UYEDBT2A3qa3gLSA4LZiXM851hcuAFeKzWFWB V0v4h6YOTuwBZySlrLhEHOTXvAzt2Sm+Or1Gcu+0BCGtX458wcWn4BVNU9xYNqRwHqYv 7SbWRzHb5g9yTbBieKt1T98MeN9schkgu+Sjrmyg38UJ2mHVcbhxER20aSdB1e1Onml1 sxlSOiC0V7MPzxpFlSbIRISOw9yMn+6oMC/zTRu5uucy7lwW3t7gCPCnnzZ+I9ezbDpa JjByLKLXEB79tDjjzcRe67T6dQbLzwDVUMP45vtCn3kGqIiFdIwlNx4koC7Z8ZtuRytj vVKQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.220.15 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Gm-Message-State: APjAAAWVgcnsoc0Sfb/fqYCIb2ndM9NlsdAsdq6UNLBG/WxsE3SJA7qG WpqLmXEXaPBQBqFmKgoAKIKgBlEHve02XhbrCYjX9BtSdkfMqgl9NERSwDE2nPjE71iyT49+Mbb RweaV4MWqqLhSVVylvguLVq2sviEnRF6DaK9mNegPelqFwnKd5C8AmHa5sMXNlh0a5g== X-Received: by 2002:a17:906:161b:: with SMTP id m27mr67622980ejd.203.1564070537250; Thu, 25 Jul 2019 09:02:17 -0700 (PDT) X-Google-Smtp-Source: APXvYqzxjN3PrVIVcACi3oyfn1EmMWSJglYHFeN/cdEwEtLf5yiddhDuwiQ2KW7jCnf2nMsGbtnb X-Received: by 2002:a17:906:161b:: with SMTP id m27mr67622883ejd.203.1564070536114; Thu, 25 Jul 2019 09:02:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564070536; cv=none; d=google.com; s=arc-20160816; b=I5qEZnZMeAc7r1WPcRBoEJVhEXhXsAna6Jj2b5DInlF7wHQmxZjfPv+YkCX21j0Af+ PZMrOVBjcowpdz0g1AEL3u3ksw8eACo2Pymp+mnGuqY4VBKYrrahS39aVzOq80gbd8+s GikRTdtmBg4U+8CtLXMzutajCeuVuGXpHDFDY9qqu3/LjbbK6s1Kvo0aKn8CsgNrtceP BgIvfpQocT037hxSXfMZBrCnIN8MsRNLWAJmkk8i9M8GtUS3peCOHQvB3J2sBBcMviCz TgPJhbBd8YkdynIN+q+b//3ZyqPCv0p2SAM2bVjW/0mFrJlGWig/s+IKkCCyuBwbu03E WaSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=6Mh79RAA5pSJuJeknurkVikW8ZCWGhCG4w/vjmj7nmU=; b=PnWVVQjucA6JjpH6QJ5SrrX0EERDjN4A+WhH1AeSh89tftarOm3iRbIfFzUamaHMK8 /+SWUJUgzHBl3/vM0NDqGcsvd8mzGo6d+aHkkEr69fxp1dEmargp+eU956C4XRjSkb/0 GRuFl+4yYYWFdJVTeqIXzW4jSX1KyGY/hHplvylmKAzUOVAn9Q8GaEJGzaL6+xk7e1NS 6pze1rWh7DPvqKA45KeqF2XyFiU4kTJd4nCu1gsizP7TMMJErk6Anjb615nlmci9uvaF 4lellV/M6jlB3s7mIJpFKDqnvIplFSCCj3TLamK8LcmenDipIiy6cQy3vmpSttRADthk ZGQA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.220.15 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id j5si9722043ejb.211.2019.07.25.09.02.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Jul 2019 09:02:16 -0700 (PDT) Received-SPF: pass (google.com: domain of osalvador@suse.de designates 195.135.220.15 as permitted sender) client-ip=195.135.220.15; Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.220.15 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id B6699AF74; Thu, 25 Jul 2019 16:02:15 +0000 (UTC) From: Oscar Salvador To: akpm@linux-foundation.org Cc: dan.j.williams@intel.com, david@redhat.com, pasha.tatashin@soleen.com, mhocko@suse.com, anshuman.khandual@arm.com, Jonathan.Cameron@huawei.com, vbabka@suse.cz, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador Subject: [PATCH v3 2/5] mm: Introduce a new Vmemmap page-type Date: Thu, 25 Jul 2019 18:02:04 +0200 Message-Id: <20190725160207.19579-3-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20190725160207.19579-1-osalvador@suse.de> References: <20190725160207.19579-1-osalvador@suse.de> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This patch introduces a new Vmemmap page-type. It also introduces some functions to ease the handling of vmemmap pages: - vmemmap_nr_sections: Returns the number of sections that used vmemmap. - vmemmap_nr_pages: Allows us to retrieve the amount of vmemmap pages derivated from any vmemmap-page in the section. Useful for accounting and to know how much to we have to skip in the case where vmemmap pages need to be ignored. - vmemmap_head: Returns the vmemmap head page - SetPageVmemmap: Sets Reserved flag bit, and sets page->type to Vmemmap. Setting the Reserved flag bit is just for extra protection, actually we do not expect anyone to use these pages for anything. - ClearPageVmemmap: Clears Reserved flag bit and page->type. Only used when sections containing vmemmap pages are removed. These functions will be used for the code handling Vmemmap pages. Signed-off-by: Oscar Salvador --- include/linux/mm.h | 17 +++++++++++++++++ include/linux/mm_types.h | 5 +++++ include/linux/page-flags.h | 19 +++++++++++++++++++ 3 files changed, 41 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 45f0ab0ed4f7..432175f8f8d2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2904,6 +2904,23 @@ static inline bool debug_guardpage_enabled(void) { return false; } static inline bool page_is_guard(struct page *page) { return false; } #endif /* CONFIG_DEBUG_PAGEALLOC */ +static __always_inline struct page *vmemmap_head(struct page *page) +{ + return (struct page *)page->vmemmap_head; +} + +static __always_inline unsigned long vmemmap_nr_sections(struct page *page) +{ + struct page *head = vmemmap_head(page); + return head->vmemmap_sections; +} + +static __always_inline unsigned long vmemmap_nr_pages(struct page *page) +{ + struct page *head = vmemmap_head(page); + return head->vmemmap_pages - (page - head); +} + #if MAX_NUMNODES > 1 void __init setup_nr_node_ids(void); #else diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6a7a1083b6fb..51dd227f2a6b 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -170,6 +170,11 @@ struct page { * pmem backed DAX files are mapped. */ }; + struct { /* Vmemmap pages */ + unsigned long vmemmap_head; + unsigned long vmemmap_sections; /* Number of sections */ + unsigned long vmemmap_pages; /* Number of pages */ + }; /** @rcu_head: You can use this to free a page by RCU. */ struct rcu_head rcu_head; diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index f91cb8898ff0..75f302a532f9 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -708,6 +708,7 @@ PAGEFLAG_FALSE(DoubleMap) #define PG_kmemcg 0x00000200 #define PG_table 0x00000400 #define PG_guard 0x00000800 +#define PG_vmemmap 0x00001000 #define PageType(page, flag) \ ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE) @@ -764,6 +765,24 @@ PAGE_TYPE_OPS(Table, table) */ PAGE_TYPE_OPS(Guard, guard) +/* + * Vmemmap pages refers to those pages that are used to create the memmap + * array, and reside within the same memory range that was hotppluged, so + * they are self-hosted. (see include/linux/memory_hotplug.h) + */ +PAGE_TYPE_OPS(Vmemmap, vmemmap) +static __always_inline void SetPageVmemmap(struct page *page) +{ + __SetPageVmemmap(page); + __SetPageReserved(page); +} + +static __always_inline void ClearPageVmemmap(struct page *page) +{ + __ClearPageVmemmap(page); + __ClearPageReserved(page); +} + extern bool is_free_buddy_page(struct page *page); __PAGEFLAG(Isolated, isolated, PF_ANY); From patchwork Thu Jul 25 16:02:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 11059179 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2016C138D for ; Thu, 25 Jul 2019 16:02:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1105C285DD for ; Thu, 25 Jul 2019 16:02:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 04C7B28A4E; Thu, 25 Jul 2019 16:02:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 735A428889 for ; Thu, 25 Jul 2019 16:02:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0AE8F8E0002; Thu, 25 Jul 2019 12:02:19 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 060236B026A; Thu, 25 Jul 2019 12:02:18 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E41808E0003; Thu, 25 Jul 2019 12:02:18 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by kanga.kvack.org (Postfix) with ESMTP id 8C7E08E0002 for ; Thu, 25 Jul 2019 12:02:18 -0400 (EDT) Received: by mail-ed1-f72.google.com with SMTP id a5so32403391edx.12 for ; Thu, 25 Jul 2019 09:02:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=Ga37T45KGhVxXQV5RRrlrcczMrKoQqBloMP20eWE9yY=; b=dBYgDIjqO4yIg6Q0db4voaDbKE9KLyvcxJi381drcKuHLzp+/DYv2YqvFd8F9hoa+c Bn76JYZZ+A+/3AikQvgs/2ReRJYGbLcWOUM6OIupGKY0CcEYxlAl31AJE4WwScQMpCYc kRSrvEHSAT/fLv7TkrH+fcdfxDkZdp9IoNPO9mGVDBvadqC7Nb5wlEVsY/zj7AVUzcMm lRXoHZ002fpeavf9H1xiUQI+32gfm/+nA9OFaRIb0nJtsSD4G5gLfmZ9F6Ufit1zdp6b huQPf2WdVni1IYvIIDQ+o7fhAOg+eXt70QyDbpEOE9BuM3uvxI3zmEP1q1NKmW9FYGKq hvzg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.220.15 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Gm-Message-State: APjAAAUdCqJF154lnXy6cYaXcwXKHmPR+MQ+ykSxi+tjQssA80NhatWF WWlTNBppG/+FBGoeTu8aI6Li6PT4DfzuHv+YmuK+u2if0SbKA+ba/ibzIUuLBj2ACJJPzgtjwyx 7+nbuZeyaFQb+m6AUBwoZSud+wa4zJ53kpwZMXTv9xT6zacqTa0jtrMOWnYC93fTkAw== X-Received: by 2002:a17:906:d201:: with SMTP id w1mr37887849ejz.303.1564070538131; Thu, 25 Jul 2019 09:02:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqzm8PEPzxtpuIEbnaJCnTjPq7rlzz2RPxHt9P6Xowh99EkA/M3n72ogX+NT9L/RauYD+u0n X-Received: by 2002:a17:906:d201:: with SMTP id w1mr37887716ejz.303.1564070536716; Thu, 25 Jul 2019 09:02:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564070536; cv=none; d=google.com; s=arc-20160816; b=RpuyvjFLMi++oLOazmy4fpgo9KZ2jwLVV6XP7QGtUlcrafqQ6Os0ZQP2sPKunS/kw2 miM4fbytt3k6CzKzjKNVrUhrqEHBEQG0xsovE2FKOti5RfqBmekJ7B6mgBRfGK+AUlL0 XyaabWjjBft4klKjAaglNFoMX9UaxGrDWCHkfpxV+JLVstSTGYFdo8ejEUljoH30w9uu 30xm8ePGFr9Bw/qEDNPigYLmUEtYe6y0oPN6Mro40Y8BGiGElXYt76FJyuwG018Ynlg0 dPrPs85p6Z3dBgqy3iV7mgQSVonYm4QLFduS8L1icMxo73sIoXGGJtGDg+Rhy+0sAEhm X2ug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=Ga37T45KGhVxXQV5RRrlrcczMrKoQqBloMP20eWE9yY=; b=jlgCTTj9Kg+I2lgR20BuAvP2KeK9AVGec5yj65MH7NW8If/ajx5u/7SjJQOz21nbYP 0oMlTTQ60Jx7caB/Ab7UmCpZRXWaolBpV7PYJDjpq+wnU8FPWE5OUlU4FmkhvU6YD6/s a6zFkNlmrBbDxFtfvDqp94yl4kVp2pVZAQLHagsNwwiBFCWoRhdbaPV2xxp8ILWGoo/y gEG1cXxJB+2Dsj0v9++fSrDWaDxcyvcqSlu7ei/qQZ+x6HHKUYiVk0sJocoKtzYtGNYr hyoM1GYlR7SnQwH+DBDN4CfXBY32x1HAtM8D2xaZkJ3CqY3MPU2SON0rDvNYkvp2bGgJ PapQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.220.15 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id f9si8108600eja.7.2019.07.25.09.02.16 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Jul 2019 09:02:16 -0700 (PDT) Received-SPF: pass (google.com: domain of osalvador@suse.de designates 195.135.220.15 as permitted sender) client-ip=195.135.220.15; Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.220.15 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 57D71AFE1; Thu, 25 Jul 2019 16:02:16 +0000 (UTC) From: Oscar Salvador To: akpm@linux-foundation.org Cc: dan.j.williams@intel.com, david@redhat.com, pasha.tatashin@soleen.com, mhocko@suse.com, anshuman.khandual@arm.com, Jonathan.Cameron@huawei.com, vbabka@suse.cz, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador Subject: [PATCH v3 3/5] mm,sparse: Add SECTION_USE_VMEMMAP flag Date: Thu, 25 Jul 2019 18:02:05 +0200 Message-Id: <20190725160207.19579-4-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20190725160207.19579-1-osalvador@suse.de> References: <20190725160207.19579-1-osalvador@suse.de> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When hot-removing memory, we need to be careful about two things: 1) Memory range must be memory_block aligned. This is what check_hotplug_memory_range() checks for. 2) If a range was hot-added using MHP_MEMMAP_ON_MEMORY, we need to check whether the caller is removing memory with the same granularity that it was added. So to check against case 2), we mark all sections used by vmemmap (not only the ones containing vmemmap pages, but all sections spanning the memory range) with SECTION_USE_VMEMMAP. This will allow us to do some sanity checks when in hot-remove stage. Signed-off-by: Oscar Salvador --- include/linux/memory_hotplug.h | 3 ++- include/linux/mmzone.h | 8 +++++++- mm/memory_hotplug.c | 2 +- mm/sparse.c | 9 +++++++-- 4 files changed, 17 insertions(+), 5 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 45dece922d7c..6b20008d9297 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -366,7 +366,8 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap); extern bool is_memblock_offlined(struct memory_block *mem); extern int sparse_add_section(int nid, unsigned long pfn, - unsigned long nr_pages, struct vmem_altmap *altmap); + unsigned long nr_pages, struct vmem_altmap *altmap, + bool vmemmap_section); extern void sparse_remove_section(struct mem_section *ms, unsigned long pfn, unsigned long nr_pages, unsigned long map_offset, struct vmem_altmap *altmap); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index d77d717c620c..259c326962f5 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1254,7 +1254,8 @@ extern size_t mem_section_usage_size(void); #define SECTION_HAS_MEM_MAP (1UL<<1) #define SECTION_IS_ONLINE (1UL<<2) #define SECTION_IS_EARLY (1UL<<3) -#define SECTION_MAP_LAST_BIT (1UL<<4) +#define SECTION_USE_VMEMMAP (1UL<<4) +#define SECTION_MAP_LAST_BIT (1UL<<5) #define SECTION_MAP_MASK (~(SECTION_MAP_LAST_BIT-1)) #define SECTION_NID_SHIFT 3 @@ -1265,6 +1266,11 @@ static inline struct page *__section_mem_map_addr(struct mem_section *section) return (struct page *)map; } +static inline int vmemmap_section(struct mem_section *section) +{ + return (section && (section->section_mem_map & SECTION_USE_VMEMMAP)); +} + static inline int present_section(struct mem_section *section) { return (section && (section->section_mem_map & SECTION_MARKED_PRESENT)); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 3d97c3711333..c2338703ce80 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -314,7 +314,7 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, pfns = min(nr_pages, PAGES_PER_SECTION - (pfn & ~PAGE_SECTION_MASK)); - err = sparse_add_section(nid, pfn, pfns, altmap); + err = sparse_add_section(nid, pfn, pfns, altmap, 0); if (err) break; pfn += pfns; diff --git a/mm/sparse.c b/mm/sparse.c index 79355a86064f..09cac39e39d9 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -856,13 +856,18 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn, * * -ENOMEM - Out of memory. */ int __meminit sparse_add_section(int nid, unsigned long start_pfn, - unsigned long nr_pages, struct vmem_altmap *altmap) + unsigned long nr_pages, struct vmem_altmap *altmap, + bool vmemmap_section) { unsigned long section_nr = pfn_to_section_nr(start_pfn); + unsigned long flags = 0; struct mem_section *ms; struct page *memmap; int ret; + if (vmemmap_section) + flags = SECTION_USE_VMEMMAP; + ret = sparse_index_init(section_nr, nid); if (ret < 0) return ret; @@ -884,7 +889,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn, /* Align memmap to section boundary in the subsection case */ if (section_nr_to_pfn(section_nr) != start_pfn) memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr)); - sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0); + sparse_init_one_section(ms, section_nr, memmap, ms->usage, flags); return 0; } From patchwork Thu Jul 25 16:02:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 11059183 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 16E2713A4 for ; Thu, 25 Jul 2019 16:02:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0331628A19 for ; Thu, 25 Jul 2019 16:02:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EACB228A3E; Thu, 25 Jul 2019 16:02:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6E08528A5D for ; Thu, 25 Jul 2019 16:02:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 542728E0003; Thu, 25 Jul 2019 12:02:21 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4F2E46B026B; Thu, 25 Jul 2019 12:02:21 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3956C8E0003; Thu, 25 Jul 2019 12:02:21 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by kanga.kvack.org (Postfix) with ESMTP id B55846B026A for ; Thu, 25 Jul 2019 12:02:20 -0400 (EDT) Received: by mail-ed1-f72.google.com with SMTP id n3so32390006edr.8 for ; Thu, 25 Jul 2019 09:02:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=f7UUvJsfXZXodHTfIDUHMCaAGiYivhzT6pUIeqGM2J0=; b=MC5xqpTipo+J0i2U4qQ6jeGqpxLvBESopzDqElrDnmmi/NRz2kowqGyqeMUSppA+jh 12W6sHnj2xdTUkkktJorBTYASEk5T7SAT2Z2wGX9YyStHRIHtexvHP6QDGOuk1rykNd7 6BMujhsqbAzccsSnIjXhaKNGhbfznO0MHyr4C7s73kFGefgnabMa6Lf6tFRpVkKL0XnU 9E3FYdqME79M5bpNcZu0rOowXZSEAiz+ULydsPE6C/xNchOBUi3fvtuHZu1Jpy9xnN/y P9xcV2Zz9pVDOSOEXyPpT/l0Ebqdl/TBT2TO/7SQwRFquoPIqxHNUM+hHGZcLAgbbjxf /MLA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.220.15 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Gm-Message-State: APjAAAUZJFYkRy/kqIfD3QSmhS/E6JB4OM0pLkJ6YLjh/2L0Kdpo/Wkl HxL1DuThDH9nCG1EZBLamZpHtJM5nY3wiBV8IMfj9oaLfBY5kCHdFZ8YnWVopAvN4wJX8wn4pNj vbo9CuBDhwPTZKThW/h/zhCxLDtlnIvJO4CuHxGiuUh27R5Fe+4S5Cs3kBjWlfyOCKA== X-Received: by 2002:a17:906:c459:: with SMTP id ck25mr67563012ejb.32.1564070540225; Thu, 25 Jul 2019 09:02:20 -0700 (PDT) X-Google-Smtp-Source: APXvYqxlqoP91n67XZ2JoBztBRn0hDSg8Q+61OzMpeSENN+HtaHpInf9Yf8j0bhPphm7DoLMNdyK X-Received: by 2002:a17:906:c459:: with SMTP id ck25mr67562775ejb.32.1564070537558; Thu, 25 Jul 2019 09:02:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564070537; cv=none; d=google.com; s=arc-20160816; b=Vr25dGesPX9uORLKC5GNojQcVdYMBdLM1akUBPn5W1Y6CtDc/+wj0f5tpU1ZuFn559 U17QuZtStwAy6AEW0pKjATCEW9l8Yfha+NpssbzmDH4qPFQiYLKZ+8tbvmWevfLZx+1q Cq96LLJUkAklz7UF9Oeb7+zBZqvrLsPzOenWYc1j/XP/bmkpGqaM+TxgacNu96JZI6Fh xXyeH7e/pjIkjgyPhUwF/kOVYTFFBwsa3z9Rc9RQmUWn/YDotVskopqfyrMusUqBn1rN 0GfY9po+Tj6byJfFwGMUr/5Bd53Q0krOSCMVlBucN4R5CBo0xcghyHfCVcJ17rMuShuE xW9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=f7UUvJsfXZXodHTfIDUHMCaAGiYivhzT6pUIeqGM2J0=; b=M+u12nxynlUCNW0FGU2Thw2mFJTJX+2jiif87jN+B1uHJKEGuNnx7tYiR/jfBRgeV0 Mp4VMlDYIDmX0ehmUMw7dkTGDjQ/D6FV/yWKuPNS4YujnP5nrJPe8Geg8qGg6yciQgvR aHoPdZvJvxJoDsH6p9SEknJjuepKwrS64vWywClOlCo5+lCbkqK+rtpj/gZFpIiJhbno 1gtYQtXJJisxR7SIuMgUCNjUXLWcIfPdMNLDe2lRE1IchnEgh9AQFX+oRdBmJE5UOQ5T hg6ZuoHZ5mtNAfYPsvBF1jYrrC5kqkjwhYnqXCQcLNkzRkNQ1hK2A4tsSs8HrajZsR67 bY8w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.220.15 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id z7si11339955edc.200.2019.07.25.09.02.17 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Jul 2019 09:02:17 -0700 (PDT) Received-SPF: pass (google.com: domain of osalvador@suse.de designates 195.135.220.15 as permitted sender) client-ip=195.135.220.15; Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.220.15 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 0E131AFE2; Thu, 25 Jul 2019 16:02:17 +0000 (UTC) From: Oscar Salvador To: akpm@linux-foundation.org Cc: dan.j.williams@intel.com, david@redhat.com, pasha.tatashin@soleen.com, mhocko@suse.com, anshuman.khandual@arm.com, Jonathan.Cameron@huawei.com, vbabka@suse.cz, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador Subject: [PATCH v3 4/5] mm,memory_hotplug: Allocate memmap from the added memory range for sparse-vmemmap Date: Thu, 25 Jul 2019 18:02:06 +0200 Message-Id: <20190725160207.19579-5-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20190725160207.19579-1-osalvador@suse.de> References: <20190725160207.19579-1-osalvador@suse.de> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Physical memory hotadd has to allocate a memmap (struct page array) for the newly added memory section. Currently, alloc_pages_node() is used for those allocations. This has some disadvantages: a) an existing memory is consumed for that purpose (~2MB per 128MB memory section on x86_64) b) if the whole node is movable then we have off-node struct pages which has performance drawbacks. a) has turned out to be a problem for memory hotplug based ballooning because the userspace might not react in time to online memory while the memory consumed during physical hotadd consumes enough memory to push system to OOM. 31bc3858ea3e ("memory-hotplug: add automatic onlining policy for the newly added memory") has been added to workaround that problem. This can be improved when CONFIG_SPARSEMEM_VMEMMAP is enabled. Vmemap page tables can map arbitrary memory. That means that we can simply use the beginning of each memory section and map struct pages there. struct pages which back the allocated space then just need to be treated carefully. Implementation wise we will reuse vmem_altmap infrastructure to override the default allocator used by __vmemap_populate. Once the memmap is allocated, we are going to need a way to mark altmap pfns used for the allocation. If MHP_MEMMAP_ON_MEMORY flag was passed, we will set up the layout of the altmap structure at the beginning of __add_pages(), and then we will call mhp_mark_vmemmap_pages() to do the proper marking. mhp_mark_vmemmap_pages() marks the pages as vmemmap and sets some metadata: Vmemmap's pages layout is as follows: * Layout: * Head: * head->vmemmap_pages : nr of vmemmap pages * head->vmemmap_sections : nr of sections used by this altmap * Tail: * tail->vmemmap_head : head * All: * page->type : Vmemmap E.g: When hot-add 1GB on x86_64 : head->vmemmap_pages = 4096 head->vmemmap_sections = 8 We keep this information within the struct pages as we need them in certain stages like offline, online and hot-remove. head->vmemmap_sections is a kind of refcount, because when using MHP_MEMMAP_ON_MEMORY, we need to know how much do we have to defer the call to vmemmap_free(). The thing is that the first pages of the memory range are used to store the memmap mapping, so we cannot remove those first, otherwise we would blow up when accessing the other pages. So, instead of actually removing the section (with vmemmap_free), we wait until we remove the last one, and then we call vmemmap_free() for all batched sections. We also have to be careful about those pages during online and offline operations. They are simply skipped, so online will keep them reserved and so unusable for any other purpose and offline ignores them so they do not block the offline operation. In offline operation we only have to check for one particularity. Depending on the way the hot-added range was added, it might be that, that one or more of memory blocks from the beginning are filled with only vmemmap pages. We just need to check for this case and skip 1) isolating 2) migrating, because those pages do not need to be migrated anywhere, as they are self-hosted. Signed-off-by: Oscar Salvador --- arch/powerpc/mm/init_64.c | 7 +++ arch/s390/mm/init.c | 6 ++ arch/x86/mm/init_64.c | 10 +++ drivers/acpi/acpi_memhotplug.c | 3 +- include/linux/memory_hotplug.h | 6 ++ include/linux/memremap.h | 2 +- mm/compaction.c | 7 +++ mm/memory_hotplug.c | 136 ++++++++++++++++++++++++++++++++++++++--- mm/page_alloc.c | 26 +++++++- mm/page_isolation.c | 14 ++++- mm/sparse.c | 107 ++++++++++++++++++++++++++++++++ 11 files changed, 309 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index a44f6281ca3a..f19aa006ca6d 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -292,6 +292,13 @@ void __ref vmemmap_free(unsigned long start, unsigned long end, if (base_pfn >= alt_start && base_pfn < alt_end) { vmem_altmap_free(altmap, nr_pages); + } else if (PageVmemmap(page)) { + /* + * runtime vmemmap pages are residing inside the memory + * section so they do not have to be freed anywhere. + */ + while (PageVmemmap(page)) + ClearPageVmemmap(page++); } else if (PageReserved(page)) { /* allocated from bootmem */ if (page_size < PAGE_SIZE) { diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index 20340a03ad90..adb04f3977eb 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -278,6 +278,12 @@ int arch_add_memory(int nid, u64 start, u64 size, unsigned long size_pages = PFN_DOWN(size); int rc; + /* + * Physical memory is added only later during the memory online so we + * cannot use the added range at this stage unfortunately. + */ + restrictions->flags &= ~restrictions->flags; + if (WARN_ON_ONCE(restrictions->altmap)) return -EINVAL; diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index a6b5c653727b..f9f720a28b3e 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -876,6 +876,16 @@ static void __meminit free_pagetable(struct page *page, int order) unsigned long magic; unsigned int nr_pages = 1 << order; + /* + * Runtime vmemmap pages are residing inside the memory section so + * they do not have to be freed anywhere. + */ + if (PageVmemmap(page)) { + while (nr_pages--) + ClearPageVmemmap(page++); + return; + } + /* bootmem page has reserved flag */ if (PageReserved(page)) { __ClearPageReserved(page); diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index d91b3584d4b2..e0148dde5313 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -207,7 +207,8 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) if (node < 0) node = memory_add_physaddr_to_nid(info->start_addr); - result = __add_memory(node, info->start_addr, info->length, 0); + result = __add_memory(node, info->start_addr, info->length, + MHP_MEMMAP_ON_MEMORY); /* * If the memory block has been used by the kernel, add_memory() diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 6b20008d9297..e1e8abf22a80 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -377,4 +377,10 @@ extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_ int online_type); extern struct zone *zone_for_pfn_range(int online_type, int nid, unsigned start_pfn, unsigned long nr_pages); + +#ifdef CONFIG_SPARSEMEM_VMEMMAP +extern void mhp_mark_vmemmap_pages(struct vmem_altmap *self); +#else +static inline void mhp_mark_vmemmap_pages(struct vmem_altmap *self) {} +#endif #endif /* __LINUX_MEMORY_HOTPLUG_H */ diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 2cfc3c289d01..0a7355b8c1cf 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -16,7 +16,7 @@ struct device; * @alloc: track pages consumed, private to vmemmap_populate() */ struct vmem_altmap { - const unsigned long base_pfn; + unsigned long base_pfn; const unsigned long reserve; unsigned long free; unsigned long align; diff --git a/mm/compaction.c b/mm/compaction.c index ac4ead029b4a..2faf769375c4 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -857,6 +857,13 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, nr_scanned++; page = pfn_to_page(low_pfn); + /* + * Vmemmap pages do not need to be isolated. + */ + if (PageVmemmap(page)) { + low_pfn += vmemmap_nr_pages(page) - 1; + continue; + } /* * Check if the pageblock has already been marked skipped. diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index c2338703ce80..09d41339cd11 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -278,6 +278,13 @@ static int check_pfn_span(unsigned long pfn, unsigned long nr_pages, return 0; } +static void mhp_init_altmap(unsigned long pfn, unsigned long nr_pages, + struct vmem_altmap *altmap) +{ + altmap->free = nr_pages; + altmap->base_pfn = pfn; +} + /* * Reasonably generic function for adding memory. It is * expected that archs that support memory hotplug will @@ -289,8 +296,18 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, { int err; unsigned long nr, start_sec, end_sec; - struct vmem_altmap *altmap = restrictions->altmap; + struct vmem_altmap *altmap; + struct vmem_altmap mhp_altmap = {}; + unsigned long mhp_flags = restrictions->flags; + bool vmemmap_section = false; + + if (mhp_flags) { + mhp_init_altmap(pfn, nr_pages, &mhp_altmap); + restrictions->altmap = &mhp_altmap; + vmemmap_section = true; + } + altmap = restrictions->altmap; if (altmap) { /* * Validate altmap is within bounds of the total request @@ -314,7 +331,7 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, pfns = min(nr_pages, PAGES_PER_SECTION - (pfn & ~PAGE_SECTION_MASK)); - err = sparse_add_section(nid, pfn, pfns, altmap, 0); + err = sparse_add_section(nid, pfn, pfns, altmap, vmemmap_section); if (err) break; pfn += pfns; @@ -322,6 +339,10 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages, cond_resched(); } vmemmap_populate_print_last(); + + if (mhp_flags) + mhp_mark_vmemmap_pages(altmap); + return err; } @@ -640,6 +661,14 @@ static int online_pages_blocks(unsigned long start, unsigned long nr_pages) while (start < end) { order = min(MAX_ORDER - 1, get_order(PFN_PHYS(end) - PFN_PHYS(start))); + /* + * Check if the pfn is aligned to its order. + * If not, we decrement the order until it is, + * otherwise __free_one_page will bug us. + */ + while (start & ((1 << order) - 1)) + order--; + (*online_page_callback)(pfn_to_page(start), order); onlined_pages += (1UL << order); @@ -648,17 +677,51 @@ static int online_pages_blocks(unsigned long start, unsigned long nr_pages) return onlined_pages; } +static bool vmemmap_skip_block(unsigned long pfn, unsigned long nr_pages, + unsigned long *nr_vmemmap_pages) +{ + bool skip = false; + unsigned long vmemmap_pages = 0; + + /* + * This function gets called from {online,offline}_pages. + * It has two goals: + * + * 1) Account number of vmemmap pages within the range + * 2) Check if the whole range contains only vmemmap_pages. + */ + + if (PageVmemmap(pfn_to_page(pfn))) { + struct page *page = pfn_to_page(pfn); + + vmemmap_pages = min(vmemmap_nr_pages(page), nr_pages); + if (vmemmap_pages == nr_pages) + skip = true; + } + + *nr_vmemmap_pages = vmemmap_pages; + return skip; +} + static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages, void *arg) { unsigned long onlined_pages = *(unsigned long *)arg; - - if (PageReserved(pfn_to_page(start_pfn))) - onlined_pages += online_pages_blocks(start_pfn, nr_pages); - + unsigned long pfn = start_pfn; + unsigned long nr_vmemmap_pages = 0; + bool skip; + + skip = vmemmap_skip_block(pfn, nr_pages, &nr_vmemmap_pages); + if (skip) + goto skip_online_pages; + + pfn += nr_vmemmap_pages; + if (PageReserved(pfn_to_page(pfn))) + onlined_pages += online_pages_blocks(pfn, nr_pages - nr_vmemmap_pages); +skip_online_pages: online_mem_sections(start_pfn, start_pfn + nr_pages); - *(unsigned long *)arg = onlined_pages; + *(unsigned long *)arg = onlined_pages + nr_vmemmap_pages; return 0; } @@ -1040,6 +1103,19 @@ static int online_memory_block(struct memory_block *mem, void *arg) return device_online(&mem->dev); } +static unsigned long mhp_check_flags(unsigned long flags) +{ + if (!flags) + return 0; + + if (flags != MHP_MEMMAP_ON_MEMORY) { + WARN(1, "Wrong flags value (%lx). Ignoring flags.\n", flags); + return 0; + } + + return flags; +} + /* * NOTE: The caller must call lock_device_hotplug() to serialize hotplug * and online/offline operations (triggered e.g. by sysfs). @@ -1075,6 +1151,8 @@ int __ref add_memory_resource(int nid, struct resource *res, unsigned long flags goto error; new_node = ret; + restrictions.flags = mhp_check_flags(flags); + /* call arch's memory hotadd */ ret = arch_add_memory(nid, start, size, &restrictions); if (ret < 0) @@ -1502,12 +1580,14 @@ static int __ref __offline_pages(unsigned long start_pfn, { unsigned long pfn, nr_pages; unsigned long offlined_pages = 0; + unsigned long nr_vmemmap_pages = 0; int ret, node, nr_isolate_pageblock; unsigned long flags; unsigned long valid_start, valid_end; struct zone *zone; struct memory_notify arg; char *reason; + bool skip = false; mem_hotplug_begin(); @@ -1524,8 +1604,10 @@ static int __ref __offline_pages(unsigned long start_pfn, node = zone_to_nid(zone); nr_pages = end_pfn - start_pfn; + skip = vmemmap_skip_block(start_pfn, nr_pages, &nr_vmemmap_pages); + /* set above range as isolated */ - ret = start_isolate_page_range(start_pfn, end_pfn, + ret = start_isolate_page_range(start_pfn + nr_vmemmap_pages, end_pfn, MIGRATE_MOVABLE, SKIP_HWPOISON | REPORT_FAILURE); if (ret < 0) { @@ -1545,6 +1627,9 @@ static int __ref __offline_pages(unsigned long start_pfn, goto failed_removal_isolated; } + if (skip) + goto skip_migration; + do { for (pfn = start_pfn; pfn;) { if (signal_pending(current)) { @@ -1581,6 +1666,7 @@ static int __ref __offline_pages(unsigned long start_pfn, NULL, check_pages_isolated_cb); } while (ret); +skip_migration: /* Ok, all of our target is isolated. We cannot do rollback at this point. */ walk_system_ram_range(start_pfn, end_pfn - start_pfn, @@ -1596,7 +1682,9 @@ static int __ref __offline_pages(unsigned long start_pfn, spin_unlock_irqrestore(&zone->lock, flags); /* removal success */ - adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages); + if (offlined_pages) + adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages); + offlined_pages += nr_vmemmap_pages; zone->present_pages -= offlined_pages; pgdat_resize_lock(zone->zone_pgdat, &flags); @@ -1739,11 +1827,41 @@ static void __release_memory_resource(resource_size_t start, } } +static int check_hotplug_granularity(u64 start, u64 size) +{ + unsigned long pfn = PHYS_PFN(start); + + /* + * Sanity check in case the range used MHP_MEMMAP_ON_MEMORY. + */ + if (vmemmap_section(__pfn_to_section(pfn))) { + struct page *page = pfn_to_page(pfn); + unsigned long nr_pages = size >> PAGE_SHIFT; + unsigned long sections; + + /* + * The start of the memory range is not correct. + */ + if (!PageVmemmap(page) || (vmemmap_head(page) != page)) + return -EINVAL; + + sections = vmemmap_nr_sections(page); + if (sections * PAGES_PER_SECTION != nr_pages) + /* + * Check that granularity is the same. + */ + return -EINVAL; + } + + return 0; +} + static int __ref try_remove_memory(int nid, u64 start, u64 size) { int rc = 0; BUG_ON(check_hotplug_memory_range(start, size)); + BUG_ON(check_hotplug_granularity(start, size)); mem_hotplug_begin(); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d3bb601c461b..7c7d7130b627 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1340,14 +1340,21 @@ static void free_one_page(struct zone *zone, static void __meminit __init_single_page(struct page *page, unsigned long pfn, unsigned long zone, int nid) { + if (PageVmemmap(page)) + /* + * Vmemmap pages need to preserve their state. + */ + goto preserve_state; + mm_zero_struct_page(page); - set_page_links(page, zone, nid, pfn); - init_page_count(page); page_mapcount_reset(page); + INIT_LIST_HEAD(&page->lru); +preserve_state: + init_page_count(page); + set_page_links(page, zone, nid, pfn); page_cpupid_reset_last(page); page_kasan_tag_reset(page); - INIT_LIST_HEAD(&page->lru); #ifdef WANT_PAGE_VIRTUAL /* The shift won't overflow because ZONE_NORMAL is below 4G. */ if (!is_highmem_idx(zone)) @@ -8184,6 +8191,14 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, page = pfn_to_page(check); + /* + * Vmemmap pages are not needed to be moved around. + */ + if (PageVmemmap(page)) { + iter += vmemmap_nr_pages(page) - 1; + continue; + } + if (PageReserved(page)) goto unmovable; @@ -8551,6 +8566,11 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) continue; } page = pfn_to_page(pfn); + + if (PageVmemmap(page)) { + pfn += vmemmap_nr_pages(page); + continue; + } /* * The HWPoisoned page may be not in buddy system, and * page_count() is not 0. diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 89c19c0feadb..ee26ea41c9eb 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -146,7 +146,7 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype) static inline struct page * __first_valid_page(unsigned long pfn, unsigned long nr_pages) { - int i; + unsigned long i; for (i = 0; i < nr_pages; i++) { struct page *page; @@ -154,6 +154,10 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) page = pfn_to_online_page(pfn + i); if (!page) continue; + if (PageVmemmap(page)) { + i += vmemmap_nr_pages(page) - 1; + continue; + } return page; } return NULL; @@ -267,6 +271,14 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn, continue; } page = pfn_to_page(pfn); + /* + * Vmemmap pages are not isolated. Skip them. + */ + if (PageVmemmap(page)) { + pfn += vmemmap_nr_pages(page); + continue; + } + if (PageBuddy(page)) /* * If the page is on a free list, it has to be on diff --git a/mm/sparse.c b/mm/sparse.c index 09cac39e39d9..2cc2e5af1986 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -645,18 +645,125 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn) #endif #ifdef CONFIG_SPARSEMEM_VMEMMAP +static void vmemmap_init_page(struct page *page, struct page *head) +{ + page_mapcount_reset(page); + SetPageVmemmap(page); + page->vmemmap_head = (unsigned long)head; +} + +static void vmemmap_init_head(struct page *page, unsigned long nr_sections, + unsigned long nr_pages) +{ + page->vmemmap_sections = nr_sections; + page->vmemmap_pages = nr_pages; +} + +void mhp_mark_vmemmap_pages(struct vmem_altmap *self) +{ + unsigned long pfn = self->base_pfn + self->reserve; + unsigned long nr_pages = self->alloc; + unsigned long nr_sects = self->free / PAGES_PER_SECTION; + unsigned long i; + struct page *head; + + if (!nr_pages) + return; + + /* + * All allocations for the memory hotplug are the same sized so align + * should be 0. + */ + WARN_ON(self->align); + + memset(pfn_to_page(pfn), 0, sizeof(struct page) * nr_pages); + + /* + * Mark pages as Vmemmap pages + * Layout: + * Head: + * head->vmemmap_pages : nr of vmemmap pages + * head->mhp_flags : MHP_flags + * head->vmemmap_sections : nr of sections used by this altmap + * Tail: + * tail->vmemmap_head : head + * All: + * page->type : Vmemmap + */ + head = pfn_to_page(pfn); + for (i = 0; i < nr_pages; i++) { + struct page *page = head + i; + + vmemmap_init_page(page, head); + } + vmemmap_init_head(head, nr_sects, nr_pages); +} + +/* + * If the range we are trying to remove was hot-added with vmemmap pages + * using MHP_MEMMAP_*, we need to keep track of it to know how much + * do we have do defer the free up. + * Since sections are removed sequentally in __remove_pages()-> + * __remove_section(), we just wait until we hit the last section. + * Once that happens, we can trigger free_deferred_vmemmap_range to actually + * free the whole memory-range. + */ +static struct page *__vmemmap_head = NULL; + static struct page *populate_section_memmap(unsigned long pfn, unsigned long nr_pages, int nid, struct vmem_altmap *altmap) { return __populate_section_memmap(pfn, nr_pages, nid, altmap); } +static void vmemmap_free_deferred_range(unsigned long start, + unsigned long end) +{ + unsigned long nr_pages = end - start; + unsigned long first_section; + + first_section = (unsigned long)__vmemmap_head; + while (start >= first_section) { + vmemmap_free(start, end, NULL); + end = start; + start -= nr_pages; + } + __vmemmap_head = NULL; +} + +static inline bool vmemmap_dec_and_test(void) +{ + __vmemmap_head->vmemmap_sections--; + return !__vmemmap_head->vmemmap_sections; +} + +static void vmemmap_defer_free(unsigned long start, unsigned long end) +{ + if (vmemmap_dec_and_test()) + vmemmap_free_deferred_range(start, end); +} + +static inline bool should_defer_freeing(unsigned long start) +{ + if (PageVmemmap((struct page *)start) || __vmemmap_head) { + if (!__vmemmap_head) + __vmemmap_head = (struct page *)start; + return true; + } + return false; +} + static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages, struct vmem_altmap *altmap) { unsigned long start = (unsigned long) pfn_to_page(pfn); unsigned long end = start + nr_pages * sizeof(struct page); + if (should_defer_freeing(start)) { + vmemmap_defer_free(start, end); + return; + } + vmemmap_free(start, end, altmap); } static void free_map_bootmem(struct page *memmap) From patchwork Thu Jul 25 16:02:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 11059181 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 82FFA138D for ; Thu, 25 Jul 2019 16:02:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 740B828866 for ; Thu, 25 Jul 2019 16:02:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6845228A4E; Thu, 25 Jul 2019 16:02:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A4D2028866 for ; Thu, 25 Jul 2019 16:02:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DA7946B0010; Thu, 25 Jul 2019 12:02:19 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D0B666B026A; Thu, 25 Jul 2019 12:02:19 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B5D466B026B; Thu, 25 Jul 2019 12:02:19 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by kanga.kvack.org (Postfix) with ESMTP id 64E396B0010 for ; Thu, 25 Jul 2019 12:02:19 -0400 (EDT) Received: by mail-ed1-f72.google.com with SMTP id i9so32398881edr.13 for ; Thu, 25 Jul 2019 09:02:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=2GwpzrV+PsqSwM2Q5DY24zOCxeh8R2hXPbC9OI9gsOk=; b=DETyLGusUb55BqeLOfFIyrB1NllUqAbpp0itWCnYrWeBSb0alQNK94Ny90wBC9JSS5 1mKx3BUwciMoNvf3qV5HXj+EFypIpFuTu7d/eDF65fa0P0uh5hXoggVK5fvIUSLI1WQq K6AkFR+QbhDMit3AzKgXT+YJhfsGZfzNfr+/gG81jmw8oMvRUhGSagzY9rgb8FCdF6XO 6r2U1XGq6G8Fy5gW4CNgLKvGN45EVJHaBizudR2A/VhGuJXSF6NMu0AyOMIhDiqdq/J+ GH/9ShmatfFPSjjAaG9o3FbMGWh8dyOI1xQJKUTshyYMjOaceBfHjtoSddFZUfazyDzx 285g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.220.15 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Gm-Message-State: APjAAAW4TDtsuSs9CmdP6YBDPq89gt19AkefoWG0W2cGegLOFhy8hVSg l3ss+YzHHjsVR3WDrG35yPfZSGBZkw3CnaHcn0iI4WFlOnMs+c1oXDrhohBNrjjl++vrWQ6DF3A VfQ5g3O3oN/ia4Nb060seXiKe8E939NnZ3EqU8+r1d3kCUjWKrhOKQFXCyO1RelKTTw== X-Received: by 2002:a05:6402:1707:: with SMTP id y7mr76432399edu.223.1564070538978; Thu, 25 Jul 2019 09:02:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqzfc5Ibbt5DPZ79IObwUUObEFwn4KiETWArV4S29z6gJnr1fVIb8TZ9GqwmUGawhgcKuQpG X-Received: by 2002:a05:6402:1707:: with SMTP id y7mr76432315edu.223.1564070538127; Thu, 25 Jul 2019 09:02:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564070538; cv=none; d=google.com; s=arc-20160816; b=NK0ON2Xc8vqcH1ellZXDTJr6pCqo6qPp23ZLsfFmCIYab+tzB+pn/ynSJVrhCLQEmo TqehqKJqdy0HOyZ4aQoPPN8+6c3F5i4pgRJrDOy4cVI2NzlNY9c1gUMIfKuKHQPbhYHM D23xed9O0dwLN86MlCZSq/8TUkLrEJCzHfXzPkZgcurh16Z/mAcroDUxjB5taxpz5Y3d AQgDEnxTMyAj0NJas3Lb44qjnB2dovj6eKC4mE8LCg+EQWXWZlLIaJdSvdNfVnQmIzsi AdFyaBlBl1dpnHGS37Cb2rkD+D2aOTwJbWSeZ2CtmRz38u+Wq47spvUx5kEnkVYRVG3B kJOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=2GwpzrV+PsqSwM2Q5DY24zOCxeh8R2hXPbC9OI9gsOk=; b=oQoBghLDyQAfldwvEWvj/a3vCxG/8+gKX5WOV9K+Yr+rYVg7l/Af5zlqE8Nqo+zYlM ESKaZyD5lanRgjA1dikDgCXPmFlgery5VUVQON46wtPyg4JdoSucO3DdnIwN/pMiK5CY Bw1OBsgTlwAL9VEyk6k9FCh7LpezGUATMwl+1aL4YV+s4yQDoeTz4+MYa1Y2XIHvFQ5w 5Js6oPqjXoaZVIjNnXPVqVpHW6kEnDrtdBXhF6+RfUsJa74+ELOIKElB98o11UK8DIN0 Rb56PxG6PNmvYIHqf1gxLIeIizg46gtE7WdekvbPRiSHrD+9yn+hMTSf4N7vwrZFSY74 OXog== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.220.15 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id v5si9328567eja.80.2019.07.25.09.02.17 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Jul 2019 09:02:18 -0700 (PDT) Received-SPF: pass (google.com: domain of osalvador@suse.de designates 195.135.220.15 as permitted sender) client-ip=195.135.220.15; Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.220.15 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id AEB8DAFFB; Thu, 25 Jul 2019 16:02:17 +0000 (UTC) From: Oscar Salvador To: akpm@linux-foundation.org Cc: dan.j.williams@intel.com, david@redhat.com, pasha.tatashin@soleen.com, mhocko@suse.com, anshuman.khandual@arm.com, Jonathan.Cameron@huawei.com, vbabka@suse.cz, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador Subject: [PATCH v3 5/5] mm,memory_hotplug: Allow userspace to enable/disable vmemmap Date: Thu, 25 Jul 2019 18:02:07 +0200 Message-Id: <20190725160207.19579-6-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20190725160207.19579-1-osalvador@suse.de> References: <20190725160207.19579-1-osalvador@suse.de> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP It seems that we have some users out there that want to expose all hotpluggable memory to userspace, so this implements a toggling mechanism for those users who want to disable it. By default, vmemmap pages mechanism is enabled. Signed-off-by: Oscar Salvador --- drivers/base/memory.c | 33 +++++++++++++++++++++++++++++++++ include/linux/memory_hotplug.h | 3 +++ mm/memory_hotplug.c | 7 +++++++ 3 files changed, 43 insertions(+) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index d30d0f6c8ad0..5ec6b80de9dd 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -578,6 +578,35 @@ static DEVICE_ATTR_WO(soft_offline_page); static DEVICE_ATTR_WO(hard_offline_page); #endif +#ifdef CONFIG_SPARSEMEM_VMEMMAP +static ssize_t vmemmap_hotplug_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + if (vmemmap_enabled) + return sprintf(buf, "enabled\n"); + else + return sprintf(buf, "disabled\n"); +} + +static ssize_t vmemmap_hotplug_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (sysfs_streq(buf, "enable")) + vmemmap_enabled = true; + else if (sysfs_streq(buf, "disable")) + vmemmap_enabled = false; + else + return -EINVAL; + + return count; +} +static DEVICE_ATTR_RW(vmemmap_hotplug); +#endif + /* * Note that phys_device is optional. It is here to allow for * differentiation between which *physical* devices each @@ -794,6 +823,10 @@ static struct attribute *memory_root_attrs[] = { &dev_attr_hard_offline_page.attr, #endif +#ifdef CONFIG_SPARSEMEM_VMEMMAP + &dev_attr_vmemmap_hotplug.attr, +#endif + &dev_attr_block_size_bytes.attr, &dev_attr_auto_online_blocks.attr, NULL diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index e1e8abf22a80..03d227d13301 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -134,6 +134,9 @@ extern int arch_add_memory(int nid, u64 start, u64 size, struct mhp_restrictions *restrictions); extern u64 max_mem_size; +#ifdef CONFIG_SPARSEMEM_VMEMMAP +extern bool vmemmap_enabled; +#endif extern bool memhp_auto_online; /* If movable_node boot option specified */ extern bool movable_node_enabled; diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 09d41339cd11..5ffe5375b87c 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -68,6 +68,10 @@ void put_online_mems(void) bool movable_node_enabled = false; +#ifdef CONFIG_SPARSEMEM_VMEMMAP +bool vmemmap_enabled __read_mostly = true; +#endif + #ifndef CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE bool memhp_auto_online; #else @@ -1108,6 +1112,9 @@ static unsigned long mhp_check_flags(unsigned long flags) if (!flags) return 0; + if (!vmemmap_enabled) + return 0; + if (flags != MHP_MEMMAP_ON_MEMORY) { WARN(1, "Wrong flags value (%lx). Ignoring flags.\n", flags); return 0;