From patchwork Fri Sep 28 15:03:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 10620107 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7D0F915E8 for ; Fri, 28 Sep 2018 15:04:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 65F8B285A4 for ; Fri, 28 Sep 2018 15:04:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 61AFD2892B; Fri, 28 Sep 2018 15:04:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 38D9928F97 for ; Fri, 28 Sep 2018 15:04:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E358D8E0003; Fri, 28 Sep 2018 11:04:20 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DE4BC8E0001; Fri, 28 Sep 2018 11:04:20 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C5DB28E0003; Fri, 28 Sep 2018 11:04:20 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by kanga.kvack.org (Postfix) with ESMTP id 935DE8E0001 for ; Fri, 28 Sep 2018 11:04:20 -0400 (EDT) Received: by mail-qk1-f197.google.com with SMTP id u195-v6so6295333qka.14 for ; Fri, 28 Sep 2018 08:04:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:mime-version:content-transfer-encoding; bh=saEiQjKuNUd1dup51hDspsqAuOf90G65mZL3gh0pXjg=; b=C2delmr1eEURlsFhSkOJffEZYP+xJiR7J+WMYnya0z5E26/SpHR5tRw8x6zoXio9gq ocRmTJQqjQ0qDkcypOy9zKyNm6DYZ2LBbnsS/+Ls7W14F/6nu+J+wZ9MqD4b8oxxd4Kb pTikta+FN2qscQz5QFzefzrJIDqpGblY/JmGhmgZ+7HbKFGvoBPuy1j1nU4s70tO68/9 eboiaea42ztLHO4AXnvjFDh9GEh7c0CfDm+sH8IJel+rYxKQ12008K+PhrhFVqgo4TCs wn13L2OmxtrdBErKXjJiuBHO4sDQyDqlCjiWAkBV0VpqXFuzsWFZ+i1nbe0WAhT0/WRd J/bA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of david@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: ABuFfoglIv7M0/2lvvE+W1tMQqXJjzdkZtCanzs/uoYGHf+sJGkPi6TW g3nmNflYILM1YHetNZLfwP4wHwcezxy9MZCjRASrSaAKWa5AvD/aW8ghLm0eTau8tcNRDgKDlGl 7vAH7eibi+KVurQ7oAT1yubdimsMQNbFhtS91qjg11ezOj1r22FVuTmHaIvRvMbJAxQ== X-Received: by 2002:ac8:2204:: with SMTP id o4-v6mr13077218qto.332.1538147060294; Fri, 28 Sep 2018 08:04:20 -0700 (PDT) X-Google-Smtp-Source: ACcGV60fiJwwfCewHOhqxgd5JAbMkiVQZnYb2CYf1EHn7YBYUvTKFE/5HDAKxBCXfq9c3cGW47oS X-Received: by 2002:ac8:2204:: with SMTP id o4-v6mr13077133qto.332.1538147058940; Fri, 28 Sep 2018 08:04:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538147058; cv=none; d=google.com; s=arc-20160816; b=NACIzxt26Fow4fs+Ppp7YA2+qvrV7B6A05TUIJzXtP8de1QeSdtNv5GRh2yf/aCmtW 6LICLZX3XK5yBuATmTJ0PH3zcvaMUyBD+2orgVEW5znh1mSh4qKMY7qBqGZZkfFJj/ac vKzkCMjAIO1NUljZT6JJ9wJPa8FrYJd+lNGQDb9vm6y2QtrqG/chp9m/xjRy4dHsUxQK XUj6lPYJE2dwLe39KMIIfW9xB0shzZcvCxIvQE7JmJR+CRAYtd1NTBfmZuTCpVxc7iqi Z6ucnMyd4IBniANdqN0QXJ64+KywaMSCgCk6DFsl/4OGYRVyih6ztON0vskrFwiFNCWZ kxPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from; bh=saEiQjKuNUd1dup51hDspsqAuOf90G65mZL3gh0pXjg=; b=VQIBBd1GP1C1nkA6IJ9zXpTrmwqnghxBIwdE8szz6taqqHJjTEuqgEkjbPyzOK04ow Qz4AwWO6qU1jJX16AvHee6hS60goC/LVyOyjlIGueBpS6YwgEulcKLNXparfXoRyWUHO GTK+0Y5G4iHNKjly8DfuOVtui8/3klt+7dVETftOwDW/c7/qW94F1FXainEXyylaIpoW NxIGo2NiyDqVqAoPu6cxzmSQ22ypAdnMAs8OGX0mR2ZNbcfrRtWOs064jhZ4Cb0dybW9 y0niICAILo5sLr9+25uxNP0GRa1XclNjPXt2BnwtjK2Y7vPojCuAl/MGd8B6AcdQMG75 O8Rw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of david@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id w10-v6si317442qtk.68.2018.09.28.08.04.18 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 28 Sep 2018 08:04:18 -0700 (PDT) Received-SPF: pass (google.com: domain of david@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of david@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 97B9430833A9; Fri, 28 Sep 2018 15:04:16 +0000 (UTC) Received: from t460s.redhat.com (ovpn-116-40.ams2.redhat.com [10.36.116.40]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6C01E66829; Fri, 28 Sep 2018 15:03:58 +0000 (UTC) From: David Hildenbrand To: linux-mm@kvack.org Cc: xen-devel@lists.xenproject.org, devel@linuxdriverproject.org, linux-acpi@vger.kernel.org, linux-sh@vger.kernel.org, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org, David Hildenbrand , Tony Luck , Fenghua Yu , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Martin Schwidefsky , Heiko Carstens , Yoshinori Sato , Rich Felker , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , "Rafael J. Wysocki" , Len Brown , Greg Kroah-Hartman , "K. Y. Srinivasan" , Haiyang Zhang , Stephen Hemminger , Boris Ostrovsky , Juergen Gross , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Mike Rapoport , Dan Williams , Stephen Rothwell , Michal Hocko , "Kirill A. Shutemov" , Nicholas Piggin , =?utf-8?q?Jonathan_Neusch=C3=A4fer?= , Joe Perches , Michael Neuling , Mauricio Faria de Oliveira , Balbir Singh , Rashmica Gupta , Pavel Tatashin , Rob Herring , Philippe Ombredanne , Kate Stewart , "mike.travis@hpe.com" , Joonsoo Kim , Oscar Salvador , Mathieu Malaterre Subject: [PATCH RFC] mm/memory_hotplug: Introduce memory block types Date: Fri, 28 Sep 2018 17:03:57 +0200 Message-Id: <20180928150357.12942-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Fri, 28 Sep 2018 15:04:18 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP How to/when to online hotplugged memory is hard to manage for distributions because different memory types are to be treated differently. Right now, we need complicated udev rules that e.g. check if we are running on s390x, on a physical system or on a virtualized system. But there is also sometimes the demand to really online memory immediately while adding in the kernel and not to wait for user space to make a decision. And on virtualized systems there might be different requirements, depending on "how" the memory was added (and if it will eventually get unplugged again - DIMM vs. paravirtualized mechanisms). On the one hand, we have physical systems where we sometimes want to be able to unplug memory again - e.g. a DIMM - so we have to online it to the MOVABLE zone optionally. That decision is usually made in user space. On the other hand, we have memory that should never be onlined automatically, only when asked for by an administrator. Such memory only applies to virtualized environments like s390x, where the concept of "standby" memory exists. Memory is detected and added during boot, so it can be onlined when requested by the admininistrator or some tooling. Only when onlining, memory will be allocated in the hypervisor. But then, we also have paravirtualized devices (namely xen and hyper-v balloons), that hotplug memory that will never ever be removed from a system right now using offline_pages/remove_memory. If at all, this memory is logically unplugged and handed back to the hypervisor via ballooning. For paravirtualized devices it is relevant that memory is onlined as quickly as possible after adding - and that it is added to the NORMAL zone. Otherwise, it could happen that too much memory in a row is added (but not onlined), resulting in out-of-memory conditions due to the additional memory for "struct pages" and friends. MOVABLE zone as well as delays might be very problematic and lead to crashes (e.g. zone imbalance). Therefore, introduce memory block types and online memory depending on it when adding the memory. Expose the memory type to user space, so user space handlers can start to process only "normal" memory. Other memory block types can be ignored. One thing less to worry about in user space. Cc: Tony Luck Cc: Fenghua Yu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Yoshinori Sato Cc: Rich Felker Cc: Dave Hansen Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Greg Kroah-Hartman Cc: "K. Y. Srinivasan" Cc: Haiyang Zhang Cc: Stephen Hemminger Cc: Boris Ostrovsky Cc: Juergen Gross Cc: "Jérôme Glisse" Cc: Andrew Morton Cc: Mike Rapoport Cc: Dan Williams Cc: Stephen Rothwell Cc: Michal Hocko Cc: "Kirill A. Shutemov" Cc: David Hildenbrand Cc: Nicholas Piggin Cc: "Jonathan Neuschäfer" Cc: Joe Perches Cc: Michael Neuling Cc: Mauricio Faria de Oliveira Cc: Balbir Singh Cc: Rashmica Gupta Cc: Pavel Tatashin Cc: Rob Herring Cc: Philippe Ombredanne Cc: Kate Stewart Cc: "mike.travis@hpe.com" Cc: Joonsoo Kim Cc: Oscar Salvador Cc: Mathieu Malaterre Signed-off-by: David Hildenbrand --- This patch is based on the current mm-tree, where some related patches from me are currently residing that touched the add_memory() functions. arch/ia64/mm/init.c | 4 +- arch/powerpc/mm/mem.c | 4 +- arch/powerpc/platforms/powernv/memtrace.c | 3 +- arch/s390/mm/init.c | 4 +- arch/sh/mm/init.c | 4 +- arch/x86/mm/init_32.c | 4 +- arch/x86/mm/init_64.c | 8 +-- drivers/acpi/acpi_memhotplug.c | 3 +- drivers/base/memory.c | 63 ++++++++++++++++++++--- drivers/hv/hv_balloon.c | 33 ++---------- drivers/s390/char/sclp_cmd.c | 3 +- drivers/xen/balloon.c | 2 +- include/linux/memory.h | 28 +++++++++- include/linux/memory_hotplug.h | 17 +++--- mm/hmm.c | 6 ++- mm/memory_hotplug.c | 31 ++++++----- 16 files changed, 139 insertions(+), 78 deletions(-) diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c index d5e12ff1d73c..813d1d86bf95 100644 --- a/arch/ia64/mm/init.c +++ b/arch/ia64/mm/init.c @@ -646,13 +646,13 @@ mem_init (void) #ifdef CONFIG_MEMORY_HOTPLUG int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) + int memory_block_type) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; int ret; - ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + ret = __add_pages(nid, start_pfn, nr_pages, altmap, memory_block_type); if (ret) printk("%s: Problem encountered in __add_pages() as ret=%d\n", __func__, ret); diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index 5551f5870dcc..dd32fcc9099c 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -118,7 +118,7 @@ int __weak remove_section_mapping(unsigned long start, unsigned long end) } int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) + int memory_block_type) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; @@ -135,7 +135,7 @@ int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap * } flush_inval_dcache_range(start, start + size); - return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + return __add_pages(nid, start_pfn, nr_pages, altmap, memory_block_type); } #ifdef CONFIG_MEMORY_HOTREMOVE diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c index 84d038ed3882..57d6b3d46382 100644 --- a/arch/powerpc/platforms/powernv/memtrace.c +++ b/arch/powerpc/platforms/powernv/memtrace.c @@ -232,7 +232,8 @@ static int memtrace_online(void) ent->mem = 0; } - if (add_memory(ent->nid, ent->start, ent->size)) { + if (add_memory(ent->nid, ent->start, ent->size, + MEMORY_BLOCK_NORMAL)) { pr_err("Failed to add trace memory to node %d\n", ent->nid); ret += 1; diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index e472cd763eb3..b5324527c7f6 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -222,7 +222,7 @@ device_initcall(s390_cma_mem_init); #endif /* CONFIG_CMA */ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) + int memory_block_type) { unsigned long start_pfn = PFN_DOWN(start); unsigned long size_pages = PFN_DOWN(size); @@ -232,7 +232,7 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, if (rc) return rc; - rc = __add_pages(nid, start_pfn, size_pages, altmap, want_memblock); + rc = __add_pages(nid, start_pfn, size_pages, altmap, memory_block_type); if (rc) vmem_remove_mapping(start, size); return rc; diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c index c8c13c777162..6b876000731a 100644 --- a/arch/sh/mm/init.c +++ b/arch/sh/mm/init.c @@ -419,14 +419,14 @@ void free_initrd_mem(unsigned long start, unsigned long end) #ifdef CONFIG_MEMORY_HOTPLUG int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) + int memory_block_type) { unsigned long start_pfn = PFN_DOWN(start); unsigned long nr_pages = size >> PAGE_SHIFT; int ret; /* We only have ZONE_NORMAL, so this is easy.. */ - ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + ret = __add_pages(nid, start_pfn, nr_pages, altmap, memory_block_type); if (unlikely(ret)) printk("%s: Failed, __add_pages() == %d\n", __func__, ret); diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index f2837e4c40b3..4f50cd4467a9 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -851,12 +851,12 @@ void __init mem_init(void) #ifdef CONFIG_MEMORY_HOTPLUG int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) + int memory_block_type) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; - return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + return __add_pages(nid, start_pfn, nr_pages, altmap, memory_block_type); } #ifdef CONFIG_MEMORY_HOTREMOVE diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 5fab264948c2..fc3df573f0f3 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -783,11 +783,11 @@ static void update_end_of_memory_vars(u64 start, u64 size) } int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, - struct vmem_altmap *altmap, bool want_memblock) + struct vmem_altmap *altmap, int memory_block_type) { int ret; - ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + ret = __add_pages(nid, start_pfn, nr_pages, altmap, memory_block_type); WARN_ON_ONCE(ret); /* update max_pfn, max_low_pfn and high_memory */ @@ -798,14 +798,14 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, } int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) + int memory_block_type) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; init_memory_mapping(start, start + size); - return add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + return add_pages(nid, start_pfn, nr_pages, altmap, memory_block_type); } #define PAGE_INUSE 0xFD diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 8fe0960ea572..c5f646b4e97e 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -228,7 +228,8 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) if (node < 0) node = memory_add_physaddr_to_nid(info->start_addr); - result = __add_memory(node, info->start_addr, info->length); + result = __add_memory(node, info->start_addr, info->length, + MEMORY_BLOCK_NORMAL); /* * If the memory block has been used by the kernel, add_memory() diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 0e5985682642..2686101e41b5 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -381,6 +381,32 @@ static ssize_t show_phys_device(struct device *dev, return sprintf(buf, "%d\n", mem->phys_device); } +static ssize_t type_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct memory_block *mem = to_memory_block(dev); + ssize_t len = 0; + + switch (mem->state) { + case MEMORY_BLOCK_NORMAL: + len = sprintf(buf, "normal\n"); + break; + case MEMORY_BLOCK_STANDBY: + len = sprintf(buf, "standby\n"); + break; + case MEMORY_BLOCK_PARAVIRT: + len = sprintf(buf, "paravirt\n"); + break; + default: + len = sprintf(buf, "ERROR-UNKNOWN-%ld\n", + mem->state); + WARN_ON(1); + break; + } + + return len; +} + #ifdef CONFIG_MEMORY_HOTREMOVE static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn, unsigned long nr_pages, int online_type, @@ -442,6 +468,7 @@ static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL); static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state); static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL); static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL); +static DEVICE_ATTR_RO(type); /* * Block size attribute stuff @@ -514,7 +541,8 @@ memory_probe_store(struct device *dev, struct device_attribute *attr, nid = memory_add_physaddr_to_nid(phys_addr); ret = __add_memory(nid, phys_addr, - MIN_MEMORY_BLOCK_SIZE * sections_per_block); + MIN_MEMORY_BLOCK_SIZE * sections_per_block, + MEMORY_BLOCK_NORMAL); if (ret) goto out; @@ -620,6 +648,7 @@ static struct attribute *memory_memblk_attrs[] = { &dev_attr_state.attr, &dev_attr_phys_device.attr, &dev_attr_removable.attr, + &dev_attr_type.attr, #ifdef CONFIG_MEMORY_HOTREMOVE &dev_attr_valid_zones.attr, #endif @@ -657,13 +686,17 @@ int register_memory(struct memory_block *memory) } static int init_memory_block(struct memory_block **memory, - struct mem_section *section, unsigned long state) + struct mem_section *section, unsigned long state, + int memory_block_type) { struct memory_block *mem; unsigned long start_pfn; int scn_nr; int ret = 0; + if (memory_block_type == MEMORY_BLOCK_NONE) + return -EINVAL; + mem = kzalloc(sizeof(*mem), GFP_KERNEL); if (!mem) return -ENOMEM; @@ -675,6 +708,7 @@ static int init_memory_block(struct memory_block **memory, mem->state = state; start_pfn = section_nr_to_pfn(mem->start_section_nr); mem->phys_device = arch_get_memory_phys_device(start_pfn); + mem->type = memory_block_type; ret = register_memory(mem); @@ -699,7 +733,8 @@ static int add_memory_block(int base_section_nr) if (section_count == 0) return 0; - ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE); + ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE, + MEMORY_BLOCK_NORMAL); if (ret) return ret; mem->section_count = section_count; @@ -710,19 +745,35 @@ static int add_memory_block(int base_section_nr) * need an interface for the VM to add new memory regions, * but without onlining it. */ -int hotplug_memory_register(int nid, struct mem_section *section) +int hotplug_memory_register(int nid, struct mem_section *section, + int memory_block_type) { int ret = 0; struct memory_block *mem; mutex_lock(&mem_sysfs_mutex); + /* make sure there is no memblock if we don't want one */ + if (memory_block_type == MEMORY_BLOCK_NONE) { + mem = find_memory_block(section); + if (mem) { + put_device(&mem->dev); + ret = -EINVAL; + } + goto out; + } + mem = find_memory_block(section); if (mem) { - mem->section_count++; + /* make sure the type matches */ + if (mem->type == memory_block_type) + mem->section_count++; + else + ret = -EINVAL; put_device(&mem->dev); } else { - ret = init_memory_block(&mem, section, MEM_OFFLINE); + ret = init_memory_block(&mem, section, MEM_OFFLINE, + memory_block_type); if (ret) goto out; mem->section_count++; diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index b1b788082793..5a8d18c4d699 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -537,11 +537,6 @@ struct hv_dynmem_device { */ bool host_specified_ha_region; - /* - * State to synchronize hot-add. - */ - struct completion ol_waitevent; - bool ha_waiting; /* * This thread handles hot-add * requests from the host as well as notifying @@ -640,14 +635,6 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val, unsigned long flags, pfn_count; switch (val) { - case MEM_ONLINE: - case MEM_CANCEL_ONLINE: - if (dm_device.ha_waiting) { - dm_device.ha_waiting = false; - complete(&dm_device.ol_waitevent); - } - break; - case MEM_OFFLINE: spin_lock_irqsave(&dm_device.ha_lock, flags); pfn_count = hv_page_offline_check(mem->start_pfn, @@ -665,9 +652,7 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val, } spin_unlock_irqrestore(&dm_device.ha_lock, flags); break; - case MEM_GOING_ONLINE: - case MEM_GOING_OFFLINE: - case MEM_CANCEL_OFFLINE: + default: break; } return NOTIFY_OK; @@ -731,12 +716,10 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size, has->covered_end_pfn += processed_pfn; spin_unlock_irqrestore(&dm_device.ha_lock, flags); - init_completion(&dm_device.ol_waitevent); - dm_device.ha_waiting = !memhp_auto_online; - nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn)); ret = add_memory(nid, PFN_PHYS((start_pfn)), - (HA_CHUNK << PAGE_SHIFT)); + (HA_CHUNK << PAGE_SHIFT), + MEMORY_BLOCK_PARAVIRT); if (ret) { pr_err("hot_add memory failed error is %d\n", ret); @@ -757,16 +740,6 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size, break; } - /* - * Wait for the memory block to be onlined when memory onlining - * is done outside of kernel (memhp_auto_online). Since the hot - * add has succeeded, it is ok to proceed even if the pages in - * the hot added region have not been "onlined" within the - * allowed time. - */ - if (dm_device.ha_waiting) - wait_for_completion_timeout(&dm_device.ol_waitevent, - 5*HZ); post_status(&dm_device); } } diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c index d7686a68c093..1928a2411456 100644 --- a/drivers/s390/char/sclp_cmd.c +++ b/drivers/s390/char/sclp_cmd.c @@ -406,7 +406,8 @@ static void __init add_memory_merged(u16 rn) if (!size) goto skip_add; for (addr = start; addr < start + size; addr += block_size) - add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size); + add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size, + MEMORY_BLOCK_STANDBY); skip_add: first_rn = rn; num = 1; diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index fdfc64f5acea..291a8aac6af3 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -397,7 +397,7 @@ static enum bp_state reserve_additional_memory(void) mutex_unlock(&balloon_mutex); /* add_memory_resource() requires the device_hotplug lock */ lock_device_hotplug(); - rc = add_memory_resource(nid, resource, memhp_auto_online); + rc = add_memory_resource(nid, resource, MEMORY_BLOCK_PARAVIRT); unlock_device_hotplug(); mutex_lock(&balloon_mutex); diff --git a/include/linux/memory.h b/include/linux/memory.h index a6ddefc60517..3dc2a0b12653 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -23,6 +23,30 @@ #define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS) +/* + * NONE: No memory block is to be created (e.g. device memory). + * NORMAL: Memory block that represents normal (boot or hotplugged) memory + * (e.g. ACPI DIMMs) that should be onlined either automatically + * (memhp_auto_online) or manually by user space to select a + * specific zone. + * Applicable to memhp_auto_online. + * STANDBY: Memory block that represents standby memory that should only + * be onlined on demand by user space (e.g. standby memory on + * s390x), but never automatically by the kernel. + * Not applicable to memhp_auto_online. + * PARAVIRT: Memory block that represents memory added by + * paravirtualized mechanisms (e.g. hyper-v, xen) that will + * always automatically get onlined. Memory will be unplugged + * using ballooning, not by relying on the MOVABLE ZONE. + * Not applicable to memhp_auto_online. + */ +enum { + MEMORY_BLOCK_NONE, + MEMORY_BLOCK_NORMAL, + MEMORY_BLOCK_STANDBY, + MEMORY_BLOCK_PARAVIRT, +}; + struct memory_block { unsigned long start_section_nr; unsigned long end_section_nr; @@ -34,6 +58,7 @@ struct memory_block { int (*phys_callback)(struct memory_block *); struct device dev; int nid; /* NID for this memory block */ + int type; /* type of this memory block */ }; int arch_get_memory_phys_device(unsigned long start_pfn); @@ -111,7 +136,8 @@ extern int register_memory_notifier(struct notifier_block *nb); extern void unregister_memory_notifier(struct notifier_block *nb); extern int register_memory_isolate_notifier(struct notifier_block *nb); extern void unregister_memory_isolate_notifier(struct notifier_block *nb); -int hotplug_memory_register(int nid, struct mem_section *section); +int hotplug_memory_register(int nid, struct mem_section *section, + int memory_block_type); #ifdef CONFIG_MEMORY_HOTREMOVE extern int unregister_memory_section(struct mem_section *); #endif diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index ffd9cd10fcf3..b560a9ee0e8c 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -115,18 +115,18 @@ extern int __remove_pages(struct zone *zone, unsigned long start_pfn, /* reasonably generic interface to expand the physical pages */ extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, - struct vmem_altmap *altmap, bool want_memblock); + struct vmem_altmap *altmap, int memory_block_type); #ifndef CONFIG_ARCH_HAS_ADD_PAGES static inline int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap, - bool want_memblock) + int memory_block_type) { - return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + return __add_pages(nid, start_pfn, nr_pages, altmap, memory_block_type); } #else /* ARCH_HAS_ADD_PAGES */ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, - struct vmem_altmap *altmap, bool want_memblock); + struct vmem_altmap *altmap, int memory_block_type); #endif /* ARCH_HAS_ADD_PAGES */ #ifdef CONFIG_NUMA @@ -324,11 +324,12 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {} extern void __ref free_area_init_core_hotplug(int nid); extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn, void *arg, int (*func)(struct memory_block *, void *)); -extern int __add_memory(int nid, u64 start, u64 size); -extern int add_memory(int nid, u64 start, u64 size); -extern int add_memory_resource(int nid, struct resource *resource, bool online); +extern int __add_memory(int nid, u64 start, u64 size, int memory_block_type); +extern int add_memory(int nid, u64 start, u64 size, int memory_block_type); +extern int add_memory_resource(int nid, struct resource *resource, + int memory_block_type); extern int arch_add_memory(int nid, u64 start, u64 size, - struct vmem_altmap *altmap, bool want_memblock); + struct vmem_altmap *altmap, int memory_block_type); extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap); extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); diff --git a/mm/hmm.c b/mm/hmm.c index c968e49f7a0c..2350f6f6ab42 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -32,6 +32,7 @@ #include #include #include +#include #define PA_SECTION_SIZE (1UL << PA_SECTION_SHIFT) @@ -1096,10 +1097,11 @@ static int hmm_devmem_pages_create(struct hmm_devmem *devmem) */ if (devmem->pagemap.type == MEMORY_DEVICE_PUBLIC) ret = arch_add_memory(nid, align_start, align_size, NULL, - false); + MEMORY_BLOCK_NONE); else ret = add_pages(nid, align_start >> PAGE_SHIFT, - align_size >> PAGE_SHIFT, NULL, false); + align_size >> PAGE_SHIFT, NULL, + MEMORY_BLOCK_NONE); if (ret) { mem_hotplug_done(); goto error_add_memory; diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index d4c7e42e46f3..bce6c41d721c 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -246,7 +246,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat) #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */ static int __meminit __add_section(int nid, unsigned long phys_start_pfn, - struct vmem_altmap *altmap, bool want_memblock) + struct vmem_altmap *altmap, int memory_block_type) { int ret; @@ -257,10 +257,11 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn, if (ret < 0) return ret; - if (!want_memblock) + if (memory_block_type == MEMBLOCK_NONE) return 0; - return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn)); + return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn), + memory_block_type); } /* @@ -271,7 +272,7 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn, */ int __ref __add_pages(int nid, unsigned long phys_start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap, - bool want_memblock) + int memory_block_type) { unsigned long i; int err = 0; @@ -296,7 +297,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn, for (i = start_sec; i <= end_sec; i++) { err = __add_section(nid, section_nr_to_pfn(i), altmap, - want_memblock); + memory_block_type); /* * EEXIST is finally dealt with by ioresource collision @@ -1099,7 +1100,8 @@ static int online_memory_block(struct memory_block *mem, void *arg) * * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ -int __ref add_memory_resource(int nid, struct resource *res, bool online) +int __ref add_memory_resource(int nid, struct resource *res, + int memory_block_type) { u64 start, size; bool new_node = false; @@ -1108,6 +1110,9 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online) start = res->start; size = resource_size(res); + if (memory_block_type == MEMORY_BLOCK_NONE) + return -EINVAL; + ret = check_hotplug_memory_range(start, size); if (ret) return ret; @@ -1128,7 +1133,7 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online) new_node = ret; /* call arch's memory hotadd */ - ret = arch_add_memory(nid, start, size, NULL, true); + ret = arch_add_memory(nid, start, size, NULL, memory_block_type); if (ret < 0) goto error; @@ -1153,8 +1158,8 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online) /* device_online() will take the lock when calling online_pages() */ mem_hotplug_done(); - /* online pages if requested */ - if (online) + if (memory_block_type == MEMORY_BLOCK_PARAVIRT || + (memory_block_type == MEMORY_BLOCK_NORMAL && memhp_auto_online)) walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL, online_memory_block); @@ -1169,7 +1174,7 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online) } /* requires device_hotplug_lock, see add_memory_resource() */ -int __ref __add_memory(int nid, u64 start, u64 size) +int __ref __add_memory(int nid, u64 start, u64 size, int memory_block_type) { struct resource *res; int ret; @@ -1178,18 +1183,18 @@ int __ref __add_memory(int nid, u64 start, u64 size) if (IS_ERR(res)) return PTR_ERR(res); - ret = add_memory_resource(nid, res, memhp_auto_online); + ret = add_memory_resource(nid, res, memory_block_type); if (ret < 0) release_memory_resource(res); return ret; } -int add_memory(int nid, u64 start, u64 size) +int add_memory(int nid, u64 start, u64 size, int memory_block_type) { int rc; lock_device_hotplug(); - rc = __add_memory(nid, start, size); + rc = __add_memory(nid, start, size, memory_block_type); unlock_device_hotplug(); return rc;