From patchwork Fri Oct 16 02:45:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11840643 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C66D415E6 for ; Fri, 16 Oct 2020 02:45:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 706C1208E4 for ; Fri, 16 Oct 2020 02:45:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="MuznLLZ/" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 706C1208E4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8C11D94002B; Thu, 15 Oct 2020 22:45:25 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 86FD0940007; Thu, 15 Oct 2020 22:45:25 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 75EC394002B; Thu, 15 Oct 2020 22:45:25 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0033.hostedemail.com [216.40.44.33]) by kanga.kvack.org (Postfix) with ESMTP id 492D1940007 for ; Thu, 15 Oct 2020 22:45:25 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id AB81E181AEF00 for ; Fri, 16 Oct 2020 02:45:24 +0000 (UTC) X-FDA: 77376247368.22.grape53_2208a5027219 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id 817E718038E60 for ; Fri, 16 Oct 2020 02:45:24 +0000 (UTC) X-Spam-Summary: 1,0,0,f40e04b122d2dc7f,d41d8cd98f00b204,akpm@linux-foundation.org,,RULES_HIT:1:2:41:69:355:379:800:960:966:967:968:973:981:988:989:1260:1345:1431:1437:1605:1730:1747:1777:1792:2196:2199:2393:2525:2559:2565:2570:2682:2685:2693:2703:2859:2902:2933:2937:2939:2942:2945:2947:2951:2954:3022:3167:3865:3866:3867:3868:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4052:4250:4321:4385:4605:5007:6261:6630:7514:7576:7688:7875:7903:8603:8957:9010:9025:9545:10004:11658:12048,0,RBL:198.145.29.99:@linux-foundation.org:.lbl8.mailshell.net-64.100.201.201 62.2.0.100;04ygaqj4s88bi4oeptj4zxqdorwuhop3ifpnsnw3g9pogscwfwx36bbr97eyqyz.ioej75xfo5uyzqk9hhcrmwfm37m9bqy56tope54gbnm61o59x7yej1m6tb6ieyc.s-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: grape53_2208a5027219 X-Filterd-Recvd-Size: 11408 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Fri, 16 Oct 2020 02:45:23 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 2A3F320878; Fri, 16 Oct 2020 02:45:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1602816322; bh=6p3odBtEO/kPAfAlb2nZGcApF1Fira7pnCiac+FlSCE=; h=Date:From:To:Subject:In-Reply-To:From; b=MuznLLZ/xxC4htXDY3QHi6pB1akdtucuj2FTQyL/zgRTR0cZQ1YyjgBW5CCrD54Iv 6bTKZ6adwmlM8qf4Idx/ckMenJcmZ8ya8REQeLWyMNeiXP/T3+PYtvFIAbiETR4jh/ 5dhk2SKQLVTIBihEFyY5uyOfMu2KFZ2IElkj3nkE= Date: Thu, 15 Oct 2020 19:45:19 -0700 From: Andrew Morton To: akpm@linux-foundation.org, anton@ozlabs.org, ardb@kernel.org, benh@kernel.crashing.org, bhe@redhat.com, boris.ostrovsky@oracle.com, borntraeger@de.ibm.com, dan.j.williams@intel.com, dave.jiang@intel.com, david@redhat.com, ebiederm@xmission.com, gor@linux.ibm.com, gregkh@linuxfoundation.org, haiyangz@microsoft.com, hca@linux.ibm.com, jasowang@redhat.com, jgg@ziepe.ca, jgross@suse.com, julien@xen.org, keescook@chromium.org, kernelfans@gmail.com, kys@microsoft.com, lenb@kernel.org, leobras.c@gmail.com, linux-mm@kvack.org, lpechacek@suse.cz, mhocko@suse.com, mm-commits@vger.kernel.org, mpe@ellerman.id.au, mst@redhat.com, natechancellor@gmail.com, nathanl@linux.ibm.com, oohall@gmail.com, pankaj.gupta.linux@gmail.com, paulus@samba.org, richardw.yang@linux.intel.com, rjw@rjwysocki.net, roger.pau@citrix.com, sstabellini@kernel.org, sthemmin@microsoft.com, tglx@linutronix.de, torvalds@linux-foundation.org, vishal.l.verma@intel.com, wei.liu@kernel.org Subject: [patch 069/156] kernel/resource: make release_mem_region_adjustable() never fail Message-ID: <20201016024519.Va_XHnJuz%akpm@linux-foundation.org> In-Reply-To: <20201015192732.f448da14e9854c7cb7299956@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Hildenbrand Subject: kernel/resource: make release_mem_region_adjustable() never fail Patch series "selective merging of system ram resources", v4. Some add_memory*() users add memory in small, contiguous memory blocks. Examples include virtio-mem, hyper-v balloon, and the XEN balloon. This can quickly result in a lot of memory resources, whereby the actual resource boundaries are not of interest (e.g., it might be relevant for DIMMs, exposed via /proc/iomem to user space). We really want to merge added resources in this scenario where possible. Resources are effectively stored in a list-based tree. Having a lot of resources not only wastes memory, it also makes traversing that tree more expensive, and makes /proc/iomem explode in size (e.g., requiring kexec-tools to manually merge resources when creating a kdump header. The current kexec-tools resource count limit does not allow for more than ~100GB of memory with a memory block size of 128MB on x86-64). Let's allow to selectively merge system ram resources by specifying a new flag for add_memory*(). Patch #5 contains a /proc/iomem example. Only tested with virtio-mem. This patch (of 8): Let's make sure splitting a resource on memory hotunplug will never fail. This will become more relevant once we merge selected System RAM resources - then, we'll trigger that case more often on memory hotunplug. In general, this function is already unlikely to fail. When we remove memory, we free up quite a lot of metadata (memmap, page tables, memory block device, etc.). The only reason it could really fail would be when injecting allocation errors. All other error cases inside release_mem_region_adjustable() seem to be sanity checks if the function would be abused in different context - let's add WARN_ON_ONCE() in these cases so we can catch them. [natechancellor@gmail.com: fix use of ternary condition in release_mem_region_adjustable] Link: https://lkml.kernel.org/r/20200922060748.2452056-1-natechancellor@gmail.com Link: https://github.com/ClangBuiltLinux/linux/issues/1159 Link: https://lkml.kernel.org/r/20200911103459.10306-2-david@redhat.com Signed-off-by: David Hildenbrand Signed-off-by: Nathan Chancellor Cc: Michal Hocko Cc: Dan Williams Cc: Jason Gunthorpe Cc: Kees Cook Cc: Ard Biesheuvel Cc: Pankaj Gupta Cc: Baoquan He Cc: Wei Yang Cc: Anton Blanchard Cc: Benjamin Herrenschmidt Cc: Boris Ostrovsky Cc: Christian Borntraeger Cc: Dave Jiang Cc: Eric Biederman Cc: Greg Kroah-Hartman Cc: Haiyang Zhang Cc: Heiko Carstens Cc: Jason Wang Cc: Juergen Gross Cc: Julien Grall Cc: "K. Y. Srinivasan" Cc: Len Brown Cc: Leonardo Bras Cc: Libor Pechacek Cc: Michael Ellerman Cc: "Michael S. Tsirkin" Cc: Nathan Lynch Cc: "Oliver O'Halloran" Cc: Paul Mackerras Cc: Pingfan Liu Cc: "Rafael J. Wysocki" Cc: Roger Pau Monn Cc: Stefano Stabellini Cc: Stephen Hemminger Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vishal Verma Cc: Wei Liu Signed-off-by: Andrew Morton --- include/linux/ioport.h | 4 +-- kernel/resource.c | 49 ++++++++++++++++++++++----------------- mm/memory_hotplug.c | 22 ----------------- 3 files changed, 31 insertions(+), 44 deletions(-) --- a/include/linux/ioport.h~kernel-resource-make-release_mem_region_adjustable-never-fail +++ a/include/linux/ioport.h @@ -248,8 +248,8 @@ extern struct resource * __request_regio extern void __release_region(struct resource *, resource_size_t, resource_size_t); #ifdef CONFIG_MEMORY_HOTREMOVE -extern int release_mem_region_adjustable(struct resource *, resource_size_t, - resource_size_t); +extern void release_mem_region_adjustable(struct resource *, resource_size_t, + resource_size_t); #endif /* Wrappers for managed devices */ --- a/kernel/resource.c~kernel-resource-make-release_mem_region_adjustable-never-fail +++ a/kernel/resource.c @@ -1258,21 +1258,28 @@ EXPORT_SYMBOL(__release_region); * assumes that all children remain in the lower address entry for * simplicity. Enhance this logic when necessary. */ -int release_mem_region_adjustable(struct resource *parent, - resource_size_t start, resource_size_t size) +void release_mem_region_adjustable(struct resource *parent, + resource_size_t start, resource_size_t size) { + struct resource *new_res = NULL; + bool alloc_nofail = false; struct resource **p; struct resource *res; - struct resource *new_res; resource_size_t end; - int ret = -EINVAL; end = start + size - 1; - if ((start < parent->start) || (end > parent->end)) - return ret; + if (WARN_ON_ONCE((start < parent->start) || (end > parent->end))) + return; - /* The alloc_resource() result gets checked later */ - new_res = alloc_resource(GFP_KERNEL); + /* + * We free up quite a lot of memory on memory hotunplug (esp., memap), + * just before releasing the region. This is highly unlikely to + * fail - let's play save and make it never fail as the caller cannot + * perform any error handling (e.g., trying to re-add memory will fail + * similarly). + */ +retry: + new_res = alloc_resource(GFP_KERNEL | (alloc_nofail ? __GFP_NOFAIL : 0)); p = &parent->child; write_lock(&resource_lock); @@ -1298,7 +1305,6 @@ int release_mem_region_adjustable(struct * so if we are dealing with them, let us just back off here. */ if (!(res->flags & IORESOURCE_SYSRAM)) { - ret = 0; break; } @@ -1315,20 +1321,23 @@ int release_mem_region_adjustable(struct /* free the whole entry */ *p = res->sibling; free_resource(res); - ret = 0; } else if (res->start == start && res->end != end) { /* adjust the start */ - ret = __adjust_resource(res, end + 1, - res->end - end); + WARN_ON_ONCE(__adjust_resource(res, end + 1, + res->end - end)); } else if (res->start != start && res->end == end) { /* adjust the end */ - ret = __adjust_resource(res, res->start, - start - res->start); + WARN_ON_ONCE(__adjust_resource(res, res->start, + start - res->start)); } else { - /* split into two entries */ + /* split into two entries - we need a new resource */ if (!new_res) { - ret = -ENOMEM; - break; + new_res = alloc_resource(GFP_ATOMIC); + if (!new_res) { + alloc_nofail = true; + write_unlock(&resource_lock); + goto retry; + } } new_res->name = res->name; new_res->start = end + 1; @@ -1339,9 +1348,8 @@ int release_mem_region_adjustable(struct new_res->sibling = res->sibling; new_res->child = NULL; - ret = __adjust_resource(res, res->start, - start - res->start); - if (ret) + if (WARN_ON_ONCE(__adjust_resource(res, res->start, + start - res->start))) break; res->sibling = new_res; new_res = NULL; @@ -1352,7 +1360,6 @@ int release_mem_region_adjustable(struct write_unlock(&resource_lock); free_resource(new_res); - return ret; } #endif /* CONFIG_MEMORY_HOTREMOVE */ --- a/mm/memory_hotplug.c~kernel-resource-make-release_mem_region_adjustable-never-fail +++ a/mm/memory_hotplug.c @@ -1727,26 +1727,6 @@ void try_offline_node(int nid) } EXPORT_SYMBOL(try_offline_node); -static void __release_memory_resource(resource_size_t start, - resource_size_t size) -{ - int ret; - - /* - * When removing memory in the same granularity as it was added, - * this function never fails. It might only fail if resources - * have to be adjusted or split. We'll ignore the error, as - * removing of memory cannot fail. - */ - ret = release_mem_region_adjustable(&iomem_resource, start, size); - if (ret) { - resource_size_t endres = start + size - 1; - - pr_warn("Unable to release resource <%pa-%pa> (%d)\n", - &start, &endres, ret); - } -} - static int __ref try_remove_memory(int nid, u64 start, u64 size) { int rc = 0; @@ -1780,7 +1760,7 @@ static int __ref try_remove_memory(int n memblock_remove(start, size); } - __release_memory_resource(start, size); + release_mem_region_adjustable(&iomem_resource, start, size); try_offline_node(nid);