From patchwork Thu Mar 28 13:43:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 10875019 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 025B8922 for ; Thu, 28 Mar 2019 13:44:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DC3E128B5F for ; Thu, 28 Mar 2019 13:44:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CE91228B7C; Thu, 28 Mar 2019 13:44:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 30F5C28B5F for ; Thu, 28 Mar 2019 13:44:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A9E5E6B0006; Thu, 28 Mar 2019 09:44:04 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A4DB26B0007; Thu, 28 Mar 2019 09:44:04 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C85B6B0008; Thu, 28 Mar 2019 09:44:04 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by kanga.kvack.org (Postfix) with ESMTP id 3C2F06B0006 for ; Thu, 28 Mar 2019 09:44:04 -0400 (EDT) Received: by mail-ed1-f72.google.com with SMTP id f11so4868753edq.18 for ; Thu, 28 Mar 2019 06:44:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=XODt1qR1GPIXDsOImWFqchXZf32Hc7y493iTY2DeLnY=; b=iPYfh222EV1NWNqG4gvRfmAK+Od9nXW2VB7mIdSn1wEVdBEvok6JGxmJUcD8WucdsW GfCm2Rof+va6v8UrFI+TyCxinqYky2JjaUNz1LFNq9ghgh5q4yOVI7yKoTzz49433pq3 0ON+DJquG5XClrfxlGOGilORMq360Y34IplkNOZQJaznbjcv0lAS8u+neNJD25o0J1yA pokq4NBXBSgtNsEvpENNuTXe/g9mUCJVZxDIGumTOh0Av9daZtL6xPO0o/oBSLPb6KOl syDkXXI3zOkSOA8TJTnldNjFeBNmqQiJhUHVVh4LSENzNuKiZf2WPhb1ntHPDH/dnSZx CshA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Gm-Message-State: APjAAAXVpiED+MMRh4kGQO6VLHQRkkNkZ5CUV1dLO4NcCS2E13mOb4MS AMEw9O3t4u6qdMkuoTAsciewrmLV4XUUaKHb1/LAlenhE7IMSsvlE7qNYnu66IFGXpWRHMw6Fn9 j88Tmx4LUlSaAhqvP/yHlbkeND1QH1h8xg0rL/YNh9RPlmWUFDRMxxxLNMzEK6Chwjg== X-Received: by 2002:aa7:c6cf:: with SMTP id b15mr28056734eds.46.1553780643713; Thu, 28 Mar 2019 06:44:03 -0700 (PDT) X-Google-Smtp-Source: APXvYqwruMQbHe/0h3n4SKb1mhLy6Uyv/o1lLT3/s8ly+FW/czXdAWQMgv8IiG5gdM2Hwm3sdRM4 X-Received: by 2002:aa7:c6cf:: with SMTP id b15mr28056678eds.46.1553780642709; Thu, 28 Mar 2019 06:44:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553780642; cv=none; d=google.com; s=arc-20160816; b=snYwMvucaTgDPDAsQOoDnoLmZKVNN0HHwSZsQz1L8cUf6Ze11XodDGb3jVQhZs+Wo1 JPEbY8po1xKkMf/JwxxC3lmslEYwc1S3Bs4OrmxnMVWSRe1tNpPUTHdq9kg+5yDLTZ2g MUPqp0X1LmjZloz43pSOtH1Fp+YKeymq9u1tdwyvC+yE8i4tuuzlGbpspCx0IO4GQnSz y4SGT+ZJz6WgQUb20Regn/ichv79sIlQXVBVQ9mzI9HdAZ54xzhOr3XHZk3ZTIzUye9d SkXlEqb69MYEh0qzZr/BHe5nzVUhzySZP/l8XSt5dwXXOHDYOQVhQF30vU1Uyw0qFu87 UNvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=XODt1qR1GPIXDsOImWFqchXZf32Hc7y493iTY2DeLnY=; b=XqUWlDpCPZpIzkgXnP5H5VkfIsiT/rH4tzr2XzLXs7uspg4AEt1mqGaq2DeRGkWWjU TVK/wsgivrtOYULRQ1GXg6aH/IFer4BCzTCvG+YlSP4SmSXvRGmdyP6oOijI+pJqPHj6 t9O7Oy+NiFSbWIdV/iHVm3Eh003hkTXkVMyWPovTFIdKn+eI2YUxUXDU9ioSSueSVhZc Bi+P3iLfiWLb1a9DEtknQHaQlqlO/R30tzrkhIzbccp5U7yqlsx0xLwI3ImdFIn4clKO 4ejwRvZ+lpoauy8L8PZ7gG40mG0r22lufxhYDixApxuFLG3SSAgoNzXDprzpwNt+ZUx4 JRoA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id h26si227035ede.241.2019.03.28.06.44.02 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 Mar 2019 06:44:02 -0700 (PDT) Received-SPF: pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Thu, 28 Mar 2019 14:44:01 +0100 Received: from d104.suse.de (nwb-a10-snat.microfocus.com [10.120.13.202]) by emea4-mta.ukb.novell.com with ESMTP (NOT encrypted); Thu, 28 Mar 2019 13:43:30 +0000 From: Oscar Salvador To: akpm@linux-foundation.org Cc: mhocko@suse.com, david@redhat.com, dan.j.williams@intel.com, Jonathan.Cameron@huawei.com, anshuman.khandual@arm.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Oscar Salvador Subject: [PATCH 1/4] mm, memory_hotplug: cleanup memory offline path Date: Thu, 28 Mar 2019 14:43:17 +0100 Message-Id: <20190328134320.13232-2-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20190328134320.13232-1-osalvador@suse.de> References: <20190328134320.13232-1-osalvador@suse.de> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Michal Hocko check_pages_isolated_cb currently accounts the whole pfn range as being offlined if test_pages_isolated suceeds on the range. This is based on the assumption that all pages in the range are freed which is currently the case in most cases but it won't be with later changes, as pages marked as vmemmap won't be isolated. Move the offlined pages counting to offline_isolated_pages_cb and rely on __offline_isolated_pages to return the correct value. check_pages_isolated_cb will still do it's primary job and check the pfn range. While we are at it remove check_pages_isolated and offline_isolated_pages and use directly walk_system_ram_range as do in online_pages. Signed-off-by: Michal Hocko Signed-off-by: Oscar Salvador --- include/linux/memory_hotplug.h | 2 +- mm/memory_hotplug.c | 45 +++++++++++------------------------------- mm/page_alloc.c | 11 +++++++++-- 3 files changed, 21 insertions(+), 37 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 8ade08c50d26..42ba7199f701 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -87,7 +87,7 @@ extern int add_one_highpage(struct page *page, int pfn, int bad_ppro); extern int online_pages(unsigned long, unsigned long, int); extern int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn, unsigned long *valid_start, unsigned long *valid_end); -extern void __offline_isolated_pages(unsigned long, unsigned long); +extern unsigned long __offline_isolated_pages(unsigned long, unsigned long); typedef void (*online_page_callback_t)(struct page *page, unsigned int order); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 0082d699be94..5139b3bfd8b0 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1453,17 +1453,12 @@ static int offline_isolated_pages_cb(unsigned long start, unsigned long nr_pages, void *data) { - __offline_isolated_pages(start, start + nr_pages); + unsigned long offlined_pages; + offlined_pages = __offline_isolated_pages(start, start + nr_pages); + *(unsigned long *)data += offlined_pages; return 0; } -static void -offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) -{ - walk_system_ram_range(start_pfn, end_pfn - start_pfn, NULL, - offline_isolated_pages_cb); -} - /* * Check all pages in range, recoreded as memory resource, are isolated. */ @@ -1471,26 +1466,7 @@ static int check_pages_isolated_cb(unsigned long start_pfn, unsigned long nr_pages, void *data) { - int ret; - long offlined = *(long *)data; - ret = test_pages_isolated(start_pfn, start_pfn + nr_pages, true); - offlined = nr_pages; - if (!ret) - *(long *)data += offlined; - return ret; -} - -static long -check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn) -{ - long offlined = 0; - int ret; - - ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, &offlined, - check_pages_isolated_cb); - if (ret < 0) - offlined = (long)ret; - return offlined; + return test_pages_isolated(start_pfn, start_pfn + nr_pages, true); } static int __init cmdline_parse_movable_node(char *p) @@ -1575,7 +1551,7 @@ static int __ref __offline_pages(unsigned long start_pfn, unsigned long end_pfn) { unsigned long pfn, nr_pages; - long offlined_pages; + unsigned long offlined_pages = 0; int ret, node, nr_isolate_pageblock; unsigned long flags; unsigned long valid_start, valid_end; @@ -1651,14 +1627,15 @@ static int __ref __offline_pages(unsigned long start_pfn, goto failed_removal_isolated; } /* check again */ - offlined_pages = check_pages_isolated(start_pfn, end_pfn); - } while (offlined_pages < 0); + ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, NULL, + check_pages_isolated_cb); + } while (ret); - pr_info("Offlined Pages %ld\n", offlined_pages); /* Ok, all of our target is isolated. We cannot do rollback at this point. */ - offline_isolated_pages(start_pfn, end_pfn); - + walk_system_ram_range(start_pfn, end_pfn - start_pfn, &offlined_pages, + offline_isolated_pages_cb); + pr_info("Offlined Pages %ld\n", offlined_pages); /* * Onlining will reset pagetype flags and makes migrate type * MOVABLE, so just need to decrease the number of isolated diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d96ca5bc555b..d128f53888b8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8374,7 +8374,7 @@ void zone_pcp_reset(struct zone *zone) * All pages in the range must be in a single zone and isolated * before calling this. */ -void +unsigned long __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) { struct page *page; @@ -8382,12 +8382,15 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) unsigned int order, i; unsigned long pfn; unsigned long flags; + unsigned long offlined_pages = 0; + /* find the first valid pfn */ for (pfn = start_pfn; pfn < end_pfn; pfn++) if (pfn_valid(pfn)) break; if (pfn == end_pfn) - return; + return offlined_pages; + offline_mem_sections(pfn, end_pfn); zone = page_zone(pfn_to_page(pfn)); spin_lock_irqsave(&zone->lock, flags); @@ -8405,12 +8408,14 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) if (unlikely(!PageBuddy(page) && PageHWPoison(page))) { pfn++; SetPageReserved(page); + offlined_pages++; continue; } BUG_ON(page_count(page)); BUG_ON(!PageBuddy(page)); order = page_order(page); + offlined_pages += 1 << order; #ifdef CONFIG_DEBUG_VM pr_info("remove from free list %lx %d %lx\n", pfn, 1 << order, end_pfn); @@ -8423,6 +8428,8 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) pfn += (1 << order); } spin_unlock_irqrestore(&zone->lock, flags); + + return offlined_pages; } #endif From patchwork Thu Mar 28 13:43:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 10875021 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DF74F1390 for ; Thu, 28 Mar 2019 13:44:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C067528B5F for ; Thu, 28 Mar 2019 13:44:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B196628B7C; Thu, 28 Mar 2019 13:44:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5E08528B5F for ; Thu, 28 Mar 2019 13:44:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E6E166B0007; Thu, 28 Mar 2019 09:44:05 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E1E5A6B0008; Thu, 28 Mar 2019 09:44:05 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CBE426B000A; Thu, 28 Mar 2019 09:44:05 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by kanga.kvack.org (Postfix) with ESMTP id 711516B0007 for ; Thu, 28 Mar 2019 09:44:05 -0400 (EDT) Received: by mail-ed1-f69.google.com with SMTP id z98so8178629ede.3 for ; Thu, 28 Mar 2019 06:44:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=dakEYHYgkKR75qewIBTuxounAHVO6QqvHJt24fDIZmE=; b=K9jGcFSrBj84vSoL1p0QNHIkzZYVCl3+5r6VXACfGD+O/jnCfqOfB+NVENSEFYOaR+ reuJIhFpFekBkExDPPQtu6Fd2kYeceRA0Ta3OtX5PVe5H48Me9AKsOCv6nwFWbMA1f+x KmG/cGpOq7idDmEt+hIwmfNewWKBqqM3tvxqi36eeOkBpjRjAt7IvubB512KLMJqRLfz ut9SYI6c+neV+Uo+o1un9+DNd81A7Y+pr/LD+F9Nc42xlJtWSoX7wf8dO3rAEldvpXkA ys6E9njyjLZknAnvf5KZQIwPDkRGaJ7cMlXhX3E3x2tqqXB/yfDETeKNktjp6nBU/9WA +oTQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Gm-Message-State: APjAAAWjZXA2vyXsvEAU9BSuJgkM9pYy1CzlIG41/RVZPmlI+QZ7ZzPp IRKbfqzsVxDFChqr76Qe8wekL3njyzbgsWZHgkxwkevXhIcBoXDa3ScqmNXmaQtKMIcvsN8dgtB mJcON3VjEZz8ENapbhiP7YiZuzsPQrYRKh88BNShWG6YoZnC0Bvi1fGwNY2HqFvYnaQ== X-Received: by 2002:a17:906:a12:: with SMTP id w18mr24347926ejf.70.1553780644906; Thu, 28 Mar 2019 06:44:04 -0700 (PDT) X-Google-Smtp-Source: APXvYqzht6xCrSZDA5ZkzQDj70Jsgk1CXV+8YyS55HjSl0Fg5eLWaQGdxKtrJLUYwG+BLh9SmGFn X-Received: by 2002:a17:906:a12:: with SMTP id w18mr24347867ejf.70.1553780643594; Thu, 28 Mar 2019 06:44:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553780643; cv=none; d=google.com; s=arc-20160816; b=yIgRPo8TRCYCX9zPZR/7+tkSayFcnn78wbsN9gqcKbMrZx02NKjqKtBw9hW5eVAuJ7 vM8YFbLkOmyAz/TBViKDmqP4FGwrDiLnPxmFHLEvaCV258dVtuu7R4ADLI8NZsNvPPqQ DrpN/vntn7p/WQUyxzAusMi3G1XEd+mJwX6TG5ffgbhqoYNLiXR38Sxcn3f02BC2zcvD ju3w/yspmFQR248CmWdYGHhvF+9JbHeUHva9uj33+AZL+6ac1G4W03jKI9+BRQ+bYI75 Hh2WvVmuyNDg6OMrJCc30eJTV9QoG5r9wUbjmUDX8bsUfS+FUkqBDhH93Ghb0Co7TRsD HcyQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=dakEYHYgkKR75qewIBTuxounAHVO6QqvHJt24fDIZmE=; b=DxHjBB9jez6yfl5D0Yzlg3ZoRIyJv5X9OzdRRnu8tS5hXd0a22skNnRX+gI3ngp4Vf zWQa0oDWiqDBXRHX8DdnNnjE5MlOlK8PQI8CQWfHrTTIRLgUavX7C6eveNG+XRRhZqqH j6+aRGZ2isVMpqQXEOYcAe4XwJ9FJmhT/Xrc/GDt+rjeFt6ZfPVCGLMtwJi05+egPnKf Jw7Bkz30cqofR0BRmvPrQL+E2m3If72IiW9u3PlsmtPh0Va+AK0Z1drGuAbCrTSliE2h ujE/Rcqh/14X9dB9h3Wj7JearZhXhNjqOHmSbxc2YtoUpsCejvfsH6JPmDMMRSXYrN2E eWZg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id h4si1672614edb.13.2019.03.28.06.44.03 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 Mar 2019 06:44:03 -0700 (PDT) Received-SPF: pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Thu, 28 Mar 2019 14:44:02 +0100 Received: from d104.suse.de (nwb-a10-snat.microfocus.com [10.120.13.202]) by emea4-mta.ukb.novell.com with ESMTP (NOT encrypted); Thu, 28 Mar 2019 13:43:31 +0000 From: Oscar Salvador To: akpm@linux-foundation.org Cc: mhocko@suse.com, david@redhat.com, dan.j.williams@intel.com, Jonathan.Cameron@huawei.com, anshuman.khandual@arm.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Oscar Salvador Subject: [PATCH 2/4] mm, memory_hotplug: provide a more generic restrictions for memory hotplug Date: Thu, 28 Mar 2019 14:43:18 +0100 Message-Id: <20190328134320.13232-3-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20190328134320.13232-1-osalvador@suse.de> References: <20190328134320.13232-1-osalvador@suse.de> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Michal Hocko arch_add_memory, __add_pages take a want_memblock which controls whether the newly added memory should get the sysfs memblock user API (e.g. ZONE_DEVICE users do not want/need this interface). Some callers even want to control where do we allocate the memmap from by configuring altmap. Add a more generic hotplug context for arch_add_memory and __add_pages. struct mhp_restrictions contains flags which contains additional features to be enabled by the memory hotplug (MHP_MEMBLOCK_API currently) and altmap for alternative memmap allocator. Please note that the complete altmap propagation down to vmemmap code is still not done in this patch. It will be done in the follow up to reduce the churn here. This patch shouldn't introduce any functional change. Signed-off-by: Michal Hocko Signed-off-by: Oscar Salvador --- arch/arm64/mm/mmu.c | 5 ++--- arch/ia64/mm/init.c | 5 ++--- arch/powerpc/mm/mem.c | 6 +++--- arch/s390/mm/init.c | 6 +++--- arch/sh/mm/init.c | 6 +++--- arch/x86/mm/init_32.c | 6 +++--- arch/x86/mm/init_64.c | 10 +++++----- include/linux/memory_hotplug.h | 29 +++++++++++++++++++++++------ kernel/memremap.c | 9 ++++++--- mm/memory_hotplug.c | 10 ++++++---- 10 files changed, 56 insertions(+), 36 deletions(-) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index e97f018ff740..8c0d5484b38c 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1046,8 +1046,7 @@ int p4d_free_pud_page(p4d_t *p4d, unsigned long addr) } #ifdef CONFIG_MEMORY_HOTPLUG -int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) +int arch_add_memory(int nid, u64 start, u64 size, struct mhp_restrictions *restrictions) { int flags = 0; @@ -1058,6 +1057,6 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, size, PAGE_KERNEL, pgd_pgtable_alloc, flags); return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT, - altmap, want_memblock); + restrictions); } #endif diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c index e49200e31750..7af16f5d5ca6 100644 --- a/arch/ia64/mm/init.c +++ b/arch/ia64/mm/init.c @@ -666,14 +666,13 @@ mem_init (void) } #ifdef CONFIG_MEMORY_HOTPLUG -int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) +int arch_add_memory(int nid, u64 start, u64 size, struct mhp_restrictions *restrictions) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; int ret; - ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + ret = __add_pages(nid, start_pfn, nr_pages, restrictions); if (ret) printk("%s: Problem encountered in __add_pages() as ret=%d\n", __func__, ret); diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index f6787f90e158..76bcc29fa3e1 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -109,8 +109,8 @@ int __weak remove_section_mapping(unsigned long start, unsigned long end) return -ENODEV; } -int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) +int __meminit arch_add_memory(int nid, u64 start, u64 size, + struct mhp_restrictions *restrictions) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; @@ -127,7 +127,7 @@ int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap * } flush_inval_dcache_range(start, start + size); - return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + return __add_pages(nid, start_pfn, nr_pages, restrictions); } #ifdef CONFIG_MEMORY_HOTREMOVE diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index 3e82f66d5c61..9ae71a82e9e1 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -224,8 +224,8 @@ device_initcall(s390_cma_mem_init); #endif /* CONFIG_CMA */ -int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) +int arch_add_memory(int nid, u64 start, u64 size, + struct mhp_restrictions *restrictions) { unsigned long start_pfn = PFN_DOWN(start); unsigned long size_pages = PFN_DOWN(size); @@ -235,7 +235,7 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, if (rc) return rc; - rc = __add_pages(nid, start_pfn, size_pages, altmap, want_memblock); + rc = __add_pages(nid, start_pfn, size_pages, restrictions); if (rc) vmem_remove_mapping(start, size); return rc; diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c index 70621324db41..32798bd4c32f 100644 --- a/arch/sh/mm/init.c +++ b/arch/sh/mm/init.c @@ -416,15 +416,15 @@ void free_initrd_mem(unsigned long start, unsigned long end) #endif #ifdef CONFIG_MEMORY_HOTPLUG -int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) +int arch_add_memory(int nid, u64 start, u64 size, + struct mhp_restrictions *restrictions) { unsigned long start_pfn = PFN_DOWN(start); unsigned long nr_pages = size >> PAGE_SHIFT; int ret; /* We only have ZONE_NORMAL, so this is easy.. */ - ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + ret = __add_pages(nid, start_pfn, nr_pages, restrictions); if (unlikely(ret)) printk("%s: Failed, __add_pages() == %d\n", __func__, ret); diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index 85c94f9a87f8..755dbed85531 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -850,13 +850,13 @@ void __init mem_init(void) } #ifdef CONFIG_MEMORY_HOTPLUG -int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) +int arch_add_memory(int nid, u64 start, u64 size, + struct mhp_restrictions *restrictions) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; - return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + return __add_pages(nid, start_pfn, nr_pages, restrictions); } #ifdef CONFIG_MEMORY_HOTREMOVE diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index bccff68e3267..db42c11b48fb 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -777,11 +777,11 @@ static void update_end_of_memory_vars(u64 start, u64 size) } int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, - struct vmem_altmap *altmap, bool want_memblock) + struct mhp_restrictions *restrictions) { int ret; - ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + ret = __add_pages(nid, start_pfn, nr_pages, restrictions); WARN_ON_ONCE(ret); /* update max_pfn, max_low_pfn and high_memory */ @@ -791,15 +791,15 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, return ret; } -int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, - bool want_memblock) +int arch_add_memory(int nid, u64 start, u64 size, + struct mhp_restrictions *restrictions) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; init_memory_mapping(start, start + size); - return add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + return add_pages(nid, start_pfn, nr_pages, restrictions); } #define PAGE_INUSE 0xFD diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 42ba7199f701..119a012d43b8 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -117,20 +117,37 @@ extern int __remove_pages(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap); #endif /* CONFIG_MEMORY_HOTREMOVE */ +/* + * Do we want sysfs memblock files created. This will allow userspace to online + * and offline memory explicitly. Lack of this bit means that the caller has to + * call move_pfn_range_to_zone to finish the initialization. + */ + +#define MHP_MEMBLOCK_API 1<<0 + +/* + * Restrictions for the memory hotplug: + * flags: MHP_ flags + * altmap: alternative allocator for memmap array + */ +struct mhp_restrictions { + unsigned long flags; + struct vmem_altmap *altmap; +}; + /* reasonably generic interface to expand the physical pages */ extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, - struct vmem_altmap *altmap, bool want_memblock); + struct mhp_restrictions *restrictions); #ifndef CONFIG_ARCH_HAS_ADD_PAGES static inline int add_pages(int nid, unsigned long start_pfn, - unsigned long nr_pages, struct vmem_altmap *altmap, - bool want_memblock) + unsigned long nr_pages, struct mhp_restrictions *restrictions) { - return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock); + return __add_pages(nid, start_pfn, nr_pages, restrictions); } #else /* ARCH_HAS_ADD_PAGES */ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, - struct vmem_altmap *altmap, bool want_memblock); + struct mhp_restrictions *restrictions); #endif /* ARCH_HAS_ADD_PAGES */ #ifdef CONFIG_NUMA @@ -332,7 +349,7 @@ extern int __add_memory(int nid, u64 start, u64 size); extern int add_memory(int nid, u64 start, u64 size); extern int add_memory_resource(int nid, struct resource *resource); extern int arch_add_memory(int nid, u64 start, u64 size, - struct vmem_altmap *altmap, bool want_memblock); + struct mhp_restrictions *restrictions); extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap); extern bool is_memblock_offlined(struct memory_block *mem); diff --git a/kernel/memremap.c b/kernel/memremap.c index a856cb5ff192..d42f11673979 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -149,6 +149,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) struct resource *res = &pgmap->res; struct dev_pagemap *conflict_pgmap; pgprot_t pgprot = PAGE_KERNEL; + struct mhp_restrictions restrictions = {}; int error, nid, is_ram; if (!pgmap->ref || !pgmap->kill) @@ -199,6 +200,9 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) if (error) goto err_pfn_remap; + /* We do not want any optional features only our own memmap */ + restrictions.altmap = altmap; + mem_hotplug_begin(); /* @@ -214,7 +218,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) */ if (pgmap->type == MEMORY_DEVICE_PRIVATE) { error = add_pages(nid, align_start >> PAGE_SHIFT, - align_size >> PAGE_SHIFT, NULL, false); + align_size >> PAGE_SHIFT, &restrictions); } else { error = kasan_add_zero_shadow(__va(align_start), align_size); if (error) { @@ -222,8 +226,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) goto err_kasan; } - error = arch_add_memory(nid, align_start, align_size, altmap, - false); + error = arch_add_memory(nid, align_start, align_size, &restrictions); } if (!error) { diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 5139b3bfd8b0..836cb026ed7b 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -273,12 +273,12 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn, * add the new pages. */ int __ref __add_pages(int nid, unsigned long phys_start_pfn, - unsigned long nr_pages, struct vmem_altmap *altmap, - bool want_memblock) + unsigned long nr_pages, struct mhp_restrictions *restrictions) { unsigned long i; int err = 0; int start_sec, end_sec; + struct vmem_altmap *altmap = restrictions->altmap; /* during initialize mem_map, align hot-added range to section */ start_sec = pfn_to_section_nr(phys_start_pfn); @@ -299,7 +299,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn, for (i = start_sec; i <= end_sec; i++) { err = __add_section(nid, section_nr_to_pfn(i), altmap, - want_memblock); + restrictions->flags & MHP_MEMBLOCK_API); /* * EEXIST is finally dealt with by ioresource collision @@ -1099,6 +1099,7 @@ int __ref add_memory_resource(int nid, struct resource *res) u64 start, size; bool new_node = false; int ret; + struct mhp_restrictions restrictions = {}; start = res->start; size = resource_size(res); @@ -1123,7 +1124,8 @@ int __ref add_memory_resource(int nid, struct resource *res) new_node = ret; /* call arch's memory hotadd */ - ret = arch_add_memory(nid, start, size, NULL, true); + restrictions.flags = MHP_MEMBLOCK_API; + ret = arch_add_memory(nid, start, size, &restrictions); if (ret < 0) goto error; From patchwork Thu Mar 28 13:43:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 10875025 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9EB3A922 for ; Thu, 28 Mar 2019 13:44:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7FF9A2876D for ; Thu, 28 Mar 2019 13:44:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7350E287C6; Thu, 28 Mar 2019 13:44:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 863C92876D for ; Thu, 28 Mar 2019 13:44:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C70876B000A; Thu, 28 Mar 2019 09:44:08 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id BD31D6B000C; Thu, 28 Mar 2019 09:44:08 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A4F876B000D; Thu, 28 Mar 2019 09:44:08 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by kanga.kvack.org (Postfix) with ESMTP id 2EC086B000C for ; Thu, 28 Mar 2019 09:44:08 -0400 (EDT) Received: by mail-ed1-f72.google.com with SMTP id h27so8082163eda.8 for ; Thu, 28 Mar 2019 06:44:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=ss+ZR6mD0hp+GaYNWPET5X3YtSVA9ogADcDBL+98yzQ=; b=X2ihnCDv/5+DPlq0wwSrUuZsUtsfb8zvm6nSPgMiJ2Z/q9ggzflzFTHQvnVvodOWYC 24Rxw7WVueQMSmyFmHdK/GPaPM9L45A5m6VR+TcUBhqtg41BWPaDE3klgy2UgDucr/AV /L/RW5vxL6ps/GOSwpUBwIVM/9wxrSEJgqIhWxW1ARLyijIOZjihIgA3NY76btGCqc6f DH8ffL43ONITB0uI54s2LJ/aMqafDZcBvPFouBadtagn6zy60/8hoG168JNkmYRsobGl dmJZWWSo29dN44TmArLIkVDz45YVs9ApEqx6KmvyLqrO+hLd/IB7B/OKZFj9v3SmuVbQ vcZg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Gm-Message-State: APjAAAWay4fHGA8wZwCoXWGXEIXqvTWPvMvo/0vSUEohMJl0uVK0gNMP pnDWchP9JyGDYUgdDwM7xUqmWwnuMIfsm6tCpskpdFh06TGLRM8fKcRB3CXbbEj0sXCwJYLWcK0 M12smZ/BE9fe4D6Df6WZVBVXJzVNK0DtvR+bRhEKi2KIplxPG4RL48qRMyVumJ8p9Cg== X-Received: by 2002:a17:906:892:: with SMTP id n18mr24510170eje.136.1553780647550; Thu, 28 Mar 2019 06:44:07 -0700 (PDT) X-Google-Smtp-Source: APXvYqwM2lYsb9jO30EwdZdlw47P14M3QzhlemAvUdOKGNvLT60Yh4KG45ra3Q464FlM3I8PO6FX X-Received: by 2002:a17:906:892:: with SMTP id n18mr24510050eje.136.1553780645034; Thu, 28 Mar 2019 06:44:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553780645; cv=none; d=google.com; s=arc-20160816; b=usCkShrpnv0TetSfls/nIrvOphGUKdVxqQDQcis+e5rqIKetO0TPxbKvOnoS5GZZHP TaCn+qPd1rZZE5H/C+rQDNPYCfWCjUMw0owaymFaHFLhVKCnODZ8SxqTt+OTcTHKmuER eDatEyMlOufNTnseNU3/j42beobfl/2OsWtCDkY6qQ+jt+29sMM2unYFfU80YoE9qIAz YxZqW5iICNurwqHTAgtevjSTz+IUeE8lSJguKy2wKvSu7fShbyG4IGvX45FnZnmZ50oe 5R1LlAHNs49HLlMOh/CFcMILXXRDPGRWlJPbxaedPEqFJKDfNU7pzyE6xXzkub38vR0w O/+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=ss+ZR6mD0hp+GaYNWPET5X3YtSVA9ogADcDBL+98yzQ=; b=OiSlZpB5OYq9DlgUnXnk3x9qo6dKu5SLdHhE/LrWyee1gA3Qt6fM86oA6VmTfM+J+l 6UHlKpA7+63km2XDSYemoD2ULBz774iPa0h622Iw21NEoBtpfmSAqzq3PmBXh/nMIdYr YJUJfxPE+7TTNIohO03Ruod4h35lx7Q890J0v+MHUFnaKAT27e5JKGwCTyeIZQbCKe+x 8X1cm4+E/+lKY1hdVs5jVHPU9VB4ZejVPFnVgUs4U58tOLSs5+x4Z5fMjLSvK3VCAdym 5NAJGjXTM6/IlndZtRdC2fqM2+o8C6/qLs1FIjxWryHJtGvZn/mC2V6wRSpj/P4PqtcY 3m2A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id g18si3222405edc.193.2019.03.28.06.44.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 Mar 2019 06:44:05 -0700 (PDT) Received-SPF: pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Thu, 28 Mar 2019 14:44:04 +0100 Received: from d104.suse.de (nwb-a10-snat.microfocus.com [10.120.13.202]) by emea4-mta.ukb.novell.com with ESMTP (NOT encrypted); Thu, 28 Mar 2019 13:43:31 +0000 From: Oscar Salvador To: akpm@linux-foundation.org Cc: mhocko@suse.com, david@redhat.com, dan.j.williams@intel.com, Jonathan.Cameron@huawei.com, anshuman.khandual@arm.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Oscar Salvador Subject: [PATCH 3/4] mm, memory_hotplug: allocate memmap from the added memory range for sparse-vmemmap Date: Thu, 28 Mar 2019 14:43:19 +0100 Message-Id: <20190328134320.13232-4-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20190328134320.13232-1-osalvador@suse.de> References: <20190328134320.13232-1-osalvador@suse.de> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Physical memory hotadd has to allocate a memmap (struct page array) for the newly added memory section. Currently, alloc_pages_node() is used for those allocations. This has some disadvantages: a) an existing memory is consumed for that purpose (~2MB per 128MB memory section on x86_64) b) if the whole node is movable then we have off-node struct pages which has performance drawbacks. a) has turned out to be a problem for memory hotplug based ballooning because the userspace might not react in time to online memory while the memory consumed during physical hotadd consumes enough memory to push system to OOM. 31bc3858ea3e ("memory-hotplug: add automatic onlining policy for the newly added memory") has been added to workaround that problem. I have also seen hot-add operations failing on powerpc due to the fact that we try to use order-8 pages. If the base page size is 64KB, this gives us 16MB, and if we run out of those, we simply fail. One could arge that we can fall back to basepages as we do in x86_64. But we can do much better when CONFIG_SPARSEMEM_VMEMMAP=y because vmemap page tables can map arbitrary memory. That means that we can simply use the beginning of each memory section and map struct pages there. struct pages which back the allocated space then just need to be treated carefully. Add {_Set,_Clear}PageVmemmap helpers to distinguish those pages in pfn walkers. We do not have any spare page flag for this purpose so use the combination of PageReserved bit which already tells that the page should be ignored by the core mm code and store VMEMMAP_PAGE (which sets all bits but PAGE_MAPPING_FLAGS) into page->mapping. There is one case where we cannot check for PageReserved, and that is when we have poisoning enabled + VM_BUG_ON_PGFLAGS is on + the page is not initialized. This happens in __init_single_page, where we do have to preserve the state of PageVmemmap pages, so we cannot zero the page. I added __PageVmemmap for that purpose, as it only checks for the page->mapping field. It should be enough as these pages are not yet onlined. On the memory hotplug front add a new MHP_MEMMAP_FROM_RANGE restriction flag. User is supposed to set the flag if the memmap should be allocated from the hotadded range. Right now, this is passed to add_memory(), __add_memory() and add_memory_resource(). Unfortunately we do not have a single entry point, as Hyper-V, Acpi, Xen use those three functions, so all those users have to specifiy if they want the memmap array allocated from the hot-added range. For the time being, only ACPI enabled it. Implementation wise we reuse vmem_altmap infrastructure to override the default allocator used by __vmemap_populate. Once the memmap is allocated we need a way to mark altmap pfns used for the allocation. If MHP_MEMMAP_FROM_RANGE was passed, we set up the layout of the altmap structure at the beginning of __add_pages(), and then we call mark_vmemmap_pages() after the memory has been added. mark_vmemmap_pages marks the pages as vmemmap and sets some metadata: The current layout of the Vmemmap pages are: - There is a head Vmemmap (first page), which has the following fields set: * page->_refcount: number of sections that used this altmap * page->private: total number of vmemmap pages - The remaining vmemmap pages have: * page->freelist: pointer to the head vmemmap page This is done to easy the computation we need in some places. E.g: Let us say we hot-add 9GB on x86_64: head->_refcount = 72 sections head->private = 36864 vmemmap pages tail's->freelist = head We keep a _refcount of the used sections to know how much do we have to defer the call to vmemmap_free(). The thing is that the first pages of the hot-added range are used to create the memmap mapping, so we cannot remove those first, otherwise we would blow up. What we do is that since when we hot-remove a memory-range, sections are being removed sequentially, we wait until we hit the last section, and then we free the hole range to vmemmap_free backwards. We know that it is the last section because in every pass we decrease head->_refcount, and when it reaches 0, we got our last section. We also have to be careful about those pages during online and offline operations. They are simply skipped, so online will keep them reserved and so unusable for any other purpose and offline ignores them so they do not block the offline operation. Signed-off-by: Oscar Salvador --- arch/arm64/mm/mmu.c | 5 +- arch/powerpc/mm/init_64.c | 7 ++ arch/powerpc/platforms/powernv/memtrace.c | 2 +- arch/powerpc/platforms/pseries/hotplug-memory.c | 2 +- arch/s390/mm/init.c | 6 ++ arch/x86/mm/init_64.c | 10 +++ drivers/acpi/acpi_memhotplug.c | 2 +- drivers/base/memory.c | 2 +- drivers/dax/kmem.c | 2 +- drivers/hv/hv_balloon.c | 2 +- drivers/s390/char/sclp_cmd.c | 2 +- drivers/xen/balloon.c | 2 +- include/linux/memory_hotplug.h | 22 ++++- include/linux/memremap.h | 2 +- include/linux/page-flags.h | 34 +++++++ mm/compaction.c | 6 ++ mm/memory_hotplug.c | 115 ++++++++++++++++++++---- mm/page_alloc.c | 19 +++- mm/page_isolation.c | 11 +++ mm/sparse.c | 88 ++++++++++++++++++ mm/util.c | 2 + 21 files changed, 309 insertions(+), 34 deletions(-) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 8c0d5484b38c..9607d4a3fc6b 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -746,7 +746,10 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, if (pmd_none(READ_ONCE(*pmdp))) { void *p = NULL; - p = vmemmap_alloc_block_buf(PMD_SIZE, node); + if (altmap) + p = altmap_alloc_block_buf(PMD_SIZE, altmap); + else + p = vmemmap_alloc_block_buf(PMD_SIZE, node); if (!p) return -ENOMEM; diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index a4c155af1597..94bf60c1b388 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -294,6 +294,13 @@ void __ref vmemmap_free(unsigned long start, unsigned long end, if (base_pfn >= alt_start && base_pfn < alt_end) { vmem_altmap_free(altmap, nr_pages); + } else if (PageVmemmap(page)) { + /* + * runtime vmemmap pages are residing inside the memory + * section so they do not have to be freed anywhere. + */ + while (PageVmemmap(page)) + __ClearPageVmemmap(page++); } else if (PageReserved(page)) { /* allocated from bootmem */ if (page_size < PAGE_SIZE) { diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c index 248a38ad25c7..6aa07ef19849 100644 --- a/arch/powerpc/platforms/powernv/memtrace.c +++ b/arch/powerpc/platforms/powernv/memtrace.c @@ -233,7 +233,7 @@ static int memtrace_online(void) ent->mem = 0; } - if (add_memory(ent->nid, ent->start, ent->size)) { + if (add_memory(ent->nid, ent->start, ent->size, true)) { pr_err("Failed to add trace memory to node %d\n", ent->nid); ret += 1; diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index d291b618a559..ffe02414248d 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -670,7 +670,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb) nid = memory_add_physaddr_to_nid(lmb->base_addr); /* Add the memory */ - rc = __add_memory(nid, lmb->base_addr, block_sz); + rc = __add_memory(nid, lmb->base_addr, block_sz, true); if (rc) { invalidate_lmb_associativity_index(lmb); return rc; diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index 9ae71a82e9e1..75e96860a9ac 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -231,6 +231,12 @@ int arch_add_memory(int nid, u64 start, u64 size, unsigned long size_pages = PFN_DOWN(size); int rc; + /* + * Physical memory is added only later during the memory online so we + * cannot use the added range at this stage unfortunately. + */ + restrictions->flags &= ~MHP_MEMMAP_FROM_RANGE; + rc = vmem_add_mapping(start, size); if (rc) return rc; diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index db42c11b48fb..2e40c9e637b9 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -809,6 +809,16 @@ static void __meminit free_pagetable(struct page *page, int order) unsigned long magic; unsigned int nr_pages = 1 << order; + /* + * runtime vmemmap pages are residing inside the memory section so + * they do not have to be freed anywhere. + */ + if (PageVmemmap(page)) { + while (nr_pages--) + __ClearPageVmemmap(page++); + return; + } + /* bootmem page has reserved flag */ if (PageReserved(page)) { __ClearPageReserved(page); diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 8fe0960ea572..ff9d78def208 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -228,7 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) if (node < 0) node = memory_add_physaddr_to_nid(info->start_addr); - result = __add_memory(node, info->start_addr, info->length); + result = __add_memory(node, info->start_addr, info->length, true); /* * If the memory block has been used by the kernel, add_memory() diff --git a/drivers/base/memory.c b/drivers/base/memory.c index cb8347500ce2..28fd2a5cc0c9 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -510,7 +510,7 @@ static ssize_t probe_store(struct device *dev, struct device_attribute *attr, nid = memory_add_physaddr_to_nid(phys_addr); ret = __add_memory(nid, phys_addr, - MIN_MEMORY_BLOCK_SIZE * sections_per_block); + MIN_MEMORY_BLOCK_SIZE * sections_per_block, true); if (ret) goto out; diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index a02318c6d28a..904371834390 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -65,7 +65,7 @@ int dev_dax_kmem_probe(struct device *dev) new_res->flags = IORESOURCE_SYSTEM_RAM; new_res->name = dev_name(dev); - rc = add_memory(numa_node, new_res->start, resource_size(new_res)); + rc = add_memory(numa_node, new_res->start, resource_size(new_res), false); if (rc) return rc; diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index dd475f3bcc8a..c88f32964b5f 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -741,7 +741,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size, nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn)); ret = add_memory(nid, PFN_PHYS((start_pfn)), - (HA_CHUNK << PAGE_SHIFT)); + (HA_CHUNK << PAGE_SHIFT), false); if (ret) { pr_err("hot_add memory failed error is %d\n", ret); diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c index 37d42de06079..b021c96cb5c7 100644 --- a/drivers/s390/char/sclp_cmd.c +++ b/drivers/s390/char/sclp_cmd.c @@ -406,7 +406,7 @@ static void __init add_memory_merged(u16 rn) if (!size) goto skip_add; for (addr = start; addr < start + size; addr += block_size) - add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size); + add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size, false); skip_add: first_rn = rn; num = 1; diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index d37dd5bb7a8f..59f25f3a91d0 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -352,7 +352,7 @@ static enum bp_state reserve_additional_memory(void) mutex_unlock(&balloon_mutex); /* add_memory_resource() requires the device_hotplug lock */ lock_device_hotplug(); - rc = add_memory_resource(nid, resource); + rc = add_memory_resource(nid, resource, false); unlock_device_hotplug(); mutex_lock(&balloon_mutex); diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 119a012d43b8..c304c2f529da 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -126,6 +126,14 @@ extern int __remove_pages(struct zone *zone, unsigned long start_pfn, #define MHP_MEMBLOCK_API 1<<0 /* + * Do we want memmap (struct page array) allocated from the hotadded range. + * Please note that only SPARSE_VMEMMAP implements this feature and some + * architectures might not support it even for that memory model (e.g. s390) + */ + +#define MHP_MEMMAP_FROM_RANGE 1<<1 + +/* * Restrictions for the memory hotplug: * flags: MHP_ flags * altmap: alternative allocator for memmap array @@ -345,9 +353,9 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {} extern void __ref free_area_init_core_hotplug(int nid); extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn, void *arg, int (*func)(struct memory_block *, void *)); -extern int __add_memory(int nid, u64 start, u64 size); -extern int add_memory(int nid, u64 start, u64 size); -extern int add_memory_resource(int nid, struct resource *resource); +extern int __add_memory(int nid, u64 start, u64 size, bool use_vmemmap); +extern int add_memory(int nid, u64 start, u64 size, bool use_vmemmap); +extern int add_memory_resource(int nid, struct resource *resource, bool use_vmemmap); extern int arch_add_memory(int nid, u64 start, u64 size, struct mhp_restrictions *restrictions); extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, @@ -363,4 +371,12 @@ extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_ int online_type); extern struct zone *zone_for_pfn_range(int online_type, int nid, unsigned start_pfn, unsigned long nr_pages); + +#ifdef CONFIG_SPARSEMEM_VMEMMAP +extern void mark_vmemmap_pages(struct vmem_altmap *self, + struct mhp_restrictions *r); +#else +static inline void mark_vmemmap_pages(struct vmem_altmap *self, + struct mhp_restrictions *r) {} +#endif #endif /* __LINUX_MEMORY_HOTPLUG_H */ diff --git a/include/linux/memremap.h b/include/linux/memremap.h index f0628660d541..cfde1c1febb7 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -16,7 +16,7 @@ struct device; * @alloc: track pages consumed, private to vmemmap_populate() */ struct vmem_altmap { - const unsigned long base_pfn; + unsigned long base_pfn; const unsigned long reserve; unsigned long free; unsigned long align; diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 9f8712a4b1a5..6718f6f04676 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -466,6 +466,40 @@ static __always_inline int __PageMovable(struct page *page) PAGE_MAPPING_MOVABLE; } +#define VMEMMAP_PAGE ~PAGE_MAPPING_FLAGS +static __always_inline int PageVmemmap(struct page *page) +{ + return PageReserved(page) && (unsigned long)page->mapping == VMEMMAP_PAGE; +} + +static __always_inline int __PageVmemmap(struct page *page) +{ + return (unsigned long)page->mapping == VMEMMAP_PAGE; +} + +static __always_inline void __ClearPageVmemmap(struct page *page) +{ + __ClearPageReserved(page); + page->mapping = NULL; +} + +static __always_inline void __SetPageVmemmap(struct page *page) +{ + __SetPageReserved(page); + page->mapping = (void *)VMEMMAP_PAGE; +} + +static __always_inline struct page *vmemmap_get_head(struct page *page) +{ + return (struct page *)page->freelist; +} + +static __always_inline unsigned long get_nr_vmemmap_pages(struct page *page) +{ + struct page *head = vmemmap_get_head(page); + return head->private - (page - head); +} + #ifdef CONFIG_KSM /* * A KSM page is one of those write-protected "shared pages" or "merged pages" diff --git a/mm/compaction.c b/mm/compaction.c index f171a83707ce..926b1b424de8 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -850,6 +850,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, page = pfn_to_page(low_pfn); /* + * Vmemmap pages cannot be migrated. + */ + if (PageVmemmap(page)) + goto isolate_fail; + + /* * Check if the pageblock has already been marked skipped. * Only the aligned PFN is checked as the caller isolates * COMPACT_CLUSTER_MAX at a time so the second call must diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 836cb026ed7b..0eb60c80b8bc 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -96,6 +96,11 @@ void mem_hotplug_done(void) cpus_read_unlock(); } +static inline bool can_use_vmemmap(void) +{ + return IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP); +} + u64 max_mem_size = U64_MAX; /* add this memory to iomem resource */ @@ -278,7 +283,16 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn, unsigned long i; int err = 0; int start_sec, end_sec; - struct vmem_altmap *altmap = restrictions->altmap; + struct vmem_altmap *altmap; + struct vmem_altmap __memblk_altmap = {}; + + if (restrictions->flags & MHP_MEMMAP_FROM_RANGE) { + __memblk_altmap.base_pfn = phys_start_pfn; + __memblk_altmap.free = nr_pages; + restrictions->altmap = &__memblk_altmap; + } + + altmap = restrictions->altmap; /* during initialize mem_map, align hot-added range to section */ start_sec = pfn_to_section_nr(phys_start_pfn); @@ -312,6 +326,10 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn, cond_resched(); } vmemmap_populate_print_last(); + + if (restrictions->flags & MHP_MEMMAP_FROM_RANGE) + mark_vmemmap_pages(altmap, restrictions); + out: return err; } @@ -677,6 +695,13 @@ static int online_pages_blocks(unsigned long start, unsigned long nr_pages) while (start < end) { order = min(MAX_ORDER - 1, get_order(PFN_PHYS(end) - PFN_PHYS(start))); + /* + * Check if the pfn is aligned to its order. + * If not, we decrement the order until it is, + * otherwise __free_one_page will bug us. + */ + while (start & ((1 << order) - 1)) + order--; (*online_page_callback)(pfn_to_page(start), order); onlined_pages += (1UL << order); @@ -689,13 +714,33 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages, void *arg) { unsigned long onlined_pages = *(unsigned long *)arg; + unsigned long pfn = start_pfn; + unsigned long nr_vmemmap_pages = 0; + bool skip_online = false; + + if (PageVmemmap(pfn_to_page(start_pfn))) { + /* + * We do not want to send Vmemmap pages to the buddy allocator. + * Skip them. + */ + nr_vmemmap_pages = get_nr_vmemmap_pages(pfn_to_page(start_pfn)); + nr_vmemmap_pages = min(nr_vmemmap_pages, nr_pages); + pfn += nr_vmemmap_pages; + if (nr_vmemmap_pages == nr_pages) + /* + * If the entire memblock contains only vmemmap pages, + * we do not have pages to free to the buddy allocator. + */ + skip_online = true; + } - if (PageReserved(pfn_to_page(start_pfn))) - onlined_pages += online_pages_blocks(start_pfn, nr_pages); + if (!skip_online && PageReserved(pfn_to_page(pfn))) + onlined_pages += online_pages_blocks(pfn, + nr_pages - nr_vmemmap_pages); online_mem_sections(start_pfn, start_pfn + nr_pages); - *(unsigned long *)arg = onlined_pages; + *(unsigned long *)arg = onlined_pages + nr_vmemmap_pages; return 0; } @@ -1094,7 +1139,7 @@ static int online_memory_block(struct memory_block *mem, void *arg) * * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ -int __ref add_memory_resource(int nid, struct resource *res) +int __ref add_memory_resource(int nid, struct resource *res, bool use_vmemmap) { u64 start, size; bool new_node = false; @@ -1125,6 +1170,9 @@ int __ref add_memory_resource(int nid, struct resource *res) /* call arch's memory hotadd */ restrictions.flags = MHP_MEMBLOCK_API; + if (can_use_vmemmap() && use_vmemmap) + restrictions.flags |= MHP_MEMMAP_FROM_RANGE; + ret = arch_add_memory(nid, start, size, &restrictions); if (ret < 0) goto error; @@ -1166,7 +1214,7 @@ int __ref add_memory_resource(int nid, struct resource *res) } /* requires device_hotplug_lock, see add_memory_resource() */ -int __ref __add_memory(int nid, u64 start, u64 size) +int __ref __add_memory(int nid, u64 start, u64 size, bool use_vmemmap) { struct resource *res; int ret; @@ -1175,18 +1223,18 @@ int __ref __add_memory(int nid, u64 start, u64 size) if (IS_ERR(res)) return PTR_ERR(res); - ret = add_memory_resource(nid, res); + ret = add_memory_resource(nid, res, use_vmemmap); if (ret < 0) release_memory_resource(res); return ret; } -int add_memory(int nid, u64 start, u64 size) +int add_memory(int nid, u64 start, u64 size, bool use_vmemmap) { int rc; lock_device_hotplug(); - rc = __add_memory(nid, start, size); + rc = __add_memory(nid, start, size, use_vmemmap); unlock_device_hotplug(); return rc; @@ -1554,15 +1602,31 @@ static int __ref __offline_pages(unsigned long start_pfn, { unsigned long pfn, nr_pages; unsigned long offlined_pages = 0; - int ret, node, nr_isolate_pageblock; + int ret, node, nr_isolate_pageblock = 0; unsigned long flags; unsigned long valid_start, valid_end; struct zone *zone; struct memory_notify arg; char *reason; + unsigned long nr_vmemmap_pages = 0; + bool skip_block = false; mem_hotplug_begin(); + if (PageVmemmap(pfn_to_page(start_pfn))) { + nr_vmemmap_pages = get_nr_vmemmap_pages(pfn_to_page(start_pfn)); + if (start_pfn + nr_vmemmap_pages >= end_pfn) { + /* + * Depending on how large was the hot-added range, + * an entire memblock can only contain vmemmap pages. + * Should be that the case, just skip isolation and + * migration. + */ + nr_vmemmap_pages = end_pfn - start_pfn; + skip_block = true; + } + } + /* This makes hotplug much easier...and readable. we assume this for now. .*/ if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start, @@ -1576,15 +1640,17 @@ static int __ref __offline_pages(unsigned long start_pfn, node = zone_to_nid(zone); nr_pages = end_pfn - start_pfn; - /* set above range as isolated */ - ret = start_isolate_page_range(start_pfn, end_pfn, - MIGRATE_MOVABLE, - SKIP_HWPOISON | REPORT_FAILURE); - if (ret < 0) { - reason = "failure to isolate range"; - goto failed_removal; + if (!skip_block) { + /* set above range as isolated */ + ret = start_isolate_page_range(start_pfn, end_pfn, + MIGRATE_MOVABLE, + SKIP_HWPOISON | REPORT_FAILURE); + if (ret < 0) { + reason = "failure to isolate range"; + goto failed_removal; + } + nr_isolate_pageblock = ret; } - nr_isolate_pageblock = ret; arg.start_pfn = start_pfn; arg.nr_pages = nr_pages; @@ -1597,6 +1663,9 @@ static int __ref __offline_pages(unsigned long start_pfn, goto failed_removal_isolated; } + if (skip_block) + goto no_migration; + do { for (pfn = start_pfn; pfn;) { if (signal_pending(current)) { @@ -1633,6 +1702,7 @@ static int __ref __offline_pages(unsigned long start_pfn, check_pages_isolated_cb); } while (ret); +no_migration: /* Ok, all of our target is isolated. We cannot do rollback at this point. */ walk_system_ram_range(start_pfn, end_pfn - start_pfn, &offlined_pages, @@ -1649,6 +1719,12 @@ static int __ref __offline_pages(unsigned long start_pfn, /* removal success */ adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages); + + /* + * Vmemmap pages are not being accounted to managed_pages but to + * present_pages. + */ + offlined_pages += nr_vmemmap_pages; zone->present_pages -= offlined_pages; pgdat_resize_lock(zone->zone_pgdat, &flags); @@ -1677,7 +1753,8 @@ static int __ref __offline_pages(unsigned long start_pfn, return 0; failed_removal_isolated: - undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); + if (!skip_block) + undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); memory_notify(MEM_CANCEL_OFFLINE, &arg); failed_removal: pr_debug("memory offlining [mem %#010llx-%#010llx] failed due to %s\n", diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d128f53888b8..6c026690a0b2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1273,9 +1273,12 @@ static void free_one_page(struct zone *zone, static void __meminit __init_single_page(struct page *page, unsigned long pfn, unsigned long zone, int nid) { - mm_zero_struct_page(page); + if (!__PageVmemmap(page)) { + mm_zero_struct_page(page); + init_page_count(page); + } + set_page_links(page, zone, nid, pfn); - init_page_count(page); page_mapcount_reset(page); page_cpupid_reset_last(page); page_kasan_tag_reset(page); @@ -8033,6 +8036,14 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, page = pfn_to_page(check); + if (PageVmemmap(page)) { + /* + * Vmemmap pages are marked reserved, so skip them here. + */ + iter += get_nr_vmemmap_pages(page) - 1; + continue; + } + if (PageReserved(page)) goto unmovable; @@ -8401,6 +8412,10 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) continue; } page = pfn_to_page(pfn); + if (PageVmemmap(page)) { + pfn += get_nr_vmemmap_pages(page); + continue; + } /* * The HWPoisoned page may be not in buddy system, and * page_count() is not 0. diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 019280712e1b..0081ab74c1ba 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -156,6 +156,10 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) page = pfn_to_online_page(pfn + i); if (!page) continue; + if (PageVmemmap(page)) { + i += get_nr_vmemmap_pages(page) - 1; + continue; + } return page; } return NULL; @@ -270,6 +274,13 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn, continue; } page = pfn_to_page(pfn); + if (PageVmemmap(page)) { + /* + * Vmemmap pages are not isolated. Skip them. + */ + pfn += get_nr_vmemmap_pages(page); + continue; + } if (PageBuddy(page)) /* * If the page is on a free list, it has to be on diff --git a/mm/sparse.c b/mm/sparse.c index 56e057c432f9..82c7b119eb1d 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -590,6 +590,89 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn) #endif #ifdef CONFIG_SPARSEMEM_VMEMMAP +void mark_vmemmap_pages(struct vmem_altmap *self, struct mhp_restrictions *r) +{ + unsigned long pfn = self->base_pfn + self->reserve; + unsigned long nr_pages = self->alloc; + unsigned long nr_sects = self->free / PAGES_PER_SECTION; + unsigned long i; + struct page *head; + + if (!nr_pages) + return; + + /* + * All allocations for the memory hotplug are the same sized so align + * should be 0. + */ + WARN_ON(self->align); + + /* + * Mark these pages as Vmemmap pages. + * Layout: + * [Head->refcount] : Nr sections used by this altmap + * [Head->private] : Nr of vmemmap pages + * [Tail->freelist] : Pointer to the head page + */ + + /* + * Head, first vmemmap page + */ + head = pfn_to_page(pfn); + for (i = 0; i < nr_pages; i++, pfn++) { + struct page *page = pfn_to_page(pfn); + + mm_zero_struct_page(page); + __SetPageVmemmap(page); + page->freelist = head; + init_page_count(page); + } + set_page_count(head, (int)nr_sects); + set_page_private(head, nr_pages); +} + +/* + * If the range we are trying to remove was hot-added with vmemmap pages, + * we need to keep track of it to know how much do we have do defer the + * free up. Since sections are removed sequentally in __remove_pages()-> + * __remove_section(), we just wait until we hit the last section. + * Once that happens, we can trigger free_deferred_vmemmap_range to actually + * free the whole memory-range. + */ +static struct page *head_vmemmap_page; +static bool in_vmemmap_range; + +static inline bool vmemmap_dec_and_test(void) +{ + return page_ref_dec_and_test(head_vmemmap_page); +} + +static void free_deferred_vmemmap_range(unsigned long start, + unsigned long end) +{ + unsigned long nr_pages = end - start; + unsigned long first_section = (unsigned long)head_vmemmap_page; + + while (start >= first_section) { + vmemmap_free(start, end, NULL); + end = start; + start -= nr_pages; + } + head_vmemmap_page = NULL; + in_vmemmap_range = false; +} + +static void deferred_vmemmap_free(unsigned long start, unsigned long end) +{ + if (!in_vmemmap_range) { + in_vmemmap_range = true; + head_vmemmap_page = (struct page *)start; + } + + if (vmemmap_dec_and_test()) + free_deferred_vmemmap_range(start, end); +} + static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid, struct vmem_altmap *altmap) { @@ -602,6 +685,11 @@ static void __kfree_section_memmap(struct page *memmap, unsigned long start = (unsigned long)memmap; unsigned long end = (unsigned long)(memmap + PAGES_PER_SECTION); + if (PageVmemmap(memmap) || in_vmemmap_range) { + deferred_vmemmap_free(start, end); + return; + } + vmemmap_free(start, end, altmap); } #ifdef CONFIG_MEMORY_HOTREMOVE diff --git a/mm/util.c b/mm/util.c index d559bde497a9..5521d0a6c9e3 100644 --- a/mm/util.c +++ b/mm/util.c @@ -531,6 +531,8 @@ struct address_space *page_mapping(struct page *page) mapping = page->mapping; if ((unsigned long)mapping & PAGE_MAPPING_ANON) return NULL; + if ((unsigned long)mapping == VMEMMAP_PAGE) + return NULL; return (void *)((unsigned long)mapping & ~PAGE_MAPPING_FLAGS); } From patchwork Thu Mar 28 13:43:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 10875023 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9D7C41390 for ; Thu, 28 Mar 2019 13:44:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 842FD28740 for ; Thu, 28 Mar 2019 13:44:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 74B722877B; Thu, 28 Mar 2019 13:44:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F262028740 for ; Thu, 28 Mar 2019 13:44:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2139B6B0008; Thu, 28 Mar 2019 09:44:08 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1C0FA6B000A; Thu, 28 Mar 2019 09:44:08 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 087B86B000C; Thu, 28 Mar 2019 09:44:08 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by kanga.kvack.org (Postfix) with ESMTP id AFFB66B0008 for ; Thu, 28 Mar 2019 09:44:07 -0400 (EDT) Received: by mail-ed1-f72.google.com with SMTP id m31so8143374edm.4 for ; Thu, 28 Mar 2019 06:44:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=VXzFbLAjAoJeKDf0Uc3Q0h+F3BUGkrW+aNvYoE6az7s=; b=VMarwHDeFMkM2Vrkw+6FutvszepzbBrO/1/HRAVPNqOsSDE1qLkcPtegnUjOXLfYw5 QvmYAB44vCUygOHwJ+QQI1LCwRyZuF0ZdkAsDC/kWd7f3VCM0CO6jKtSX96H+ElDwYQU ZEr7C7xqGcBxKgIRP1PKSt1LS2irFuWeQQZd0RRonEV1+fFy+vaYfO75b0zL/HJIMLMF +7WKumw1qmeAYzDnFPgJv6EerqY1ifDUGpsSR1wnz8qPGPUlVJXE80Q7YAvi4pLksfKt 6o9UXp6um8rzzOisRJufwxMUFm1N4EWvw74E7NGyPhbwJ4BoCWFE8f9CSDaGP1vFbeFi cb/g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de X-Gm-Message-State: APjAAAV3P4570J7J2aQAxAKZvPedUgOx52r8dOWbimGvpx99T5rz1rlk q/OAmnBZHTUUjAXF0fmWj0/Vs+lmFRwvbS9l7eWfncRs2N42soxHFGQ79lZ3ReBSVexG0zY1WnM Zml8+PNfyW9s+m4Ot+MpY0eIsuN6GP4v9oe1BxiD0Wi3zNHhIPIb+aWMOj7NtFnOE4Q== X-Received: by 2002:a50:aa4e:: with SMTP id p14mr22258441edc.59.1553780647242; Thu, 28 Mar 2019 06:44:07 -0700 (PDT) X-Google-Smtp-Source: APXvYqxhAXrYG/gb1kbKb7zSHevW5+sk4rOAZdFxLeOeJGTqtWAFRZWnVLTwvA0ASjUO2ZPbDgvp X-Received: by 2002:a50:aa4e:: with SMTP id p14mr22258402edc.59.1553780646402; Thu, 28 Mar 2019 06:44:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553780646; cv=none; d=google.com; s=arc-20160816; b=FzN7BCjCX6TdgmDpOwcPx6JcoYASIVYlvsupMVxJmfGmkd90vJBR3qSc8je2PdL/WK /eTRGcaDfvPNueqQ4+lNCUCEPvjacy2+brzh+Y+nmuLZNrzN3vRucBhx1aLGJ1/hlkCO 1J7c6NFZ0WNSkCuAQI2N4tXufkDY46maCnj9+c4WmpE4dYAoGCNO8OwJidGniFLUUe3T yXieBmpyjVBV+z4tkl98DUHmIjRXB8QDSJqu4pH9Tuc7395rXrbqbsaPAf1yQPM2SAIE 7rp0AvQQKvgC7VpDw7JbTUBBHjXh1KGAJgYOzoEXWTCN88WEbPRilF6087sNhhWYfA3P HASQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=VXzFbLAjAoJeKDf0Uc3Q0h+F3BUGkrW+aNvYoE6az7s=; b=iTWkQd+Pb+IQH3rUktP0tCGg/HYlLY7TgaRVViJOLU9FVlpyT24FDRhaskdL+pjH5d jwljd0/WE9PdS4XAtmL84hagKV+xRrbVIQJ36Lrf83XISrt/vrWeYmHwLB1Cb+BPf+Hf 9Yxf/N0wW2uvhxq3fwAN9331Wah+Aa0z94Ou+5rafdk2Qrx3yuMVxhVF7wy5aYCg/nFF cx/px18uPzAIUyiAYcnf2mK0nk7OAi6OCLpePnmXAYtnwqnBQaSbmfuNlKyuvcxLSbNG p03JpIIAAPMNVB9WcxxPHYdSoZwk/Ymo+XMjqQypeEdoc7MS4gGebxeElfdsDxyVqYAY BR3w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from smtp.nue.novell.com (smtp.nue.novell.com. [195.135.221.5]) by mx.google.com with ESMTPS id c9si1839298edk.312.2019.03.28.06.44.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 Mar 2019 06:44:06 -0700 (PDT) Received-SPF: pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) client-ip=195.135.221.5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of osalvador@suse.de designates 195.135.221.5 as permitted sender) smtp.mailfrom=osalvador@suse.de Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Thu, 28 Mar 2019 14:44:05 +0100 Received: from d104.suse.de (nwb-a10-snat.microfocus.com [10.120.13.202]) by emea4-mta.ukb.novell.com with ESMTP (NOT encrypted); Thu, 28 Mar 2019 13:43:31 +0000 From: Oscar Salvador To: akpm@linux-foundation.org Cc: mhocko@suse.com, david@redhat.com, dan.j.williams@intel.com, Jonathan.Cameron@huawei.com, anshuman.khandual@arm.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Oscar Salvador Subject: [PATCH 4/4] mm, sparse: rename kmalloc_section_memmap, __kfree_section_memmap Date: Thu, 28 Mar 2019 14:43:20 +0100 Message-Id: <20190328134320.13232-5-osalvador@suse.de> X-Mailer: git-send-email 2.13.7 In-Reply-To: <20190328134320.13232-1-osalvador@suse.de> References: <20190328134320.13232-1-osalvador@suse.de> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Michal Hocko The sufix "kmalloc" is misleading. Rename it to alloc_section_memmap/free_section_memmap which better reflects the funcionality. Signed-off-by: Michal Hocko Signed-off-by: Oscar Salvador --- mm/sparse.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/mm/sparse.c b/mm/sparse.c index 82c7b119eb1d..63c1d0bd4755 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -673,13 +673,13 @@ static void deferred_vmemmap_free(unsigned long start, unsigned long end) free_deferred_vmemmap_range(start, end); } -static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid, +static inline struct page *alloc_section_memmap(unsigned long pnum, int nid, struct vmem_altmap *altmap) { /* This will make the necessary allocations eventually. */ return sparse_mem_map_populate(pnum, nid, altmap); } -static void __kfree_section_memmap(struct page *memmap, +static void free_section_memmap(struct page *memmap, struct vmem_altmap *altmap) { unsigned long start = (unsigned long)memmap; @@ -723,13 +723,13 @@ static struct page *__kmalloc_section_memmap(void) return ret; } -static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid, +static inline struct page *alloc_section_memmap(unsigned long pnum, int nid, struct vmem_altmap *altmap) { return __kmalloc_section_memmap(); } -static void __kfree_section_memmap(struct page *memmap, +static void free_section_memmap(struct page *memmap, struct vmem_altmap *altmap) { if (is_vmalloc_addr(memmap)) @@ -794,12 +794,12 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn, if (ret < 0 && ret != -EEXIST) return ret; ret = 0; - memmap = kmalloc_section_memmap(section_nr, nid, altmap); + memmap = alloc_section_memmap(section_nr, nid, altmap); if (!memmap) return -ENOMEM; usemap = __kmalloc_section_usemap(); if (!usemap) { - __kfree_section_memmap(memmap, altmap); + free_section_memmap(memmap, altmap); return -ENOMEM; } @@ -821,7 +821,7 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn, out: if (ret < 0) { kfree(usemap); - __kfree_section_memmap(memmap, altmap); + free_section_memmap(memmap, altmap); } return ret; } @@ -872,7 +872,7 @@ static void free_section_usemap(struct page *memmap, unsigned long *usemap, if (PageSlab(usemap_page) || PageCompound(usemap_page)) { kfree(usemap); if (memmap) - __kfree_section_memmap(memmap, altmap); + free_section_memmap(memmap, altmap); return; }