From patchwork Wed May 23 12:55:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Hocko X-Patchwork-Id: 10421213 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0179C60224 for ; Wed, 23 May 2018 12:56:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E3CA02892D for ; Wed, 23 May 2018 12:56:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D867928F97; Wed, 23 May 2018 12:56:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0A0E22892D for ; Wed, 23 May 2018 12:56:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A96DC6B000C; Wed, 23 May 2018 08:56:06 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A46F96B000D; Wed, 23 May 2018 08:56:06 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8E8AB6B000E; Wed, 23 May 2018 08:56:06 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id 2C1796B000C for ; Wed, 23 May 2018 08:56:06 -0400 (EDT) Received: by mail-wm0-f72.google.com with SMTP id n17-v6so2285725wmc.8 for ; Wed, 23 May 2018 05:56:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=YixmkgXDxccEZn6P5RsR62QAsgPIbMvWMHhhWxchCaA=; b=qcSNe2nDEZFhS6zeIZnjWHy/Ex5kdNxBUT847yihzuUVSmzgi5iBgllDBEVW0p/qPe yrZsAFNW/6xKI3caYrigF9wLfHsGpiK/cgK73EJAzB7HiagYD7+fGDorzqFHwdiiJoGs EELqo2GjU9K7tnZHPfflt7izLOdRWwIrfL+NXm9M4nEQzvV7nVR5pIBWSQCF+ouhoWKS U69uKre92QuwKeW53sWfzSsPgXjDFkdrlr37JtHSHs/8OHQ82H5r8hXEaMB4bd3zT72t /HqgEelltPSzEEY/s7u+puSRXwBcrj0KAWeIjlYrmuPPFP+Wc0Cy6WgN3EkEe+Um6F9z Fe9w== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Gm-Message-State: ALKqPwefP7ZBMpuF1+IzuUu0uoIWj778TWyien9Kbhn7bvvvH1UisHgY Jox/OgwGrrgYC7d+/OKMdhfnS6fNSpSbfGZK263Gdphh3DTRsFmcTuTeGCECv52Yldt2KEmWzRs Tp58YFCa5v00CKdTNYU3hz+O3+ASrhej5Y4SoG4bJ7UeZ33wVE9m+CkA0EDVOg7+ir8HYK7dJjl ttBUKdIAmmUz5lszP+pJs+odNEdZ0Pq7pcuP6ztUXLhciS1xlC+HsfhR4idyI+QEZgZg4qq7+IA z9zOTw9V0eDoh2BOr1vQRmXGwgP0cxORjTjzJy2+z/5Hmpo5kC5UM3USsVJj25myw6ILhKH6mAX h71E2tLIay6ifKvwwd7FVBEUk/r+MFarEKVtrGDtuqFM3L9GHYPXBvpQBFVyU+yvYHGLTrutKA= = X-Received: by 2002:adf:ad2f:: with SMTP id p44-v6mr1484192wrc.164.1527080165709; Wed, 23 May 2018 05:56:05 -0700 (PDT) X-Received: by 2002:adf:ad2f:: with SMTP id p44-v6mr1484146wrc.164.1527080164684; Wed, 23 May 2018 05:56:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527080164; cv=none; d=google.com; s=arc-20160816; b=G+KQmON8wMJJh+5+vCE8bhuvbFv6Rla3HKPbFwI/oNXrs5dm4e2TDecnjmzx5aiZ+u UeS84VqAb1DYon2lxW0e4m7Og+VMVdGKEpS7kXLyCWww7mnXlVHE5JEJMk2UAcyqo/QK mYnvWD+WeSRBHeYIltFbUEnuUPiYmi9/gWZEcKUpDaVf30Sh8nZi7eh8mkc3MAv01I3P rfCtG1uxC1ptORPQkj33y0+5GONGTXoQgWzusaZ5YOI+h7B7RXw3SlwXBaHLbAbU7z8X OzPR8vaa26zGx1fwC6TtL/qw995GBrij+YhbH8J4nsoUKWueCtd0ftdRHYl6JhZYZC12 n+BA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=YixmkgXDxccEZn6P5RsR62QAsgPIbMvWMHhhWxchCaA=; b=Jfm2qA+zlPxaWJDAs67CH7r+kRL/2+Mw1XASZAjbFQirTOXc4dTmLbUGHApl3I6awx aUZx9G/h4V2qLr2CxtPTNL01FBEhUjyaQinZpkO2G6bkUBnzuPeSEwqcnuEGc8xQlZLP aPGH6y6gckTiofxOdvIbs8Viz1kXnMuffDQE5fqaSvv/cuqc4dJM3SQ9G3pr3bD5V47b YEDJ9GT9pUrDvpZiVqmMd20hETvlST041mlW58uuISRJMecO7wz2zxZ3PXUJkwrqiiiw N773gXQQ0xZvcpfG9kmMYMCJSRyMvHdS4CnZHJITVRgU6HcoCP0M7L/kCBhVoUM8Pg2+ FRig== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id z11-v6sor6792424wre.74.2018.05.23.05.56.04 for (Google Transport Security); Wed, 23 May 2018 05:56:04 -0700 (PDT) Received-SPF: pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Google-Smtp-Source: AB8JxZrAeUkbFMi4M7WqVAfSIQV+MbO4zL9E8v0wJQw1ZA9xobF5M97cqKJH+de1E4KDzxet7VIP8Q== X-Received: by 2002:adf:9502:: with SMTP id 2-v6mr2188775wrs.241.1527080164317; Wed, 23 May 2018 05:56:04 -0700 (PDT) Received: from tiehlicka.suse.cz (ip-37-188-135-200.eurotel.cz. [37.188.135.200]) by smtp.gmail.com with ESMTPSA id u89-v6sm2643543wma.4.2018.05.23.05.56.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 May 2018 05:56:03 -0700 (PDT) From: Michal Hocko To: Andrew Morton Cc: Oscar Salvador , Vlastimil Babka , Pavel Tatashin , Reza Arbab , Igor Mammedov , Vitaly Kuznetsov , LKML , , Michal Hocko Subject: [PATCH 1/2] mm, memory_hotplug: make has_unmovable_pages more robust Date: Wed, 23 May 2018 14:55:54 +0200 Message-Id: <20180523125555.30039-2-mhocko@kernel.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180523125555.30039-1-mhocko@kernel.org> References: <20180523125555.30039-1-mhocko@kernel.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Michal Hocko Oscar has reported: : Due to an unfortunate setting with movablecore, memblocks containing bootmem : memory (pages marked by get_page_bootmem()) ended up marked in zone_movable. : So while trying to remove that memory, the system failed in do_migrate_range : and __offline_pages never returned. : : This can be reproduced by running : qemu-system-x86_64 -m 6G,slots=8,maxmem=8G -numa node,mem=4096M -numa node,mem=2048M : and movablecore=4G kernel command line : : linux kernel: BIOS-provided physical RAM map: : linux kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable : linux kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved : linux kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved : linux kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable : linux kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved : linux kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved : linux kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved : linux kernel: BIOS-e820: [mem 0x0000000100000000-0x00000001bfffffff] usable : linux kernel: NX (Execute Disable) protection: active : linux kernel: SMBIOS 2.8 present. : linux kernel: DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org : linux kernel: Hypervisor detected: KVM : linux kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved : linux kernel: e820: remove [mem 0x000a0000-0x000fffff] usable : linux kernel: last_pfn = 0x1c0000 max_arch_pfn = 0x400000000 : : linux kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0 : linux kernel: SRAT: PXM 1 -> APIC 0x01 -> Node 1 : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff] : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff] : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x13fffffff] : linux kernel: ACPI: SRAT: Node 1 PXM 1 [mem 0x140000000-0x1bfffffff] : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x1c0000000-0x43fffffff] hotplug : linux kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x0 : linux kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0 : linux kernel: NODE_DATA(0) allocated [mem 0x13ffd6000-0x13fffffff] : linux kernel: NODE_DATA(1) allocated [mem 0x1bffd3000-0x1bfffcfff] : : zoneinfo shows that the zone movable is placed into both numa nodes: : Node 0, zone Movable : pages free 160140 : min 1823 : low 2278 : high 2733 : spanned 262144 : present 262144 : managed 245670 : Node 1, zone Movable : pages free 448427 : min 3827 : low 4783 : high 5739 : spanned 524288 : present 524288 : managed 515766 Note how only Node 0 has a hutplugable memory region which would rule it out from the early memblock allocations (most likely memmap). Node1 will surely contain memmaps on the same node and those would prevent offlining to succeed. So this is arguably a configuration issue. Although one could argue that we should be more clever and rule early allocations from the zone movable. This would be correct but probably not worth the effort considering what a hack movablecore is. Anyway, We could do better for those cases though. We rely on start_isolate_page_range resp. has_unmovable_pages to do their job. The first one isolates the whole range to be offlined so that we do not allocate from it anymore and the later makes sure we are not stumbling over non-migrateable pages. has_unmovable_pages is overly optimistic, however. It doesn't check all the pages if we are withing zone_movable because we rely that those pages will be always migrateable. As it turns out we are still not perfect there. While bootmem pages in zonemovable sound like a clear bug which should be fixed let's remove the optimization for now and warn if we encounter unmovable pages in zone_movable in the meantime. That should help for now at least. Btw. this wasn't a real problem until 72b39cfc4d75 ("mm, memory_hotplug: do not fail offlining too early") because we used to have a small number of retries and then failed. This turned out to be too fragile though. Reported-by: Oscar Salvador Tested-by: Oscar Salvador Signed-off-by: Michal Hocko Reviewed-by: Pavel Tatashin --- mm/page_alloc.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3c6f4008ea55..b9a45753244d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7629,11 +7629,12 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, unsigned long pfn, iter, found; /* - * For avoiding noise data, lru_add_drain_all() should be called - * If ZONE_MOVABLE, the zone never contains unmovable pages + * TODO we could make this much more efficient by not checking every + * page in the range if we know all of them are in MOVABLE_ZONE and + * that the movable zone guarantees that pages are migratable but + * the later is not the case right now unfortunatelly. E.g. movablecore + * can still lead to having bootmem allocations in zone_movable. */ - if (zone_idx(zone) == ZONE_MOVABLE) - return false; /* * CMA allocations (alloc_contig_range) really need to mark isolate @@ -7654,7 +7655,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, page = pfn_to_page(check); if (PageReserved(page)) - return true; + goto unmovable; /* * Hugepages are not in LRU lists, but they're movable. @@ -7704,9 +7705,12 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, * page at boot. */ if (found > count) - return true; + goto unmovable; } return false; +unmovable: + WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE); + return true; } #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)