From patchwork Fri Jun 7 09:09:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13689521 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1B89C27C55 for ; Fri, 7 Jun 2024 09:10:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7FFB96B00B5; Fri, 7 Jun 2024 05:10:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7AF706B00B6; Fri, 7 Jun 2024 05:10:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 651516B00B7; Fri, 7 Jun 2024 05:10:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 3FD0A6B00B5 for ; Fri, 7 Jun 2024 05:10:06 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id ECCB11A03DD for ; Fri, 7 Jun 2024 09:10:05 +0000 (UTC) X-FDA: 82203520770.16.A461182 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf11.hostedemail.com (Postfix) with ESMTP id 3D8B340009 for ; Fri, 7 Jun 2024 09:10:04 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=i3l+1xbW; spf=pass (imf11.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717751404; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mc4++sRx3ukGDekZxB1N0n7JzUxaxxhg3TlGVFiWWaQ=; b=v3XBwAqEZDK/jcW6OQ1P+Sqtm4+bniyplXA8GM0fryKtcWyMbSgYkMAztzbk5lW8xA4KFI gNB/AtMK2IUHJyeTylfk84by2KLibpwpDlrjeZRQ+4oghaGwb7pfbd0eqOIWb1LnFIB3mq ZLp1D8kSQxm/DjUNonStue6EJsvYiQI= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=i3l+1xbW; spf=pass (imf11.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717751404; a=rsa-sha256; cv=none; b=vyar/cwyVTwWqlrwo2irC6xI2wLJv/EJb/yk/sG+u/cCwPc8JbD0zD0WGwb/E9YDg4SAlD S+p1m3bctN/lk74qpydyQrwps4oCwErTMGZiqccfbBIVHZNI3xysLFYfPOTj5K/QTdpn6b 4CSQarxOyrzKv3QMKEf4ZmIYcm9r21M= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1717751403; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mc4++sRx3ukGDekZxB1N0n7JzUxaxxhg3TlGVFiWWaQ=; b=i3l+1xbWhuleJwBKwwSrDqP43KsQnfifvtyAnYuVufCcL0d0NFBBHXPiLH0MHGUKkDxvfj WWrtixlSiV3Fj/7e8bSJYMfx5LkRmpSLW0vEc4HIqqBnUlpQeOlJpHndGG82/ltRA9tDGN XsuZBbOBpONWjoqPTYqFHPyaYQlzwsg= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-606-eUzDhlzTMDqdmYuPAblQDA-1; Fri, 07 Jun 2024 05:09:56 -0400 X-MC-Unique: eUzDhlzTMDqdmYuPAblQDA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 964D43C01221; Fri, 7 Jun 2024 09:09:55 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.194.94]) by smtp.corp.redhat.com (Postfix) with ESMTP id 30BEC37E5; Fri, 7 Jun 2024 09:09:51 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-hyperv@vger.kernel.org, virtualization@lists.linux.dev, xen-devel@lists.xenproject.org, kasan-dev@googlegroups.com, David Hildenbrand , Andrew Morton , Mike Rapoport , Oscar Salvador , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , "Michael S. Tsirkin" , Jason Wang , Xuan Zhuo , =?utf-8?q?Eugenio_P=C3=A9rez?= , Juergen Gross , Stefano Stabellini , Oleksandr Tyshchenko , Alexander Potapenko , Marco Elver , Dmitry Vyukov Subject: [PATCH v1 2/3] mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline() instead of PageReserved() Date: Fri, 7 Jun 2024 11:09:37 +0200 Message-ID: <20240607090939.89524-3-david@redhat.com> In-Reply-To: <20240607090939.89524-1-david@redhat.com> References: <20240607090939.89524-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 X-Stat-Signature: jmb5f1of1b6j4z917nnesqm9c5efo18i X-Rspamd-Queue-Id: 3D8B340009 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1717751404-89988 X-HE-Meta: U2FsdGVkX1/gYPk9O600AIuO/vlJMgDibqUtnGzdkG4ewv2x90EB1Rlap4aAlQJ1M+itn5bqptWTtX/H2Bp0yaluvhBhciv9HL6whAS6A4jTXkliYQBb6/YZojvO4nvotCoYQgiGAdqnteCFnDriFaAwvRxfOcwy8cHF15U3pGrAkLWhC45YQ4vii50kda3OIv0Ot31KQcVbe7xPrB902QwmvQfhqZxotZukmRu6pmlzu3NBqQd+An3lJu406pBiznNcdShte4sOSL3TSp8w1t6H/QqU8L4U5Kx5Di3aOsZU6M8MO5Vt6CbkRYazwk/wwsVXguki3xcT3xEMK+VXu1BAI2VViXVUHapkW3/EELa7ek4DbT4tolhuLJ2MG1xEIF11u8Id7HGSWz4N6CCUzBa7hPPPKUMC6LFyDX3mrUv16ypKpnjexO1HgKGUB238jrelaNRs+b5e+/XjZ2J3BXpyZG7FEGfLEdiEyC2KyelpEgcpbAEfJg2EkBbkF1AYpVLYBrmFv8bVg+eVUp8trcRyHTtC7QexSL7dMuLTr8jngo8Vq5FT6AZvzu87qfpH+THvGRaxcnRaBUvURGR8Bb75hZyT1rwrV//LeHv8ArnnUKXqC3N+P5gW8x2zqvfKlUS8U3/BcYGw49iRhNfWhrNc+NDbHh+8uSB/G9ZAM0a7Bjw9oIgZKiLupfnDNex4rB4WWQ/IQB9QFkkHb62tFt14/8CoSr9wDEAMOYnEtaJJjoCe+jrwuL5z54eyecFgmeg002cswn5flivAB9nIxX2C76wDSq1mfIfl3/XOjZNnilw4o+O03eGlBy5O1GyhtES+dChvHFb4GrX43Nt2GXyHtR/SchuRsJfr7VGleO4y7nWT9NDRWYY22atrZ9TamAwlpkiDgA3QuXLzJHkw4tx4SVbfQIGF9XUUOKZCWM73ipP4kSA/4Z6d9fBR8+UlrIlsZRUqjaZ+tMP6dz0 WvewJ1Y7 Beo9IF1fsgO54OT8XyroqScGEfFJ1vDFm8e87kQ9GGAu50ciRi9ZZC0YwworykpCscOCYVphXjYT3efQ4Kb+CwnAuQxBZmhrr+vmGRsY6JybyLN+PQawAEjdlMrOPvaU4lya4feCRz8JL+XQOduR+2KIN1iqqXN9uCojkafcyphySR50QoEeXzWMgNVAR0TCWV7WtSGrxPP8zDquDtzPJmvnPfYffVJpaX/Nd2kefAgKjJp3sqz2eX9Rhe/Ip6QzC/MICZlLZB8pk770= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We currently initialize the memmap such that PG_reserved is set and the refcount of the page is 1. In virtio-mem code, we have to manually clear that PG_reserved flag to make memory offlining with partially hotplugged memory blocks possible: has_unmovable_pages() would otherwise bail out on such pages. We want to avoid PG_reserved where possible and move to typed pages instead. Further, we want to further enlighten memory offlining code about PG_offline: offline pages in an online memory section. One example is handling managed page count adjustments in a cleaner way during memory offlining. So let's initialize the pages with PG_offline instead of PG_reserved. generic_online_page()->__free_pages_core() will now clear that flag before handing that memory to the buddy. Note that the page refcount is still 1 and would forbid offlining of such memory except when special care is take during GOING_OFFLINE as currently only implemented by virtio-mem. With this change, we can now get non-PageReserved() pages in the XEN balloon list. From what I can tell, that can already happen via decrease_reservation(), so that should be fine. HV-balloon should not really observe a change: partial online memory blocks still cannot get surprise-offlined, because the refcount of these PageOffline() pages is 1. Update virtio-mem, HV-balloon and XEN-balloon code to be aware that hotplugged pages are now PageOffline() instead of PageReserved() before they are handed over to the buddy. We'll leave the ZONE_DEVICE case alone for now. Signed-off-by: David Hildenbrand Acked-by: Oscar Salvador # for the generic --- drivers/hv/hv_balloon.c | 5 ++--- drivers/virtio/virtio_mem.c | 18 ++++++++++++------ drivers/xen/balloon.c | 9 +++++++-- include/linux/page-flags.h | 12 +++++------- mm/memory_hotplug.c | 16 ++++++++++------ mm/mm_init.c | 10 ++++++++-- mm/page_alloc.c | 32 +++++++++++++++++++++++--------- 7 files changed, 67 insertions(+), 35 deletions(-) diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index e000fa3b9f978..c1be38edd8361 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -693,9 +693,8 @@ static void hv_page_online_one(struct hv_hotadd_state *has, struct page *pg) if (!PageOffline(pg)) __SetPageOffline(pg); return; - } - if (PageOffline(pg)) - __ClearPageOffline(pg); + } else if (!PageOffline(pg)) + return; /* This frame is currently backed; online the page. */ generic_online_page(pg, 0); diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index a3857bacc8446..b90df29621c81 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -1146,12 +1146,16 @@ static void virtio_mem_set_fake_offline(unsigned long pfn, for (; nr_pages--; pfn++) { struct page *page = pfn_to_page(pfn); - __SetPageOffline(page); - if (!onlined) { + if (!onlined) + /* + * Pages that have not been onlined yet were initialized + * to PageOffline(). Remember that we have to route them + * through generic_online_page(). + */ SetPageDirty(page); - /* FIXME: remove after cleanups */ - ClearPageReserved(page); - } + else + __SetPageOffline(page); + VM_WARN_ON_ONCE(!PageOffline(page)); } page_offline_end(); } @@ -1166,9 +1170,11 @@ static void virtio_mem_clear_fake_offline(unsigned long pfn, for (; nr_pages--; pfn++) { struct page *page = pfn_to_page(pfn); - __ClearPageOffline(page); if (!onlined) + /* generic_online_page() will clear PageOffline(). */ ClearPageDirty(page); + else + __ClearPageOffline(page); } } diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index aaf2514fcfa46..528395133b4f8 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -146,7 +146,8 @@ static DECLARE_WAIT_QUEUE_HEAD(balloon_wq); /* balloon_append: add the given page to the balloon. */ static void balloon_append(struct page *page) { - __SetPageOffline(page); + if (!PageOffline(page)) + __SetPageOffline(page); /* Lowmem is re-populated first, so highmem pages go at list tail. */ if (PageHighMem(page)) { @@ -412,7 +413,11 @@ static enum bp_state increase_reservation(unsigned long nr_pages) xenmem_reservation_va_mapping_update(1, &page, &frame_list[i]); - /* Relinquish the page back to the allocator. */ + /* + * Relinquish the page back to the allocator. Note that + * some pages, including ones added via xen_online_page(), might + * not be marked reserved; free_reserved_page() will handle that. + */ free_reserved_page(page); } diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index f04fea86324d9..e0362ce7fc109 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -30,16 +30,11 @@ * - Pages falling into physical memory gaps - not IORESOURCE_SYSRAM. Trying * to read/write these pages might end badly. Don't touch! * - The zero page(s) - * - Pages not added to the page allocator when onlining a section because - * they were excluded via the online_page_callback() or because they are - * PG_hwpoison. * - Pages allocated in the context of kexec/kdump (loaded kernel image, * control pages, vmcoreinfo) * - MMIO/DMA pages. Some architectures don't allow to ioremap pages that are * not marked PG_reserved (as they might be in use by somebody else who does * not respect the caching strategy). - * - Pages part of an offline section (struct pages of offline sections should - * not be trusted as they will be initialized when first onlined). * - MCA pages on ia64 * - Pages holding CPU notes for POWER Firmware Assisted Dump * - Device memory (e.g. PMEM, DAX, HMM) @@ -1021,6 +1016,10 @@ PAGE_TYPE_OPS(Buddy, buddy, buddy) * The content of these pages is effectively stale. Such pages should not * be touched (read/write/dump/save) except by their owner. * + * When a memory block gets onlined, all pages are initialized with a + * refcount of 1 and PageOffline(). generic_online_page() will + * take care of clearing PageOffline(). + * * If a driver wants to allow to offline unmovable PageOffline() pages without * putting them back to the buddy, it can do so via the memory notifier by * decrementing the reference count in MEM_GOING_OFFLINE and incrementing the @@ -1028,8 +1027,7 @@ PAGE_TYPE_OPS(Buddy, buddy, buddy) * pages (now with a reference count of zero) are treated like free pages, * allowing the containing memory block to get offlined. A driver that * relies on this feature is aware that re-onlining the memory block will - * require to re-set the pages PageOffline() and not giving them to the - * buddy via online_page_callback_t. + * require not giving them to the buddy via generic_online_page(). * * There are drivers that mark a page PageOffline() and expect there won't be * any further access to page content. PFN walkers that read content of random diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 27e3be75edcf7..0254059efcbe1 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -734,7 +734,7 @@ static inline void section_taint_zone_device(unsigned long pfn) /* * Associate the pfn range with the given zone, initializing the memmaps * and resizing the pgdat/zone data to span the added pages. After this - * call, all affected pages are PG_reserved. + * call, all affected pages are PageOffline(). * * All aligned pageblocks are initialized to the specified migratetype * (usually MIGRATE_MOVABLE). Besides setting the migratetype, no related @@ -1100,8 +1100,12 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE); - for (i = 0; i < nr_pages; i++) - SetPageVmemmapSelfHosted(pfn_to_page(pfn + i)); + for (i = 0; i < nr_pages; i++) { + struct page *page = pfn_to_page(pfn + i); + + __ClearPageOffline(page); + SetPageVmemmapSelfHosted(page); + } /* * It might be that the vmemmap_pages fully span sections. If that is @@ -1959,9 +1963,9 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, * Don't allow to offline memory blocks that contain holes. * Consequently, memory blocks with holes can never get onlined * via the hotplug path - online_pages() - as hotplugged memory has - * no holes. This way, we e.g., don't have to worry about marking - * memory holes PG_reserved, don't need pfn_valid() checks, and can - * avoid using walk_system_ram_range() later. + * no holes. This way, we don't have to worry about memory holes, + * don't need pfn_valid() checks, and can avoid using + * walk_system_ram_range() later. */ walk_system_ram_range(start_pfn, nr_pages, &system_ram_pages, count_system_ram_pages_cb); diff --git a/mm/mm_init.c b/mm/mm_init.c index feb5b6e8c8875..c066c1c474837 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -892,8 +892,14 @@ void __meminit memmap_init_range(unsigned long size, int nid, unsigned long zone page = pfn_to_page(pfn); __init_single_page(page, pfn, zone, nid); - if (context == MEMINIT_HOTPLUG) - __SetPageReserved(page); + if (context == MEMINIT_HOTPLUG) { +#ifdef CONFIG_ZONE_DEVICE + if (zone == ZONE_DEVICE) + __SetPageReserved(page); + else +#endif + __SetPageOffline(page); + } /* * Usually, we want to mark the pageblock MIGRATE_MOVABLE, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e0c8a8354be36..039bc52cc9091 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1225,18 +1225,23 @@ void __free_pages_core(struct page *page, unsigned int order, * When initializing the memmap, __init_single_page() sets the refcount * of all pages to 1 ("allocated"/"not free"). We have to set the * refcount of all involved pages to 0. + * + * Note that hotplugged memory pages are initialized to PageOffline(). + * Pages freed from memblock might be marked as reserved. */ - prefetchw(p); - for (loop = 0; loop < (nr_pages - 1); loop++, p++) { - prefetchw(p + 1); - __ClearPageReserved(p); - set_page_count(p, 0); - } - __ClearPageReserved(p); - set_page_count(p, 0); - if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) && unlikely(context == MEMINIT_HOTPLUG)) { + prefetchw(p); + for (loop = 0; loop < (nr_pages - 1); loop++, p++) { + prefetchw(p + 1); + VM_WARN_ON_ONCE(PageReserved(p)); + __ClearPageOffline(p); + set_page_count(p, 0); + } + VM_WARN_ON_ONCE(PageReserved(p)); + __ClearPageOffline(p); + set_page_count(p, 0); + /* * Freeing the page with debug_pagealloc enabled will try to * unmap it; some archs don't like double-unmappings, so @@ -1245,6 +1250,15 @@ void __free_pages_core(struct page *page, unsigned int order, debug_pagealloc_map_pages(page, nr_pages); adjust_managed_page_count(page, nr_pages); } else { + prefetchw(p); + for (loop = 0; loop < (nr_pages - 1); loop++, p++) { + prefetchw(p + 1); + __ClearPageReserved(p); + set_page_count(p, 0); + } + __ClearPageReserved(p); + set_page_count(p, 0); + /* memblock adjusts totalram_pages() ahead of time. */ atomic_long_add(nr_pages, &page_zone(page)->managed_pages); }