From patchwork Tue May 21 12:57:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13669414 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8214C25B75 for ; Tue, 21 May 2024 12:57:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4D9B96B008A; Tue, 21 May 2024 08:57:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 488FA6B0092; Tue, 21 May 2024 08:57:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 350FF6B0093; Tue, 21 May 2024 08:57:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 125236B008A for ; Tue, 21 May 2024 08:57:36 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8325D1211FB for ; Tue, 21 May 2024 12:57:35 +0000 (UTC) X-FDA: 82142404470.18.D14507E Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf03.hostedemail.com (Postfix) with ESMTP id C3B6220002 for ; Tue, 21 May 2024 12:57:31 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=cFWDpUmo; spf=pass (imf03.hostedemail.com: domain of 3OppMZggKCNkE57FH5I6BJJBG9.7JHGDIPS-HHFQ57F.JMB@flex--jackmanb.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3OppMZggKCNkE57FH5I6BJJBG9.7JHGDIPS-HHFQ57F.JMB@flex--jackmanb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716296251; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tRM9o0p6I8YJ1VI9X9CkGkomZF/jTW7vEZ+k3El+rfw=; b=OaHkZZTNNCy4kTPIKgpSrz2VGOnQ4eXOGRpifkPfa00WPLj7K5vh56sKVLpOBhMqBCbpYV 6Um7eBtppxEUGBdem8cZm7KCkjReOg6Jkl72NKn4NH04OhWKK9zH5n3CXcSPaG9HxAQLaz vhOtnpVrQNbriC6M9gLNase6K/XOEUk= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=cFWDpUmo; spf=pass (imf03.hostedemail.com: domain of 3OppMZggKCNkE57FH5I6BJJBG9.7JHGDIPS-HHFQ57F.JMB@flex--jackmanb.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3OppMZggKCNkE57FH5I6BJJBG9.7JHGDIPS-HHFQ57F.JMB@flex--jackmanb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716296251; a=rsa-sha256; cv=none; b=lYjD2MO159d+6U9fEaDeIFlDY8o1oxsVwXXzPwI/MBFMDs4uHwQvsj8YzjBw/nvW/KKKY+ YU9hOwdNohK8JCNwxcdyBSLA46sOAl8SbI59JMWQ79nh01Gu2/wzH/pzQhsfgFLJBq11OI 0IvNKj3NJw25ia+AddQEil50sK3+2/g= Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-de60321ce6cso23142089276.1 for ; Tue, 21 May 2024 05:57:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1716296251; x=1716901051; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=tRM9o0p6I8YJ1VI9X9CkGkomZF/jTW7vEZ+k3El+rfw=; b=cFWDpUmox/04aijz+08CiLZG9LlHHR6Cmvecte2kxTeH7KCzROFylWmG4aUGgRfNdG vpkL8RbKODc2r1/zUidQwgOvtm7dr8SEl2R+SkVR++EPgln7rxXJ0KTdq5lEJsBG0Hzp /89BvVNUQMaY52tQj19TDQp1grX9QvS1RBRPQW5EE+4hAH5O1llQCJZGOyHQnU0CtyEy RO6wpb1It5R4EyLM3ZOEr2WRtoV1Sd+ValduG7zOqAs/wMtXrFsi7hfiLP/Yc0t0QaQr mu/5d6D2uqhJHKWnrXdqnWVGGpcHOhYJIzgkspFGFyACthKAqG4YhbnvNBYoE3U2XB4G fp+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716296251; x=1716901051; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tRM9o0p6I8YJ1VI9X9CkGkomZF/jTW7vEZ+k3El+rfw=; b=xID2AXwjHPrYBE8XvN6QxuAXomwMpS1Aei8C+V3nwXmcmUuOInRzC1EfGtBeC7gokb TuRGcehoTq5qkvITaKkNJSKjeWJkWsfwyH8F01XfBkxXjN5aJw8xYOB87nC6gmlhigFH aPB8C6UX2nGupWbxD2aazCQPThudDJOABzzef1eRUTzJ9TxPTCxyXi/90D3MpUpt59bd TmqjWeEV9FgTzb/c/34gjhCyj0Dwp8AnKVAdUKJ9B6tBr8kXVoWJbC0RjR8AYmM9Ettu LzlQo08YU4WcQm0rn10wRqyZB+jIDCtFiNPP2LCyZ112BBCWSwtQ+uHS9IqVLInhLol3 xv6Q== X-Forwarded-Encrypted: i=1; AJvYcCXeLGmTKkYgCEM0LMy/xChoETTpKa3O3ppCCHRnQi+uHNBWMX6yc4bCNTRuqNCdqJXdv+DPmzAkeU8xXamOgxJVqyc= X-Gm-Message-State: AOJu0YwKO/yRsx9eVOYzRCA5nJjYXpOnxgvvj5nhEor5gFlTAYc+UBmW Rs2tmdsAxr011m4TQxUSrQEEHUvcXeODl552/J6CL0cJwJph0dda6EIfNNKVx5CooXGmTiya2wy rQJkmhxYRsQ== X-Google-Smtp-Source: AGHT+IHUQuVOIfwVncwQOFUHVovBLz5PWrCe4Dx+2LOZnSScbdDQHrTlBjLZBbELeDi9NMGgMHKZKXynUK6x6Q== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:6902:120f:b0:dcb:e982:4e40 with SMTP id 3f1490d57ef6-dee4f38b7cbmr8662141276.12.1716296250742; Tue, 21 May 2024 05:57:30 -0700 (PDT) Date: Tue, 21 May 2024 12:57:18 +0000 In-Reply-To: <20240521-mm-hotplug-sync-v1-0-6d53706c1ba8@google.com> Mime-Version: 1.0 References: <20240521-mm-hotplug-sync-v1-0-6d53706c1ba8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240521-mm-hotplug-sync-v1-1-6d53706c1ba8@google.com> Subject: [PATCH 1/2] mm,memory_hotplug: Remove un-taken lock From: Brendan Jackman To: David Hildenbrand , Oscar Salvador , Andrew Morton , Mike Rapoport Cc: Michal Hocko , Anshuman Khandual , Vlastimil Babka , Pavel Tatashin , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Brendan Jackman X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: C3B6220002 X-Stat-Signature: eww3ge8cuktg4spti56e5eizsdfnb5w5 X-HE-Tag: 1716296251-622867 X-HE-Meta: U2FsdGVkX18Gjhee2xEtyKOMq5WC7dasrSdvAoFaSgDMeoqCkHQLbWIwrzz50KXNSJlKapO369+b2Ys9aB4FKrnNRRFaFLQU/ielDqLbzPP13KBlpqKi4wkF144Y0MFYR4BmXndXrCQMuIAvDX1T7GbI7tjdnz7imJYueqSQpK69i7AZw0ksHIxyhdtwr9c6nPor18qu8M34vqhSn1BdJN+sqrqwckFW59JFgG/iN5f0p6W5aoNXFR3d8LKEdgAu4tJ151hblRlA5berFGDdjOKuDw6yfq/ZZQxUua07AFoudEsImDxt45LhLMBN19vJqQAth4G7a11Ehmtn3soyBMVn64bfdymBqzulBF+Pd6O56VZO9mdB5E1+Sz29V5RG9wADrGVgTbIv0M4TvICx5gKXFM7pwUDFHe9cwTPmDZp7NfuiDZ78q2/737EnP8py446d2XQB2EqVUhMu/Rkw8bo9XaLnkD/GgmkyXVLUHf7JbgHc17Qd3gbCD5yXj/ijnsEyYby3GEHKBG1i6y8bPAbejuD6zdLt0peAXAx2ANG6/wWAFTe/lfqZf3lCGcypog/K5y5SexhKN4Pl9WttMy0qOqAHxkAz+6RoGw0Ug3+26snAQUL/zco+/D44DdxkbqEclJIISLGfcthcOHLgM+UEJCAg2mwsV+Mj75WfdMktW8jZP6EtIDB7t6ZcsJ5zg64oCKs2t589PHi1oizImHGnCOpG98qUyV8ofV9bqfpXGzZCpgvs39MmmWZPtc3YuTaxzmk2GMPKVmDiZZXwIuDP4ghFFlzSPmhiWC234yqrpgZOxgDtJDqBIRYvGu/uHtVJkVtQWC6ynu3yQAJL3xPuTCNZ33982G8+iRRWMQKzfAFYIQ0gQ6jwO8Uftg1tBgj09kFBc72Pi1yNgJoaZWY253BLE89gwd8IwgefmIKyIz0tkev0+KvBIR5Pm/J2RzgyP5733etwUITYKA+ /khSINaj wWZEhcayMrsqTiARYyy7uSeqQRIE45H0OatJg/atcktQpaKzBhvnn3e0i6QU28P2t+6lTkVPnaLy+2+MfyiKHaU9kyeM1pxklposR/EkO2AE7fkeltJHzSPwO4b2mGkbRsjnB3QMFkPvxqa/OicHBgk8oUGJHiiuhkKsNLwtDTD7msvXjinQ68/AhOQrHPGbVaKjshjjgNZ7Lm5IR24Y4zNXRsHyOL0/PLfyrkokKnAWKamAmtw3xenVr60oA7/7Btc1aEs3y6o1A1YShSoXDud7j9ubW72DKKV/ItPOnP25Qp+Ly3T9h2UkLXmgvKPdm8m6rKunhXESd5/09y5mIQs9K8uOnoGTfqag4VV0OwA2fr2MK1PL/Esr9SsAkYThrWk+UrigVc3OYqO8IPINvfW29ePb7tPUP/eh4Assbua3GW8LFDpK83uJflYcyynKX9/GGVRfGkBNdQG5JqhXC/J9ezTzDyJpBTlEZJim1yJOyKE7RpLgbivBL6ayk34qIESzqIWBKTG0k/PvgoWcn0cuVMPh5ouh16N31n9Uzcc92+XillZBxUO8ZxvoF94EVpCaCCxUTOgt3OjgaXu1MMe9nNjUvGXlbAtAqNoUbMBrEBHO16/zgTIL8qIXyXpqCoLBGqQYq7hjOmAtRHrJIKVM9/RH4N+A0FpWoV77/amzTsI9OtLMDciLADzeErFINrKpk0Kd2s/CzBXwuROHrUrdcOs49QMwxW/lTebRJ7/yWuHEmtRw4B8EKlC7mDtX8JBPswpnTgKpz/ZgCtaqT4i5XTP/IEayKxjBuZ7I4LJjWYJL8zKg72WUJc2WzkYFhFfUeaIpUY4t7KnTmpVYdZGwXJKDpomcxxcjJKHnF11UseHXSxTZ09IJrV5dim6xUqNkCYrQFNoNokeJXdfPElXewief20QNDfSh6ehWRjbsnOutsVDzsybiNVA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: It seems that [1] was acked, and the a v2 was written[2] which improved upon it, but got bogged down in discussion of other topics, so the improvements were not included. Then [1] got merged as commit 27cacaad16c5 ("mm,memory_hotplug: drop unneeded locking") and we ended up with locks that get taken for read but never for write. So, let's remove the read locking. Compared to Oscar's original v2[2], I have added a READ_ONCE in page_outside_zone_boundaries; this is a substitute for the compiler barrier that was implied by read_seqretry(). I believe this is necessary to insure against UB, although the value being read here is only used for a printk so the stakes seem very low (and this is all debug code anyway). I believe a compiler barrier is also needed in zone_spans_pfn, but I'll address that in a separate patch. That read_seqretry() also impleied a CPU-level memory barrier, which I don't think needs replacing: page_outside_zone_boundaries() is used in the alloc and free paths, but you can't allocate or free pages from the span that is in the middle of being added/removed by hotplug. In other words, page_outside_zone_boundaries() doesn't require a strictly up-to-date view of spanned_pages, but I think it does require a value that was once/will eventually be correct, hence READ_ONCE. [1] https://lore.kernel.org/all/20210531093958.15021-1-osalvador@suse.de/T/#u [2] https://lore.kernel.org/linux-mm/20210602091457.17772-3-osalvador@suse.de/#t Cc: David Hildenbrand Cc: Michal Hocko Cc: Anshuman Khandual Cc: Vlastimil Babka Cc: Pavel Tatashin Co-developed-by: Oscar Salvador Signed-off-by: Oscar Salvador Signed-off-by: Brendan Jackman --- include/linux/memory_hotplug.h | 35 ----------------------------------- include/linux/mmzone.h | 23 +++++------------------ mm/mm_init.c | 1 - mm/page_alloc.c | 10 +++------- 4 files changed, 8 insertions(+), 61 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 7a9ff464608d..f9577e67e5ee 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -141,31 +141,7 @@ bool mhp_supports_memmap_on_memory(void); /* * Zone resizing functions - * - * Note: any attempt to resize a zone should has pgdat_resize_lock() - * zone_span_writelock() both held. This ensure the size of a zone - * can't be changed while pgdat_resize_lock() held. */ -static inline unsigned zone_span_seqbegin(struct zone *zone) -{ - return read_seqbegin(&zone->span_seqlock); -} -static inline int zone_span_seqretry(struct zone *zone, unsigned iv) -{ - return read_seqretry(&zone->span_seqlock, iv); -} -static inline void zone_span_writelock(struct zone *zone) -{ - write_seqlock(&zone->span_seqlock); -} -static inline void zone_span_writeunlock(struct zone *zone) -{ - write_sequnlock(&zone->span_seqlock); -} -static inline void zone_seqlock_init(struct zone *zone) -{ - seqlock_init(&zone->span_seqlock); -} extern void adjust_present_page_count(struct page *page, struct memory_group *group, long nr_pages); @@ -251,17 +227,6 @@ static inline void pgdat_kswapd_lock_init(pg_data_t *pgdat) ___page; \ }) -static inline unsigned zone_span_seqbegin(struct zone *zone) -{ - return 0; -} -static inline int zone_span_seqretry(struct zone *zone, unsigned iv) -{ - return 0; -} -static inline void zone_span_writelock(struct zone *zone) {} -static inline void zone_span_writeunlock(struct zone *zone) {} -static inline void zone_seqlock_init(struct zone *zone) {} static inline int try_online_node(int nid) { diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 8f9c9590a42c..194ef7fed9d6 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -14,7 +14,6 @@ #include #include #include -#include #include #include #include @@ -896,18 +895,11 @@ struct zone { * * Locking rules: * - * zone_start_pfn and spanned_pages are protected by span_seqlock. - * It is a seqlock because it has to be read outside of zone->lock, - * and it is done in the main allocator path. But, it is written - * quite infrequently. - * - * The span_seq lock is declared along with zone->lock because it is - * frequently read in proximity to zone->lock. It's good to - * give them a chance of being in the same cacheline. - * - * Write access to present_pages at runtime should be protected by - * mem_hotplug_begin/done(). Any reader who can't tolerant drift of - * present_pages should use get_online_mems() to get a stable value. + * Besides system initialization functions, memory-hotplug is the only + * user that can change zone's {spanned,present} pages at runtime, and + * it does so by holding the mem_hotplug_lock lock. Any readers who + * can't tolerate drift values should use {get,put}_online_mems to get + * a stable value. */ atomic_long_t managed_pages; unsigned long spanned_pages; @@ -930,11 +922,6 @@ struct zone { unsigned long nr_isolate_pageblock; #endif -#ifdef CONFIG_MEMORY_HOTPLUG - /* see spanned/present_pages for more description */ - seqlock_t span_seqlock; -#endif - int initialized; /* Write-intensive fields used from the page allocator */ diff --git a/mm/mm_init.c b/mm/mm_init.c index f72b852bd5b8..c725618aeb58 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1383,7 +1383,6 @@ static void __meminit zone_init_internals(struct zone *zone, enum zone_type idx, zone->name = zone_names[idx]; zone->zone_pgdat = NODE_DATA(nid); spin_lock_init(&zone->lock); - zone_seqlock_init(zone); zone_pcp_init(zone); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2e22ce5675ca..5116a2b9ea6e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -426,16 +426,12 @@ void set_pageblock_migratetype(struct page *page, int migratetype) static int page_outside_zone_boundaries(struct zone *zone, struct page *page) { int ret; - unsigned seq; unsigned long pfn = page_to_pfn(page); unsigned long sp, start_pfn; - do { - seq = zone_span_seqbegin(zone); - start_pfn = zone->zone_start_pfn; - sp = zone->spanned_pages; - ret = !zone_spans_pfn(zone, pfn); - } while (zone_span_seqretry(zone, seq)); + start_pfn = zone->zone_start_pfn; + sp = READ_ONCE(zone->spanned_pages); + ret = !zone_spans_pfn(zone, pfn); if (ret) pr_err("page 0x%lx outside node %d zone %s [ 0x%lx - 0x%lx ]\n", From patchwork Tue May 21 12:57:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13669415 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6AABC25B74 for ; Tue, 21 May 2024 12:57:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 639A06B0092; Tue, 21 May 2024 08:57:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E8A66B0093; Tue, 21 May 2024 08:57:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B0C16B0095; Tue, 21 May 2024 08:57:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 2159F6B0092 for ; Tue, 21 May 2024 08:57:37 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B8AA2161217 for ; Tue, 21 May 2024 12:57:36 +0000 (UTC) X-FDA: 82142404512.06.F5EFA4F Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) by imf25.hostedemail.com (Postfix) with ESMTP id DB210A0004 for ; Tue, 21 May 2024 12:57:34 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Jc5BkNMp; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of 3PZpMZggKCNwH8AIK8L9EMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--jackmanb.bounces.google.com designates 209.85.128.74 as permitted sender) smtp.mailfrom=3PZpMZggKCNwH8AIK8L9EMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--jackmanb.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716296255; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Bo9M5YTqR8tTEb+CFvSEPP+Md2sw1AJdmaK3l6GjkIM=; b=xKy0RmfZZmA3w73xC4EeLK0aEEXNjck02COd/JL0GTY6x8coyJJF20Y1wKULKXmrkZHNhI 1gwxKgcUCcVfU2tGjhMTwCcWNlIR9Vp4mcRuKYM0JO0eaj4JLZVne8xdyp7TtVBQNIzfvB XlEUXyFi49azpj60y8y8IpufKsL4vgg= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Jc5BkNMp; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of 3PZpMZggKCNwH8AIK8L9EMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--jackmanb.bounces.google.com designates 209.85.128.74 as permitted sender) smtp.mailfrom=3PZpMZggKCNwH8AIK8L9EMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--jackmanb.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716296255; a=rsa-sha256; cv=none; b=CEYpqJonnRRzKkb76osnPhnf/sVO3WX+Dip8EFckK24KsclFQlbtGTz32VtTz2WnPBtBrz lMIs6VaZ4cm6szukwF6G+haPxCPOjgL5m+8NzrWBvIxQ+h4LQNLv9UgPRZIhdFfXVOz8kR ZTJ8CJ1k5I5UmNk9Sm3Ghi2oRzEe3Oo= Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-42003993265so48439685e9.1 for ; Tue, 21 May 2024 05:57:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1716296253; x=1716901053; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Bo9M5YTqR8tTEb+CFvSEPP+Md2sw1AJdmaK3l6GjkIM=; b=Jc5BkNMp6Et633dzO01SmLSS9K2rzfy6br03Tiju7VcznhZzBpEeBJuOA1Df/pxtOn mM4ypNYXy0kJx5w0+4sEEmH33/UXh+duNQO+7YSwnXLEfwxA/EejIJye31xwJLofJN6f X0BnBRQou8QDQzFqarNUXSoHJaGhQBvuVlEW/51ZU5/wiHtz0QF/stfCShPvxpFJjg7v e9Nphad+iVfZAT0MYKTEF/GvssMXRmah4YNkDtCA+8jPgWpxAVKIYgPNft2zcQ6MBab2 GhFMgGKLdKSS8YSKjybjQHO8nVYukmkkcHO03PNhICvNHDT59m3NUNe0+O/A9+ZZi9tL tmUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716296253; x=1716901053; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Bo9M5YTqR8tTEb+CFvSEPP+Md2sw1AJdmaK3l6GjkIM=; b=SlAXz37J27M2VB9GTsuYYMlLCHQDGi2awYGd7qvC9sff88OnYPU4aidF2CBb5Qcs8h BBv4J9FQBfeSfg2Ny4zdMWTfaM3p4+u3nWyQoSocwtjr39r1yjY250vdthTfo8GQrUWc cx3Xo42YhLkYX6Pu1F96s3tShy2CD5TOUCNSbe6DysHv87p9oUSi4sDYQNukalNhsNqG uVu03lGnYYp/RpM4n53dBIkIubKn4cmCo5fz2394aUeApcpr5fieNqJ3gKotO2cZTPJv SooIFaP4pMfNiu6POEUrawc9VN0tRut7IzZw8eWqIG0ztSC/FGLx3x7DkVruPOuv5H1U FtQQ== X-Forwarded-Encrypted: i=1; AJvYcCUo9CuJQRgxa8OtDzn8zQmmAfpQUYbvvdYeyW/loo+G3YiJlhUVqFBxEya2cj4jxscPpUItu6mzN6Cq4PGQRLSBGag= X-Gm-Message-State: AOJu0Yzy7rkTJKVIh2zNtgzgN4ImUDvZamox7MYTAjyfwB564CmwlckX 78nQ6Iva983SVwVrgm2Ekoj1Eeb+kIKbeqGurHkzE2s8W3u4VeDgUZDzG4jemBDUSmXtnqppDho 7h5CVnRU96g== X-Google-Smtp-Source: AGHT+IEqtB7J+eET8jECP6dJaE/vZ4au8St4Y37qZzLM3Wd/fTYl0KhtmsX38BXBJE3VkkIRwseWYmA0Pv3/Xg== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:600c:19c8:b0:418:ee30:3f9e with SMTP id 5b1f17b1804b1-41fea92770dmr2110535e9.2.1716296253380; Tue, 21 May 2024 05:57:33 -0700 (PDT) Date: Tue, 21 May 2024 12:57:19 +0000 In-Reply-To: <20240521-mm-hotplug-sync-v1-0-6d53706c1ba8@google.com> Mime-Version: 1.0 References: <20240521-mm-hotplug-sync-v1-0-6d53706c1ba8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240521-mm-hotplug-sync-v1-2-6d53706c1ba8@google.com> Subject: [PATCH 2/2] mm,memory_hotplug: {READ,WRITE}_ONCE unsynchronized zone data From: Brendan Jackman To: David Hildenbrand , Oscar Salvador , Andrew Morton , Mike Rapoport Cc: Michal Hocko , Anshuman Khandual , Vlastimil Babka , Pavel Tatashin , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Brendan Jackman X-Rspamd-Queue-Id: DB210A0004 X-Stat-Signature: 8m7b7rs78e11hmst8es4doieggs9et5z X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1716296254-426571 X-HE-Meta: U2FsdGVkX1+qbFp7B/hGUeM+wFbjbf1IWsLlk28Efk3P/pRh6R2WrcHuclsJ0VoM+oaYx3jy7dzA/EsnaJFlPWu16N6iEo1Z2/HE60ZCVWZK15M3h137zddrVGqzoP6Pip/+mi3/x3aTiqqlPKQbivsVD2L70dNWZ4/s+fnowODiMY2nkyoLgDh0OhwqwHy6qd7TGgx9nioq/YWnGDBgFxP1X23qHi6Mi32+ycJcDuCmFHi9O25gFVHF53+5pyiY8wB8Rkr7LCFPbTMTvDOTVK8QzIFR5i/B4O9NIYY5hX5Oz+2fVEAyQIVBaRHmaUWP491Icl1GS02F++zFL337k9VUo/HkrrmDGft7jsyAccEzLNjhTRxoNbDtaDMBojDVSK8wfUaB2P/3WN/ZPmcABPqtdu7IrWHiALLTiuf2+ybXGuxT12nNSOplT10ZKaRGGtt1hWcomyR+U0c21aNbB/8ktjZncyArLEsGMc4mvSWnM4EEVou/b7fmJNnTzZNrfqy7NeqGvVfPBP+O2IK1IiIdqAWYuSR6vVLUQYjdaeV7l/Egmtf9RtfMp9SWHyvakuSKq2lRV98z/kYKE0jWr4ah+5qVAHMYbJm3levqcMblE2QmxakfgTl+faWg502/bLmuZunTQa9g4peAis9Wsyqpii+A3Kpok56KLqlPFk5+PV0gdgtTkId9PUwf3VbVQKT+sv3aM0HhgyOjKijaOME8l9tf2bVQpdwhn1xcfO40ok+b7zimaQDY4AInzQQ67HYKrZJ2+zraMGdF86R3dDUrGyxKCeH54EmjgPNH1k2jXZNNidPiRCz4iNcozWuHOMYnPXINHHtGM1ahco2H6d0CRkYJJO5wPx4Z7VxDBv1SEuduOjM9r36cANsf7PRj+Ebtr6Md2x2a/7j4Ub1hDeYeyGgf72LwR7AruHoheep/XD0c6ROOSJUpdCvKFndb0L2tDx421nz3zZRzpdI BmqthpyA y1kWJ2WSDxLKRrlDvTBuOo/CkcbbonJ2cZYHpaSfdkcl+xGteHlvwgyLzshvSSaLxTh5bzYDinYLSk/qQqpmXusCcLekjXNf5/BI46tg8J1//SWfM1FijfJ8PZoVnDSr66dGv1p9yMFxEp2u1r6yi81dG67ej+9mh6IhmY2cpdC8zj0POvlyKKmkYfiEkkgfR6poZrnnT1SdU7qnxUAbh12Pp3NmGdchVE0+4ltZvntqjL8myzBSa0TUKkConTkw7GoKozrtbr8NMpjJn2nDHhe9hfQRf8gfUwAKLpRNTRsDyeO7XTi51V3Y8WUdE05MbJvgul+fqHgc2RQRiQ05Drb93MzsD+vrIMNQ+xFuHpvtoK8sIRkfSjX988A+ZF6hYD7gMtaSw+HQBHwOfaheOxrRBQU0UBHktOx4aaKA4NxGQK8S5LfX89+i8xP30VM/fy0n/XzuofMt0LHyEjPJHiMTviR1JMJH+IKwkWJU9gUSSxIR6NMK4MUuZUbrkDJrQn98Dpykq2b+oz80qvFqubU510lb/l+omNgHxIwts6dav6asYpdOBL4VrQJVWEJWb34SPHr1c6NL8Gd1C3NxbHmeqPX0tfOB6SDKKK6c0+rdPdhO1Ky5nnpllgQQnrIn4JlYH3PU+o3ajZTwbGWHqjZk/75APL8qPK6AI4giY1/ZOvlrQiC4dGHRGa2lqNxW4j6WW327jIhU9LiAv0oCdGFcXw4bGMnDAzWXwO+TIasZILeGVObAu5ccYS590DRX3TUj+1JDWccJPjqVhDhH5XKrBCSjsxS5D8/cAY6rJsM86J1jwhme8e6frDg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: These fields are written by memory hotplug under mem_hotplug_lock but read without any lock. It seems like reader code is robust against the value being stale or "from the future", but we also need to account for: 1. Load/store tearing (according to Linus[1], this really happens, even when everything is aligned as you would hope). 2. Invented loads[2] - the compiler can spill and re-read these fields ([2] calls this "invented loads") and assume that they have not changed. Note we don't need READ_ONCE in paths that have the mem_hotplug_lock for write, but we still need WRITE_ONCE to prevent store-tearing. [1] https://lore.kernel.org/all/CAHk-=wj2t+GK+DGQ7Xy6U7zMf72e7Jkxn4_-kGyfH3WFEoH+YQ@mail.gmail.com/T/#u As discovered via the original big-bad article[2] [2] https://lwn.net/Articles/793253/ Signed-off-by: Brendan Jackman --- include/linux/mmzone.h | 14 ++++++++++---- mm/compaction.c | 2 +- mm/memory_hotplug.c | 20 ++++++++++++-------- mm/mm_init.c | 2 +- mm/page_alloc.c | 2 +- mm/show_mem.c | 8 ++++---- mm/vmstat.c | 4 ++-- 7 files changed, 31 insertions(+), 21 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 194ef7fed9d6..bdb3be76d10c 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1018,11 +1018,13 @@ static inline unsigned long zone_cma_pages(struct zone *zone) #endif } +/* This is unstable unless you hold mem_hotplug_lock. */ static inline unsigned long zone_end_pfn(const struct zone *zone) { - return zone->zone_start_pfn + zone->spanned_pages; + return zone->zone_start_pfn + READ_ONCE(zone->spanned_pages); } +/* This is unstable unless you hold mem_hotplug_lock. */ static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn) { return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone); @@ -1033,9 +1035,10 @@ static inline bool zone_is_initialized(struct zone *zone) return zone->initialized; } +/* This is unstable unless you hold mem_hotplug_lock. */ static inline bool zone_is_empty(struct zone *zone) { - return zone->spanned_pages == 0; + return READ_ONCE(zone->spanned_pages) == 0; } #ifndef BUILD_VDSO32_64 @@ -1485,10 +1488,13 @@ static inline bool managed_zone(struct zone *zone) return zone_managed_pages(zone); } -/* Returns true if a zone has memory */ +/* + * Returns true if a zone has memory. + * This is unstable unless you old mem_hotplug_lock. + */ static inline bool populated_zone(struct zone *zone) { - return zone->present_pages; + return READ_ONCE(zone->present_pages); } #ifdef CONFIG_NUMA diff --git a/mm/compaction.c b/mm/compaction.c index e731d45befc7..b8066d1fdcf5 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2239,7 +2239,7 @@ static unsigned int fragmentation_score_zone_weighted(struct zone *zone) { unsigned long score; - score = zone->present_pages * fragmentation_score_zone(zone); + score = READ_ONCE(zone->present_pages) * fragmentation_score_zone(zone); return div64_ul(score, zone->zone_pgdat->node_present_pages + 1); } diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 431b1f6753c0..71b5e3d314a2 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -463,6 +463,8 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn, int nid = zone_to_nid(zone); if (zone->zone_start_pfn == start_pfn) { + unsigned long old_end_pfn = zone_end_pfn(zone); + /* * If the section is smallest section in the zone, it need * shrink zone->zone_start_pfn and zone->zone_spanned_pages. @@ -470,13 +472,13 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn, * for shrinking zone. */ pfn = find_smallest_section_pfn(nid, zone, end_pfn, - zone_end_pfn(zone)); + old_end_pfn); if (pfn) { - zone->spanned_pages = zone_end_pfn(zone) - pfn; + WRITE_ONCE(zone->spanned_pages, old_end_pfn - pfn); zone->zone_start_pfn = pfn; } else { zone->zone_start_pfn = 0; - zone->spanned_pages = 0; + WRITE_ONCE(zone->spanned_pages, 0); } } else if (zone_end_pfn(zone) == end_pfn) { /* @@ -488,10 +490,11 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn, pfn = find_biggest_section_pfn(nid, zone, zone->zone_start_pfn, start_pfn); if (pfn) - zone->spanned_pages = pfn - zone->zone_start_pfn + 1; + WRITE_ONCE(zone->spanned_pages, + pfn - zone->zone_start_pfn + 1); else { zone->zone_start_pfn = 0; - zone->spanned_pages = 0; + WRITE_ONCE(zone->spanned_pages, 0); } } } @@ -710,7 +713,8 @@ static void __meminit resize_zone_range(struct zone *zone, unsigned long start_p if (zone_is_empty(zone) || start_pfn < zone->zone_start_pfn) zone->zone_start_pfn = start_pfn; - zone->spanned_pages = max(start_pfn + nr_pages, old_end_pfn) - zone->zone_start_pfn; + WRITE_ONCE(zone->spanned_pages, + max(start_pfn + nr_pages, old_end_pfn) - zone->zone_start_pfn); } static void __meminit resize_pgdat_range(struct pglist_data *pgdat, unsigned long start_pfn, @@ -795,7 +799,7 @@ static void auto_movable_stats_account_zone(struct auto_movable_stats *stats, struct zone *zone) { if (zone_idx(zone) == ZONE_MOVABLE) { - stats->movable_pages += zone->present_pages; + stats->movable_pages += READ_ONCE(zone->present_pages); } else { stats->kernel_early_pages += zone->present_early_pages; #ifdef CONFIG_CMA @@ -1077,7 +1081,7 @@ void adjust_present_page_count(struct page *page, struct memory_group *group, */ if (early_section(__pfn_to_section(page_to_pfn(page)))) zone->present_early_pages += nr_pages; - zone->present_pages += nr_pages; + WRITE_ONCE(zone->present_pages, zone->present_pages + nr_pages); zone->zone_pgdat->node_present_pages += nr_pages; if (group && movable) diff --git a/mm/mm_init.c b/mm/mm_init.c index c725618aeb58..ec66f2eadb95 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1540,7 +1540,7 @@ void __ref free_area_init_core_hotplug(struct pglist_data *pgdat) for (z = 0; z < MAX_NR_ZONES; z++) { struct zone *zone = pgdat->node_zones + z; - zone->present_pages = 0; + WRITE_ONCE(zone->present_pages, 0); zone_init_internals(zone, z, nid, 0); } } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5116a2b9ea6e..1eb9000ec7d7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5728,7 +5728,7 @@ __meminit void zone_pcp_init(struct zone *zone) if (populated_zone(zone)) pr_debug(" %s zone: %lu pages, LIFO batch:%u\n", zone->name, - zone->present_pages, zone_batchsize(zone)); + READ_ONCE(zone->present_pages), zone_batchsize(zone)); } void adjust_managed_page_count(struct page *page, long count) diff --git a/mm/show_mem.c b/mm/show_mem.c index bdb439551eef..667680a6107b 100644 --- a/mm/show_mem.c +++ b/mm/show_mem.c @@ -337,7 +337,7 @@ static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_z K(zone_page_state(zone, NR_ZONE_INACTIVE_FILE)), K(zone_page_state(zone, NR_ZONE_UNEVICTABLE)), K(zone_page_state(zone, NR_ZONE_WRITE_PENDING)), - K(zone->present_pages), + K(READ_ONCE(zone->present_pages)), K(zone_managed_pages(zone)), K(zone_page_state(zone, NR_MLOCK)), K(zone_page_state(zone, NR_BOUNCE)), @@ -407,11 +407,11 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) for_each_populated_zone(zone) { - total += zone->present_pages; - reserved += zone->present_pages - zone_managed_pages(zone); + total += READ_ONCE(zone->present_pages); + reserved += READ_ONCE(zone->present_pages) - zone_managed_pages(zone); if (is_highmem(zone)) - highmem += zone->present_pages; + highmem += READ_ONCE(zone->present_pages); } printk("%lu pages RAM\n", total); diff --git a/mm/vmstat.c b/mm/vmstat.c index 8507c497218b..5a9c4b5768e5 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1708,8 +1708,8 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, min_wmark_pages(zone), low_wmark_pages(zone), high_wmark_pages(zone), - zone->spanned_pages, - zone->present_pages, + READ_ONCE(zone->spanned_pages), + READ_ONCE(zone->present_pages), zone_managed_pages(zone), zone_cma_pages(zone));