From patchwork Tue May 21 12:57:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13669415 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6AABC25B74 for ; Tue, 21 May 2024 12:57:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 639A06B0092; Tue, 21 May 2024 08:57:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E8A66B0093; Tue, 21 May 2024 08:57:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B0C16B0095; Tue, 21 May 2024 08:57:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 2159F6B0092 for ; Tue, 21 May 2024 08:57:37 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B8AA2161217 for ; Tue, 21 May 2024 12:57:36 +0000 (UTC) X-FDA: 82142404512.06.F5EFA4F Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) by imf25.hostedemail.com (Postfix) with ESMTP id DB210A0004 for ; Tue, 21 May 2024 12:57:34 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Jc5BkNMp; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of 3PZpMZggKCNwH8AIK8L9EMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--jackmanb.bounces.google.com designates 209.85.128.74 as permitted sender) smtp.mailfrom=3PZpMZggKCNwH8AIK8L9EMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--jackmanb.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716296255; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Bo9M5YTqR8tTEb+CFvSEPP+Md2sw1AJdmaK3l6GjkIM=; b=xKy0RmfZZmA3w73xC4EeLK0aEEXNjck02COd/JL0GTY6x8coyJJF20Y1wKULKXmrkZHNhI 1gwxKgcUCcVfU2tGjhMTwCcWNlIR9Vp4mcRuKYM0JO0eaj4JLZVne8xdyp7TtVBQNIzfvB XlEUXyFi49azpj60y8y8IpufKsL4vgg= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Jc5BkNMp; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of 3PZpMZggKCNwH8AIK8L9EMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--jackmanb.bounces.google.com designates 209.85.128.74 as permitted sender) smtp.mailfrom=3PZpMZggKCNwH8AIK8L9EMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--jackmanb.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716296255; a=rsa-sha256; cv=none; b=CEYpqJonnRRzKkb76osnPhnf/sVO3WX+Dip8EFckK24KsclFQlbtGTz32VtTz2WnPBtBrz lMIs6VaZ4cm6szukwF6G+haPxCPOjgL5m+8NzrWBvIxQ+h4LQNLv9UgPRZIhdFfXVOz8kR ZTJ8CJ1k5I5UmNk9Sm3Ghi2oRzEe3Oo= Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-42003993265so48439685e9.1 for ; Tue, 21 May 2024 05:57:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1716296253; x=1716901053; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Bo9M5YTqR8tTEb+CFvSEPP+Md2sw1AJdmaK3l6GjkIM=; b=Jc5BkNMp6Et633dzO01SmLSS9K2rzfy6br03Tiju7VcznhZzBpEeBJuOA1Df/pxtOn mM4ypNYXy0kJx5w0+4sEEmH33/UXh+duNQO+7YSwnXLEfwxA/EejIJye31xwJLofJN6f X0BnBRQou8QDQzFqarNUXSoHJaGhQBvuVlEW/51ZU5/wiHtz0QF/stfCShPvxpFJjg7v e9Nphad+iVfZAT0MYKTEF/GvssMXRmah4YNkDtCA+8jPgWpxAVKIYgPNft2zcQ6MBab2 GhFMgGKLdKSS8YSKjybjQHO8nVYukmkkcHO03PNhICvNHDT59m3NUNe0+O/A9+ZZi9tL tmUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716296253; x=1716901053; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Bo9M5YTqR8tTEb+CFvSEPP+Md2sw1AJdmaK3l6GjkIM=; b=SlAXz37J27M2VB9GTsuYYMlLCHQDGi2awYGd7qvC9sff88OnYPU4aidF2CBb5Qcs8h BBv4J9FQBfeSfg2Ny4zdMWTfaM3p4+u3nWyQoSocwtjr39r1yjY250vdthTfo8GQrUWc cx3Xo42YhLkYX6Pu1F96s3tShy2CD5TOUCNSbe6DysHv87p9oUSi4sDYQNukalNhsNqG uVu03lGnYYp/RpM4n53dBIkIubKn4cmCo5fz2394aUeApcpr5fieNqJ3gKotO2cZTPJv SooIFaP4pMfNiu6POEUrawc9VN0tRut7IzZw8eWqIG0ztSC/FGLx3x7DkVruPOuv5H1U FtQQ== X-Forwarded-Encrypted: i=1; AJvYcCUo9CuJQRgxa8OtDzn8zQmmAfpQUYbvvdYeyW/loo+G3YiJlhUVqFBxEya2cj4jxscPpUItu6mzN6Cq4PGQRLSBGag= X-Gm-Message-State: AOJu0Yzy7rkTJKVIh2zNtgzgN4ImUDvZamox7MYTAjyfwB564CmwlckX 78nQ6Iva983SVwVrgm2Ekoj1Eeb+kIKbeqGurHkzE2s8W3u4VeDgUZDzG4jemBDUSmXtnqppDho 7h5CVnRU96g== X-Google-Smtp-Source: AGHT+IEqtB7J+eET8jECP6dJaE/vZ4au8St4Y37qZzLM3Wd/fTYl0KhtmsX38BXBJE3VkkIRwseWYmA0Pv3/Xg== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:600c:19c8:b0:418:ee30:3f9e with SMTP id 5b1f17b1804b1-41fea92770dmr2110535e9.2.1716296253380; Tue, 21 May 2024 05:57:33 -0700 (PDT) Date: Tue, 21 May 2024 12:57:19 +0000 In-Reply-To: <20240521-mm-hotplug-sync-v1-0-6d53706c1ba8@google.com> Mime-Version: 1.0 References: <20240521-mm-hotplug-sync-v1-0-6d53706c1ba8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240521-mm-hotplug-sync-v1-2-6d53706c1ba8@google.com> Subject: [PATCH 2/2] mm,memory_hotplug: {READ,WRITE}_ONCE unsynchronized zone data From: Brendan Jackman To: David Hildenbrand , Oscar Salvador , Andrew Morton , Mike Rapoport Cc: Michal Hocko , Anshuman Khandual , Vlastimil Babka , Pavel Tatashin , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Brendan Jackman X-Rspamd-Queue-Id: DB210A0004 X-Stat-Signature: 8m7b7rs78e11hmst8es4doieggs9et5z X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1716296254-426571 X-HE-Meta: U2FsdGVkX1+qbFp7B/hGUeM+wFbjbf1IWsLlk28Efk3P/pRh6R2WrcHuclsJ0VoM+oaYx3jy7dzA/EsnaJFlPWu16N6iEo1Z2/HE60ZCVWZK15M3h137zddrVGqzoP6Pip/+mi3/x3aTiqqlPKQbivsVD2L70dNWZ4/s+fnowODiMY2nkyoLgDh0OhwqwHy6qd7TGgx9nioq/YWnGDBgFxP1X23qHi6Mi32+ycJcDuCmFHi9O25gFVHF53+5pyiY8wB8Rkr7LCFPbTMTvDOTVK8QzIFR5i/B4O9NIYY5hX5Oz+2fVEAyQIVBaRHmaUWP491Icl1GS02F++zFL337k9VUo/HkrrmDGft7jsyAccEzLNjhTRxoNbDtaDMBojDVSK8wfUaB2P/3WN/ZPmcABPqtdu7IrWHiALLTiuf2+ybXGuxT12nNSOplT10ZKaRGGtt1hWcomyR+U0c21aNbB/8ktjZncyArLEsGMc4mvSWnM4EEVou/b7fmJNnTzZNrfqy7NeqGvVfPBP+O2IK1IiIdqAWYuSR6vVLUQYjdaeV7l/Egmtf9RtfMp9SWHyvakuSKq2lRV98z/kYKE0jWr4ah+5qVAHMYbJm3levqcMblE2QmxakfgTl+faWg502/bLmuZunTQa9g4peAis9Wsyqpii+A3Kpok56KLqlPFk5+PV0gdgtTkId9PUwf3VbVQKT+sv3aM0HhgyOjKijaOME8l9tf2bVQpdwhn1xcfO40ok+b7zimaQDY4AInzQQ67HYKrZJ2+zraMGdF86R3dDUrGyxKCeH54EmjgPNH1k2jXZNNidPiRCz4iNcozWuHOMYnPXINHHtGM1ahco2H6d0CRkYJJO5wPx4Z7VxDBv1SEuduOjM9r36cANsf7PRj+Ebtr6Md2x2a/7j4Ub1hDeYeyGgf72LwR7AruHoheep/XD0c6ROOSJUpdCvKFndb0L2tDx421nz3zZRzpdI BmqthpyA y1kWJ2WSDxLKRrlDvTBuOo/CkcbbonJ2cZYHpaSfdkcl+xGteHlvwgyLzshvSSaLxTh5bzYDinYLSk/qQqpmXusCcLekjXNf5/BI46tg8J1//SWfM1FijfJ8PZoVnDSr66dGv1p9yMFxEp2u1r6yi81dG67ej+9mh6IhmY2cpdC8zj0POvlyKKmkYfiEkkgfR6poZrnnT1SdU7qnxUAbh12Pp3NmGdchVE0+4ltZvntqjL8myzBSa0TUKkConTkw7GoKozrtbr8NMpjJn2nDHhe9hfQRf8gfUwAKLpRNTRsDyeO7XTi51V3Y8WUdE05MbJvgul+fqHgc2RQRiQ05Drb93MzsD+vrIMNQ+xFuHpvtoK8sIRkfSjX988A+ZF6hYD7gMtaSw+HQBHwOfaheOxrRBQU0UBHktOx4aaKA4NxGQK8S5LfX89+i8xP30VM/fy0n/XzuofMt0LHyEjPJHiMTviR1JMJH+IKwkWJU9gUSSxIR6NMK4MUuZUbrkDJrQn98Dpykq2b+oz80qvFqubU510lb/l+omNgHxIwts6dav6asYpdOBL4VrQJVWEJWb34SPHr1c6NL8Gd1C3NxbHmeqPX0tfOB6SDKKK6c0+rdPdhO1Ky5nnpllgQQnrIn4JlYH3PU+o3ajZTwbGWHqjZk/75APL8qPK6AI4giY1/ZOvlrQiC4dGHRGa2lqNxW4j6WW327jIhU9LiAv0oCdGFcXw4bGMnDAzWXwO+TIasZILeGVObAu5ccYS590DRX3TUj+1JDWccJPjqVhDhH5XKrBCSjsxS5D8/cAY6rJsM86J1jwhme8e6frDg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: These fields are written by memory hotplug under mem_hotplug_lock but read without any lock. It seems like reader code is robust against the value being stale or "from the future", but we also need to account for: 1. Load/store tearing (according to Linus[1], this really happens, even when everything is aligned as you would hope). 2. Invented loads[2] - the compiler can spill and re-read these fields ([2] calls this "invented loads") and assume that they have not changed. Note we don't need READ_ONCE in paths that have the mem_hotplug_lock for write, but we still need WRITE_ONCE to prevent store-tearing. [1] https://lore.kernel.org/all/CAHk-=wj2t+GK+DGQ7Xy6U7zMf72e7Jkxn4_-kGyfH3WFEoH+YQ@mail.gmail.com/T/#u As discovered via the original big-bad article[2] [2] https://lwn.net/Articles/793253/ Signed-off-by: Brendan Jackman --- include/linux/mmzone.h | 14 ++++++++++---- mm/compaction.c | 2 +- mm/memory_hotplug.c | 20 ++++++++++++-------- mm/mm_init.c | 2 +- mm/page_alloc.c | 2 +- mm/show_mem.c | 8 ++++---- mm/vmstat.c | 4 ++-- 7 files changed, 31 insertions(+), 21 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 194ef7fed9d6..bdb3be76d10c 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1018,11 +1018,13 @@ static inline unsigned long zone_cma_pages(struct zone *zone) #endif } +/* This is unstable unless you hold mem_hotplug_lock. */ static inline unsigned long zone_end_pfn(const struct zone *zone) { - return zone->zone_start_pfn + zone->spanned_pages; + return zone->zone_start_pfn + READ_ONCE(zone->spanned_pages); } +/* This is unstable unless you hold mem_hotplug_lock. */ static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn) { return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone); @@ -1033,9 +1035,10 @@ static inline bool zone_is_initialized(struct zone *zone) return zone->initialized; } +/* This is unstable unless you hold mem_hotplug_lock. */ static inline bool zone_is_empty(struct zone *zone) { - return zone->spanned_pages == 0; + return READ_ONCE(zone->spanned_pages) == 0; } #ifndef BUILD_VDSO32_64 @@ -1485,10 +1488,13 @@ static inline bool managed_zone(struct zone *zone) return zone_managed_pages(zone); } -/* Returns true if a zone has memory */ +/* + * Returns true if a zone has memory. + * This is unstable unless you old mem_hotplug_lock. + */ static inline bool populated_zone(struct zone *zone) { - return zone->present_pages; + return READ_ONCE(zone->present_pages); } #ifdef CONFIG_NUMA diff --git a/mm/compaction.c b/mm/compaction.c index e731d45befc7..b8066d1fdcf5 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2239,7 +2239,7 @@ static unsigned int fragmentation_score_zone_weighted(struct zone *zone) { unsigned long score; - score = zone->present_pages * fragmentation_score_zone(zone); + score = READ_ONCE(zone->present_pages) * fragmentation_score_zone(zone); return div64_ul(score, zone->zone_pgdat->node_present_pages + 1); } diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 431b1f6753c0..71b5e3d314a2 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -463,6 +463,8 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn, int nid = zone_to_nid(zone); if (zone->zone_start_pfn == start_pfn) { + unsigned long old_end_pfn = zone_end_pfn(zone); + /* * If the section is smallest section in the zone, it need * shrink zone->zone_start_pfn and zone->zone_spanned_pages. @@ -470,13 +472,13 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn, * for shrinking zone. */ pfn = find_smallest_section_pfn(nid, zone, end_pfn, - zone_end_pfn(zone)); + old_end_pfn); if (pfn) { - zone->spanned_pages = zone_end_pfn(zone) - pfn; + WRITE_ONCE(zone->spanned_pages, old_end_pfn - pfn); zone->zone_start_pfn = pfn; } else { zone->zone_start_pfn = 0; - zone->spanned_pages = 0; + WRITE_ONCE(zone->spanned_pages, 0); } } else if (zone_end_pfn(zone) == end_pfn) { /* @@ -488,10 +490,11 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn, pfn = find_biggest_section_pfn(nid, zone, zone->zone_start_pfn, start_pfn); if (pfn) - zone->spanned_pages = pfn - zone->zone_start_pfn + 1; + WRITE_ONCE(zone->spanned_pages, + pfn - zone->zone_start_pfn + 1); else { zone->zone_start_pfn = 0; - zone->spanned_pages = 0; + WRITE_ONCE(zone->spanned_pages, 0); } } } @@ -710,7 +713,8 @@ static void __meminit resize_zone_range(struct zone *zone, unsigned long start_p if (zone_is_empty(zone) || start_pfn < zone->zone_start_pfn) zone->zone_start_pfn = start_pfn; - zone->spanned_pages = max(start_pfn + nr_pages, old_end_pfn) - zone->zone_start_pfn; + WRITE_ONCE(zone->spanned_pages, + max(start_pfn + nr_pages, old_end_pfn) - zone->zone_start_pfn); } static void __meminit resize_pgdat_range(struct pglist_data *pgdat, unsigned long start_pfn, @@ -795,7 +799,7 @@ static void auto_movable_stats_account_zone(struct auto_movable_stats *stats, struct zone *zone) { if (zone_idx(zone) == ZONE_MOVABLE) { - stats->movable_pages += zone->present_pages; + stats->movable_pages += READ_ONCE(zone->present_pages); } else { stats->kernel_early_pages += zone->present_early_pages; #ifdef CONFIG_CMA @@ -1077,7 +1081,7 @@ void adjust_present_page_count(struct page *page, struct memory_group *group, */ if (early_section(__pfn_to_section(page_to_pfn(page)))) zone->present_early_pages += nr_pages; - zone->present_pages += nr_pages; + WRITE_ONCE(zone->present_pages, zone->present_pages + nr_pages); zone->zone_pgdat->node_present_pages += nr_pages; if (group && movable) diff --git a/mm/mm_init.c b/mm/mm_init.c index c725618aeb58..ec66f2eadb95 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1540,7 +1540,7 @@ void __ref free_area_init_core_hotplug(struct pglist_data *pgdat) for (z = 0; z < MAX_NR_ZONES; z++) { struct zone *zone = pgdat->node_zones + z; - zone->present_pages = 0; + WRITE_ONCE(zone->present_pages, 0); zone_init_internals(zone, z, nid, 0); } } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5116a2b9ea6e..1eb9000ec7d7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5728,7 +5728,7 @@ __meminit void zone_pcp_init(struct zone *zone) if (populated_zone(zone)) pr_debug(" %s zone: %lu pages, LIFO batch:%u\n", zone->name, - zone->present_pages, zone_batchsize(zone)); + READ_ONCE(zone->present_pages), zone_batchsize(zone)); } void adjust_managed_page_count(struct page *page, long count) diff --git a/mm/show_mem.c b/mm/show_mem.c index bdb439551eef..667680a6107b 100644 --- a/mm/show_mem.c +++ b/mm/show_mem.c @@ -337,7 +337,7 @@ static void show_free_areas(unsigned int filter, nodemask_t *nodemask, int max_z K(zone_page_state(zone, NR_ZONE_INACTIVE_FILE)), K(zone_page_state(zone, NR_ZONE_UNEVICTABLE)), K(zone_page_state(zone, NR_ZONE_WRITE_PENDING)), - K(zone->present_pages), + K(READ_ONCE(zone->present_pages)), K(zone_managed_pages(zone)), K(zone_page_state(zone, NR_MLOCK)), K(zone_page_state(zone, NR_BOUNCE)), @@ -407,11 +407,11 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) for_each_populated_zone(zone) { - total += zone->present_pages; - reserved += zone->present_pages - zone_managed_pages(zone); + total += READ_ONCE(zone->present_pages); + reserved += READ_ONCE(zone->present_pages) - zone_managed_pages(zone); if (is_highmem(zone)) - highmem += zone->present_pages; + highmem += READ_ONCE(zone->present_pages); } printk("%lu pages RAM\n", total); diff --git a/mm/vmstat.c b/mm/vmstat.c index 8507c497218b..5a9c4b5768e5 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1708,8 +1708,8 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, min_wmark_pages(zone), low_wmark_pages(zone), high_wmark_pages(zone), - zone->spanned_pages, - zone->present_pages, + READ_ONCE(zone->spanned_pages), + READ_ONCE(zone->present_pages), zone_managed_pages(zone), zone_cma_pages(zone));