From patchwork Tue Apr 15 02:45:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 14051378 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04CDFC369B2 for ; Tue, 15 Apr 2025 02:48:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6B84D280025; Mon, 14 Apr 2025 22:48:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 63F68280024; Mon, 14 Apr 2025 22:48:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4BADA280025; Mon, 14 Apr 2025 22:48:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 29B70280024 for ; Mon, 14 Apr 2025 22:48:08 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 00BC3B0FBC for ; Tue, 15 Apr 2025 02:48:08 +0000 (UTC) X-FDA: 83334743898.16.72FC457 Received: from mail-pg1-f170.google.com (mail-pg1-f170.google.com [209.85.215.170]) by imf05.hostedemail.com (Postfix) with ESMTP id 0D4E9100003 for ; Tue, 15 Apr 2025 02:48:06 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=c4lF7jsJ; spf=pass (imf05.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.215.170 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744685287; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CpSUB3YDfgeYGmfO/fpEZlFvo0fqwlF+YsT5eAHFtoQ=; b=iH1Vf9faKnv+BJnoxIKrIewZxjG6r95qoM9s+AcVf9EO09gIX3Y8rqtj74xLD3dTr8ipjy Hh5EkN+L6UT/sr4TeB5ZstxKfoVfT0KoIuBgIsQg88fG1uUjWpACg26/QQUOBNrXLajwzh YR4ZswLm6vQ2HGhvw92ZBqqrkS1PsBs= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=c4lF7jsJ; spf=pass (imf05.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.215.170 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744685287; a=rsa-sha256; cv=none; b=Q7gtysIj5cx/vtgB7nKNO0yB57y2hLRSXVPNQPbiUjhx6fbba0gnqwa3+CfO+G3BmZzv1d oW5194athgPn56DRAtYtUq8q9jpy+Ol+AhfBhODKkIgr249Mf+iqdqIqcR8eyZLWNQTWu/ 5b1m8+Cb3XsBu10CC/1SwBlins7VNCQ= Received: by mail-pg1-f170.google.com with SMTP id 41be03b00d2f7-af51596da56so4317187a12.0 for ; Mon, 14 Apr 2025 19:48:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685286; x=1745290086; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CpSUB3YDfgeYGmfO/fpEZlFvo0fqwlF+YsT5eAHFtoQ=; b=c4lF7jsJ0sKXnIJoQQl2sjYGlFsOZaJrx23jktbIOIOTLEbUm5BC6ZyWO2Rs3q5DGR MKxVh4tEbU//p+SRiCM30XxnOeThl6DyxKsEGVsMLT4J5uX6DuuEzX4kr2CIcSAaeYjR jQZ9zmiiCZDiY/V1++JAZgXCpDxLQZjEB2lX1MwoJUIT2KFdA8csRmYZv600TrimO+6U TxK7QTh3rQfA/Msa0zip5t6j57vn/xLryEieTQ2tTsB2s4opRenkTa7IHTEJB/A1iFyr 7qfaiIS1v1R5iKnXaTKGrA5g/xisiw0j6XIAqCrnKrd3qPb8eqrk2WLX4MgsLQ9K36i4 XEVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685286; x=1745290086; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CpSUB3YDfgeYGmfO/fpEZlFvo0fqwlF+YsT5eAHFtoQ=; b=aESQoNWiXgCEIJsGnhcHDhU1DFYDly1aW/wzelmr4d4F2QPwJ6QWWKDw608TK/LaHs vFpw7hSR8wt0zUHD93YR6C32+eSEfS5EgO7F/BeFoPfJWaipDC8gQO9xw0/f7gvcwZ1u 4yN60smsb+AwQb4aa5vP8Pf0qWLG6YAoCyRCwcuLV8ct/awua1+xpgfobEeyF4MifffT wKk++oHMAiKK6tVJ1rTA0AW4wnS4HDrIG++D7TACUHeWOfP7g+Z2zcar9onlwz+lR/S7 jO8lcwIkrhJIr3V0NaIKLU1VIzm4p3f9nQ8fEVUwwEerup3x3FREGJqOBWjcMr9k12HD iC3Q== X-Forwarded-Encrypted: i=1; AJvYcCWNfg4cK/1NIQEooobSB2iJcXJUdaL+GvcH+UCKrLZDNb0DvLjqxRj3NX5um3ICUulyDnMYKi6HeA==@kvack.org X-Gm-Message-State: AOJu0Yy67ZuuzVOHtLOawe2ZB/7NdYy+/z5BTPfbCf62/zj23m8dMwAS QIocl5YuN5dntKnOtmUR3VnQxLBNxKtKzqv26X4pe9RSKR3Qah9T81fnzOA2ASc= X-Gm-Gg: ASbGncuDZT2UcW9iHUJXnIvYHHOfBC4Ge6DahDongVLJhsAgrnRVoV0Fy/vcRRzuvNr JuX0LaZt6SS/u+/VaReZzeMcRjwCM4GWykEXzciPbKKxfI3YH4d6jZ16IDZdWg0GCG/dTF8hvh+ NLuuDBGwnmm5rySJFH0pg/1m1kZmh3A48Kl+oZ2gdt8dil0KxPKTZgYQvNRLTzQg4sUa82mZ25t w28GL2VLJzPcgCuO38rqg0vb/q5M0IVCDafATXruZe/HNIRoKHKHZZjvZ89Fytv8VmVuhjbGX7U 9B1RVcNMPPxxjMXcPfkXHfgKU4RASxG6OdhEh88mV/l/hwygUWjb5NQ18Ij0heRWvfVxDse5RzJ 0CR5b9i4= X-Google-Smtp-Source: AGHT+IHgqNDuzvTzNM38ht/Apai59kRy7TIM0/jwQoLjF9IBxI/3XCoQow31pfiiq7WdKcqDsqh+0g== X-Received: by 2002:a17:902:f78a:b0:224:1eab:97b5 with SMTP id d9443c01a7336-22bea49530dmr219890795ad.1.1744685285756; Mon, 14 Apr 2025 19:48:05 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.48.00 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:48:05 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 24/28] mm: memcontrol: prepare for reparenting LRU pages for lruvec lock Date: Tue, 15 Apr 2025 10:45:28 +0800 Message-Id: <20250415024532.26632-25-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 0D4E9100003 X-Rspamd-Server: rspam05 X-Rspam-User: X-Stat-Signature: xbwpjuy8y98pmj5fdmperwfshadafinm X-HE-Tag: 1744685286-526222 X-HE-Meta: U2FsdGVkX191qXDkcgZDGhwuCZk4V2k9OpejcQhQJcYHqsRuZANtLlDRGC96dRYWoNs+08YHrzo4tEw/EWeoXxwoowpvS4kCfKGukjOXMsKRvBO5BMUtIcMj2xK/Tyee4jqm7So+0H8Plvmw56KTI9kTva2Kq1xCP2htl/ZCdizImMOncArVksapnP8j4GEzTBnpDnIt6sWKqC+eYjf7yep0R5uV+5GSiw+xCMaBIUPDJFAmZ6meBWF65yioZIEPKEPRIr9Y64Sg0NddHODMWaqWXqNeGwIOQmKmcq8DcT/aHYgr+HHpVO88fia1xEqAKbrmj7QquD02Uqw6E2TPzadqT+9YZTHdyx8k+GoT8baAb/TZWPPKWRSMNYzMmBf/A0NFxtj1l2danIJn6nODVVhZtcaz2MjCh7AVcBIvWUjoyXsnfdWvg+VAL0l70I7zVKX94HSuAlPia1OZ0w678/xjYEDk8osBIUT+XSVqfzWr8xRFzubHFbmojzEbcYR6COID3fvdmp80A3lPJxFBv2zPGfTfL11SGPp+Fke9cEllWKZqb1C6ZO2+g2TE6jmVPu0a3O8AAxX7O3ikX1CmkS7yX5i3sJoBHw25wT/VqEDXMy5Jr0kWzS04fT5QimhY02K44nNk9CwrxqR4RuvMf8EFHujXhJHbr33Plnzb+HKg4v2S4xuLEyF/38x0MzM7vtRndnngEJ65T1zBcM2dgRF5PcWg2LK2FuFMMZDnMRpO8zOKSaGmrrhnqB2K3c+IIbeEnDKpkcQqyIrG23zoXo4/lBIv3nETLmPRm3dZ88pTzz81CR+QUsJaXdpnBHPU7p5qC83QuiIC9FknI1dAmKRdMOjtNflJlQ5SoZKI/CBNbREyck9PA59Jhh/VxYgiNOL+woZdTA/GkbDQavnyw30h8m6Cpl8GwWOhP311H8yp/9zW6VRATuHYnwPIA5T5TNOgeAeH6ymeijOtfZC Qul8Vr25 uWDA9jepjGd+R1M/CZJfW+JoBvSjZ99APuV4igsUlxTnSJKCZQVK+/MVMBYbJywSiJ6w4WQw6JBtcAyH7s4bYXe8vB5BOVdoCgpeNSuWTQpQX6KOX+bOJl6gOQ+bkQPEFmr4IZMq0XNzyhHVVT5u6Xlfh8HGV4KPAeBNyIMK6zc9JF0kK4h5/Qq9/4x9Zeab5HVX6yAu3yntFZyMY4MzabSx8eHr4nqymSfjFQh4yClx4kPiA2f809CnPeDTbTqEzC918FmcIgKFPCEA+EJ7DS30Psj+QY/fzQVI/jW/Qf+uzAn8WtzGwBxTiNw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The following diagram illustrates how to ensure the safety of the folio lruvec lock when LRU folios undergo reparenting. In the folio_lruvec_lock(folio) function: ``` rcu_read_lock(); retry: lruvec = folio_lruvec(folio); /* There is a possibility of folio reparenting at this point. */ spin_lock(&lruvec->lru_lock); if (unlikely(lruvec_memcg(lruvec) != folio_memcg(folio))) { /* * The wrong lruvec lock was acquired, and a retry is required. * This is because the folio resides on the parent memcg lruvec * list. */ spin_unlock(&lruvec->lru_lock); goto retry; } /* Reaching here indicates that folio_memcg() is stable. */ ``` In the memcg_reparent_objcgs(memcg) function: ``` spin_lock(&lruvec->lru_lock); spin_lock(&lruvec_parent->lru_lock); /* Transfer folios from the lruvec list to the parent's. */ spin_unlock(&lruvec_parent->lru_lock); spin_unlock(&lruvec->lru_lock); ``` After acquiring the lruvec lock, it is necessary to verify whether the folio has been reparented. If reparenting has occurred, the new lruvec lock must be reacquired. During the LRU folio reparenting process, the lruvec lock will also be acquired (this will be implemented in a subsequent patch). Therefore, folio_memcg() remains unchanged while the lruvec lock is held. Given that lruvec_memcg(lruvec) is always equal to folio_memcg(folio) after the lruvec lock is acquired, the lruvec_memcg_debug() check is redundant. Hence, it is removed. This patch serves as a preparation for the reparenting of LRU folios. Signed-off-by: Muchun Song --- include/linux/memcontrol.h | 23 ++++++----------- mm/compaction.c | 29 ++++++++++++++++----- mm/memcontrol.c | 53 +++++++++++++++++++------------------- 3 files changed, 58 insertions(+), 47 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 01239147eb11..27b23e464229 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -719,7 +719,11 @@ static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg, * folio_lruvec - return lruvec for isolating/putting an LRU folio * @folio: Pointer to the folio. * - * This function relies on folio->mem_cgroup being stable. + * The user should hold an rcu read lock to protect lruvec associated with + * the folio from being released. But it does not prevent binding stability + * between the folio and the returned lruvec from being changed to its parent + * or ancestor (e.g. like folio_lruvec_lock() does that holds LRU lock to + * prevent the change). */ static inline struct lruvec *folio_lruvec(struct folio *folio) { @@ -742,15 +746,6 @@ struct lruvec *folio_lruvec_lock_irq(struct folio *folio); struct lruvec *folio_lruvec_lock_irqsave(struct folio *folio, unsigned long *flags); -#ifdef CONFIG_DEBUG_VM -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio); -#else -static inline -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) -{ -} -#endif - static inline struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){ return css ? container_of(css, struct mem_cgroup, css) : NULL; @@ -1211,11 +1206,6 @@ static inline struct lruvec *folio_lruvec(struct folio *folio) return &pgdat->__lruvec; } -static inline -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) -{ -} - static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg) { return NULL; @@ -1532,17 +1522,20 @@ static inline struct lruvec *parent_lruvec(struct lruvec *lruvec) static inline void lruvec_unlock(struct lruvec *lruvec) { spin_unlock(&lruvec->lru_lock); + rcu_read_unlock(); } static inline void lruvec_unlock_irq(struct lruvec *lruvec) { spin_unlock_irq(&lruvec->lru_lock); + rcu_read_unlock(); } static inline void lruvec_unlock_irqrestore(struct lruvec *lruvec, unsigned long flags) { spin_unlock_irqrestore(&lruvec->lru_lock, flags); + rcu_read_unlock(); } /* Test requires a stable folio->memcg binding, see folio_memcg() */ diff --git a/mm/compaction.c b/mm/compaction.c index ce45d633ddad..4abd1481d5de 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -551,6 +551,24 @@ static bool compact_lock_irqsave(spinlock_t *lock, unsigned long *flags, return true; } +static struct lruvec * +compact_folio_lruvec_lock_irqsave(struct folio *folio, unsigned long *flags, + struct compact_control *cc) +{ + struct lruvec *lruvec; + + rcu_read_lock(); +retry: + lruvec = folio_lruvec(folio); + compact_lock_irqsave(&lruvec->lru_lock, flags, cc); + if (unlikely(lruvec_memcg(lruvec) != folio_memcg(folio))) { + spin_unlock_irqrestore(&lruvec->lru_lock, *flags); + goto retry; + } + + return lruvec; +} + /* * Compaction requires the taking of some coarse locks that are potentially * very heavily contended. The lock should be periodically unlocked to avoid @@ -872,7 +890,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, { pg_data_t *pgdat = cc->zone->zone_pgdat; unsigned long nr_scanned = 0, nr_isolated = 0; - struct lruvec *lruvec; + struct lruvec *lruvec = NULL; unsigned long flags = 0; struct lruvec *locked = NULL; struct folio *folio = NULL; @@ -1189,18 +1207,17 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, if (!folio_test_clear_lru(folio)) goto isolate_fail_put; - lruvec = folio_lruvec(folio); + if (locked) + lruvec = folio_lruvec(folio); /* If we already hold the lock, we can skip some rechecking */ - if (lruvec != locked) { + if (lruvec != locked || !locked) { if (locked) lruvec_unlock_irqrestore(locked, flags); - compact_lock_irqsave(&lruvec->lru_lock, &flags, cc); + lruvec = compact_folio_lruvec_lock_irqsave(folio, &flags, cc); locked = lruvec; - lruvec_memcg_debug(lruvec, folio); - /* * Try get exclusive access under lock. If marked for * skip, the scan is aborted unless the current context diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 694f19017699..1f0c6e7b69cc 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1196,23 +1196,6 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg, } } -#ifdef CONFIG_DEBUG_VM -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) -{ - struct mem_cgroup *memcg; - - if (mem_cgroup_disabled()) - return; - - memcg = folio_memcg(folio); - - if (!memcg) - VM_BUG_ON_FOLIO(!mem_cgroup_is_root(lruvec_memcg(lruvec)), folio); - else - VM_BUG_ON_FOLIO(lruvec_memcg(lruvec) != memcg, folio); -} -#endif - /** * folio_lruvec_lock - Lock the lruvec for a folio. * @folio: Pointer to the folio. @@ -1222,14 +1205,20 @@ void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) * - folio_test_lru false * - folio frozen (refcount of 0) * - * Return: The lruvec this folio is on with its lock held. + * Return: The lruvec this folio is on with its lock held and rcu read lock held. */ struct lruvec *folio_lruvec_lock(struct folio *folio) { - struct lruvec *lruvec = folio_lruvec(folio); + struct lruvec *lruvec; + rcu_read_lock(); +retry: + lruvec = folio_lruvec(folio); spin_lock(&lruvec->lru_lock); - lruvec_memcg_debug(lruvec, folio); + if (unlikely(lruvec_memcg(lruvec) != folio_memcg(folio))) { + spin_unlock(&lruvec->lru_lock); + goto retry; + } return lruvec; } @@ -1244,14 +1233,20 @@ struct lruvec *folio_lruvec_lock(struct folio *folio) * - folio frozen (refcount of 0) * * Return: The lruvec this folio is on with its lock held and interrupts - * disabled. + * disabled and rcu read lock held. */ struct lruvec *folio_lruvec_lock_irq(struct folio *folio) { - struct lruvec *lruvec = folio_lruvec(folio); + struct lruvec *lruvec; + rcu_read_lock(); +retry: + lruvec = folio_lruvec(folio); spin_lock_irq(&lruvec->lru_lock); - lruvec_memcg_debug(lruvec, folio); + if (unlikely(lruvec_memcg(lruvec) != folio_memcg(folio))) { + spin_unlock_irq(&lruvec->lru_lock); + goto retry; + } return lruvec; } @@ -1267,15 +1262,21 @@ struct lruvec *folio_lruvec_lock_irq(struct folio *folio) * - folio frozen (refcount of 0) * * Return: The lruvec this folio is on with its lock held and interrupts - * disabled. + * disabled and rcu read lock held. */ struct lruvec *folio_lruvec_lock_irqsave(struct folio *folio, unsigned long *flags) { - struct lruvec *lruvec = folio_lruvec(folio); + struct lruvec *lruvec; + rcu_read_lock(); +retry: + lruvec = folio_lruvec(folio); spin_lock_irqsave(&lruvec->lru_lock, *flags); - lruvec_memcg_debug(lruvec, folio); + if (unlikely(lruvec_memcg(lruvec) != folio_memcg(folio))) { + spin_unlock_irqrestore(&lruvec->lru_lock, *flags); + goto retry; + } return lruvec; }