From patchwork Thu Oct 24 04:10:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13848295 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFB7ED0BB59 for ; Thu, 24 Oct 2024 04:10:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 017E36B0082; Thu, 24 Oct 2024 00:10:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F0A456B0083; Thu, 24 Oct 2024 00:10:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD1AC6B0085; Thu, 24 Oct 2024 00:10:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C05376B0082 for ; Thu, 24 Oct 2024 00:10:42 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 264A54108D for ; Thu, 24 Oct 2024 04:10:32 +0000 (UTC) X-FDA: 82707168306.17.5744BEE Received: from mail-oo1-f42.google.com (mail-oo1-f42.google.com [209.85.161.42]) by imf27.hostedemail.com (Postfix) with ESMTP id 2C1E84000D for ; Thu, 24 Oct 2024 04:10:22 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=OmZfLhhp; spf=pass (imf27.hostedemail.com: domain of hughd@google.com designates 209.85.161.42 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729742962; a=rsa-sha256; cv=none; b=WE8MV91aYs/gzjOmF5v1KAUYb0eKf63U7b+ZhvK17LtI+QHD9hD3y57ldXCPkmX1ti736R VBK34X9c49fYQBuMvBrGb1cVLACiG1cMRavqxj0wEXjN+IS6ywsMCAjCio/9syYzD91PJs ueXkLeXS1t7GGZh6ROA7r6F8/cBtZaU= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=OmZfLhhp; spf=pass (imf27.hostedemail.com: domain of hughd@google.com designates 209.85.161.42 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729742962; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=yn77C/xxzSi/PdOkEd//aGH6uiDZj6/qHoCDJkjxx3E=; b=6mUfkRErGxRNbQVzbbtt5AidIZz5Vcx/v/vrdWHIB3NPVAJx1TJtShf8Z6bWf8SpcG97lM +LcP9iqofK5Egot8KXdYoaPa4qM64GVQTmEqeyHZe+G9wi4IO+RC4Clp+nUDtjBV2i/kH9 RMiI67Y0PGcdxgd8L6V5nJFBjaLC4E0= Received: by mail-oo1-f42.google.com with SMTP id 006d021491bc7-5e5568f1b6eso216422eaf.1 for ; Wed, 23 Oct 2024 21:10:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729743039; x=1730347839; darn=kvack.org; h=mime-version:message-id:subject:cc:to:from:date:from:to:cc:subject :date:message-id:reply-to; bh=yn77C/xxzSi/PdOkEd//aGH6uiDZj6/qHoCDJkjxx3E=; b=OmZfLhhpl6nKkblUZNWayt/KyVfnjXVGDLSgYWiWTxgBWnEk6AJxEcVZgsyd6MSJQm 416QzCKHqcHpoG39wMKauwiMwemujMLObhQL5pQwm0U3YDgWUc+Xzx3hXFXnEkNxO14Y xTMAh96G9+F/SGTNE00FOx2wW+yaxThOAtnc9+MPaHn8fHY4TN4AIuYTWM8plaxWQO2s Vf24wlmcNKQAsk5GegrF7PTpgdzUR0hWGtUXBYh7HRKd/WCliKleSEwKj3cVdsrjrFj7 78Z90ffgIuSJ9IptxkgKX5+uaNeIYYg895ujMOVXw8xBJ6f90EKo9ozFJnRrBWxZkbRH iYkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729743039; x=1730347839; h=mime-version:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=yn77C/xxzSi/PdOkEd//aGH6uiDZj6/qHoCDJkjxx3E=; b=bnmHQjY/rNQmmMYeXjLukzUPX1OpIYUYrxEk7oPtILIR7KiLSVaoYr3+mAnNYAUNQa saiGeOX+K2IxqRT1Kw1MELLJ0UPRtaVpIsLAJeG4CffYNSwYHlvDtPgLk0fVMhsaYsGj KWAV+D7FlLPwY+SZ9KHFJ4oJCuRFc3HOpgzRh2KGRup/uCMw3hYWYkf+fJ2QXFjmByRi KNKgMd+8WjbOR5HYnRvvQh2EFLfn33Uixj5YNe3sg4SVsmp/jGiKYhklkR2f/J8PzZ2w oWdmCFNI/wkTtU/JRx1gUQAVC5MUQ3X4cC0wvven6g00F6FpHIHCdia7BRVBfwAyRUdv AVtA== X-Forwarded-Encrypted: i=1; AJvYcCXjBR0B/qd6Bb82jujI5mDpK9fbiwznOlz8R7nWRvTPmJXbjfbA76DYKSJn5k9FRSjdYKSIfR9jDQ==@kvack.org X-Gm-Message-State: AOJu0YzqNBQDaSooSd5beZtDSw9mJzqe8NxhhDnsOuJw6VEE6yJNQ0wo wGhw5U2WL5r0m8jAUIVRgN9uk1V66FIP3EMWlS/YFyM8U+G2jxEOR6BnfTWHQg== X-Google-Smtp-Source: AGHT+IGQwUlPD8ns9Utb2tr/HtMk8Wi4zrqUznuOyTeg9/Yu++uaYuMFy7qcbOdOO8AfMjUddXM2gw== X-Received: by 2002:a05:6870:a79b:b0:268:9f88:18ef with SMTP id 586e51a60fabf-28ced27ba4cmr529116fac.13.1729743039120; Wed, 23 Oct 2024 21:10:39 -0700 (PDT) Received: from darker.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7182eb3ec69sm2039774a34.22.2024.10.23.21.10.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Oct 2024 21:10:37 -0700 (PDT) Date: Wed, 23 Oct 2024 21:10:20 -0700 (PDT) From: Hugh Dickins To: Andrew Morton cc: Usama Arif , Yang Shi , Wei Yang , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Johannes Weiner , Baolin Wang , Barry Song , Kefeng Wang , Ryan Roberts , Nhat Pham , Zi Yan , Chris Li , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH hotfix 1/2] mm/thp: fix deferred split queue not partially_mapped Message-ID: <760237a3-69d6-9197-432d-0306d52c048a@google.com> MIME-Version: 1.0 X-Stat-Signature: kdw7j3dkx4td7tmyjh3m9rzwjnq17x9y X-Rspamd-Queue-Id: 2C1E84000D X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1729743022-508460 X-HE-Meta: U2FsdGVkX18xu6bcok/o/9rAQpArDHwGHpXj14ym2LWcQEGyQLzdhTqXzsd6QgGEP3lv9Zz6JK6uPvuvNUd5rlTgOeGn8lM2rwZnU1boMO085mO8Qch2lQRhRz/Hx8M1MaJ0UNcJsgD92Q+H+alHc/rFUsEmaA/XOwcpSE0z78KWNhRbtb15oAcK2xhmuby7w0+ffX5lRudkTtAFRR4ZEiEtVJhSVDy5vNKdHQrXUabwEuiF/mJFyke5AXDnrCUXj3xAlN+EXYah7CMdePj+AzPHheTIP255etW7FNj/JJwp2mbd0tilwAreHklMIUY4LiOuGvQEo/Tq2NrMQ10TtQ/BLRJ4V8udR0IPLunwiV3Juk6y9CjBCEyXHRKukHvrs3BH+eozRgKpXxhEJth3K+A+pehc59VX0eM6GXekol0JTPEWdkQ2SAlrMcQ3VIqd4BaswEnlV0FSFb2XbtorrhvNAfzruCGRfhLfNBjP5HYld9oZdYKD+DZFAjIf84QfHvln8Rg07ByGMDEdrNclZiFGHKVkW0umY82i3J0CC4gM5wPY5qSHZwslDgVfzeOPMJ1AXqOdiXmA4YSDOoAAi+Abjtkr9uGbyRb/XPN3llJonCOf/w3LEzFlZNeDCg75RkmDC1gHukCivncaScLQuw/wXJ3s80bqrZmcE8pFwjVlxAOtWDi1Hq51D5nb3ZVLZmOr+FZH2RTb/d6GcJSzGNCel/ReM6w1h/dnBXchYubMINMJ8bMcylJGnRrWa8KwduJQlwuH9q3RO0URG/emfBf6NydOd2aVOYq8LpW8Iup9Yu+bkUoErkLB0fal9KnLiltl0OOM3ezsSMe1gp3UD7qKJgTNNzot7/2/JugMwW4rZJ80y005IIppOWxhet4rqpzu6W55+uaBreLdqMf2Gpuxf9F2FnwhSZ1Kmvo6XXuW4m9tVlvrhmaidGPJugKFz7HuSvvVE8+FOk5EV+2 +6BTFkAj yl8yd/UGD0enlMqhJ5vy1rVUklJkfnSVMmdBSQ1jVne1yldaliJoRcqFpaupiMTYS9hlR885PC5WHA4aHLRT3+FoAw9iGKhbbf9Spvt3d2gf6RcUeBj5c8zYV8ZNE1QuC/Sr95VYqkx5QkeIiogoiltpti+V3Sc5e/TREORkMd/iqlZF92jd9yMtbH1LB7AVymHtuEbrp+CR9jxzO+TMQTFm9DGgKCx7pBlqo606lVoq793TBovy13xh4u6PGKExdykJF0u9FfNx5gg21P/8aTVf9erJDICP+B7GR4WLzEh8WJ3wz+GFmaSO8HOEZiEAlPFnvGbid7sVVPVt5SMwSXk3XR1q/CfnX9WQzJIf8JziJUiFY7qlKyCM87eSIvdWxdK1x0xGZ8aAr50+89/AjjoWNBLXX8fHVN/o1YjAFZdDZ25r8h6Dh3NJVSqfkm3N/cuQMZ0X6gatOkENNoaEAeBJyDA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Recent changes are putting more pressure on THP deferred split queues: under load revealing long-standing races, causing list_del corruptions, "Bad page state"s and worse (I keep BUGs in both of those, so usually don't get to see how badly they end up without). The relevant recent changes being 6.8's mTHP, 6.10's mTHP swapout, and 6.12's mTHP swapin, improved swap allocation, and underused THP splitting. The new unlocked list_del_init() in deferred_split_scan() is buggy. I gave bad advice, it looks plausible since that's a local on-stack list, but the fact is that it can race with a third party freeing or migrating the preceding folio (properly unqueueing it with refcount 0 while holding split_queue_lock), thereby corrupting the list linkage. The obvious answer would be to take split_queue_lock there: but it has a long history of contention, so I'm reluctant to add to that. Instead, make sure that there is always one safe (raised refcount) folio before, by delaying its folio_put(). (And of course I was wrong to suggest updating split_queue_len without the lock: leave that until the splice.) And remove two over-eager partially_mapped checks, restoring those tests to how they were before: if uncharge_folio() or free_tail_page_prepare() finds _deferred_list non-empty, it's in trouble whether or not that folio is partially_mapped (and the flag was already cleared in the latter case). Fixes: dafff3f4c850 ("mm: split underused THPs") Signed-off-by: Hugh Dickins Acked-by: Usama Arif Reviewed-by: David Hildenbrand Reviewed-by: Baolin Wang --- mm/huge_memory.c | 21 +++++++++++++++++---- mm/memcontrol.c | 3 +-- mm/page_alloc.c | 5 ++--- 3 files changed, 20 insertions(+), 9 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2fb328880b50..a1d345f1680c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3718,8 +3718,8 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, struct deferred_split *ds_queue = &pgdata->deferred_split_queue; unsigned long flags; LIST_HEAD(list); - struct folio *folio, *next; - int split = 0; + struct folio *folio, *next, *prev = NULL; + int split = 0, removed = 0; #ifdef CONFIG_MEMCG if (sc->memcg) @@ -3775,15 +3775,28 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, */ if (!did_split && !folio_test_partially_mapped(folio)) { list_del_init(&folio->_deferred_list); - ds_queue->split_queue_len--; + removed++; + } else { + /* + * That unlocked list_del_init() above would be unsafe, + * unless its folio is separated from any earlier folios + * left on the list (which may be concurrently unqueued) + * by one safe folio with refcount still raised. + */ + swap(folio, prev); } - folio_put(folio); + if (folio) + folio_put(folio); } spin_lock_irqsave(&ds_queue->split_queue_lock, flags); list_splice_tail(&list, &ds_queue->split_queue); + ds_queue->split_queue_len -= removed; spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + if (prev) + folio_put(prev); + /* * Stop shrinker if we didn't split any page, but the queue is empty. * This can happen if pages were freed under us. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7845c64a2c57..2703227cce88 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4631,8 +4631,7 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); VM_BUG_ON_FOLIO(folio_order(folio) > 1 && !folio_test_hugetlb(folio) && - !list_empty(&folio->_deferred_list) && - folio_test_partially_mapped(folio), folio); + !list_empty(&folio->_deferred_list), folio); /* * Nobody should be changing or seriously looking at diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8afab64814dc..4b21a368b4e2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -961,9 +961,8 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page) break; case 2: /* the second tail page: deferred_list overlaps ->mapping */ - if (unlikely(!list_empty(&folio->_deferred_list) && - folio_test_partially_mapped(folio))) { - bad_page(page, "partially mapped folio on deferred list"); + if (unlikely(!list_empty(&folio->_deferred_list))) { + bad_page(page, "on deferred list"); goto out; } break;