From patchwork Thu Nov 14 06:59:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13874610 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4AFB0D65C4F for ; Thu, 14 Nov 2024 07:00:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AA4216B0088; Thu, 14 Nov 2024 02:00:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A53F76B0089; Thu, 14 Nov 2024 02:00:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F4F16B008A; Thu, 14 Nov 2024 02:00:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 710C16B0088 for ; Thu, 14 Nov 2024 02:00:37 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 23D51160F8C for ; Thu, 14 Nov 2024 07:00:37 +0000 (UTC) X-FDA: 82783800288.03.FE7500A Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) by imf17.hostedemail.com (Postfix) with ESMTP id 8279A40450 for ; Thu, 14 Nov 2024 07:00:01 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=ZlXCoeu1; spf=pass (imf17.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.210.171 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731567579; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=23GC9z2kYnX1PBmcPyoYRptMMW0UHpf+qVqZ3j/14aw=; b=ybS9cT4jrSLS2SGHkA5nrtV4q5NrI73h204TG0Vg7d19OjMb56AyZ128L4aZid3ZTe4KE7 bMpAqhRFinpKGTYzhpXEGpbcSyeUnxSiScCIVaF62UdhlfYpqV/LNpEgY9xzCpLbXM+d3B /znZfA8SSMP+MHPYPdlSmA6+NnF5++s= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731567579; a=rsa-sha256; cv=none; b=RolHD4Ox8zyY0eowVexUsEdKJrXc4yZW/uPZRDFwWOvMFhU6C2NCDTqdGw4eyAG1A29V0Z 806ZsKAKT3nQt28Oqk35bzZeft//1j62HBHD5ZS5nRYQ6oAbTa+3euclyqByTgf+jT7cUU z3UczCi+z6QAXICHLUgXRnFAcC6/Vic= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=ZlXCoeu1; spf=pass (imf17.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.210.171 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-723f37dd76cso232871b3a.0 for ; Wed, 13 Nov 2024 23:00:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1731567634; x=1732172434; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=23GC9z2kYnX1PBmcPyoYRptMMW0UHpf+qVqZ3j/14aw=; b=ZlXCoeu1mNRrsHahj+pQ7wC31/dYrQ5KwgF5M89AAOCS/wKn4wddWZCnUPDJA/bH6v cd9uNYTbFJIxcKsCg4+fRQ3rBC/Cj1Wba85pJCqfOFWTHWhXq/9lm/uuMCJCXKzw/g/O qfgh/fxbZeGLoFORcUWXSKJCplASdNJEw1aCtDfRBz2YOPq8sbQfpfVGTpGYDZCgkCpK LkLNhxNXPaSLklYExGxgaFXtf+pkIUFaNl1QbjH+uZESDFWNSVyWsIIQj73bGKp07nUy kqcg6knIE+iHz/Bc44EtWoys7EutFlhgEy7m9bAB9Y40OOr8eIeruQYj3P1tike+psvV Eveg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731567634; x=1732172434; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=23GC9z2kYnX1PBmcPyoYRptMMW0UHpf+qVqZ3j/14aw=; b=GbSC5bFiMLoZpjHol9/rL86rJafCeOXfZLBcRybkft3f6hrV+331YRK2bHIo3H/OYw d97f+SKxSvdlAFK1BpTWeKvfkVpCpVYSFC7eECeYTYKkTDTzKax0MsjZmcljJeeUxu/W 7a+VDyGKFy6en7rDUMSnBSPkwlC53LwzIcbBD6Yqss2dRBDJdyPy/3DZdkiGAapeHXPr g4pPpFrGYoYRjXOkePaJdrlvNzq99XMM+n5aP7dPcLz00uGXbd+Q0zaVT37wyK9VwAIl hFvMRqtz1mOJ394NLn23+AliZb9fBekNJBQcNbnl2yUDuJ89edRrkAk/ml25Cvkq+psE rL1w== X-Forwarded-Encrypted: i=1; AJvYcCVmy4rOiIm6eH2zXUtTvVyhug7hr7c+PS5mV/8pfHgHq2u0qNmr3b/+JxG2IHKWTGvE7k8c6y8hcg==@kvack.org X-Gm-Message-State: AOJu0YxDgorKAhk/N5009qDDJtH+DmqW+eS18fYiyiReipTIUuDF3Zoc 7SuBorsJVF++6tE0SRxk7TP36bQJMu1nEJiEXVFG5eZgT5PHJg8VAxALPiHO4y0= X-Google-Smtp-Source: AGHT+IE3O39rZGVFEPKMSxSa/5Ca+GqYQaa3AwUorYO2XE8T6fJA6cVIU7wKCrtDgDFVz/hz4vTDDg== X-Received: by 2002:a05:6a20:8418:b0:1db:d81a:a900 with SMTP id adf61e73a8af0-1dc700ee224mr8489901637.0.1731567633901; Wed, 13 Nov 2024 23:00:33 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-211c7d389c2sm4119065ad.268.2024.11.13.23.00.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Nov 2024 23:00:33 -0800 (PST) From: Qi Zheng To: david@redhat.com, jannh@google.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, peterx@redhat.com Cc: mgorman@suse.de, catalin.marinas@arm.com, will@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, zokeefe@google.com, rientjes@google.com, Qi Zheng Subject: [PATCH v3 1/9] mm: khugepaged: recheck pmd state in retract_page_tables() Date: Thu, 14 Nov 2024 14:59:52 +0800 Message-Id: X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 X-Stat-Signature: n7fbbcm3tfzup7a4efwnpca1amf3hatc X-Rspam-User: X-Rspamd-Queue-Id: 8279A40450 X-Rspamd-Server: rspam02 X-HE-Tag: 1731567601-802449 X-HE-Meta: U2FsdGVkX19ep4d7snJnS2MDbgMLSDBAdDZ2nGKI67dh0cWu+Xo3Bnm29nuSm/Z9h5LbxGTmvbaO830C8AyisnsCnNFEVl7qEY5tD2Zmb1/ZAhI1h3lkg+PWJRfX9O4L0HbtFprd8A3Qt0+f0aZ/sJH79iOssijriifDoAYYPPfjbOA+UmaoaKCkOe7bY8wBcJ+VzTVLFm4mH6SonVG76rf2u2cyYCvZRy70+2EP0Yw12d/qgD3YkL/QEvhsEoQSIiBvN138At6iQ5wAnpmTuQAthh441f4Mz3Dx/0a8owEs6kfFO1Rd5VVTTankiVEMnuT5QhsFBCklVlNSQIZRUE6NIwWYAQW8Pl9/9eC3WWlFbI0fOteawQCAmxfvFhF0Vk7wcvFsIWJh8IH4d3i5N+x08WRQirrLDYNo4Dd2vZxjTyVFSmtjsM2bgAM09I/cQXWnHjkg2noToLW8PR9NtcqwitmHk5aci1kc1Aj3KcXSXg3QXNpkMbRCiLtTa8Ph4QBKrNiBF+uefjlEFJfmKJCJ8+LbuuMMkwrXc7Mver4lTajyu5LHNIIwaqRvzPDSvhUrwl312lQh9y8umd2wJl0P/JnlamjWzwC0IVDYxSsjQ+Bs+UT5flZ7w0dB+2dyO0hr+oMyrf3kWnp+JNTq8UxM69J/WR/szmcN6FN8FkqRcRHAnPpnvABJCt395tNX3Zl+jvs1AnUdboHe2iui9qZifPgd+Hhy+ZCqj3i4bR8QE7Jdz/cAh7KeuORbs84kIWfVYLkdjCqIrJDP79hlgRNiuFA/zPwIxiXLBF3DcY4ANnGXd6xf2qI6Mjf4+E6egr4mSJS9oQwMaXnhsQmO2h1GrXtn2fh61vZG7jmKVGK6X9q7dRGS34WjjazluP/nQKCabLKXs/cln23EpEe/85AX40M7uGxMKfTR4uYc0UAFGwk/0aeJD1lDAB529aQm5qjmEwIQgFpzaGOsZkt +fCkm+WW IZV+CY4u92pf0KK0xq++esNlusRUN+4Uq40TBu212wXLAZkB4u9839xc0+T/HJicjAYhI3pB69u7h6/fxUd9hr7OTb/3TMjbhA7Sj2OMbti7dTDEZiNzQFpdnoeICvpIO7Gu56pTywYasi5PZUukYrWIInUotyQYnZAHBVOnjUbbZeFZJkpZk5Kfb3CIWmqXtVRpg6wKmudZGO3nA8Urhym+yEVL9WqP7GhHCFrrSeBP3s5n7YmEqkKFsr6Uafq84cJqTk86zWKQtFAtih/cHhUQzbrGVJtyqrlBgIoAgmj1uIkbzJztmVzDo5qJRkubcR9wY7vA3uo4aY9h+BOBep6yifK0cL7wtYwmeKaxeYxAf1iQ0Ym06t8ylw9rn/7d7QRlW X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In retract_page_tables(), the lock of new_folio is still held, we will be blocked in the page fault path, which prevents the pte entries from being set again. So even though the old empty PTE page may be concurrently freed and a new PTE page is filled into the pmd entry, it is still empty and can be removed. So just refactor the retract_page_tables() a little bit and recheck the pmd state after holding the pmd lock. Suggested-by: Jann Horn Signed-off-by: Qi Zheng --- mm/khugepaged.c | 45 +++++++++++++++++++++++++++++++-------------- 1 file changed, 31 insertions(+), 14 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 6f8d46d107b4b..99dc995aac110 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -947,17 +947,10 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, return SCAN_SUCCEED; } -static int find_pmd_or_thp_or_none(struct mm_struct *mm, - unsigned long address, - pmd_t **pmd) +static inline int check_pmd_state(pmd_t *pmd) { - pmd_t pmde; + pmd_t pmde = pmdp_get_lockless(pmd); - *pmd = mm_find_pmd(mm, address); - if (!*pmd) - return SCAN_PMD_NULL; - - pmde = pmdp_get_lockless(*pmd); if (pmd_none(pmde)) return SCAN_PMD_NONE; if (!pmd_present(pmde)) @@ -971,6 +964,17 @@ static int find_pmd_or_thp_or_none(struct mm_struct *mm, return SCAN_SUCCEED; } +static int find_pmd_or_thp_or_none(struct mm_struct *mm, + unsigned long address, + pmd_t **pmd) +{ + *pmd = mm_find_pmd(mm, address); + if (!*pmd) + return SCAN_PMD_NULL; + + return check_pmd_state(*pmd); +} + static int check_pmd_still_valid(struct mm_struct *mm, unsigned long address, pmd_t *pmd) @@ -1720,7 +1724,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) pmd_t *pmd, pgt_pmd; spinlock_t *pml; spinlock_t *ptl; - bool skipped_uffd = false; + bool success = false; /* * Check vma->anon_vma to exclude MAP_PRIVATE mappings that @@ -1757,6 +1761,19 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) mmu_notifier_invalidate_range_start(&range); pml = pmd_lock(mm, pmd); + /* + * The lock of new_folio is still held, we will be blocked in + * the page fault path, which prevents the pte entries from + * being set again. So even though the old empty PTE page may be + * concurrently freed and a new PTE page is filled into the pmd + * entry, it is still empty and can be removed. + * + * So here we only need to recheck if the state of pmd entry + * still meets our requirements, rather than checking pmd_same() + * like elsewhere. + */ + if (check_pmd_state(pmd) != SCAN_SUCCEED) + goto drop_pml; ptl = pte_lockptr(mm, pmd); if (ptl != pml) spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); @@ -1770,20 +1787,20 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * repeating the anon_vma check protects from one category, * and repeating the userfaultfd_wp() check from another. */ - if (unlikely(vma->anon_vma || userfaultfd_wp(vma))) { - skipped_uffd = true; - } else { + if (likely(!vma->anon_vma && !userfaultfd_wp(vma))) { pgt_pmd = pmdp_collapse_flush(vma, addr, pmd); pmdp_get_lockless_sync(); + success = true; } if (ptl != pml) spin_unlock(ptl); +drop_pml: spin_unlock(pml); mmu_notifier_invalidate_range_end(&range); - if (!skipped_uffd) { + if (success) { mm_dec_nr_ptes(mm); page_table_check_pte_clear_range(mm, addr, pgt_pmd); pte_free_defer(mm, pmd_pgtable(pgt_pmd));