From patchwork Mon Nov 27 08:46:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 13469264 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D351BC07D5A for ; Mon, 27 Nov 2023 08:47:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6DF006B0325; Mon, 27 Nov 2023 03:47:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 667846B0326; Mon, 27 Nov 2023 03:47:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 508776B0327; Mon, 27 Nov 2023 03:47:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3E7066B0325 for ; Mon, 27 Nov 2023 03:47:26 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 0FFD112016A for ; Mon, 27 Nov 2023 08:47:26 +0000 (UTC) X-FDA: 81503105292.15.75D02EF Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) by imf11.hostedemail.com (Postfix) with ESMTP id 35E2D40013 for ; Mon, 27 Nov 2023 08:47:24 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=YQetLNQl; spf=pass (imf11.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701074844; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9POB4kFuEKuzMcdZHLoubaPw/TJvHsiTa281mWkLZq8=; b=GIa5fn17TkXMCZih66CqaDEgx/qUwNJYw3/+eg7Ano2v49JNNy25zeTywJltOdbyd1M/BE D1Qn0oHLOqk6hmy16sgyIldeiDIYvOGpGPNJT0lc42taieO6b5dzHXV2Qy/sjcNcWezcFf hO4ZvffC2Z8rcPaVvbAeceeSPNrFr54= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701074844; a=rsa-sha256; cv=none; b=740lUAkgFlNzSWyNVV+9uRqs7Wg9jI1mBVsqf79aehvawFnGNCs0wRgmFEcQvjgacFloWz 7HqhDZMhA9Tpvo1AphkNN7rmx/v2S5C4/fZ/jyHB4sSQRG8qWQVMbEB2kBYfnKMK0MSuk1 ypbmdeYH4ksfH8kHn2lBaq0uen3F/X8= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=YQetLNQl; spf=pass (imf11.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com Received: by mail-pf1-f175.google.com with SMTP id d2e1a72fcca58-6cbe5b6ec62so2898344b3a.1 for ; Mon, 27 Nov 2023 00:47:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1701074843; x=1701679643; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9POB4kFuEKuzMcdZHLoubaPw/TJvHsiTa281mWkLZq8=; b=YQetLNQluMt4CiZT8nIos1TjqdIA3K8UtgfrnP+ukiKxD3nMNTJ8hB+ejZLeua49FO BEFys0Q2IWXeot9PeX89GmQJsb7zOTbypOK5fqETXku2Z0IHELzmFxgxUJP+tAhrjpN2 9tky29KV6qujIPBanu+iv6jOSuR9CHZf979MFdjWzxCW+2rYIsSG0IqLGCqFP07u4Ewu AUIgmzrKQsP4ebZRs4iqoVSue3DH6/TY/cgv/wy1+F3RnlMZ6eOAyX+gB22AvfdXBfZZ x7c2/nuEMYy/tK6eiBAuszaKc9kzSp5yzSBTIdNiJR6mFxEnN9uhtU25/SE3amVx7WBr KF8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701074843; x=1701679643; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9POB4kFuEKuzMcdZHLoubaPw/TJvHsiTa281mWkLZq8=; b=PZgsyqCokSPi88WdJhLFDGdJqqh23/QY7ueadqQ8ZlBvnp6rG3AlCX4bBmzlUJgoo8 QiBvzLrMG6MxCVGMrjP3p56fqLiVJDZ29nVQoXqKxShUn2gG1MJ2omj4nqJUxQn7CbaW lf49C1JgAy9R/8eB6VEHkbzvWmuDm/ArhM3JxE3jWrF6UQSn8VntJhjXmMnM2Dr/IfDW o8OSYQlNT2dMQVMoh948ybsDNxWeA68fWMc+NSyHQXHNvBQ9XjEo4XBxHWGF/RFkKo2r Ssdv1XbO7yCumeuuce4G5qygCdYgrnpuz6xbrp8zp3C8psM1fptLoeoAE+tvaFjbEear ZxPg== X-Gm-Message-State: AOJu0Ywq8HXaM1JV+oMRnH0YDoY0/YtrjoDp08/0pj2jt36+TyECj+zb 5EVEszXm92sr3dI2iVg9YNcn4A== X-Google-Smtp-Source: AGHT+IGPbAAqj7jljm3HkPkr6GUGngrFdQat2yh2VPEfUSz7fn/6W6SAjj42octpaoYEmJpwaYLguA== X-Received: by 2002:a05:6a00:3926:b0:690:c75e:25c8 with SMTP id fh38-20020a056a00392600b00690c75e25c8mr11586636pfb.7.1701074843002; Mon, 27 Nov 2023 00:47:23 -0800 (PST) Received: from PXLDJ45XCM.bytedance.net ([139.177.225.230]) by smtp.gmail.com with ESMTPSA id e22-20020aa78c56000000b006c875abecbcsm6686932pfd.121.2023.11.27.00.47.20 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 27 Nov 2023 00:47:22 -0800 (PST) From: Muchun Song To: mike.kravetz@oracle.com, muchun.song@linux.dev, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Muchun Song Subject: [PATCH 1/4] mm: pagewalk: assert write mmap lock only for walking the user page tables Date: Mon, 27 Nov 2023 16:46:42 +0800 Message-Id: <20231127084645.27017-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-145) In-Reply-To: <20231127084645.27017-1-songmuchun@bytedance.com> References: <20231127084645.27017-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Stat-Signature: azfy4yzfh4a8s448z4rfqbw8cmabmhsu X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 35E2D40013 X-Rspam-User: X-HE-Tag: 1701074844-33634 X-HE-Meta: U2FsdGVkX1/VUFeWh5o8GPkOKT+h7VktUh4bHHK4Jh2d3Xxi3BMMDdLhCLOPTnKyBeKwhylSha1vZE8UtVs6DnA4X4zRz91EmWq4xgbvcxFqhY5ROVM6oZLzQCIbCSXiKTtzOABP2lFijqp+Of+2R8N6ChH0yZkQTJ0wSKIuCg80lr1XIwTELhw7epMLOQjv3RHgYIp2qlD/cojBl6NvJiWnj7MQ91B2Ddvd/+Jwkq5+1VoWX6BztlFtjOVj5tatTSBl2zesCc1GRXhoFtK6XXHEFroff+nIherAm4ciZrShpTbinIt5F9XMLIxh+ZvlqVzMdVEwBZ01yxMdmKanUxJx7uOKrVQBVKbSjrhGhZhmwKoG0x3aaOMhskdMKhSO8MC4q27AEqfuYx94blkJcMEMvSg3aY7PNi1rbaYpkF9niueVMJ8GQbO0UNEfw/rBjuDinfbssA46JbAlRsbrZjhbT2FEz8eG1hVHzfCzsYTOSDF3hk502h7a1MOvGSvRmICQAbO57cn7JTPOrDCO/oWeUB9jM8HUH9CBHTD4xKPYd0JXT37bMn7N1LubXoZk74oxKj8PeeZS6ENKMxztI0ERpV0AQ2hYk12aj6P+15aF25ScQ/lqUf+z0hCv5tRqXx8r4U6L8qC+b+upjt7sa3N5x0xY6fZYuPJDOeMj8RQd3l/MpXee/iiT3IVq7zqBHYrc3wmBxTsfnokgI0TpfJPqEa/b/PxDYXgtrgaDTVp+HMf6ps60NqzI2/UFEcdyOM8FF8H0wdor8zQj48hqhuU9t+kIsVHU3KYosJHWre5OVVnhFee5vhKlJwMkN9vxP/8J0FNDAPsWW8CJQ3Pbk8gSqrS4Fos2eTfPUfMbAJ63f+1Z87xiqIEAxEcKXt4vxmCoU0kxQWL1++4F/4HWKMSWu0Gq7HdvELjW5gb5pwY0P99od15MHCmZtNK/nKGjUGcRoL81CqK/jPK9KBb xNWOjw7x /jZFRd9KBDUfEf9nfrs/8XFO0eIJg0rv+8DsNX+C7OmPghxfx6PArufhpQb83XhoSpjglG3eBZn28wwjAENd0zQMtZqeQTQ197nEnYdKx/uZUiMcDgnl+CvIH5VcmsIuRHO3WzPptaWB2X1gBsjUdMmQRyLNV6ac/oWt6ZUlK5L15FyL53Oweqq6OiDJk9Vdxv0OlrceIqAtncE/WuMLoj4rienc2GYcR+e0pGS7uMSN0vx5h8fxUJBYVumxbbNLQ2OH7ToM5UZoDNVy9BFjLnpdl8tqedx3M+DuGKEOQ3EOmWFhp7nsMjXNKvztsITRKEQXvlzXnOiMH264m5b6lk3Bks4GUKK26DHx2Mucpl+Pu7m4zXFxHtY/iDktTHnx574JL6Dk2nq5ttQ2foo1qc0WCdv+MDiJzKoxp X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The 8782fb61cc848 ("mm: pagewalk: Fix race between unmap and page walker") introduces an assertion to walk_page_range_novma() to make all the users of page table walker is safe. However, the race only exists for walking the user page tables. And it is ridiculous to hold a particular user mmap write lock against the changes of the kernel page tables. So only assert at least mmap read lock when walking the kernel page tables. And some users matching this case could downgrade to a mmap read lock to relief the contention of mmap lock of init_mm, it will be nicer in hugetlb (only holding mmap read lock) in the next patch. Signed-off-by: Muchun Song Acked-by: Mike Kravetz --- mm/pagewalk.c | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/mm/pagewalk.c b/mm/pagewalk.c index b7d7e4fcfad7a..f46c80b18ce4f 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -539,6 +539,11 @@ int walk_page_range(struct mm_struct *mm, unsigned long start, * not backed by VMAs. Because 'unusual' entries may be walked this function * will also not lock the PTEs for the pte_entry() callback. This is useful for * walking the kernel pages tables or page tables for firmware. + * + * Note: Be careful to walk the kernel pages tables, the caller may be need to + * take other effective approache (mmap lock may be insufficient) to prevent + * the intermediate kernel page tables belonging to the specified address range + * from being freed (e.g. memory hot-remove). */ int walk_page_range_novma(struct mm_struct *mm, unsigned long start, unsigned long end, const struct mm_walk_ops *ops, @@ -556,7 +561,29 @@ int walk_page_range_novma(struct mm_struct *mm, unsigned long start, if (start >= end || !walk.mm) return -EINVAL; - mmap_assert_write_locked(walk.mm); + /* + * 1) For walking the user virtual address space: + * + * The mmap lock protects the page walker from changes to the page + * tables during the walk. However a read lock is insufficient to + * protect those areas which don't have a VMA as munmap() detaches + * the VMAs before downgrading to a read lock and actually tearing + * down PTEs/page tables. In which case, the mmap write lock should + * be hold. + * + * 2) For walking the kernel virtual address space: + * + * The kernel intermediate page tables usually do not be freed, so + * the mmap map read lock is sufficient. But there are some exceptions. + * E.g. memory hot-remove. In which case, the mmap lock is insufficient + * to prevent the intermediate kernel pages tables belonging to the + * specified address range from being freed. The caller should take + * other actions to prevent this race. + */ + if (mm == &init_mm) + mmap_assert_locked(walk.mm); + else + mmap_assert_write_locked(walk.mm); return walk_pgd_range(start, end, &walk); }