From patchwork Wed Apr 10 17:06:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13624844 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7166ECD11C2 for ; Wed, 10 Apr 2024 17:06:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DC6C96B0088; Wed, 10 Apr 2024 13:06:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D766E6B0089; Wed, 10 Apr 2024 13:06:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C3EA36B008C; Wed, 10 Apr 2024 13:06:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A65F06B0088 for ; Wed, 10 Apr 2024 13:06:29 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6CD34A0742 for ; Wed, 10 Apr 2024 17:06:29 +0000 (UTC) X-FDA: 81994250898.03.9B8E3D2 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf16.hostedemail.com (Postfix) with ESMTP id 39402180015 for ; Wed, 10 Apr 2024 17:06:27 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ALa6+y2d; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf16.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712768787; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=WEqgvwhAAzls4ErH8/osuK7rTIzstVDebZ900Wgu7zY=; b=uKHh4ZAJgxl2csr4GgfxPE6FMapI54W/Zb03R95QjtfBP1j65fuRJmBiFX8vDfmqDTzRjE OBGtCvoMdyXr9utdIscCzqF0svg/KxFQJfy3lsll5CcpakeC4qbbDIidWGNC46Ue1jUfO1 aypcIJEWo5uLj8QWJn+gBALSyuelWSg= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ALa6+y2d; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf16.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712768787; a=rsa-sha256; cv=none; b=a5sYV/c3KzPwItbOlyIjgFYl51EOhU6iZEa3M3OVdCTPZ91kNEg+zDPNf+Pe/qJ9KYQDrO yy+uQgS13DwPb9iLN/B+XuavWHVK9dDlDCVy5JLF1nZ4Mbdeg5qkrb1DbJO4op1MOtCVH0 cOXzv6W81KJSUflxudNvxjN7o2UkO5Q= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1712768786; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=WEqgvwhAAzls4ErH8/osuK7rTIzstVDebZ900Wgu7zY=; b=ALa6+y2dAKqL5DwT3+NSufS/wu1WazoUTRc5cmkCt6WKpWQMrmnSaNXxCyY2gvzBs3g0W1 yA85+8ZJi8hL3UHxpQORk5Qis5b+Fc10SCWacrIp2a9zCF0ocYyj6Pg3/RNHMJYtfcGLi7 tF0IIBKBQdqbFPDKZkrDhlHZzKw9H2I= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-137-apBRnJkLND6bSl4fppfJsg-1; Wed, 10 Apr 2024 13:06:25 -0400 X-MC-Unique: apBRnJkLND6bSl4fppfJsg-1 Received: by mail-qv1-f69.google.com with SMTP id 6a1803df08f44-69b37bbded3so3545536d6.0 for ; Wed, 10 Apr 2024 10:06:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712768784; x=1713373584; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=WEqgvwhAAzls4ErH8/osuK7rTIzstVDebZ900Wgu7zY=; b=SsGuN0jr+5rJIlqUPL4yxjmXvmLf9nNFbQSfYujcCd+hId3ZZ4jag7TQIBDAGcgBcQ qfGcsna1cP/FE2v3Q9pf6eJZY+10hXODNVs7B7+ey2B6e+2BNcM+6vDTc8B7BdyMxoL8 XyqotzDAosBj/7LqhbVX9I2K3mX20VPmYo6lcQKmS3WXGDElXmVzU4xTbSHYvQ4nObiB AQdafShWPX3l6EqcYYxzHowGFw2Ap+OsEzYk7se/TU34qTjO0MzjH1jEJwUYsvO9Mou9 Xa2gpoajhICQyD0vUWvDPm9CY5Um4tJYpKEuPZEoVdU3sahCS2ME9nkapp/xxBeB2QBT xmGQ== X-Forwarded-Encrypted: i=1; AJvYcCXhx/8LmdHoOTqsXjlqs9nyGDq0ed6tc2BZgDWdJB2X2osvh5NFvdLstcgLDqu03xnsUvnnNo1eTVXdH1QXtP3uDXo= X-Gm-Message-State: AOJu0YwLPVtYUf8WbKnPDxVOKSXEo7feFXj7wmEVGi2Ovz70jca02EXp EAGib0fCGhQZbsY/D0UrfCQ7vOSHIx39nSUT8/hOAf10YhovLUFZzGGej5E9kimcqXHirdjTSbd ue7nWCqRE+xb4dKeUzN2Q/jx4p1MMvaMO9eap7qWbK0Nn9d4k X-Received: by 2002:a05:6214:d6a:b0:699:1907:7676 with SMTP id 10-20020a0562140d6a00b0069919077676mr3352843qvs.5.1712768784285; Wed, 10 Apr 2024 10:06:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFRTPN2u1lI3LzFyPnWWbnzQkk1fo+PcFa8nRoXNuHB2BWBkwYg/viSYICcc2wh/uVZXRgS+A== X-Received: by 2002:a05:6214:d6a:b0:699:1907:7676 with SMTP id 10-20020a0562140d6a00b0069919077676mr3352794qvs.5.1712768783582; Wed, 10 Apr 2024 10:06:23 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id dv15-20020ad44eef000000b0069b1dd6f141sm2956574qvb.137.2024.04.10.10.06.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Apr 2024 10:06:23 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: peterx@redhat.com, Andrew Morton , Matthew Wilcox , Suren Baghdasaryan , Lokesh Gidra , "Liam R . Howlett" , Alistair Popple Subject: [PATCH] mm: Always sanity check anon_vma first for per-vma locks Date: Wed, 10 Apr 2024 13:06:21 -0400 Message-ID: <20240410170621.2011171-1-peterx@redhat.com> X-Mailer: git-send-email 2.44.0 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Queue-Id: 39402180015 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: rcsh4zc7hurrcuj94ze3nkxec1cdzsfg X-HE-Tag: 1712768787-279235 X-HE-Meta: U2FsdGVkX187Ph85NYNjSYYhcCDKktDBKf6nXjxSYiCnEvNwBKvmD9/J8jqNlcfvLCD+Dk3ZdN4RVHDd/DExVQw9KNBYEkCRVOtGgaKFWNpjVRQpCKSWwaq0BDy3EnPNuEvX6O4Q53ahGmxGMhPEOGJMM0V0d8GekAJHk+LAPhoxwBUcosdSHoEZ3ChVN4ToQ5NTSbGWVqazoWaZAL7uxT43E/qk4tRTj59GCC77g4YXuEU6Mhi8ccnO7dW+tnWAN/DKeKeHcy7LfdGl1EmYr0CrI4tFx7+8DFTuUNfOysafgRem6rFUF8Goh6+lGI06mOFVseqoHWhkdvgkDPZ6cHSIWQrvs5uikWkc6eQqdiqWryyrwnMOM18WkTm3ynFvxCTmv0fuIgSSMHzYtUulxKvib9rgD16cNEiq72C79zlnFm+wEEUNOXxgCRqnoeEZ2OnGHH6OywZ4oozy6dphEFnXucAIK2fzliXdj4UYFfFCiz7C7ao/6aS4r+QsQnEW106sDum9RVCibgQeXAGKno+Kxetc4kw1weN/FeYroCEgFshnpyNZEqUHZMpCZM8beb5/ct02rJlItmPvkjthSTsLwBrThIfi05zKryf9WJ3G5kKvTW8gX/j3GfaiZ83DHmwiFNK3O8SMtYuCL0LcbSvWK9+IYDb9QCN5/LjdaSpKCqwl9TSJWkUa//LTYFO8Aw1DQoJlKuAQgLuRi2typauV8jfwQpeGRIYLCXPKDXgszk2fH+R2Vxy+U/wucm2iGIaDQ7Tl6BFCwlqYW1DOpzgMKMPbRM38WXk24r0lGX27kYlFPIxUIe2MILZkZ7tZgTxY46XSqIShC905aBWIezApZvNJIdny9FxZg8FnIbRZVazyI+9ecgD0vQ0d/QAfLzKy/IH20ARIlDDQQ8EjzAABFx1B+TEb3sY2ar0eIrUNwdppjNYVf+WJsckZ7W6aGhfOLPkyRg3KTy9UGqQ raIv4ZZt nwg9jDaOU4v5yJvivy4MV0VEz6ryb0NjYrHvJ3TQF9tTZhGJouK0n14fEw/aBYNVgllrbLuE+FlmupHtsliHPC8UaETDnDkmn5BB0ujLPaCOatDOR8Xhbp6o09eNJJkxO+FXOqFAP9PC5CN4WxiInPu+PbaRHOPAsFkqss3A0GW3QF/dZuTxKDTa5IIed4EY4b0W4+Le0lChCbpivwk7MwgB9mB31bVGdV1sKvclAdy3xAJaAJb7aF4/CuIqU6fDLlbKsSpnb9QiknR+U4DyrECAA4+ZPUAA4bCQ9lV1AFWKiJB6kykaq1gL/8fC6X7LEcSmsiNsgcpJqqeaDZc3cCzSUwS+yennCaVp9g93saURxiLNvge0DMEGef/toonXHB3IOTMCqG0FQ3jLz3fwqLKzd3pLutNdSGI5kQLf+uTbxOUf6Wcbq7eHCW8d8YNWr47G681cPJyyUKcXAg5TA/ccR4Oj3UD8F/b+NegBT88ufNcPfKeFRbqaZysYZlNgm1z4QW+RpJeZV9l3fCOfFxejzzkgh004HmqS1AxrkiKj5ip2JUSfoe1t1CM60m57ueDF0Ajt/YZgd1MzV4sk5nDyg+EH8KOTqYqGT411uD1F/qba86AwEz51u7VslVAg9PDicHncIIqA06RI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: anon_vma is a tricky object in the context of per-vma lock, because it's racy to modify it in that context and mmap lock is needed if it's not stable yet. So far there are three places that sanity checks anon_vma for that: - lock_vma_under_rcu(): this is the major entrance of per-vma lock, where we have taken care of anon memory v.s. potential anon_vma allocations. - lock_vma(): even if it looks so generic as an API, it's only used in userfaultfd context to leverage per-vma locks. It does extra check over MAP_PRIVATE file mappings for the same anon_vma issue. - vmf_anon_prepare(): it works for private file mapping faults just like what lock_vma() wanted to cover above. One trivial difference is in some extremely corner case, the fault handler will still allow per-vma fault to happen, like a READ on a privately mapped file. The question is whether that's intended to make it as complicated. Per my question in the thread, it is not intended, and Suren also seems to agree [1]. So the trivial side effect of such patch is: - We may do slightly better on the first WRITE of a private file mapping, because we can retry earlier (in lock_vma_under_rcu(), rather than vmf_anon_prepare() later). - We may always use mmap lock for the initial READs on a private file mappings, while before this patch it _can_ (only when no WRITE ever happened... but it doesn't make much sense for a MAP_PRIVATE..) do the read fault with per-vma lock. Then noted that right after any WRITE the anon_vma will be stablized, then there will be no difference. And I believe that should be the majority cases too; I also did try to run a program, writting to MAP_PRIVATE file memory (that I pre-headed in the page cache) and I can hardly measure a difference in performance. Let's simply ignore all those trivial corner cases and unify the anon_vma check from three places into one. I also didn't check the rest users of lock_vma_under_rcu(), where in a !fault context it could even fix something that used to race with private file mappings but I didn't check further. I still left a WARN_ON_ONCE() in vmf_anon_prepare() to double check we're all good. [1] https://lore.kernel.org/r/CAJuCfpGj5xk-NxSwW6Mt8NGZcV9N__8zVPMGXDPAYKMcN9=Oig@mail.gmail.com Cc: Matthew Wilcox Cc: Suren Baghdasaryan Cc: Lokesh Gidra Cc: Liam R. Howlett Cc: Alistair Popple Signed-off-by: Peter Xu Reviewed-by: Suren Baghdasaryan --- mm/memory.c | 10 ++++------ mm/userfaultfd.c | 13 ++----------- 2 files changed, 6 insertions(+), 17 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 78422d1c7381..4e2a9c4d9776 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3219,10 +3219,8 @@ vm_fault_t vmf_anon_prepare(struct vm_fault *vmf) if (likely(vma->anon_vma)) return 0; - if (vmf->flags & FAULT_FLAG_VMA_LOCK) { - vma_end_read(vma); - return VM_FAULT_RETRY; - } + /* We shouldn't try a per-vma fault at all if anon_vma isn't solid */ + WARN_ON_ONCE(vmf->flags & FAULT_FLAG_VMA_LOCK); if (__anon_vma_prepare(vma)) return VM_FAULT_OOM; return 0; @@ -5826,9 +5824,9 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, * find_mergeable_anon_vma uses adjacent vmas which are not locked. * This check must happen after vma_start_read(); otherwise, a * concurrent mremap() with MREMAP_DONTUNMAP could dissociate the VMA - * from its anon_vma. + * from its anon_vma. This applies to both anon or private file maps. */ - if (unlikely(vma_is_anonymous(vma) && !vma->anon_vma)) + if (unlikely(!(vma->vm_flags & VM_SHARED) && !vma->anon_vma)) goto inval_end_read; /* Check since vm_start/vm_end might change before we lock the VMA */ diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index f6267afe65d1..61f21da77dcd 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -72,17 +72,8 @@ static struct vm_area_struct *lock_vma(struct mm_struct *mm, struct vm_area_struct *vma; vma = lock_vma_under_rcu(mm, address); - if (vma) { - /* - * lock_vma_under_rcu() only checks anon_vma for private - * anonymous mappings. But we need to ensure it is assigned in - * private file-backed vmas as well. - */ - if (!(vma->vm_flags & VM_SHARED) && unlikely(!vma->anon_vma)) - vma_end_read(vma); - else - return vma; - } + if (vma) + return vma; mmap_read_lock(mm); vma = find_vma_and_prepare_anon(mm, address);