From patchwork Wed Dec 4 11:09:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13893583 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88670E7716D for ; Wed, 4 Dec 2024 11:10:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1EBAD6B008A; Wed, 4 Dec 2024 06:10:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1C3D16B008C; Wed, 4 Dec 2024 06:10:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 063EA6B0092; Wed, 4 Dec 2024 06:10:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id DCAD56B008A for ; Wed, 4 Dec 2024 06:10:34 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 78DDE412C9 for ; Wed, 4 Dec 2024 11:10:34 +0000 (UTC) X-FDA: 82857007968.12.43DB798 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf14.hostedemail.com (Postfix) with ESMTP id AE7D3100003 for ; Wed, 4 Dec 2024 11:10:16 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=dGYAAp8V; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf14.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733310626; a=rsa-sha256; cv=none; b=Jj3l+zzxkZezuVFAvMRMrxVzg9eUQcMJg3kB500LEJSd18CmO6jCi1wotm5Mb+uB3v1mNO rvnmNFMb4Ul2tovjTZ2p32tq+aaN7MAXJkGmfm7Svpw73rDp4dcV5FWFW4i9YLzWwhGILR YwNmG3yFZssO1ttiOmNbeJgTTdyhmrw= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=dGYAAp8V; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf14.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733310626; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=z5NhkFR9u/ZCBqXODjRxrXNpbWksUAj27FTFaGEH9DI=; b=GRHYh9QR6CQO8BmPAJ39nSybpYklrdSBaNFi9UZD6vHGQ65pdyyCwJRp7gVAouuR90zc8v rkngsg3scL4/bb1S6RZggy8Mqj4yMv4wkwGW//2qqUtbtgBvEz1NnnWsg57TN3owLXYHyk kRlIpRKAnQbA/gMTI7N65ukvnWp7nYo= Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-215b4681c94so20335665ad.0 for ; Wed, 04 Dec 2024 03:10:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1733310631; x=1733915431; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=z5NhkFR9u/ZCBqXODjRxrXNpbWksUAj27FTFaGEH9DI=; b=dGYAAp8VolX1ee/OxVQrDmNGZG+AXXmgHKX0+aRk276cIP0MhPeB//Up/a2ksfBHLw jg5Lxs7Wn2Z9nvLdV4JhuFY67c/ND0sN22c4+EfDZHQP6MiVr+4FovRKK1v+R8xRRzpq AdxIUiF+4ZunoZJO2Ycy8R5YCBLexrqiVRPVVvnz+KNClDiE/pTlWRy9xaHVnMPoqTF+ Knp6vVugwQUJ28QK/jDdhfUdJ4uhRUp8r4+ga/PH2D5w9IFDxPXqZHTV76h94U7uXQRi UDIoLfQU1U0f2Gh1Ah4LKc0e/VaFodirgugNm+SQKuaCekgC90VwrCU8RbA+7nkmlCx8 kREQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733310631; x=1733915431; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=z5NhkFR9u/ZCBqXODjRxrXNpbWksUAj27FTFaGEH9DI=; b=MTq/bF1eRrPVrLuggtcNlTl/qbFkz4uF75MiZe3kXDbyRM8ivr1Jo8rVZAV+6eLvp3 Cx81JGFXTAEL0hu+EXaU2tFp38prpLizC990Q22pQGxYsrVQVFUoA8oeW8rONIFTvpz9 x+ciAX7NYp+GhsagBCe3TCrvcuLzFg0p956AuxlWhsvtowqr9jB5kMlxTsvmy5sXfAx8 NHgQWKlGMCq9cX7qqB9d9+CfMRzT4Bfl+0yqiJEXIGw4egceGdIU/U7ciTALxZDqcvq6 mzm1fZAaK84VXWUYOeOptLnKZ2ZLa8Y5Z/emQ/4/MfVkjUvCYOpEeSnmYcKsxNU3TY00 v74g== X-Forwarded-Encrypted: i=1; AJvYcCWnQIBUDOZB6s20kfBbsWKmH6/jM889B44v01PRqJ4/rnNjDmf6gtl/TVwjKs/QNQKGCFlx0x3LUQ==@kvack.org X-Gm-Message-State: AOJu0YxNcu6YqB98wK9RT0mQ1AU0zDE3mXxq7VWFE3i6nRxt0b6OZy7J CqivOq0TyECH9BXMYgZFvK51KF1LOspbpKbUuqsZfZZxILrv+pnwMMUPZsySQxc= X-Gm-Gg: ASbGncs5LdHUv7AR1wTnNWBJrQc3KTJg0Se8L0mv8L56eaRVzTJ/xnOGiIgVez9QAbs yyFYavSEpL419czWN6HDXcFv6JXTByLvaH3oKmnaKgCelD2qs1nYeSfRU6PF1Jy8pIpTaWLbgdG wVcL84nBEDbaLxI4GA0xxwstoiW8ysJW7vLEAYor3HuKD4PD6yjv79DnuiPAoEOyvY8FJWu+wPP NKsrXVtq/WojMqgwm6Dxd4wwVN7886yZkr9hSvPZKXVpm8Keu2aoriHWX3qEB1NegqjAHA+h40P ++k/jlTNB7Aj3+o= X-Google-Smtp-Source: AGHT+IHr/3WEXhcUbT+3scjK3eDqtKNFZN/P/43E0ViNGyViHEQX/poSMnV5EjBiq3NGqRPWRhcOyA== X-Received: by 2002:a17:902:f64d:b0:215:9894:5679 with SMTP id d9443c01a7336-215bc42229emr92525605ad.0.1733310630948; Wed, 04 Dec 2024 03:10:30 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.148]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21527515731sm107447495ad.192.2024.12.04.03.10.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Dec 2024 03:10:30 -0800 (PST) From: Qi Zheng To: david@redhat.com, jannh@google.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, peterx@redhat.com, akpm@linux-foundation.org Cc: mgorman@suse.de, catalin.marinas@arm.com, will@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com, zokeefe@google.com, rientjes@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Qi Zheng Subject: [PATCH v4 02/11] mm: userfaultfd: recheck dst_pmd entry in move_pages_pte() Date: Wed, 4 Dec 2024 19:09:42 +0800 Message-Id: <8108c262757fc492626f3a2ffc44b775f2710e16.1733305182.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: AE7D3100003 X-Stat-Signature: uts756ojsujekiq4sktbw8p4j7qdwgy9 X-HE-Tag: 1733310616-761314 X-HE-Meta: U2FsdGVkX1++u9tAUSC54iQZUhtInykE69KevdWnpZHDri+hzgge2jysNvOyxd8FhzibfLfb63iVA20GZpnVxf6DCF4Z9be+jLsRCr93EfO8F70jVecJY8F+DFEZHLxs3XovdH9FmL6v9N9UStOX8JSDBVqBeJVobQRjunqyl7chWlmT8RmzabWBiZKnqGv4VKo1u0BJ8VM5WdV70UwkcG1qA7xwkWNR5BQRRbbm26ktWc5UxS8Jy1vG4sk8sCOE4czho3aOAcYTwEyJDjvccfgG2nuDIWIi90eMZJnBJGSYJoupQ5tt8TgkS/aUf2BiWtl0KvG1CBEwhoXmy0VyW5O2qX6JxaBnHfDjsJhdSJ/McIhb5SrHypno90Ab+bP8fDPJ9ZKoNGlwk3sYxU6Wdcd5PjIiC6Dcnxz6B82u3R800mZD1G38sOJVEwD2T5+Dq5iHpih+hMgS9qpAVMYh31Mnrq86vx5B9H2W7/SA1EhgS2n+c0S8AUerDhycXLcRlhKJIZnuLeD6tYdmZKFRODqac08/du/ZA2UcHqh6JimMKK0jeys6eFnEKoNQTCyr02gUI3gqIWLWkXiMjNXl9S8ypCUt1jUMwWx859J7pmO8wkHBacmCglTKCTkBRJ4LuCfGIJV5dlsZ187ybXF9XXM5MEPXAENTz/4de23H4o1x697xdxjVvVFTyqRT2iKiSD6n/wfjEJsobm/GAtJZtzaXTsAxdpEE7d6JFKRQ0sEvrG0hDPOCJnDW2vo/NYdHcvwUd5UXq/vn4Vp7v7sDrLNnQVC9r9Kr5SPwvFbSVzTmqGCp/giBhxrnl2MBfh+XfkUhdYNsH2groxC6+8R88FFlGaLQJTRGtufXikfeBKmLl8WLLrtgLwcbBD5zXpPw9Xc3681ptAvVqWk8t9LVuUXxuJtKHCRuUFBvO8na3ULLR0g6HMd2bRY5VEiY5QxywCerH4eMVL+LlYukGSc pW7UD3mL DcRdAYJzlGh00YEWCdPnYXj7KDWWxB+7yWbgSHYf4qOAoTV+lTj46uuhrW+5/sx+CTrhlm7Untp1UqsoDWFTTc7VD5Aj9JM+njGlJenqRPJRevOyYCgKHb8oA3lUoj1NytZmNl2z62ia6j1Tow3uQEjanfmwL90ObFPf6cqSUhgJi7CHMPibWb0BxbabL4Lvz6haMNL6Mkfl9tmExXf3HqjMN96g6oGeaR+/VmxZeDxFGwbrC74fHpH6tNlJ0xCn3Wa88QrSWq4MSS2WtFvLOvAZDUvAUaAXKR6lfrv3/tQCPhlMtVhiv5aJBRWJmKwtnHj5R2SyDXiRxUxuC9c+tojUvbV3ewtRm6mzP5FGInlRltxM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In move_pages_pte(), since dst_pte needs to be none, the subsequent pte_same() check cannot prevent the dst_pte page from being freed concurrently, so we also need to abtain dst_pmdval and recheck pmd_same(). Otherwise, once we support empty PTE page reclaimation for anonymous pages, it may result in moving the src_pte page into the dts_pte page that is about to be freed by RCU. Signed-off-by: Qi Zheng --- mm/userfaultfd.c | 51 +++++++++++++++++++++++++++++++----------------- 1 file changed, 33 insertions(+), 18 deletions(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 60a0be33766ff..8e16dc290ddf1 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1020,6 +1020,14 @@ void double_pt_unlock(spinlock_t *ptl1, __release(ptl2); } +static inline bool is_pte_pages_stable(pte_t *dst_pte, pte_t *src_pte, + pte_t orig_dst_pte, pte_t orig_src_pte, + pmd_t *dst_pmd, pmd_t dst_pmdval) +{ + return pte_same(ptep_get(src_pte), orig_src_pte) && + pte_same(ptep_get(dst_pte), orig_dst_pte) && + pmd_same(dst_pmdval, pmdp_get_lockless(dst_pmd)); +} static int move_present_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma, @@ -1027,6 +1035,7 @@ static int move_present_pte(struct mm_struct *mm, unsigned long dst_addr, unsigned long src_addr, pte_t *dst_pte, pte_t *src_pte, pte_t orig_dst_pte, pte_t orig_src_pte, + pmd_t *dst_pmd, pmd_t dst_pmdval, spinlock_t *dst_ptl, spinlock_t *src_ptl, struct folio *src_folio) { @@ -1034,8 +1043,8 @@ static int move_present_pte(struct mm_struct *mm, double_pt_lock(dst_ptl, src_ptl); - if (!pte_same(ptep_get(src_pte), orig_src_pte) || - !pte_same(ptep_get(dst_pte), orig_dst_pte)) { + if (!is_pte_pages_stable(dst_pte, src_pte, orig_dst_pte, orig_src_pte, + dst_pmd, dst_pmdval)) { err = -EAGAIN; goto out; } @@ -1071,6 +1080,7 @@ static int move_swap_pte(struct mm_struct *mm, unsigned long dst_addr, unsigned long src_addr, pte_t *dst_pte, pte_t *src_pte, pte_t orig_dst_pte, pte_t orig_src_pte, + pmd_t *dst_pmd, pmd_t dst_pmdval, spinlock_t *dst_ptl, spinlock_t *src_ptl) { if (!pte_swp_exclusive(orig_src_pte)) @@ -1078,8 +1088,8 @@ static int move_swap_pte(struct mm_struct *mm, double_pt_lock(dst_ptl, src_ptl); - if (!pte_same(ptep_get(src_pte), orig_src_pte) || - !pte_same(ptep_get(dst_pte), orig_dst_pte)) { + if (!is_pte_pages_stable(dst_pte, src_pte, orig_dst_pte, orig_src_pte, + dst_pmd, dst_pmdval)) { double_pt_unlock(dst_ptl, src_ptl); return -EAGAIN; } @@ -1097,13 +1107,14 @@ static int move_zeropage_pte(struct mm_struct *mm, unsigned long dst_addr, unsigned long src_addr, pte_t *dst_pte, pte_t *src_pte, pte_t orig_dst_pte, pte_t orig_src_pte, + pmd_t *dst_pmd, pmd_t dst_pmdval, spinlock_t *dst_ptl, spinlock_t *src_ptl) { pte_t zero_pte; double_pt_lock(dst_ptl, src_ptl); - if (!pte_same(ptep_get(src_pte), orig_src_pte) || - !pte_same(ptep_get(dst_pte), orig_dst_pte)) { + if (!is_pte_pages_stable(dst_pte, src_pte, orig_dst_pte, orig_src_pte, + dst_pmd, dst_pmdval)) { double_pt_unlock(dst_ptl, src_ptl); return -EAGAIN; } @@ -1136,6 +1147,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pte_t *src_pte = NULL; pte_t *dst_pte = NULL; pmd_t dummy_pmdval; + pmd_t dst_pmdval; struct folio *src_folio = NULL; struct anon_vma *src_anon_vma = NULL; struct mmu_notifier_range range; @@ -1148,11 +1160,11 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, retry: /* * Use the maywrite version to indicate that dst_pte will be modified, - * but since we will use pte_same() to detect the change of the pte - * entry, there is no need to get pmdval, so just pass a dummy variable - * to it. + * since dst_pte needs to be none, the subsequent pte_same() check + * cannot prevent the dst_pte page from being freed concurrently, so we + * also need to abtain dst_pmdval and recheck pmd_same() later. */ - dst_pte = pte_offset_map_rw_nolock(mm, dst_pmd, dst_addr, &dummy_pmdval, + dst_pte = pte_offset_map_rw_nolock(mm, dst_pmd, dst_addr, &dst_pmdval, &dst_ptl); /* Retry if a huge pmd materialized from under us */ @@ -1161,7 +1173,11 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, goto out; } - /* same as dst_pte */ + /* + * Unlike dst_pte, the subsequent pte_same() check can ensure the + * stability of the src_pte page, so there is no need to get pmdval, + * just pass a dummy variable to it. + */ src_pte = pte_offset_map_rw_nolock(mm, src_pmd, src_addr, &dummy_pmdval, &src_ptl); @@ -1213,7 +1229,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, err = move_zeropage_pte(mm, dst_vma, src_vma, dst_addr, src_addr, dst_pte, src_pte, orig_dst_pte, orig_src_pte, - dst_ptl, src_ptl); + dst_pmd, dst_pmdval, dst_ptl, src_ptl); goto out; } @@ -1303,8 +1319,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, err = move_present_pte(mm, dst_vma, src_vma, dst_addr, src_addr, dst_pte, src_pte, - orig_dst_pte, orig_src_pte, - dst_ptl, src_ptl, src_folio); + orig_dst_pte, orig_src_pte, dst_pmd, + dst_pmdval, dst_ptl, src_ptl, src_folio); } else { entry = pte_to_swp_entry(orig_src_pte); if (non_swap_entry(entry)) { @@ -1319,10 +1335,9 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, goto out; } - err = move_swap_pte(mm, dst_addr, src_addr, - dst_pte, src_pte, - orig_dst_pte, orig_src_pte, - dst_ptl, src_ptl); + err = move_swap_pte(mm, dst_addr, src_addr, dst_pte, src_pte, + orig_dst_pte, orig_src_pte, dst_pmd, + dst_pmdval, dst_ptl, src_ptl); } out: