From patchwork Thu Nov 14 06:59:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13874611 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A2C4D65C4F for ; Thu, 14 Nov 2024 07:00:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C1706B008A; Thu, 14 Nov 2024 02:00:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 571FE6B008C; Thu, 14 Nov 2024 02:00:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 439DB6B0092; Thu, 14 Nov 2024 02:00:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 25CE36B008A for ; Thu, 14 Nov 2024 02:00:45 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C5BB7C0F87 for ; Thu, 14 Nov 2024 07:00:44 +0000 (UTC) X-FDA: 82783801422.21.09AC6BC Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf15.hostedemail.com (Postfix) with ESMTP id 25DF2A0472 for ; Thu, 14 Nov 2024 06:59:57 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=CxTqPyyf; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf15.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731567465; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=z5NhkFR9u/ZCBqXODjRxrXNpbWksUAj27FTFaGEH9DI=; b=2fi3+PSuU9OEoGVGwK4GvqOTk00x/uFvqT1Sh7Dge1b77hd0FAT5SpyeMf1XDsmejd4Red E/gxtnCXlla6JR8tgY5Zij1HsGnOLXTvs1cddEdydKSbxMi/T6oPmK/4zYaApusD73Fs3D iEfHL5ZaeSX6KKz1tyIO9BrZiTBRUpA= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=CxTqPyyf; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf15.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731567465; a=rsa-sha256; cv=none; b=AZMahC+zyZEdzV2QgccivjJZy8r5l79u0NRF1m4Wl5/LqpTCXAIrKK1jwKaKZpF7SIkvLl MD3kroUWGx1iqBgJ3xNQxqw0KBYSvZAXBkYeVJ9+hAJImJCwMcCckpII7Mmw00y+9Kaqt0 xUNDv7oha5RwFmbjTDNUDKhl5Iw7rm4= Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-20c714cd9c8so2557605ad.0 for ; Wed, 13 Nov 2024 23:00:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1731567641; x=1732172441; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=z5NhkFR9u/ZCBqXODjRxrXNpbWksUAj27FTFaGEH9DI=; b=CxTqPyyfwlrV2pJzTaSTrEPi9PSjH6TOV/N/eiaPvdZXUpPGzl4xP8CleXB8WvIytp pzsWWMJ7gT0A2C7ZkaJ9KfrzNYYqUW2znmShosHIvOTTfwE/4fiVxNdHUvFw84S7eGy7 YHiFhFJmeiuJEGoRX5ul3pJn7QDdzH8keFwVZWTnSFn/HRLYSIXgD1/pja/+lRR7GsiI 3lOmWMvB4zDywUzo2oFp9Xrh/YgPaGDy4NvkfxtdrQVSjv0R43o8d2ZfcwMnR7Kx8mGq JxxG2ETEJguN06FKOOrcHt9kS5ON54nCF+EskVkTzg46c/yQ0qhv4mVqRluT5Vx4sNss /geQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731567641; x=1732172441; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=z5NhkFR9u/ZCBqXODjRxrXNpbWksUAj27FTFaGEH9DI=; b=YSvdH5jYVyyTZQU2pUyVztl4HVq6kOeimu9Z+xrBgPAgBAWRRs80W2nhzerdNSeD20 4Zxn/AHn/FQGu3IE5DbXVHZA3qH/2RacLofBPw1ErMY5ZMxMt3jIuN4rLg0kQi27GqLX 3WeXgXxRKg6H34k+y3lJnJ2Dpo5V9HDy3118wR185qXUa92wo2TI86HRI+1iALrlI+FE s3JQ/i7c0Zt2Lw55D2iiZ9PBxtNpd2NxonGLvQ9RE2U07G/TKvsnTXlfAnrHviBGOGyE srooYvKjOxSzbPnHrJ9Qy35YdIhhw1+Sj4IVYbS7UoCwgVw7nKd3u/oHYSbp88Z1rakO nHhA== X-Forwarded-Encrypted: i=1; AJvYcCULNJSK/ygWgPxjDg/RwQF0LE6X50SvtqshwTU1229WXLol1XyZVZ5w3Up4pRzxse19AKzkoQFwOw==@kvack.org X-Gm-Message-State: AOJu0YwnlW5xZhj5DlqqgJMm7OrqyaCBYYadvA+yvDInGoK0xBCGQB8f G30qiLc5yFrxxV8vBGpx1iHuwhGn/20po24k7jS+8SVXkjZhnltCaCzTtxHM22g= X-Google-Smtp-Source: AGHT+IHBiLojldOiy5h7kyJqW1/SyLghLkJxoovGFCwaxMGieVNX0Uk7Et6paaJVqXGLWQ0ZBmUDAg== X-Received: by 2002:a17:902:d489:b0:20b:b75d:e8c1 with SMTP id d9443c01a7336-211c4f9cf52mr14080785ad.4.1731567641481; Wed, 13 Nov 2024 23:00:41 -0800 (PST) Received: from C02DW0BEMD6R.bytedance.net ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-211c7d389c2sm4119065ad.268.2024.11.13.23.00.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Nov 2024 23:00:40 -0800 (PST) From: Qi Zheng To: david@redhat.com, jannh@google.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, peterx@redhat.com Cc: mgorman@suse.de, catalin.marinas@arm.com, will@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, zokeefe@google.com, rientjes@google.com, Qi Zheng Subject: [PATCH v3 2/9] mm: userfaultfd: recheck dst_pmd entry in move_pages_pte() Date: Thu, 14 Nov 2024 14:59:53 +0800 Message-Id: X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 25DF2A0472 X-Stat-Signature: aedbzp4zwkwk4beb3nbg75ye7bojdhck X-Rspam-User: X-HE-Tag: 1731567597-821860 X-HE-Meta: U2FsdGVkX18BItWCEMZN4tPQejuo6LIxlbcWzjYzf55G6KMIW0Qt1zGYYVwn04KV64dqxUCnpCze/sp8VSjBmFsisG4/0LONyDRxCqniR+DIP6TSUxt3/hh+PbTS3Kz8ZzCWZwWkmj+fGHXDj+7dfbP6jaBDa9I5ryX6HsvbIr7Dyob2woVZKXgDSIsi1K1gSZnz4yP0thBoLGracdljQTkHgGyzdjVaepJhPIXqqU2L8dKLZYhMdiCMSKRx/UYQGVncSyJqEDZx/XQIx15+ZPTc5dqOjfKSzkAG9FL7GtUu/3UD3NsKxvP4VRFB+tvELe7LFjh/cD+dgPWfiUh74hbAeQY1dIu5I6uTN6otax435I/YM/yVfSBVv+QqHRf05je/iZlPt8hJkuCPgb7c6YZS2XSas994frbF2xpGfT7ZMD131rhbtFp6kJGRpdMkF+m4nGVCy1wPRARZWIZ4yjl2mjQR9pbcV35cbi0WIkb8nrU0dxolD1GSDX3Gse/4z0eqEUhLzpNOh0+vzX8/alMWeUDHlf1RBzReCJeAiOnEaBtJ0RHVvNycjbYBSlH/mZNiCHt4UImcrMnHxJCKEkUTcyWU4q2pgMrooWrHnowJhvxt5S8vffuO6CB5FGARdNzpA9GcloDh/wJhdTcYItRQ+86xg6jyHTr1ILbntoxyLpDq3qJ/i1qihWrQPCaZwfTjGpfgq5aaDEWKP+aZoIU6njWVQJwJKzHWCiGBfMKdBvGOfEamVKGBalfWbep7P5zcmeEQktD3l7O4aWuXAOpCH91Ptu5403ZEMEzXuymbOFYJLkG82wwvNMac9G63nmmm/twbp3JfmMJIyoDmMCvJTr3aIFjGrcDK2KjSVWNx5I9y1no6U2s5HY+q56mSzwgxiMpNUjahRQ91suBe+/fcAtjpRj+iYZAli9OgdH3TvvPthc/aFeTwu0+0INeEX79j1cImoxR4qjGEucR 1LXzrujb r+R1e5fjE+hkNGhcqh4dcJl+A20R3YyYLFjQMKVuVcgerO5q5TV4ARz2gHq+MvlVbVuw6zfq3WwlQ12qsmRJ1y9ruQqlJG/HAJb4SJnX/vBjZiYlXmjLvsADjaBlwX1K71+0wpX24W6Kt5+iFSgvuSP4QBhq7MR88IbLl+Rd9a5SNAcLljTpn6RjKc3BTSfVxMRxT2IWSa0ytZpHlFpSZJj+zQtkbGW/RqBZp2PruPg51sH04sXhwUvyiF4oAx1GNhuuBzqwfqzML0IkFoLJJoQrVAtfhc5/Xq0aJcy/q4cdvuF7RZjknEKXTSPpwpGfwm92RZA9BrE1SeBWJkYde/xI9s5oQlWXT7sa/63X6Vtmjfaw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In move_pages_pte(), since dst_pte needs to be none, the subsequent pte_same() check cannot prevent the dst_pte page from being freed concurrently, so we also need to abtain dst_pmdval and recheck pmd_same(). Otherwise, once we support empty PTE page reclaimation for anonymous pages, it may result in moving the src_pte page into the dts_pte page that is about to be freed by RCU. Signed-off-by: Qi Zheng --- mm/userfaultfd.c | 51 +++++++++++++++++++++++++++++++----------------- 1 file changed, 33 insertions(+), 18 deletions(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 60a0be33766ff..8e16dc290ddf1 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1020,6 +1020,14 @@ void double_pt_unlock(spinlock_t *ptl1, __release(ptl2); } +static inline bool is_pte_pages_stable(pte_t *dst_pte, pte_t *src_pte, + pte_t orig_dst_pte, pte_t orig_src_pte, + pmd_t *dst_pmd, pmd_t dst_pmdval) +{ + return pte_same(ptep_get(src_pte), orig_src_pte) && + pte_same(ptep_get(dst_pte), orig_dst_pte) && + pmd_same(dst_pmdval, pmdp_get_lockless(dst_pmd)); +} static int move_present_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma, @@ -1027,6 +1035,7 @@ static int move_present_pte(struct mm_struct *mm, unsigned long dst_addr, unsigned long src_addr, pte_t *dst_pte, pte_t *src_pte, pte_t orig_dst_pte, pte_t orig_src_pte, + pmd_t *dst_pmd, pmd_t dst_pmdval, spinlock_t *dst_ptl, spinlock_t *src_ptl, struct folio *src_folio) { @@ -1034,8 +1043,8 @@ static int move_present_pte(struct mm_struct *mm, double_pt_lock(dst_ptl, src_ptl); - if (!pte_same(ptep_get(src_pte), orig_src_pte) || - !pte_same(ptep_get(dst_pte), orig_dst_pte)) { + if (!is_pte_pages_stable(dst_pte, src_pte, orig_dst_pte, orig_src_pte, + dst_pmd, dst_pmdval)) { err = -EAGAIN; goto out; } @@ -1071,6 +1080,7 @@ static int move_swap_pte(struct mm_struct *mm, unsigned long dst_addr, unsigned long src_addr, pte_t *dst_pte, pte_t *src_pte, pte_t orig_dst_pte, pte_t orig_src_pte, + pmd_t *dst_pmd, pmd_t dst_pmdval, spinlock_t *dst_ptl, spinlock_t *src_ptl) { if (!pte_swp_exclusive(orig_src_pte)) @@ -1078,8 +1088,8 @@ static int move_swap_pte(struct mm_struct *mm, double_pt_lock(dst_ptl, src_ptl); - if (!pte_same(ptep_get(src_pte), orig_src_pte) || - !pte_same(ptep_get(dst_pte), orig_dst_pte)) { + if (!is_pte_pages_stable(dst_pte, src_pte, orig_dst_pte, orig_src_pte, + dst_pmd, dst_pmdval)) { double_pt_unlock(dst_ptl, src_ptl); return -EAGAIN; } @@ -1097,13 +1107,14 @@ static int move_zeropage_pte(struct mm_struct *mm, unsigned long dst_addr, unsigned long src_addr, pte_t *dst_pte, pte_t *src_pte, pte_t orig_dst_pte, pte_t orig_src_pte, + pmd_t *dst_pmd, pmd_t dst_pmdval, spinlock_t *dst_ptl, spinlock_t *src_ptl) { pte_t zero_pte; double_pt_lock(dst_ptl, src_ptl); - if (!pte_same(ptep_get(src_pte), orig_src_pte) || - !pte_same(ptep_get(dst_pte), orig_dst_pte)) { + if (!is_pte_pages_stable(dst_pte, src_pte, orig_dst_pte, orig_src_pte, + dst_pmd, dst_pmdval)) { double_pt_unlock(dst_ptl, src_ptl); return -EAGAIN; } @@ -1136,6 +1147,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pte_t *src_pte = NULL; pte_t *dst_pte = NULL; pmd_t dummy_pmdval; + pmd_t dst_pmdval; struct folio *src_folio = NULL; struct anon_vma *src_anon_vma = NULL; struct mmu_notifier_range range; @@ -1148,11 +1160,11 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, retry: /* * Use the maywrite version to indicate that dst_pte will be modified, - * but since we will use pte_same() to detect the change of the pte - * entry, there is no need to get pmdval, so just pass a dummy variable - * to it. + * since dst_pte needs to be none, the subsequent pte_same() check + * cannot prevent the dst_pte page from being freed concurrently, so we + * also need to abtain dst_pmdval and recheck pmd_same() later. */ - dst_pte = pte_offset_map_rw_nolock(mm, dst_pmd, dst_addr, &dummy_pmdval, + dst_pte = pte_offset_map_rw_nolock(mm, dst_pmd, dst_addr, &dst_pmdval, &dst_ptl); /* Retry if a huge pmd materialized from under us */ @@ -1161,7 +1173,11 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, goto out; } - /* same as dst_pte */ + /* + * Unlike dst_pte, the subsequent pte_same() check can ensure the + * stability of the src_pte page, so there is no need to get pmdval, + * just pass a dummy variable to it. + */ src_pte = pte_offset_map_rw_nolock(mm, src_pmd, src_addr, &dummy_pmdval, &src_ptl); @@ -1213,7 +1229,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, err = move_zeropage_pte(mm, dst_vma, src_vma, dst_addr, src_addr, dst_pte, src_pte, orig_dst_pte, orig_src_pte, - dst_ptl, src_ptl); + dst_pmd, dst_pmdval, dst_ptl, src_ptl); goto out; } @@ -1303,8 +1319,8 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, err = move_present_pte(mm, dst_vma, src_vma, dst_addr, src_addr, dst_pte, src_pte, - orig_dst_pte, orig_src_pte, - dst_ptl, src_ptl, src_folio); + orig_dst_pte, orig_src_pte, dst_pmd, + dst_pmdval, dst_ptl, src_ptl, src_folio); } else { entry = pte_to_swp_entry(orig_src_pte); if (non_swap_entry(entry)) { @@ -1319,10 +1335,9 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, goto out; } - err = move_swap_pte(mm, dst_addr, src_addr, - dst_pte, src_pte, - orig_dst_pte, orig_src_pte, - dst_ptl, src_ptl); + err = move_swap_pte(mm, dst_addr, src_addr, dst_pte, src_pte, + orig_dst_pte, orig_src_pte, dst_pmd, + dst_pmdval, dst_ptl, src_ptl); } out: