From patchwork Thu Aug 22 07:13:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13772872 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 46AB4C3DA4A for ; Thu, 22 Aug 2024 07:20:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=jy8AvBGPXewIN0f06tSdN0J5TZbHLAwXxEIbEfEZm70=; b=4OxQx2adcUK7HZbVVGKyj/BfXv Mc7GSzLZ0tN+j9R5DhMzuAzhoGsMjhBd0848pZSYcuqoja/+Tz7O5PmE7Btw82+9Gg3lrPqNe9dLW 294pKZLZxPe1FHryXivxR4LQuL/UWT2Ai/1Q71yI/vAVfl05gsHcoO5hwt9FUTavvqwyh63V2mXse nYieC5k1Cd0nYvC/ArF0i5gpsVE+myu3bxeSDzlkn5dUobT/VO6jWQ63fAhEz50OS3SJ/HM4y9geh OsNiXMWe/KrpcJWixW5tYFSCtnoXe/px0KYRcSIRF5siuUc1LkN0AlRZjK61UGFYiSq23nFF7ATg5 vRNyOrmA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sh27A-0000000Bpbu-1TO0; Thu, 22 Aug 2024 07:20:12 +0000 Received: from mail-pf1-x42b.google.com ([2607:f8b0:4864:20::42b]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sh21d-0000000Bn7e-0gLv for linux-arm-kernel@lists.infradead.org; Thu, 22 Aug 2024 07:14:30 +0000 Received: by mail-pf1-x42b.google.com with SMTP id d2e1a72fcca58-71431524f3aso398601b3a.0 for ; Thu, 22 Aug 2024 00:14:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1724310868; x=1724915668; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=jy8AvBGPXewIN0f06tSdN0J5TZbHLAwXxEIbEfEZm70=; b=P5u+4yig81cfTv723SGU0jczkpKMIwJUVRg0K9V9/jDd5+LvaImVh+1XUiZGYVCSmb ODlDyIFVxOsyhPztZ+D9AzeSZJ7GYtx5Xcc19Y7+GARXBmx4vYN3mmZq0FPerbMK+thM qWSWpDrAALHnT1hPTFkV+P/6IfbcZM25WUqy02UncXtyzo1/5jzqnski0WtIyTH9BDFR wvaTi4SvVkt5mCoDrnC1IBj5sbbb28UmoNx2myITFfatZSVhxmZx9Kde4wtcO9PQtCnJ 0XarNXA+kFXDcWPpNtCFKuGm5rqhkeuETPPZwwhJF3k32016JP1fxmM7V5ghA7+7vh3U yyQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724310868; x=1724915668; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jy8AvBGPXewIN0f06tSdN0J5TZbHLAwXxEIbEfEZm70=; b=lHUGlqcYXOGB1bru/EW0XDdbfXz7qr4Ak8rJejPrfqdSddwcGJoUqRy+ssUS+ithSp Q+zuoTuVcwjCEsTMWBjGMpSUxwRCr9ozcHiDwLpABYo/fSpgpBcbXOukchRXCtIOLk5D m8kzSf1eqpHUfZelNMfFfIiD+iLAZ3ODcH5RAG8XaW1N7oeT3Q/g7yUe2pZ2Ooo1qsnq McH1poxJCg5evRqF2FIksk9/1WLKZzfdYuvL7yndYaVJIpbsB8BVHM1s+gcLkLsDo+x2 6rhZHXqj9gm9iF82cK6EnPEkyeuNmDyQXJ2BW4k3UgnpWVaeSz3mWSq+7O3TzumXF8Ca qlOA== X-Forwarded-Encrypted: i=1; AJvYcCU/Tpy5Mxm7mWSP1dXdO72+0XwdWvPfa6gCzGRjsNEgfY//t/NHDqMtWUQdpX9KSxhTzGPaJ+KbFp/JZDqqStMz@lists.infradead.org X-Gm-Message-State: AOJu0Ywj5TIlRlWWLGqZEM3cKaEdQ83FDDAuVmRH3jzgslbgvJ0VVZfK IqOV09m9b4mqiamM1agtACsaR69oCLsZwyNRSGwWRfW0LjSNZRMp3iij9+ewAlU= X-Google-Smtp-Source: AGHT+IEV/3obR+4TCbDGR6rVI1h9FT7v1/u/+fxVcK1o1ztnph09CHD4OJqmjxMXnPNwi8/vaSiIPQ== X-Received: by 2002:a05:6a21:6b0a:b0:1c0:e728:a99e with SMTP id adf61e73a8af0-1cada078d94mr5247385637.26.1724310868515; Thu, 22 Aug 2024 00:14:28 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([240e:473:c90:f96:d029:ea8a:4e6d:d272]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-7cd9ac994a3sm695095a12.16.2024.08.22.00.14.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 22 Aug 2024 00:14:28 -0700 (PDT) From: Qi Zheng To: david@redhat.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, rppt@kernel.org, vishal.moola@gmail.com, peterx@redhat.com, ryan.roberts@arm.com, christophe.leroy2@cs-soprasteria.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, Qi Zheng Subject: [PATCH v2 07/14] mm: khugepaged: collapse_pte_mapped_thp() use pte_offset_map_rw_nolock() Date: Thu, 22 Aug 2024 15:13:22 +0800 Message-Id: X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240822_001429_222319_9B6423B3 X-CRM114-Status: GOOD ( 15.96 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org In collapse_pte_mapped_thp(), we may modify the pte and pmd entry after acquring the ptl, so convert it to using pte_offset_map_rw_nolock(). At this time, the write lock of mmap_lock is not held, and the pte_same() check is not performed after the PTL held. So we should get pgt_pmd and do pmd_same() check after the ptl held. For the case where the ptl is released first and then the pml is acquired, the PTE page may have been freed, so we must do pmd_same() check before reacquiring the ptl. Signed-off-by: Qi Zheng --- mm/khugepaged.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 53bfa7f4b7f82..15d3f7f3c65f2 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1604,7 +1604,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, if (userfaultfd_armed(vma) && !(vma->vm_flags & VM_SHARED)) pml = pmd_lock(mm, pmd); - start_pte = pte_offset_map_nolock(mm, pmd, haddr, &ptl); + start_pte = pte_offset_map_rw_nolock(mm, pmd, haddr, &pgt_pmd, &ptl); if (!start_pte) /* mmap_lock + page lock should prevent this */ goto abort; if (!pml) @@ -1612,6 +1612,9 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, else if (ptl != pml) spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); + if (unlikely(!pmd_same(pgt_pmd, pmdp_get_lockless(pmd)))) + goto abort; + /* step 2: clear page table and adjust rmap */ for (i = 0, addr = haddr, pte = start_pte; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE, pte++) { @@ -1657,6 +1660,16 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, /* step 4: remove empty page table */ if (!pml) { pml = pmd_lock(mm, pmd); + /* + * We called pte_unmap() and release the ptl before acquiring + * the pml, which means we left the RCU critical section, so the + * PTE page may have been freed, so we must do pmd_same() check + * before reacquiring the ptl. + */ + if (unlikely(!pmd_same(pgt_pmd, pmdp_get_lockless(pmd)))) { + spin_unlock(pml); + goto pmd_change; + } if (ptl != pml) spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); } @@ -1688,6 +1701,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, pte_unmap_unlock(start_pte, ptl); if (pml && pml != ptl) spin_unlock(pml); +pmd_change: if (notified) mmu_notifier_invalidate_range_end(&range); drop_folio: