From patchwork Wed Aug 21 08:18:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13771010 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 60EDCC52D7C for ; Wed, 21 Aug 2024 08:26:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=ATsR6rCpxZFOFORcauHss5xESVmegH/7Abibhpgyw90=; b=VuGD/bdcQ37WEkip7Q0j/V3FkD 8vTj3HOF1piL4mlO0jF7a+Cvi7/xE+j2jEiwdvKOqtin8Vy+go57TTPZDLgN4/vS+M6oLzfB8mm4y Z2JwlhtA7gao+ZQK1wCxsPgKIeO1TV/hGHF1C36/sTTKnL1s+va2UghGffmsInWhiZT9j6/RPxagz 61ffHmli9xKtSJvpN7zmZtfZ8/8Ecb86nPfjosCHM6+b9j5GGtrK7BYsu62rAGv/2+2XY6UunPQnq nmMm/yXaBwFYr0FZJmoNxsITeVxTK9WiAAr2r2yLl6pm623Kpd5J6oC9BGmJ0j/UQ97ap3DmkzTN1 2ryPe16Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sggfH-000000083Mz-1ZPV; Wed, 21 Aug 2024 08:25:59 +0000 Received: from mail-pj1-x102f.google.com ([2607:f8b0:4864:20::102f]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sggZj-0000000820b-1xYy for linux-arm-kernel@lists.infradead.org; Wed, 21 Aug 2024 08:20:17 +0000 Received: by mail-pj1-x102f.google.com with SMTP id 98e67ed59e1d1-2d41b082ab8so2511549a91.3 for ; Wed, 21 Aug 2024 01:20:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1724228414; x=1724833214; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ATsR6rCpxZFOFORcauHss5xESVmegH/7Abibhpgyw90=; b=b2wxgmkHZU1yojRIT/gEr54m6DCL/IdmNIkcooylsXvh0vgWwmOmNe0Ki9hEYQX6sJ eDcd5xE0lxmgWmc+F4iT506XahgNidiHvXGtKcjuYLW5tbPKsVWgPygURkn3YTsYa66Y k3flxpWvdrCnkZT4OAisk2YXuI8VAZDGMDKc3PhDw9DuUvFct5LMFeu6XDVHqQl7h9wt qZD1WAm8w+JBWcWgrXuH/KvI8h1yVJbvlWADdne5R10p4hIQkJgzvJdRkGMJwwa/KK6N Fqlf8TYISN3NpelpONJuqLA4+AQHmVHktuuyKJ8pENu4Vcl5dThvl/TWxvc40EEE9Dte aA4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724228414; x=1724833214; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ATsR6rCpxZFOFORcauHss5xESVmegH/7Abibhpgyw90=; b=DvkqNTzKxBQt7JdSSB+DZG3zDLePuVLjRCJczAiKeMA0FQ8b4ZrlVhC5HXF2byW+4K XozjTyliiWkZf1n6ILUC8lR3ngcbwEdK0y+BFpncV7W9sOLF61VcTmsivZRwSEtOl4oJ QA8MgJ6djDrhWGv/BfjNAuPnzZE8sWMePc4LpO1zjc8I1iHmAs7pNV+GwPRn3pGduTzj SuX0I6wEJmq1KupNbajb34cCy+pEWdCdsDiaJSdZriN+0TVouhdXm0Qnmm31bNH0V8G/ Ce3eQ6R+jdhWV4yaM5QmPKTDOoJdYLeKjU2Tn+QOFJz5PZ+OMrAXKLiAh9yj0K6cHnFG 60hw== X-Forwarded-Encrypted: i=1; AJvYcCXo7PYK3yYCLP2bIbA+pWMkYr19f4TFgxbDXhOEcRCZLiAjfjKTEgvj8G4EUDE6gR5j2Fi/Kcie520+/m6FUreU@lists.infradead.org X-Gm-Message-State: AOJu0YzuljtYW8l/kJe3Bj1JfzhtwsVHb2f653XlWMqMfIGNhfZOlWgZ SJvE4yKYp9fC9VqR3rFAFXtU+t0PPkEEajs+lr07wJmjxeuQ22b5pHS7AjVpdrw= X-Google-Smtp-Source: AGHT+IEASO/tYfbmKEqODBO5ptoLkIYqajvyb9bjTzXqrTBGFe58odiOJRUvoUKiSb3oI+1Cc8Ov6A== X-Received: by 2002:a17:90b:1c01:b0:2c9:9658:d704 with SMTP id 98e67ed59e1d1-2d5ea4c9ab9mr1366119a91.40.1724228414402; Wed, 21 Aug 2024 01:20:14 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.150]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2d5eb9049b0sm1091453a91.17.2024.08.21.01.20.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Aug 2024 01:20:14 -0700 (PDT) From: Qi Zheng To: david@redhat.com, hughd@google.com, willy@infradead.org, muchun.song@linux.dev, vbabka@kernel.org, akpm@linux-foundation.org, rppt@kernel.org, vishal.moola@gmail.com, peterx@redhat.com, ryan.roberts@arm.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, Qi Zheng Subject: [PATCH 07/14] mm: khugepaged: collapse_pte_mapped_thp() use pte_offset_map_maywrite_nolock() Date: Wed, 21 Aug 2024 16:18:50 +0800 Message-Id: X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240821_012015_525909_87A90936 X-CRM114-Status: GOOD ( 15.70 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org In collapse_pte_mapped_thp(), we may modify the pte and pmd entry after acquring the ptl, so convert it to using pte_offset_map_maywrite_nolock(). At this time, the write lock of mmap_lock is not held, and the pte_same() check is not performed after the PTL held. So we should get pgt_pmd and do pmd_same() check after the ptl held. For the case where the ptl is released first and then the pml is acquired, the PTE page may have been freed, so we must do pmd_same() check before reacquiring the ptl. Signed-off-by: Qi Zheng --- mm/khugepaged.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 26c083c59f03f..8fcad0b368a08 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1602,7 +1602,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, if (userfaultfd_armed(vma) && !(vma->vm_flags & VM_SHARED)) pml = pmd_lock(mm, pmd); - start_pte = pte_offset_map_nolock(mm, pmd, haddr, &ptl); + start_pte = pte_offset_map_maywrite_nolock(mm, pmd, haddr, &pgt_pmd, &ptl); if (!start_pte) /* mmap_lock + page lock should prevent this */ goto abort; if (!pml) @@ -1610,6 +1610,9 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, else if (ptl != pml) spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); + if (unlikely(!pmd_same(pgt_pmd, pmdp_get_lockless(pmd)))) + goto abort; + /* step 2: clear page table and adjust rmap */ for (i = 0, addr = haddr, pte = start_pte; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE, pte++) { @@ -1655,6 +1658,16 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, /* step 4: remove empty page table */ if (!pml) { pml = pmd_lock(mm, pmd); + /* + * We called pte_unmap() and release the ptl before acquiring + * the pml, which means we left the RCU critical section, so the + * PTE page may have been freed, so we must do pmd_same() check + * before reacquiring the ptl. + */ + if (unlikely(!pmd_same(pgt_pmd, pmdp_get_lockless(pmd)))) { + spin_unlock(pml); + goto pmd_change; + } if (ptl != pml) spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); } @@ -1686,6 +1699,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, pte_unmap_unlock(start_pte, ptl); if (pml && pml != ptl) spin_unlock(pml); +pmd_change: if (notified) mmu_notifier_invalidate_range_end(&range); drop_folio: