From patchwork Thu Jul 25 18:39:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13742090 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 833EFC3DA5D for ; Thu, 25 Jul 2024 18:40:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 01DC86B0082; Thu, 25 Jul 2024 14:40:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F0F986B0083; Thu, 25 Jul 2024 14:40:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD77E6B0085; Thu, 25 Jul 2024 14:40:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C68616B0082 for ; Thu, 25 Jul 2024 14:40:11 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 5505B14130E for ; Thu, 25 Jul 2024 18:40:11 +0000 (UTC) X-FDA: 82379139822.30.454F34E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf02.hostedemail.com (Postfix) with ESMTP id 97B2F8000A for ; Thu, 25 Jul 2024 18:40:09 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AHBgiecX; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf02.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721932773; a=rsa-sha256; cv=none; b=P8GwKsg5p7fX76vZ4pu0kjKU57zDheYRGUY1eGgB1Yv8aj3LJS32iGdUKH3CgsD7dv3x8Z kirY3wG3XiBSbdNnskY+UHmGMiWFtxXKsDxPVws0hPbUDhmI0aHUG8YS/VGX36+FwAV/q9 RRjtY99JeuhcPbZFJE+yMhGUghkFgIY= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AHBgiecX; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf02.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721932773; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0GBJcFAyEp9mKlk4wzFAOdvBBpNCX3Bqc7ZE541ACRk=; b=G13o5pq3qThlN8N3n7GZj4Q2cPb3Cj+PziPBmIdU32OcrlWLsJgERbhJjjoKSqPtteYmPG FAGrt18/QmGUwVT/TkvDF84aqPktWGs62iYeOz+xZxH5DbpdUqGUjd8q+0QnFgmqB72F/8 HXeIKixvtfMwhSho5VbQQis9Nw8B6PQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721932808; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0GBJcFAyEp9mKlk4wzFAOdvBBpNCX3Bqc7ZE541ACRk=; b=AHBgiecXlw0Ym2YZxtFuts9WDr9+mTC5DvZeqMPDIojRHdBp9NknwAelkj0icnuHwPRaSB 6anNeKwQFTRHjPkMyOzI1oTPKQv4XB5YK633okHa0P0ZNSZ6wOLC8OVM8e/Tp8x62IWOw9 MknrfKZbYVduyhX13xn342cr9YZDocQ= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-267-7vvkrTFxPby5uIbRmNc9xQ-1; Thu, 25 Jul 2024 14:40:04 -0400 X-MC-Unique: 7vvkrTFxPby5uIbRmNc9xQ-1 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 39CAE1955D60; Thu, 25 Jul 2024 18:40:03 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.192.30]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 496791955D42; Thu, 25 Jul 2024 18:39:59 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Muchun Song , Peter Xu , Oscar Salvador Subject: [PATCH v1 1/2] mm: let pte_lockptr() consume a pte_t pointer Date: Thu, 25 Jul 2024 20:39:54 +0200 Message-ID: <20240725183955.2268884-2-david@redhat.com> In-Reply-To: <20240725183955.2268884-1-david@redhat.com> References: <20240725183955.2268884-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Rspamd-Queue-Id: 97B2F8000A X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 3onx3zcjnq4raau58fypnmrud3y19ocm X-HE-Tag: 1721932809-2653 X-HE-Meta: U2FsdGVkX19Mby3qeKuGWg6FF33KiyIuTaiChMKcyl40tt6CXh3fACL6Xl/LT3G7Mxo9wAx+uBOh+32EUQ5JpGU3O5MtwBqmfba9GWJhzrIR68TeDeD2eRfEtPTurII89aaU6WLDFaGhxiTgg+nc9uxgT212euz9GHAVz/ug5CGT6GlpCbsOP3PrFCCr4MvJz0H+r7CIj5OpDPdTVBH22ec3I8dgMeJ8IscqNYKb1AF8mQbkNP2Wqj6x57sQQbZUFZR+IfJ7XhD6Q3DveCYjukIR04rXlKe8Zw/zRcStv+2iL37fRv5Po60dMsGEAm7L4mrhJpN/A1TRPR1do+MvpmXBy9Gkze2RhvT0oMscZPiqrq4cs3dDRgNUK7uPUguqKa/hgMgohB5drmzSCCy8VGqQYHOjBweub1FFa1ERTCqNCAK7Ft8Hvt3upul0VlEhbgWLuH8nEMVYwGVvhShylG0Uwrv5B6QHgl5VYg4fdlJAfJLx7z6dlyL/lPne+2cho/04ZVWBismGkxlhAhZ30MQ+Z5nauP6oLulKHHiWRDDe7DXXdc6z9vXlEkmVt1i3v1SfBDtgrjhP7rc8L3cuSiV8atqyx1h9SepsVJkQYsoI+7yf6NfsKCJ2RM7n6DnpyVQKK0U5DIEWyh+Y9CMYhrVLls4E0qJxRQ8widdz3TmWmUpAE2oYIxZhDU0xdxjLBbLs2xzZqw0B3jGXiN+T5VL1oV/ppxkf7XllfTUjeCO6msdHhq95aRhLWOtmD+g90vBeK3zzXqjh/kRHdpWq06AFxGrLs2w+Le8WN3H3HWVokTIoYxKGU2EF6I3XlBH8BgGCZdk6nfqLmdCK31TjlvbhfV3xKm4olCdDHkSt70K5+wHdcg2Hh1Pt3Au9hE9nTEztpqYPCGgMddxfBWCodT3KBsVyTpqRZW7zeMPY+wy6eoHf5USWHIZSmdQsI3uF95G3/sIWr72Pyk1lmfH /9ji0utc qJp8Go1aLkM8T6W8NH6cgQu2E8ayhGrG1IuvGnrP6btoRg2uediFLreRr3/kGtjyonTEtv0gFav7Y1jkNxBfKr6sXSSH9hgMiuoUY0ZNCYWasg2WTgd1BXwrLStGQaqpmYKHTYNc28ipjPlKDvMCcER7aP76ely7BtpX7RgyaLGcBhInMH9+1jmf8sekadMP0jusqilrr7cAsNPkBypTAsVd4zIXR02g8+Ol5PQa4+rnI2sue5zJnZvrEBdN2+b8jpcc5+TbMvE1FUVotQ/jkROMkzrmgV1J2LwfFEoSHklGKb2eQ2kZJGjPKMwLHzBl3lMYG7v1WE8olFwelMzUZO1gTBg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: pte_lockptr() is the only *_lockptr() function that doesn't consume what would be expected: it consumes a pmd_t pointer instead of a pte_t pointer. Let's change that. The two callers in pgtable-generic.c are easily adjusted. Adjust khugepaged.c:retract_page_tables() to simply do a pte_offset_map_nolock() to obtain the lock, even though we won't actually be traversing the page table. This makes the code more similar to the other variants and avoids other hacks to make the new pte_lockptr() version happy. pte_lockptr() users reside now only in pgtable-generic.c. Maybe, using pte_offset_map_nolock() is the right thing to do because the PTE table could have been removed in the meantime? At least it sounds more future proof if we ever have other means of page table reclaim. It's not quite clear if holding the PTE table lock is really required: what if someone else obtains the lock just after we unlock it? But we'll leave that as is for now, maybe there are good reasons. This is a preparation for adapting hugetlb page table locking logic to take the same locks as core-mm page table walkers would. Signed-off-by: David Hildenbrand Reviewed-by: Qi Zheng Signed-off-by: David Hildenbrand --- include/linux/mm.h | 7 ++++--- mm/khugepaged.c | 21 +++++++++++++++------ mm/pgtable-generic.c | 4 ++-- 3 files changed, 21 insertions(+), 11 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2c6ccf088c7be..0472a5090b180 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2873,9 +2873,10 @@ static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc) } #endif /* ALLOC_SPLIT_PTLOCKS */ -static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd) +static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pte_t *pte) { - return ptlock_ptr(page_ptdesc(pmd_page(*pmd))); + /* PTE page tables don't currently exceed a single page. */ + return ptlock_ptr(virt_to_ptdesc(pte)); } static inline bool ptlock_init(struct ptdesc *ptdesc) @@ -2898,7 +2899,7 @@ static inline bool ptlock_init(struct ptdesc *ptdesc) /* * We use mm->page_table_lock to guard all pagetable pages of the mm. */ -static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd) +static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pte_t *pte) { return &mm->page_table_lock; } diff --git a/mm/khugepaged.c b/mm/khugepaged.c index cdd1d8655a76b..f3b3db1046155 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1697,12 +1697,13 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) i_mmap_lock_read(mapping); vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { struct mmu_notifier_range range; + bool retracted = false; struct mm_struct *mm; unsigned long addr; pmd_t *pmd, pgt_pmd; spinlock_t *pml; spinlock_t *ptl; - bool skipped_uffd = false; + pte_t *pte; /* * Check vma->anon_vma to exclude MAP_PRIVATE mappings that @@ -1739,9 +1740,17 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) mmu_notifier_invalidate_range_start(&range); pml = pmd_lock(mm, pmd); - ptl = pte_lockptr(mm, pmd); + + /* + * No need to check the PTE table content, but we'll grab the + * PTE table lock while we zap it. + */ + pte = pte_offset_map_nolock(mm, pmd, addr, &ptl); + if (!pte) + goto unlock_pmd; if (ptl != pml) spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); + pte_unmap(pte); /* * Huge page lock is still held, so normally the page table @@ -1752,20 +1761,20 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * repeating the anon_vma check protects from one category, * and repeating the userfaultfd_wp() check from another. */ - if (unlikely(vma->anon_vma || userfaultfd_wp(vma))) { - skipped_uffd = true; - } else { + if (likely(!vma->anon_vma && !userfaultfd_wp(vma))) { pgt_pmd = pmdp_collapse_flush(vma, addr, pmd); pmdp_get_lockless_sync(); + retracted = true; } if (ptl != pml) spin_unlock(ptl); +unlock_pmd: spin_unlock(pml); mmu_notifier_invalidate_range_end(&range); - if (!skipped_uffd) { + if (retracted) { mm_dec_nr_ptes(mm); page_table_check_pte_clear_range(mm, addr, pgt_pmd); pte_free_defer(mm, pmd_pgtable(pgt_pmd)); diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index a78a4adf711ac..13a7705df3f87 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -313,7 +313,7 @@ pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, pte = __pte_offset_map(pmd, addr, &pmdval); if (likely(pte)) - *ptlp = pte_lockptr(mm, &pmdval); + *ptlp = pte_lockptr(mm, pte); return pte; } @@ -371,7 +371,7 @@ pte_t *__pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd, pte = __pte_offset_map(pmd, addr, &pmdval); if (unlikely(!pte)) return pte; - ptl = pte_lockptr(mm, &pmdval); + ptl = pte_lockptr(mm, pte); spin_lock(ptl); if (likely(pmd_same(pmdval, pmdp_get_lockless(pmd)))) { *ptlp = ptl; From patchwork Thu Jul 25 18:39:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13742091 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DBBCAC3DA49 for ; Thu, 25 Jul 2024 18:40:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 688EF6B0083; Thu, 25 Jul 2024 14:40:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6101D6B0088; Thu, 25 Jul 2024 14:40:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4D8726B0089; Thu, 25 Jul 2024 14:40:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 2C8506B0083 for ; Thu, 25 Jul 2024 14:40:15 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D0AD51C2235 for ; Thu, 25 Jul 2024 18:40:14 +0000 (UTC) X-FDA: 82379139948.23.1756407 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 2256CA0028 for ; Thu, 25 Jul 2024 18:40:12 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=N5Uyqllc; spf=pass (imf25.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721932764; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=825KmH6PFbzryItWTiXJBmfjya/+p9AIE9EWqywGW+c=; b=gKCGNIJ4fnGEVL9WN+Y0cTfdj2DvhnGzBJC/14k6JBRrvvyTNXYDfb2PImWRFsJGQPlhwd AkulmhgFEyjNX8A4Lxkly2ZV+kgSKQQAroufU541mFUunx3b1j8WKaLsNNHA8Q1JjpREAo z01FU+kDW2SDaBGWuFKGUw8Irayqv2I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721932764; a=rsa-sha256; cv=none; b=FKWjEIV3uib74l6k1hFXlO1WlWqCK5qvOnQayMWzW1OhURyLlF65eBmh/xMYybsUxbXfeY y3gj1BGqCc72eFKIYcG1njDXch1NU+MOYLNdTfMDgowfrJntSJDTY4zuPt59wrNQPXyIOe 89V/wfZMSTTa/3H2yJgjj8czvdSm5G8= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=N5Uyqllc; spf=pass (imf25.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721932812; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=825KmH6PFbzryItWTiXJBmfjya/+p9AIE9EWqywGW+c=; b=N5UyqllcEKBKb51EAUWoZnN3KR7C6qdVcz6JvKj5W9QfZflONX1yvDDvVopnrtlA8tlS8D spmX2bcxbIy0ws79L7MtDwMsRJcFBLYameW8pMRKXGTe1PbmUDJl1tveF0vNsTFGeb1CkW s7L/C5ua+XOTnExHAhGeMBY6jC8Uj58= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-457-s1hCBgoxNB6O3FQHM27urw-1; Thu, 25 Jul 2024 14:40:08 -0400 X-MC-Unique: s1hCBgoxNB6O3FQHM27urw-1 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id CB20F1955D4B; Thu, 25 Jul 2024 18:40:06 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.192.30]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id BE23F1955D42; Thu, 25 Jul 2024 18:40:03 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Muchun Song , Peter Xu , Oscar Salvador , stable@vger.kernel.org Subject: [PATCH v1 2/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking Date: Thu, 25 Jul 2024 20:39:55 +0200 Message-ID: <20240725183955.2268884-3-david@redhat.com> In-Reply-To: <20240725183955.2268884-1-david@redhat.com> References: <20240725183955.2268884-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 2256CA0028 X-Stat-Signature: ybz8xacace537z8qom679nanmsbp4348 X-HE-Tag: 1721932812-565825 X-HE-Meta: U2FsdGVkX1+FKLLQVtYLw+2rZQOfcRGaJYLiBsnBVhuBCnBiSJpZt9h123p1QGMTs/Ce89DPQ5kMVGn6B8J4Vph5Efmc6kgcLIJjhFf029o5nFCO32Bh8VnWiyZS5slvG7EWk3rnRTpsOXwiM/LJpXjZ5d2JtDE8xcqUPSWDs4TobPBS0dEjEdZUGp/FLm7er+Fha6xUixGI6PIBGtCUjPc4j+CDwDHFXerKdd7Adt4XU70DMmNwGuOc098UdzjI6XbjhO/fdpkOamlE42dS26dSCXVWkOJxvxvue+mZeh/K0AhOE/lpBsHQEQeXFR/+ST1ZkQWjgOF1ia6WNPefySC/28ONSgmAH3xkSbfkiqELw739jjx2oGJuODGNTGzXlMmRenQf+sP0Gs4Bp5O1Hjgj1jLYuTA4rZGv0HqC1vdarLK/7CTdnakBbGOekchLv72tIMy34pHEQMdFKkmutnHouovpyaLS0gHJNOqXA0PjFcfBE+VzdmEkcq1ycIOqijuqc1SLNhiuf4CIgPgZvJfCXOXmJMZWTK95bEe9KkSJk3563EdB4Hna/y254IjAefUCiQQ3m0N2G0BTou5Bx5op6y90K2aYqjgyTpZ/Q7cFndFWxvLnzGSVQP7rDenV2R7MSAFeckkF7nZ4SIPo7SrlZziXiG4pis5bAhsYW8jbZQRsOUe92WiZgdo5LP0YhVR0aAKiOo2/Cn/oIvB2HDhlCNay96MTxAhsenfYGmfhPZggycSYplO/WoidUJHNXoTP9HYNwx1ONV/0fOdIy/pM2Cto+bcuKzpocTdvgncf/qzQUOQPKxHCYfdXGbXMMWdH13PnI5cqDeFkujeKXrT6FZihr5zTKRXk8fFD46GE7E4K9K/Tt22FMrajVgUlnUqWiPN6ifwICW4zCo0Zq5rg85f3Ip0czx9U4WM/LcqB0GvHPopqT74YNlw7eMHmWWxfdqBN7IF4I0yMto/ dSxQygHx vk4WmzIGGbwhjw/79HjOx/l0sgq5zVsKN3iGpN9ajenpSFCyCfy1ghUCSnXVUGkhm/9RB1z/kW8TLJJmKTPeoDg/qN8tNHC+kL+bPHfRtWkiGSnXfvHDGV+Xa/hJtcyWw4eVngzTxz7fs0L7xwY1RM+l7Rrx08472yUcSem6pvalvU+PZKjz1JfYYXoHXI+7+IjGDFTRNk3To9w9oLNykBxl7fXtU8gHbE8r2nULXOHVCxnWQKuBI6SdVUNzc1C1b719xXT0MkZket9zliw2FYBKPdu2j8C7CpSevVS96Im1L8ZWrH/1PgVyKXiSZTupYkl2BAimRpzfZNbXLBXHUTkvOFubuYiI3VbY9Xesy2C4WV90= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We recently made GUP's common page table walking code to also walk hugetlb VMAs without most hugetlb special-casing, preparing for the future of having less hugetlb-specific page table walking code in the codebase. Turns out that we missed one page table locking detail: page table locking for hugetlb folios that are not mapped using a single PMD/PUD. Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the page tables, will perform a pte_offset_map_lock() to grab the PTE table lock. However, hugetlb that concurrently modifies these page tables would actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the locks would differ. Something similar can happen right now with hugetlb folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS. Let's make huge_pte_lockptr() effectively uses the same PT locks as any core-mm page table walker would. There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb folio being mapped using two PTE page tables. While hugetlb wants to take the PMD table lock, core-mm would grab the PTE table lock of one of both PTE page tables. In such corner cases, we have to make sure that both locks match, which is (fortunately!) currently guaranteed for 8xx as it does not support SMP. Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code") Cc: Signed-off-by: David Hildenbrand Reviewed-by: Baolin Wang Acked-by: Muchun Song Reviewed-by: Peter Xu Reviewed-by: Oscar Salvador --- include/linux/hugetlb.h | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index c9bf68c239a01..da800e56fe590 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -944,10 +944,29 @@ static inline bool htlb_allow_alloc_fallback(int reason) static inline spinlock_t *huge_pte_lockptr(struct hstate *h, struct mm_struct *mm, pte_t *pte) { - if (huge_page_size(h) == PMD_SIZE) + VM_WARN_ON(huge_page_size(h) == PAGE_SIZE); + VM_WARN_ON(huge_page_size(h) >= P4D_SIZE); + + /* + * hugetlb must use the exact same PT locks as core-mm page table + * walkers would. When modifying a PTE table, hugetlb must take the + * PTE PT lock, when modifying a PMD table, hugetlb must take the PMD + * PT lock etc. + * + * The expectation is that any hugetlb folio smaller than a PMD is + * always mapped into a single PTE table and that any hugetlb folio + * smaller than a PUD (but at least as big as a PMD) is always mapped + * into a single PMD table. + * + * If that does not hold for an architecture, then that architecture + * must disable split PT locks such that all *_lockptr() functions + * will give us the same result: the per-MM PT lock. + */ + if (huge_page_size(h) < PMD_SIZE) + return pte_lockptr(mm, pte); + else if (huge_page_size(h) < PUD_SIZE) return pmd_lockptr(mm, (pmd_t *) pte); - VM_BUG_ON(huge_page_size(h) == PAGE_SIZE); - return &mm->page_table_lock; + return pud_lockptr(mm, (pud_t *) pte); } #ifndef hugepages_supported