From patchwork Wed Jul 17 22:02:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13735845 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 055E9C3DA5D for ; Wed, 17 Jul 2024 22:02:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 516F26B0099; Wed, 17 Jul 2024 18:02:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4C6A96B00B5; Wed, 17 Jul 2024 18:02:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3670F6B00B6; Wed, 17 Jul 2024 18:02:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1874B6B0099 for ; Wed, 17 Jul 2024 18:02:30 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id AE17E160D08 for ; Wed, 17 Jul 2024 22:02:29 +0000 (UTC) X-FDA: 82350619218.12.3C0C585 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf02.hostedemail.com (Postfix) with ESMTP id 83A258000D for ; Wed, 17 Jul 2024 22:02:27 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=IrrrMRtQ; spf=pass (imf02.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721253707; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6jtsZY5Ha1GOE7eeXCKiwGrLCKWt0M5FAlFhM8vgy8I=; b=MuoQpEA/IWR6pxrqPrbQgZyHJ9NzS7pEXCVOA8BbebRgli/4QEGc0VqYSbT2A69QbjFbta SQxxcwMWrW2RTxzV5+9+bW0eiC8iehbrcb+4TY8DXMcfNsqRxV+IZRbnwxs4wbgrLrgYzd +WxXeNlTjSQ0V22xkek7koL2vx/n258= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721253707; a=rsa-sha256; cv=none; b=Zfu5KVlho52cVf9G7bMaifKCEyDT5qUO2mSI4yQ8e5wpEk84a56/Q7VuRf3+zyh3kejNYs Xp2CwWykUYoPhQ1C0s7o2v/mFVRiDhfii0Um+olcMD4AKFzKAyHzF5UzW2TwESXM1hmnhc o1K/yzC2eX1c1SUMjPf1Xc7o6qzU4f4= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=IrrrMRtQ; spf=pass (imf02.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253746; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6jtsZY5Ha1GOE7eeXCKiwGrLCKWt0M5FAlFhM8vgy8I=; b=IrrrMRtQ5cAn/Emv/YeLLe4jl7SuzdTcQvnzgzh0C1PEXlBviGBnPjJvjHOSNqG3v539W7 Ov7Are0BuBzBOm3dWRU/hpnct7WBrl9LO1lW5CoUHHGXq80VR8KlFphJrISAMi2PNUmf3q gTNNhwuPSycPVISTXybM7qSQIRvrbKA= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-173-4qGMDbH8P6yTYAaeOK41FQ-1; Wed, 17 Jul 2024 18:02:25 -0400 X-MC-Unique: 4qGMDbH8P6yTYAaeOK41FQ-1 Received: by mail-qt1-f197.google.com with SMTP id d75a77b69052e-447f9d993c2so175541cf.1 for ; Wed, 17 Jul 2024 15:02:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721253745; x=1721858545; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6jtsZY5Ha1GOE7eeXCKiwGrLCKWt0M5FAlFhM8vgy8I=; b=juz3bW7jqtQDYCKOgIz0xM1LSyvyf/1W02rcYEtxxptymzEcIN1sySLYh7zubVU2G3 /bXAdwAQx6E4IpEt/SbsiIY1TK1p0x8rNG5g6ulC8OnkRT1CID5/K3PMxPeyECd8Htlw jbeP0/SLv8wSDbLJ2SxOn5S5MGVoVpKs1LORZcNo1my/maQYeQ88qw/Vg/j9h+8G8uHX PfCezv3NGUWae/zshxzBzkx+nYwczNOqNx5sTBRVFqS4eGyeSWgb3NJUgA0orvPObtDJ RBcH63N7sRPJymJyP5jiYb5kRvEJUZE2SFo8cnYTAS4xvVlPTCE+R2X+EnNigSw1oeZD m7Sw== X-Forwarded-Encrypted: i=1; AJvYcCUuGxibhugaKUs2W7kkJ4voUGEvKQPEmnyD2sv0xNZrk6Sxnhjk5x0465mnUGwldMECUv3w8dBMJg2yhUz0h0Oku0M= X-Gm-Message-State: AOJu0Yx8M+okg+AickQiOYWezYxljI8nK2mukRtvWx65QqBwLWIg/ScM 14RgNqlivwMnP87aXRmHIBD5wxfVf1XGih8jLfIUJ3pqNxrb7fXjPHytaWS3w6wMMcfXFEtOYde KFl0ww9xQe0iXwTq7J0C7fVALbbC7T9vTN+r4qYBXXEWfHglZ X-Received: by 2002:a05:622a:19a8:b0:446:5a29:c501 with SMTP id d75a77b69052e-44f864afa6cmr22369571cf.1.1721253744693; Wed, 17 Jul 2024 15:02:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFUvRqkcMnpaLn941R1BXuIssxTrm35f8/3mo50id9Lq3VPmnZD9wDrl4u3hs2U9BoucnNtpA== X-Received: by 2002:a05:622a:19a8:b0:446:5a29:c501 with SMTP id d75a77b69052e-44f864afa6cmr22369171cf.1.1721253744329; Wed, 17 Jul 2024 15:02:24 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b83f632sm53071651cf.85.2024.07.17.15.02.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 15:02:23 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Vlastimil Babka , peterx@redhat.com, David Hildenbrand , Oscar Salvador , linux-s390@vger.kernel.org, Andrew Morton , Matthew Wilcox , Dan Williams , Michal Hocko , linux-riscv@lists.infradead.org, sparclinux@vger.kernel.org, Alex Williamson , Jason Gunthorpe , x86@kernel.org, Alistair Popple , linuxppc-dev@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org, Ryan Roberts , Hugh Dickins , Axel Rasmussen Subject: [PATCH RFC 1/6] mm/treewide: Remove pgd_devmap() Date: Wed, 17 Jul 2024 18:02:14 -0400 Message-ID: <20240717220219.3743374-2-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240717220219.3743374-1-peterx@redhat.com> References: <20240717220219.3743374-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Stat-Signature: dgtunoq6of9pi96gniq8xki8wr1dzbzu X-Rspamd-Queue-Id: 83A258000D X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1721253747-683117 X-HE-Meta: U2FsdGVkX1/cPR4Fvz/kPRwEEsUjdpLbcQnyOnXdJT7AojaFLguZTQAK3pOraPIS1YKFYwUkYpsWCVDNbEsyDxTOzOIoeiUna0qOUSPZxWiPbjwzx4+BSv5W8DfSYKF36k9zNqrsNh5fk0zjFIU3TnG1lFoIMg5O/SJfBBHiEI9Z91UzFkdQkUMeaXd5JMDxSuliUIV8F8gpcVAo2hzix9Kbr3/Ph56KS6K8FS7yVE/PXjgqY85wFkTbDOzjCNY0KCq43aumZCzRl8nDMebe6LgBfrQDoNZ/20KNdN+K/b1fG/ONS7aGrjttE9IfN0C3o6YDbfLsQ5jXDNl7G4MREG5dFa/93orl8pxDwq/4WOvtLFoEkxtcXaeVjPyU3xjBKMsVldUZWjdW39A/sZiWGT+KdxUixDi9Aaux5i5mFmuxMXL4rtFfliEiT5IXozj81a7TvzHMPL0wyagPEGh9zwGWuD8lt/22aymwW4SFzWWSlLBR8Gz1R351VpA6ZU0G+kQAAzGtV5GSUanltDCIE28q/d/FLiJHRvDEsGWiLXoKLPAVYClicZT6kI9+/04YuGNi3BShAe3ODl90pDr3WueptqwXNS+JWyj1+6dPgChIQ9Pvg9FfCaYpKmaiRPRzDpB/OzCrKucU/ZHFVA0fKKZ93Z7TMV4R1n55orgS+yIdmS5y1iH5E4yelbXQ5h2+OkH5W6YHz+OMpxOvxchvom6qziHUczdu28MorPHdDUllN97rvanA2YItwHB7uKXko0EY/+ZhnGjqyjcjhvb9+jJs0cYj4IuvYYa4brNCgtP9npJljO6FPDfDzwitjg6sVFekiEn4tQjSnITzGlG3X7vP6Dcm44qUbSkxoA5Xjr4VjUMLyr6mKNHnLmammMIMw6L66jQXOpNp3UEvAivBfcutTuFsh1xp7D23yokK1pH+B2GvmQcO+qy6f6Vcsug0ciWLa/1PglqcMtH+mxg 53D2U8B7 Ptmv8wNbVlZCZ6zOJjVnCttWdO//sHc2ZH63DOMDlBszpNF9qXouwEn4lcFMdihbmV0ZHuiDUnPijXySJWJObSX1FuXMyGrS4oAHFNDqVInairIEV/rf6IidVdBoHyb9tx5MuMWzjWzpopoxphz58RUeK7ahBTEkJr6OQxD9k26d+J3l2Wqnh2AxvCoAm+bzAxp8QB7dNvZ6q0OynhiCOMUs/Dzm4/3Z40WRawW23iJ0dbvywQDfY9derbE3cmuk5Jx0oVzWm9w5/M71Rv0YYRi2ZnNfzp2fM+I3VYWNyd1hVcq2TLpsCrB0Lgfzsdne/rknOlwjrnSuxjcugNiooDOWjVvT53mN7kUMP/+xI5nrn+JOdpP2sdzDC59LFx3M/Li5X X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: It's always 0 for all archs, and there's no sign to even support p4d entry in the near future. Remove it until it's needed for real. Signed-off-by: Peter Xu --- arch/arm64/include/asm/pgtable.h | 5 ----- arch/powerpc/include/asm/book3s/64/pgtable.h | 5 ----- arch/x86/include/asm/pgtable.h | 5 ----- include/linux/pgtable.h | 4 ---- mm/gup.c | 2 -- 5 files changed, 21 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index f8efbc128446..5d5d1b18b837 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1119,11 +1119,6 @@ static inline int pud_devmap(pud_t pud) { return 0; } - -static inline int pgd_devmap(pgd_t pgd) -{ - return 0; -} #endif #ifdef CONFIG_PAGE_TABLE_CHECK diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 5da92ba68a45..051b1b6d729c 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -1431,11 +1431,6 @@ static inline int pud_devmap(pud_t pud) { return pte_devmap(pud_pte(pud)); } - -static inline int pgd_devmap(pgd_t pgd) -{ - return 0; -} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 701593c53f3b..0d234f48ceeb 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -311,11 +311,6 @@ static inline int pud_devmap(pud_t pud) return 0; } #endif - -static inline int pgd_devmap(pgd_t pgd) -{ - return 0; -} #endif #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 2289e9f7aa1b..0a904300ac90 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1626,10 +1626,6 @@ static inline int pud_devmap(pud_t pud) { return 0; } -static inline int pgd_devmap(pgd_t pgd) -{ - return 0; -} #endif #if !defined(CONFIG_TRANSPARENT_HUGEPAGE) || \ diff --git a/mm/gup.c b/mm/gup.c index 54d0dc3831fb..b023bcd38235 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -3149,8 +3149,6 @@ static int gup_fast_pgd_leaf(pgd_t orig, pgd_t *pgdp, unsigned long addr, if (!pgd_access_permitted(orig, flags & FOLL_WRITE)) return 0; - BUILD_BUG_ON(pgd_devmap(orig)); - page = pgd_page(orig); refs = record_subpages(page, PGDIR_SIZE, addr, end, pages + *nr); From patchwork Wed Jul 17 22:02:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13735846 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95F26C3DA60 for ; Wed, 17 Jul 2024 22:02:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0AAE46B00B6; Wed, 17 Jul 2024 18:02:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 033D26B00B7; Wed, 17 Jul 2024 18:02:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E164A6B00B8; Wed, 17 Jul 2024 18:02:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B6E3B6B00B6 for ; Wed, 17 Jul 2024 18:02:31 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 71518140C10 for ; Wed, 17 Jul 2024 22:02:31 +0000 (UTC) X-FDA: 82350619302.25.6D8BC7A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf11.hostedemail.com (Postfix) with ESMTP id 4D6F340029 for ; Wed, 17 Jul 2024 22:02:29 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=BYyIlfsc; spf=pass (imf11.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721253696; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1QImmTBB9J6ICz0B3Smrm8OMXXoPNwmhbbZRF0rUpNQ=; b=Ue0YN0fCM+5Kzem3Ehr9Iib+RwGMSfRBRHjZjZ933flZNTG5ENBJPmSNjjLuKq/FtptiI3 KZdtRBJubXEO+kskwWqfsTduCRytGQrGf4EtQied5btHyt3R1s8pOb0I5WGLlRxhFNHyBW C6H4j+TB1ZdvFXQEfKgvcoytaa0EUsk= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=BYyIlfsc; spf=pass (imf11.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721253696; a=rsa-sha256; cv=none; b=jpa+YxeLOy3SDnJRHvQ4t/EY6KQsVvnlf5MVq567dUkulmZoigqBQJo8rbw2vWS0jzUskm Y6rMCLRRzSGMdPTSsIS/9CO65EKTQUzNlUiQtDQ5TR7l1o63OxwA+iyf1VXSGiRg0e0BGI 0L97hV66L/+kuByo5nRfSTM0+v7oVwk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253748; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1QImmTBB9J6ICz0B3Smrm8OMXXoPNwmhbbZRF0rUpNQ=; b=BYyIlfscfu0TlSzpQLATgGYot4scna3XiKQ9+i8j7Kd1FnfBNzEcnnhjNGtAcQ9dcPKMgB wYUrlFK67D2BmPKzIbutVrx8bkXkDcVHU4aQTkrjsuVrGFk8y9DgXjR+zwpPh/01PK0FtV IyrfauPoiksfnaK/WTQ9fO65TRju1zQ= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-336-GC96NbbJPhiwfSbOz7wfyA-1; Wed, 17 Jul 2024 18:02:27 -0400 X-MC-Unique: GC96NbbJPhiwfSbOz7wfyA-1 Received: by mail-qt1-f197.google.com with SMTP id d75a77b69052e-44aeacbf2baso259371cf.2 for ; Wed, 17 Jul 2024 15:02:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721253747; x=1721858547; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1QImmTBB9J6ICz0B3Smrm8OMXXoPNwmhbbZRF0rUpNQ=; b=tBcAdBquKaNKwz5bqp0bAqluf/HvuVbBGqIqOvSfrkFPQ8sg8IRP73gixSVUNBhgPc nQdzgkvWtJRjIqO4QoziXw7htw/2JwARgKJ4wpQrN2t5FdeZdjzt6kluiFNOPr66d1Tu 7pzn0r+cqgRJsRPtJNk/0xhsv121x3k2Xrp5BOA7rh0rNueEL7+vl46Uom0WDkWub/Wj y40Urb5oUwBaF3VDRCFio0Jc8pvmXjQXzMsFzmbH0wqQqfZW26hp7Ds9ACINExyLpaPI 8Z458u2k3q5HnmD5189nEvSghlGBvjOGSfZH4Zk2ZWSW00+vJXfUSrFRJaXRzNzhcv/v 32Hg== X-Forwarded-Encrypted: i=1; AJvYcCUF3nTddVsBA0BWngsJXWwv5T155JJ6jjdwaMfmc4hCGYATm0p7KDoMljc1FXpzeQ40xbVckfNO+gTzPZfDZ7aksjI= X-Gm-Message-State: AOJu0YyREBJ4SJVrur0sAmQfihgte+0EcdrD+SAhQ+8s1066kF8WFc+R ni8zOO4LgOh1thdZ4ZEKPApqHndwpOMBD8106g1+dNWL5tw4F4eyYDQDTH7Pl6zWPFl4dfHT3Ze wtX0TjlnAX/i18gLtDOJ5/EzrrmhXzbNAcfw+Z2/II4qlJmVs X-Received: by 2002:a05:622a:3cd:b0:44f:89e3:e8d3 with SMTP id d75a77b69052e-44f89e3ebf3mr16472801cf.10.1721253746707; Wed, 17 Jul 2024 15:02:26 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG7KLWkbU1UwJKSHrhdblARNi4OMqfXeCTcT+ovTUhvRHwOTG0GQKyZtUb2OdQ3d+/fWLwEbg== X-Received: by 2002:a05:622a:3cd:b0:44f:89e3:e8d3 with SMTP id d75a77b69052e-44f89e3ebf3mr16472521cf.10.1721253746301; Wed, 17 Jul 2024 15:02:26 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b83f632sm53071651cf.85.2024.07.17.15.02.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 15:02:25 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Vlastimil Babka , peterx@redhat.com, David Hildenbrand , Oscar Salvador , linux-s390@vger.kernel.org, Andrew Morton , Matthew Wilcox , Dan Williams , Michal Hocko , linux-riscv@lists.infradead.org, sparclinux@vger.kernel.org, Alex Williamson , Jason Gunthorpe , x86@kernel.org, Alistair Popple , linuxppc-dev@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org, Ryan Roberts , Hugh Dickins , Axel Rasmussen Subject: [PATCH RFC 2/6] mm: PGTABLE_HAS_P[MU]D_LEAVES config options Date: Wed, 17 Jul 2024 18:02:15 -0400 Message-ID: <20240717220219.3743374-3-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240717220219.3743374-1-peterx@redhat.com> References: <20240717220219.3743374-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 4D6F340029 X-Stat-Signature: 39gxessijgahcy3a6c1pafmrx9uid61c X-Rspam-User: X-HE-Tag: 1721253749-435789 X-HE-Meta: U2FsdGVkX188QVcTQcOMes/asqd1+y8JmqQv3IE83z6J2JZzUUaJ6FsuiZ2Tf2OuSWFNJCjzHUC5AahrgOTT7rTbWgWhUM4h9RHBSA3LV8sXzm/L7iiL12TAeKd1H3hBWTWpUAHwgME6h+HvZWIdD/SEP3tJRB+lqbyv6NYOc+W24jgGJtz1i7tWvCYgV0y/iYrc3VGydonzsSOP2IKeIwnzeyyZxajzSN4odCEt2/Q4P8AE1py15C1ND+alXxxPC81YDvMkLWdTbtB2W4paEnzGYYUp66eVd59AwOUDVt6TXka5B61mH1JGXrTb72J1z0Ia+HFCTC/3UVmlFGO7XqfdZtnuOUdjPEm2gvbQMklMiMntAseYeX6fClSNRN+mqUeP8wo7zabCGECvtYS8nLvbHYPE5wXa5eBcCqVaK8gmEwuUZz27rPiMjWXHzEoyIJpDA3r87ddEfEelfiflhwiOb9xvKkGi+1bYrKGcKqAXSxK9aqgd9F/fWN83NhrJ1aMOesXUUjQKDDFsqBfVdVRVvF6Ju0FL3YgyLPU+wqUY2EFxBEYm3p4UgPn4GngjZcY2ok/hyMOCwC5clcczwdw5irluwQGWapb81RAERnx/zI5/Im2xAcji0zpVmqmZvKtfLAMDK4BmuF0slHe5B0DD501uieYAJN8BSckA2AGFXP4FpiDCnN3laZIWgcrPVBX/76Gjv9U2OF2lfbGaUIagMkwqfe1ZFFP1GQ2dquo1h+mWn6UT8SWE7na0T1Fg5WDQSpN0FGJNNRMcNYeYxK1V5ttAgPfBHB3OeXrj/uaIK8EHWvxMbJrl0oiHXrr7HWwm/8T8ELCPAemWcD4Kl4dNoWpexYljBwMY5oUT782djm9/610QVdvOccosvind7Jr/RBoe93WtoFTri0Cwzy+b8EXLcnp72/ouSOOkBuIf28rZ6G7bFHgiaMruKe/VP3lg6UpIJXCZ53iKox+ fs9Kx4S/ dHa48Pw0K5Xrf5+a/jdXLO7Afc5ym8oBgZjkcLMc/0oc67Vb0OPCl+Wt3QJ9cbTcZB76oJWlltCTxg4zsp4Z4zBR1qIYJRYTMl3dsroyF3dNJrG7rs40cyb5VnEGxQih7/McPby+MSK+ye8DsShwrQyznLLHDcMQUq5Q93JbJMJ39//5qdB6VnGXPqsFgtqqnKo6xTsk8Z95Uz65QO/75HKrjjrbz/w9FOWW6AvO5kyIEqMdzOarDkABKb8QHjXS33/JjuYue52wSGbIAuMn6SQRJVivOoL/qkObbMc2DX1dBDvq2n8IZm+Z7qMkgmiE6CFWnNHzmexQRyAIknZKwWr0MEg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Introduce two more sub-options for PGTABLE_HAS_HUGE_LEAVES: - PGTABLE_HAS_PMD_LEAVES: set when there can be PMD mappings - PGTABLE_HAS_PUD_LEAVES: set when there can be PUD mappings It will help to identify whether the current build may only want PMD helpers but not PUD ones, as these sub-options will also check against the arch support over HAVE_ARCH_TRANSPARENT_HUGEPAGE[_PUD]. Note that having them depend on HAVE_ARCH_TRANSPARENT_HUGEPAGE[_PUD] is still some intermediate step. The best way is to have an option say "whether arch XXX supports PMD/PUD mappings" and so on. However let's leave that for later as that's the easy part. So far, we use these options to stably detect per-arch huge mapping support. Signed-off-by: Peter Xu --- include/linux/huge_mm.h | 10 +++++++--- mm/Kconfig | 6 ++++++ 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 711632df7edf..37482c8445d1 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -96,14 +96,18 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr; #define thp_vma_allowable_order(vma, vm_flags, tva_flags, order) \ (!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order))) -#ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES -#define HPAGE_PMD_SHIFT PMD_SHIFT +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES #define HPAGE_PUD_SHIFT PUD_SHIFT #else -#define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; }) #define HPAGE_PUD_SHIFT ({ BUILD_BUG(); 0; }) #endif +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES +#define HPAGE_PMD_SHIFT PMD_SHIFT +#else +#define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; }) +#endif + #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT) #define HPAGE_PMD_NR (1< X-Patchwork-Id: 13735847 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07123C3DA5D for ; Wed, 17 Jul 2024 22:02:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 73E156B00B8; Wed, 17 Jul 2024 18:02:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6766F6B00B9; Wed, 17 Jul 2024 18:02:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4EEA66B00BA; Wed, 17 Jul 2024 18:02:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 2892D6B00B8 for ; Wed, 17 Jul 2024 18:02:34 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 9B75440868 for ; Wed, 17 Jul 2024 22:02:33 +0000 (UTC) X-FDA: 82350619386.06.3D30CBE Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 790EDA000B for ; Wed, 17 Jul 2024 22:02:31 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KxDrSraP; spf=pass (imf25.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721253731; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Q/8FqM8MHmMc+CPhzIMhMKyvtEdM/efe50XUln9udmc=; b=7LExD4dtecQBn4Rh8Vui0a7rbDtFvOftmK0TRqqglkIHH1lEY5PN2Qmwnx7IQ278m924hu YinQPXJDyOtfuYzQKtErTc4EkltQofBejzVqAzsEcaEO4bRiK3UT4zn3fttGo5npMA5j6w NQoGXMzkbiNSG+PlgmvF/H+JLflNHC8= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KxDrSraP; spf=pass (imf25.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721253731; a=rsa-sha256; cv=none; b=ua2yoKQVEF1Xw+ApOCQaCoGywGRmauSu+bfP8s8z2Yv85sHqXgnWGo667yPwW2OHE1f9b/ 6iHIjGlohsN2UhGfduUnqv6glVAtR2y6DAizFPlq1cNEOtphFbKedX98BPe69ZbT7myGDY fXrA+D7gFgkdU1aLMp9Ja461345ItRQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253750; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q/8FqM8MHmMc+CPhzIMhMKyvtEdM/efe50XUln9udmc=; b=KxDrSraPvEsWfp1nbyJF/qOmJquxnhovK15GPOskAe0C+OsXOM6S1mo5O8nuvafbmife4t /cYJPtVusxL/GQX3MgYnFB50UvotMlF6yB1g9Kdq/kJq/E+05O5WS0MaxOY3l++rrfIyA/ 6LZBHKQfNSIsgKJdh30P7CiyCjhd934= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-465-_K8GhZbIOpKLeo-cp-ioOA-1; Wed, 17 Jul 2024 18:02:29 -0400 X-MC-Unique: _K8GhZbIOpKLeo-cp-ioOA-1 Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-6b79fbc7ed2so45946d6.3 for ; Wed, 17 Jul 2024 15:02:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721253749; x=1721858549; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Q/8FqM8MHmMc+CPhzIMhMKyvtEdM/efe50XUln9udmc=; b=lI6gbbdPHBq1R32lRqUIbDoIymMWMk2PPbgX11HOzIREgdGJyFAHBFLte3vlgG8H+b xss6hrLoj8XQb9Sn7rukRb3sipImuuvC7ozjp4Pfn7wX4JjGQMowMvfMLWvBbILpMo53 7NrZbm/mPUaT61bC6e18w1XU+Og4SjbUkPQoQJtPZqM81nt6lrSxzl+MfeL/aKM5FCs7 tL7fINCNw3BjMTJEdtshAEpf/KUSos6Pr+MS0Xw/nyOWHhcxTXRM11bTx5QQEHDoYtwv 5BZlJrn2AeNJDp1gzOA1fVj54l49YpSXypbySrcMTklR3SIz/lgl07N0Cp3UGsarzGYo +bfA== X-Forwarded-Encrypted: i=1; AJvYcCVOy8YC9IuE65cn48Yel5+hOQUj2Wst9my/TocvAB0jO3/G40o9BHqob7m6+OFozd3NuaALmLheNF2yhFXLHYn4EDo= X-Gm-Message-State: AOJu0YxVtwE2uq35g2d6HPFGoVc6mhVd8NJDpzgwd/m1t4KcwX/mbFvm Awc5bMMa/BW41J42WVl3sHO2Naohbg8Mm9X2Svw85JCwzoeaNjfgZta5BAGK3Gyz1nrsqGVeyEI FY1WukeXFkuB1uSnUBjP6xZJH7qLseBKHBRJS5L/IzXhj9R0H X-Received: by 2002:ac8:5e4e:0:b0:44f:89e3:e8d2 with SMTP id d75a77b69052e-44f89e3ec09mr15925861cf.12.1721253748787; Wed, 17 Jul 2024 15:02:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFkh4XOnr3AtOIW10S+BWZqmnCIDsT2k5pxPdK7+Hm3t/qz0UGl/1x0RrFHwUrrZQJVVqVXtg== X-Received: by 2002:ac8:5e4e:0:b0:44f:89e3:e8d2 with SMTP id d75a77b69052e-44f89e3ec09mr15925451cf.12.1721253748289; Wed, 17 Jul 2024 15:02:28 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b83f632sm53071651cf.85.2024.07.17.15.02.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 15:02:27 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Vlastimil Babka , peterx@redhat.com, David Hildenbrand , Oscar Salvador , linux-s390@vger.kernel.org, Andrew Morton , Matthew Wilcox , Dan Williams , Michal Hocko , linux-riscv@lists.infradead.org, sparclinux@vger.kernel.org, Alex Williamson , Jason Gunthorpe , x86@kernel.org, Alistair Popple , linuxppc-dev@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org, Ryan Roberts , Hugh Dickins , Axel Rasmussen Subject: [PATCH RFC 3/6] mm/treewide: Make pgtable-generic.c THP agnostic Date: Wed, 17 Jul 2024 18:02:16 -0400 Message-ID: <20240717220219.3743374-4-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240717220219.3743374-1-peterx@redhat.com> References: <20240717220219.3743374-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 790EDA000B X-Stat-Signature: fc4ht1spd89dn9rsopwr116oaqryw1pz X-HE-Tag: 1721253751-744743 X-HE-Meta: U2FsdGVkX19d+66s7V8vfxZMFZU0Dryf0wpU4pCDyhCNHXWPOVjEuA84TxOlVmLKMRHWzYQ3xnZucvjtTtswEUxoerXx9UeKOVKYZQautZr+6gKy6ebinoVSkC8k9LSQnp0feEdmgIuiFHoCgRCnDQACGP+M732k0w7ANx+TTjWYYc46ktJez18hhaUrugerqCSO5l8jFhFlk1sgvjYBENetQIILXcjC3jsaU0gV7pduzKuHk3/Vjnb1x8czbZqC8RyPyRbzmQGnXFx+hw1M1UfHbr/CD6EKtGMyJBep7MJpVXrDvbgnmEkh1wNifAgnhYXiUCxxGEwyUH6X5MR2ZwDTnDpdaEgWjNUW+SsRm+S6WxyyToLJ6EOt8NuoC14JVU6BOkpfGL6K3jvD9Eg4gHoyeezRWqECUXLGl4G+rO3AEhmrnzPAd6V313jgmdAkACW24K7WxQxp85mKSFxXk8sPd4EimilOkdg79xIeBEJbsg/tZY94XUDE0wvscCneYTf0PvEZrcOCsjbkPlGZNydoXvEaSz54ryO/qDyCdOQaQNFKSKNGDRjno0HE/XoCEDCZNWedMnMvld14gvDJO+2+mawtJZSlW1yzE393A001wpozMXBP1Sw0DFPY//ZutvubHZteEMLedG39KK+T2hGOJCkzeMqT432v1YOYX3cC9D3QkyB5vAMf3zXp9k7zhiD0DWkQ/zbEN7KQ0uZGOM2U5m1p2rGxbzuqJbcETOlI18IWfWcjMxWP9alfgBlIkBL2y9VgcbJUSwX1SPI1oGoK2w4PLPoEM5kcq1fgmswoJh6/k25X4hexa7sUcJ7EFMinmQ1hgflcOiv6NPx8bSCNI6X8VdAjDIkulgHoYCcP5eVzszhUhEVPDtKk+OcuQWHXNLjMmLB1ccPORGj2KSlVfvEyyxXDIOxlkueR86UthTGonNEuzblknu/k3XEIuSBy2IqXSk0hvjqOS8N h2ETJQyY ztJwjzHDjTckv87kgG/jjmcmR/J2EMDcyOeqtCl5tSfaF5OYiJHt9KfVDA+3Dj5V+p+h86YLQqvseUIB42hUznTY0JLbeQ1EkVW80sB1kYjLG4ipnlaiks8ITO94X2QX8Rw/677EYJOZ12Yw+S7NPzakJat/u98jPcyCEKlA3DY7ZVK7OXyAwMV54Z5hmkth+BiEIJZAh4DeuR48tYjpQIVinH5Q6/v75ptTxoyBSdII9yZkmXthMQ1B06ejk0R7qXGy3sfGyrF+oMto/ldP7MTSkDdBkHuuHo+jc+BNF9AlAb8VtM6GM+E0ENBCPcwmovJt5JZTb0Yio9qtsbXfOco2h2g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Make pmd/pud helpers to rely on the new PGTABLE_HAS_*_LEAVES option, rather than THP alone, as THP is only one form of huge mapping. Signed-off-by: Peter Xu --- arch/arm64/include/asm/pgtable.h | 6 ++-- arch/powerpc/include/asm/book3s/64/pgtable.h | 2 +- arch/powerpc/mm/book3s64/pgtable.c | 2 +- arch/riscv/include/asm/pgtable.h | 4 +-- arch/s390/include/asm/pgtable.h | 2 +- arch/s390/mm/pgtable.c | 4 +-- arch/sparc/mm/tlb.c | 2 +- arch/x86/mm/pgtable.c | 15 ++++----- include/linux/mm_types.h | 2 +- include/linux/pgtable.h | 4 +-- mm/memory.c | 2 +- mm/pgtable-generic.c | 32 ++++++++++---------- 12 files changed, 40 insertions(+), 37 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 5d5d1b18b837..b93c03256ada 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1105,7 +1105,7 @@ extern int __ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, pte_t *ptep, pte_t entry, int dirty); -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES #define __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS static inline int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp, @@ -1114,7 +1114,9 @@ static inline int pmdp_set_access_flags(struct vm_area_struct *vma, return __ptep_set_access_flags(vma, address, (pte_t *)pmdp, pmd_pte(entry), dirty); } +#endif +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES static inline int pud_devmap(pud_t pud) { return 0; @@ -1178,7 +1180,7 @@ static inline int __ptep_clear_flush_young(struct vm_area_struct *vma, return young; } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES #define __HAVE_ARCH_PMDP_TEST_AND_CLEAR_YOUNG static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma, unsigned long address, diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 051b1b6d729c..84cf55e18334 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -1119,7 +1119,7 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool write) return pte_access_permitted(pmd_pte(pmd), write); } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot); extern pud_t pfn_pud(unsigned long pfn, pgprot_t pgprot); extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot); diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index 5a4a75369043..d6a5457627df 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -37,7 +37,7 @@ EXPORT_SYMBOL(__pmd_frag_nr); unsigned long __pmd_frag_size_shift; EXPORT_SYMBOL(__pmd_frag_size_shift); -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES /* * This is called when relaxing access to a hugepage. It's also called in the page * fault path when we don't hit any of the major fault cases, ie, a minor diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index ebfe8faafb79..8c28f15f601b 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -752,7 +752,7 @@ static inline bool pud_user_accessible_page(pud_t pud) } #endif -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES static inline int pmd_trans_huge(pmd_t pmd) { return pmd_leaf(pmd); @@ -802,7 +802,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma, #define pmdp_collapse_flush pmdp_collapse_flush extern pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#endif /* CONFIG_PGTABLE_HAS_PMD_LEAVES */ /* * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index fb6870384b97..398bbed20dee 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1710,7 +1710,7 @@ pmd_t pmdp_xchg_direct(struct mm_struct *, unsigned long, pmd_t *, pmd_t); pmd_t pmdp_xchg_lazy(struct mm_struct *, unsigned long, pmd_t *, pmd_t); pud_t pudp_xchg_direct(struct mm_struct *, unsigned long, pud_t *, pud_t); -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES #define __HAVE_ARCH_PGTABLE_DEPOSIT void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 2c944bafb030..c4481068734e 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -561,7 +561,7 @@ pud_t pudp_xchg_direct(struct mm_struct *mm, unsigned long addr, } EXPORT_SYMBOL(pudp_xchg_direct); -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, pgtable_t pgtable) { @@ -600,7 +600,7 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) set_pte(ptep, __pte(_PAGE_INVALID)); return pgtable; } -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#endif /* CONFIG_PGTABLE_HAS_PMD_LEAVES */ #ifdef CONFIG_PGSTE void ptep_set_pte_at(struct mm_struct *mm, unsigned long addr, diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c index 8648a50afe88..140813d07c9f 100644 --- a/arch/sparc/mm/tlb.c +++ b/arch/sparc/mm/tlb.c @@ -143,7 +143,7 @@ void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr, tlb_batch_add_one(mm, vaddr, pte_exec(orig), hugepage_shift); } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES static void tlb_batch_pmd_scan(struct mm_struct *mm, unsigned long vaddr, pmd_t pmd) { diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index fa77411bb266..7b10d4a0c0cd 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -511,7 +511,7 @@ int ptep_set_access_flags(struct vm_area_struct *vma, return changed; } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp, pmd_t entry, int dirty) @@ -532,7 +532,9 @@ int pmdp_set_access_flags(struct vm_area_struct *vma, return changed; } +#endif /* PGTABLE_HAS_PMD_LEAVES */ +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES int pudp_set_access_flags(struct vm_area_struct *vma, unsigned long address, pud_t *pudp, pud_t entry, int dirty) { @@ -552,7 +554,7 @@ int pudp_set_access_flags(struct vm_area_struct *vma, unsigned long address, return changed; } -#endif +#endif /* PGTABLE_HAS_PUD_LEAVES */ int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) @@ -566,7 +568,7 @@ int ptep_test_and_clear_young(struct vm_area_struct *vma, return ret; } -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG) +#if defined(CONFIG_PGTABLE_HAS_PMD_LEAVES) || defined(CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG) int pmdp_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp) { @@ -580,7 +582,7 @@ int pmdp_test_and_clear_young(struct vm_area_struct *vma, } #endif -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES int pudp_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pud_t *pudp) { @@ -613,7 +615,7 @@ int ptep_clear_flush_young(struct vm_area_struct *vma, return ptep_test_and_clear_young(vma, address, ptep); } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES int pmdp_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) { @@ -641,8 +643,7 @@ pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, } #endif -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ - defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address, pud_t *pudp) { diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index ef09c4eef6d3..44ef91ce720c 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -942,7 +942,7 @@ struct mm_struct { #ifdef CONFIG_MMU_NOTIFIER struct mmu_notifier_subscriptions *notifier_subscriptions; #endif -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS +#if defined(CONFIG_PGTABLE_HAS_PMD_LEAVES) && !USE_SPLIT_PMD_PTLOCKS pgtable_t pmd_huge_pte; /* protected by page_table_lock */ #endif #ifdef CONFIG_NUMA_BALANCING diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 0a904300ac90..5a5aaee5fa1c 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -362,7 +362,7 @@ static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, #endif #ifndef __HAVE_ARCH_PMDP_TEST_AND_CLEAR_YOUNG -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG) +#if defined(CONFIG_PGTABLE_HAS_PMD_LEAVES) || defined(CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG) static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) @@ -383,7 +383,7 @@ static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma, BUILD_BUG(); return 0; } -#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG */ +#endif /* CONFIG_PGTABLE_HAS_PMD_LEAVES || CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG */ #endif #ifndef __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH diff --git a/mm/memory.c b/mm/memory.c index 802d0d8a40f9..126ee0903c79 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -666,7 +666,7 @@ struct folio *vm_normal_folio(struct vm_area_struct *vma, unsigned long addr, return NULL; } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t pmd) { diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index a78a4adf711a..e9fc3f6774a6 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -103,7 +103,7 @@ pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address, } #endif -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES #ifndef __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS int pmdp_set_access_flags(struct vm_area_struct *vma, @@ -145,20 +145,6 @@ pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); return pmd; } - -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -pud_t pudp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, - pud_t *pudp) -{ - pud_t pud; - - VM_BUG_ON(address & ~HPAGE_PUD_MASK); - VM_BUG_ON(!pud_trans_huge(*pudp) && !pud_devmap(*pudp)); - pud = pudp_huge_get_and_clear(vma->vm_mm, address, pudp); - flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE); - return pud; -} -#endif #endif #ifndef __HAVE_ARCH_PGTABLE_DEPOSIT @@ -252,7 +238,21 @@ void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) call_rcu(&page->rcu_head, pte_free_now); } #endif /* pte_free_defer */ -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#endif /* CONFIG_PGTABLE_HAS_PMD_LEAVES */ + +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES +pud_t pudp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, + pud_t *pudp) +{ + pud_t pud; + + VM_BUG_ON(address & ~HPAGE_PUD_MASK); + VM_BUG_ON(!pud_trans_huge(*pudp) && !pud_devmap(*pudp)); + pud = pudp_huge_get_and_clear(vma->vm_mm, address, pudp); + flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE); + return pud; +} +#endif /* CONFIG_PGTABLE_HAS_PUD_LEAVES */ #if defined(CONFIG_GUP_GET_PXX_LOW_HIGH) && \ (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RCU)) From patchwork Wed Jul 17 22:02:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13735849 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84AC9C3DA62 for ; Wed, 17 Jul 2024 22:02:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3B84B6B00BB; Wed, 17 Jul 2024 18:02:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 369B66B00BC; Wed, 17 Jul 2024 18:02:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 195306B00BD; Wed, 17 Jul 2024 18:02:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id EAC516B00BB for ; Wed, 17 Jul 2024 18:02:36 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 933FEC0D1E for ; Wed, 17 Jul 2024 22:02:36 +0000 (UTC) X-FDA: 82350619512.13.C13F6C4 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf28.hostedemail.com (Postfix) with ESMTP id 799F1C0005 for ; Wed, 17 Jul 2024 22:02:34 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UZJZGZQI; spf=pass (imf28.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721253715; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zJ8eA1hPxoK8G44ES49PoyN6H9fuen7ZiOJ63tJzju0=; b=dayGkne0JyaBZE7NERiL9JpX1WfqgFN8WBJI1cRo7mbhHUU+FnPx6BAjENHCbkskxJ/2Zd 210OwzZhpBhU2KcUFYFiimfnPOBCBaZjY6RcofRLsgFSH2v4pz0DVW0AdGollViqYwuSSf w3iAZZLFfiEWT9u98mseCL6L01kOHgo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721253715; a=rsa-sha256; cv=none; b=jO2TdCHkIiCj/gEiaFYcd1KAlnNL7Mrv/GoMfpk6bq4emHso7zzrs7zIFR9Fxsuli4DuQH EBb9BYBoN9y2+7umws9dt9oNc/MRO4x/Xl0Dfwd2AAY68kVYJ+xNpMW0thIbZIOMQQ9r91 j+g42V6WFzBYzNidXKturxC3R5bI31o= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UZJZGZQI; spf=pass (imf28.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253753; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zJ8eA1hPxoK8G44ES49PoyN6H9fuen7ZiOJ63tJzju0=; b=UZJZGZQI5N1saBiwJ6j2rxTh3Yq5/Dk89IZ2y4ngk+myHoaK8F2cL9hwSfgTA3IrXvlp7v jh7NHmDlvCue842vJv/i6yQkbqPYA5fQ7/N5A5qIy0XKGdzoaIXkiCEsDNuRyrUbM7UXsi 7SARwxd+DhCPjKxII/5Fha8mHP0ek8Q= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-44-RxbiT6JLO2-A8zkYGrhfag-1; Wed, 17 Jul 2024 18:02:32 -0400 X-MC-Unique: RxbiT6JLO2-A8zkYGrhfag-1 Received: by mail-qv1-f69.google.com with SMTP id 6a1803df08f44-6b79c5c972eso514196d6.1 for ; Wed, 17 Jul 2024 15:02:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721253751; x=1721858551; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zJ8eA1hPxoK8G44ES49PoyN6H9fuen7ZiOJ63tJzju0=; b=tTVm71WwErVzl5Pm0WN/F1vNlp+/PMjriI0qZ3rcHvuA3X9qc3Yj7mN+giEij4U0Ag /plhLe1ufznwyLCyxDCJQjaLIu6j/qc6NJbR4F2/7AGWpkQLKwVeaSJsC4BkenU/Yap2 85skWkYiij1U6etxOgkFWpdwhuNYqJWIcyT5qmj+C8HgSpUZ59iM6iClzTqnMtYddA7G 5TJbuTGRCPcaaRrmBznIfjtqDXz9yvqkoLby/SDPgtR0rkLOryP9lNWtGB1WF8j51vDB zw2k/swmLyFz7kO2bDFKHbAKlpuwg1uEGlDl5tmsnjk/5LiRELWngOPr1ehQZsuedu+w 9dlg== X-Forwarded-Encrypted: i=1; AJvYcCWCWydnjlShiuG0SYk7Cumk2FAEVC8yBr/8YDH91e32aHuS6VohYRFcbkZa/4ms6co6ubocbsWVd3fwQYizneDNLbc= X-Gm-Message-State: AOJu0Yzc1pIzhD170v/15dTKLeJ7NEc3F79bKOqH/kO6U2kR3dcExe5m u99JUHkfJAXXi/LHKNFlzZgia2/TS6+qDFiee6vujh6hqaif858LLGTqJVwih6nct7Jgcw16rO6 wUd38+HyuKY1H0UJ3DyC1T0kJ/ZYmzKPEN0Y22+NyuUYIxHKJ X-Received: by 2002:a05:622a:164b:b0:44e:cff7:3743 with SMTP id d75a77b69052e-44f86e7339emr21651811cf.9.1721253750856; Wed, 17 Jul 2024 15:02:30 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHgaCdw1UOkSL+KIFT2eMjRU01W5DATHpkzikOyR2irQlfZJvXc4O4/CXTqU/rdsuwB1sVYnA== X-Received: by 2002:a05:622a:164b:b0:44e:cff7:3743 with SMTP id d75a77b69052e-44f86e7339emr21651491cf.9.1721253750364; Wed, 17 Jul 2024 15:02:30 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b83f632sm53071651cf.85.2024.07.17.15.02.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 15:02:29 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Vlastimil Babka , peterx@redhat.com, David Hildenbrand , Oscar Salvador , linux-s390@vger.kernel.org, Andrew Morton , Matthew Wilcox , Dan Williams , Michal Hocko , linux-riscv@lists.infradead.org, sparclinux@vger.kernel.org, Alex Williamson , Jason Gunthorpe , x86@kernel.org, Alistair Popple , linuxppc-dev@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org, Ryan Roberts , Hugh Dickins , Axel Rasmussen Subject: [PATCH RFC 4/6] mm: Move huge mapping declarations from internal.h to huge_mm.h Date: Wed, 17 Jul 2024 18:02:17 -0400 Message-ID: <20240717220219.3743374-5-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240717220219.3743374-1-peterx@redhat.com> References: <20240717220219.3743374-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Queue-Id: 799F1C0005 X-Stat-Signature: q13zamqkz5dwowkntxbrz3tc3cciww6w X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1721253754-784308 X-HE-Meta: U2FsdGVkX18jCpL2LuTr410qHsquhRe/I88w64i7HViEQqN4x3zZ/hwX7yrcZYgFZaIqKUYHoJighXAQrksmpnoYBXWQtSf7T2CSiHST9t/+UDzHmYl0m7MyJAOq5nLRLu1Wv8mGHqKAiSPc5Qstbi70aVilMjug8o5GedK1xw99gG9U4HY7xuZONj+CSkcjlvNVlBym8aU0ugDLwWE8T6bEtyaFT2azKWhNuW+uP5yNt3jqAonH7y/9odU+kxOSc8P9mAhjKSBG988tcn58rp1QnepjteIIX7dv7RdOutDjPpiu5zuGh60H3muUiF6b0zPuBk6F8sCj3yvOwETG2RlMncY3Ovagl45oxDQHO2ntBYjeuT8eWrZy3Tbddg0kcz5amkNukSHXr6fAzZXMBck+/BJfDdAuzUf3OI27B9+03AFZDJN62GZBlnRQWi4CC9Eb8/TjoV6Hy996qLZb7xHyQ8apFQB9HtpILYfSEC4OUu0S3Bh/pd/UvrKSufXm45IPmo0tNwXWLuHlMn3onwpclLnSqDjZXuEwLWeKOEPf+DcEu97+Vs+zfeGEUOWd7LJAtGq3KkVBTeFc0+uwCjttHo6tLGFGoB3hzh4Xr5IN254Hrgw+kJGYS+9sSOFjUG4Hr0Ho9FoZEgVmqf26ZGEpESp5s40CtQXOi84EaJS0qfisbOKR+qrm4hS4+93uesDZXmnhGiar9YkKdTAIu1GvEIlrqNPxCrFIVHfdgjX6pZg0oJpEY9mQ7r7PBvITyew3/Y3HfhxR4oLDQ92E6rpmFogsFSAw2GH+WYU2NnEyi+vDa0HUKbdBJt+irvOp1k2TPRYCQemM3SJJWPjq+o4hvESfmOhnlfAO0qU+6GXrKboy4TAZiNnZVXUk5XyhcSPq7/+/1qHOqsNrtOduBFkgl6bMaGVLql3hnKI0ix5PnY5OSN+j8miFEqJEQf9i6Jz41vPimpZcCCxCznv p3RGTxdh jpomkx2O+X6zNid7T+rGtazbXHLBbeBzg/diMD9xDVcrzqKx8IaSpEKfVEMdrs+mMTdUNNvzUdoA9sYCbLvzV3zB/DyqTbfkRyKiV0sbqEaBCJRHc8DU5cZZLXq9pJHn6BaREIXDm3yQ9f4eWrVCyESPhROWL3+veXjNOA328FS9nJ8N76QZwLrvh9LROicbjfYi/PSWbJCP4vjePNEhU2DgriTBHzef+SFbFzbznfY9fmu3K6wLO1AWudlmb6YK5BeMqXHyAYAI9G2XpepEbGAs9/nhF6rg8oGd/1/XUZ7TfFJnj3y88WdAvIrmdnCCjcj3cUGt12WTsW+D7tbujVfy3iw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Most of the huge mapping relevant helpers are declared in huge_mm.h, not internal.h. Move the only few from internal.h into huge_mm.h. Here to move pmd_needs_soft_dirty_wp() over, we'll also need to move vma_soft_dirty_enabled() into mm.h as it'll be needed in two headers later (internal.h, huge_mm.h). Signed-off-by: Peter Xu --- include/linux/huge_mm.h | 10 ++++++++++ include/linux/mm.h | 18 ++++++++++++++++++ mm/internal.h | 33 --------------------------------- 3 files changed, 28 insertions(+), 33 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 37482c8445d1..d8b642ad512d 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -8,6 +8,11 @@ #include /* only for vma_is_dax() */ #include +void touch_pud(struct vm_area_struct *vma, unsigned long addr, + pud_t *pud, bool write); +void touch_pmd(struct vm_area_struct *vma, unsigned long addr, + pmd_t *pmd, bool write); +pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf); int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, @@ -629,4 +634,9 @@ static inline int split_folio_to_order(struct folio *folio, int new_order) #define split_folio_to_list(f, l) split_folio_to_list_to_order(f, l, 0) #define split_folio(f) split_folio_to_order(f, 0) +static inline bool pmd_needs_soft_dirty_wp(struct vm_area_struct *vma, pmd_t pmd) +{ + return vma_soft_dirty_enabled(vma) && !pmd_soft_dirty(pmd); +} + #endif /* _LINUX_HUGE_MM_H */ diff --git a/include/linux/mm.h b/include/linux/mm.h index 5f1075d19600..fa10802d8faa 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1117,6 +1117,24 @@ static inline unsigned int folio_order(struct folio *folio) return folio->_flags_1 & 0xff; } +static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma) +{ + /* + * NOTE: we must check this before VM_SOFTDIRTY on soft-dirty + * enablements, because when without soft-dirty being compiled in, + * VM_SOFTDIRTY is defined as 0x0, then !(vm_flags & VM_SOFTDIRTY) + * will be constantly true. + */ + if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)) + return false; + + /* + * Soft-dirty is kind of special: its tracking is enabled when the + * vma flags not set. + */ + return !(vma->vm_flags & VM_SOFTDIRTY); +} + #include /* diff --git a/mm/internal.h b/mm/internal.h index b4d86436565b..e49941747749 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -917,8 +917,6 @@ bool need_mlock_drain(int cpu); void mlock_drain_local(void); void mlock_drain_remote(int cpu); -extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); - /** * vma_address - Find the virtual address a page range is mapped at * @vma: The vma which maps this object. @@ -1229,14 +1227,6 @@ int migrate_device_coherent_page(struct page *page); int __must_check try_grab_folio(struct folio *folio, int refs, unsigned int flags); -/* - * mm/huge_memory.c - */ -void touch_pud(struct vm_area_struct *vma, unsigned long addr, - pud_t *pud, bool write); -void touch_pmd(struct vm_area_struct *vma, unsigned long addr, - pmd_t *pmd, bool write); - /* * mm/mmap.c */ @@ -1342,29 +1332,6 @@ static __always_inline void vma_set_range(struct vm_area_struct *vma, vma->vm_pgoff = pgoff; } -static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma) -{ - /* - * NOTE: we must check this before VM_SOFTDIRTY on soft-dirty - * enablements, because when without soft-dirty being compiled in, - * VM_SOFTDIRTY is defined as 0x0, then !(vm_flags & VM_SOFTDIRTY) - * will be constantly true. - */ - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)) - return false; - - /* - * Soft-dirty is kind of special: its tracking is enabled when the - * vma flags not set. - */ - return !(vma->vm_flags & VM_SOFTDIRTY); -} - -static inline bool pmd_needs_soft_dirty_wp(struct vm_area_struct *vma, pmd_t pmd) -{ - return vma_soft_dirty_enabled(vma) && !pmd_soft_dirty(pmd); -} - static inline bool pte_needs_soft_dirty_wp(struct vm_area_struct *vma, pte_t pte) { return vma_soft_dirty_enabled(vma) && !pte_soft_dirty(pte); From patchwork Wed Jul 17 22:02:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13735851 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73E3CC3DA5D for ; Wed, 17 Jul 2024 22:02:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C1F346B00AD; Wed, 17 Jul 2024 18:02:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B7F796B00BC; Wed, 17 Jul 2024 18:02:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7AE216B00BD; Wed, 17 Jul 2024 18:02:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 419CC6B00AD for ; Wed, 17 Jul 2024 18:02:43 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id AF63A1A144B for ; Wed, 17 Jul 2024 22:02:42 +0000 (UTC) X-FDA: 82350619764.04.F5FBCBD Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 4C45EA002A for ; Wed, 17 Jul 2024 22:02:39 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="VOlm/XVE"; spf=pass (imf25.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721253739; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UK+6jdtPUvq+CqmhBfq9bfGaFmN7od9JwWK0He3qEgI=; b=h5rpSvO4RTraQ9+RTsX3JYsHoq2UzwTrAHRQVOH6xQM5mYRJ7Odnq8iuGkOpc5KOXQR0UY A2tl9AFTFSCAbjXqw3/ZUZHsxw8AtCy80M4kLKhVLUzU7jftjfudqeTi6SmgKovtseG4AQ Dv8MqI+kM0U2/RBgJPH2FbAtVnkaJDA= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="VOlm/XVE"; spf=pass (imf25.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721253739; a=rsa-sha256; cv=none; b=1JMGt4x//tddGp86XzR2VV1Zy0lPS7Tg9issN9A1emkaJn6mV7iRHkZWBnMxuEZtTIdOAX PNqruFAEmRSweSFhpnvWEEQvhCiJP6pVP24yMSWJaDO8KBm7Vbwf8+AntZBP5/NV+E8Q4a 5cMvuPuF/UvaZoFlwZHVKbGxeEZRBE8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253758; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UK+6jdtPUvq+CqmhBfq9bfGaFmN7od9JwWK0He3qEgI=; b=VOlm/XVEQv1w+8cG1/huqZ6PENNtkyCZruAvvAh9gtogaR4RFKamYdJ+ZyH0w0sciB3jpE 48fsRAX79jPK+FtqUcOrjv65vzGbcCJhPm10U79INu6zQZz7aVYd/BIFFCuYP5C3ggWYdZ VYwE1knVPcw+72k9SKApXBOzHoBGJ8Y= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-112-YuWVZfY5MAORNsJPAsRZvw-1; Wed, 17 Jul 2024 18:02:35 -0400 X-MC-Unique: YuWVZfY5MAORNsJPAsRZvw-1 Received: by mail-qv1-f70.google.com with SMTP id 6a1803df08f44-6b5dd89dfa7so421626d6.0 for ; Wed, 17 Jul 2024 15:02:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721253754; x=1721858554; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UK+6jdtPUvq+CqmhBfq9bfGaFmN7od9JwWK0He3qEgI=; b=mx/Gn5J2KEoKvt4hXEbBLRYgW3QYuAZM8mo59WaLsFE+FTkfW1NjGNlyAT8v5Iy0Gm 0+B/sJ6csBf+GPe6o3+hWQMs1NybQW4Re4nQnbcN0xXJya2axOQogeKb2ZXf3373G5pa 96HwdlDXA5EoaCFlRXc9oZxf51gYFuBcrmTmspB06a0VqBhHWKSEfVxL7VW4cHzBcudo 7xtsDlOcGMNcF0auQIf+3KP5bIxvnVq5IER8NT0VHLaVlb2ZdqHT1ytd+0Fu96iS60lt BSepyTb+X0/HUDorY2uAF+CVkkDueDGWt4KnMeHJK5Ya1Hom5vcc/8c2wzomC06w/uhw hbmQ== X-Forwarded-Encrypted: i=1; AJvYcCUXhLZUzxe7VxoHg3NtOrdKyeGw6X+Tv+YzT64uLa8xeLdSgbTrtNaBIKAqeqQfZCPGqTNHOsu3FSHx/ao1nDYlue8= X-Gm-Message-State: AOJu0YwVMMXcQGTV+Skxd5MylNCmsLDbYy8cF/EjIOIvtDRkIxNxSkYG ipiq7FqG9inV5g5/bM3RLt57TMz9Ly9r2c41Jj808URnrz3QqZKw3leagi1Emd9CT0RuU1y2KjT zuzPDNl5bs7xbmqwEBdCIEISuOxeI/I22od7O3TrzPrV83ywx X-Received: by 2002:a05:622a:178f:b0:447:f3d8:e394 with SMTP id d75a77b69052e-44f86186c77mr22821001cf.2.1721253753690; Wed, 17 Jul 2024 15:02:33 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHjoecSnH17WsLUAfbk8D5niviaBnv9p9n7X7ysSM2hooHGbHoRt5hE2vvjgIUgbHso/7wvBQ== X-Received: by 2002:a05:622a:178f:b0:447:f3d8:e394 with SMTP id d75a77b69052e-44f86186c77mr22820481cf.2.1721253752855; Wed, 17 Jul 2024 15:02:32 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b83f632sm53071651cf.85.2024.07.17.15.02.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 15:02:32 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Vlastimil Babka , peterx@redhat.com, David Hildenbrand , Oscar Salvador , linux-s390@vger.kernel.org, Andrew Morton , Matthew Wilcox , Dan Williams , Michal Hocko , linux-riscv@lists.infradead.org, sparclinux@vger.kernel.org, Alex Williamson , Jason Gunthorpe , x86@kernel.org, Alistair Popple , linuxppc-dev@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org, Ryan Roberts , Hugh Dickins , Axel Rasmussen Subject: [PATCH RFC 5/6] mm/huge_mapping: Create huge_mapping_pxx.c Date: Wed, 17 Jul 2024 18:02:18 -0400 Message-ID: <20240717220219.3743374-6-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240717220219.3743374-1-peterx@redhat.com> References: <20240717220219.3743374-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 4C45EA002A X-Stat-Signature: g3r1con5td77gopm4y5z8t8hod8n581j X-HE-Tag: 1721253759-983555 X-HE-Meta: U2FsdGVkX1+xFm3BKcA3UkkXzUhaGJBtB0DyVwpwqY5QXAEtqOomMdsZTSwZIqE6RQg3m2+eWfLWayvVExJSoFUvZfwopj9/IWDbZCRDSLGsWNKkvHOenjTedA9bKZpIL9bd3+Hv58IiwScwkHDPBn3MBViDJt9WqT+sak20nqRAdN/wCqPkfi4ZF88poVikEI1EHBYnoiM1QiA+Y09gdzwQKKSHaWfGrX/XYjQP/tevThRFBB+tI/DTS2Cz/yM+yGsBaJY7792o7TJy2xfggXG8Hjst4naGdyh6io5fWmeKaHjrDSITOy/Xm7aU2HZI4KLwX/rUk5F0/aPURBz2BCUewcBOfEzDpPybaBA8ywkM6zQp1+f5tKIBWm7p9jQAevZDnoJX9/+cQWDYu/nAqta1TT2k5MRYw+ZsaCPs4u1nNegfVSqEmkjZEEKKiy0Ib1aV6st1DOnxDAzF01H/RrBe/Yl38pdu5r7haw7gwBL801muwucvJuEUU1jaXu4sFXNBHKXzVpXa5jp3HFISmFkFa/B8zQ4OrjyVLTVwPTIEvfRpsKLX07yeWDwA5jdoOYuBr4oHFSybtIPGzpe2e0tA+lIfx9JqvD+BRSTTp05z7x5erBey2dFnKWPXhbP2FlRLbH77842rpGhG54Tdr/ZMbcSxMD3hGzuuK1DdIivnrspsX6dTqMZ21QJdJyhun3rUnE1ul8RKWAH3Jypvh4dtGZVG6OCQaR4hlFY3DgDsxtdKPbHdtWBBl5JjlGVzYcoizaCEGuvspYaAIjVS9cS5FsAQtuy/PQqvRv//qXL+5iPl8moCmjSoyNzSz3XcCG3Tlw2kPtc3d4v9xDvK3mKfUFhgXVqZzgqe2nhrYeZF+mVkbnkrLcpxXwJE7++AcebMulEbKkNaKxlIykEV2wga7DJH+SxWKN4Hwa7cl/Ry2xalVWit8BZIy5w8djqJKls6mYTiu0MoDTYlOMU TwbmmnLV 05O528UIK+FJAg4tDjJ2tayjCM9XjsYyGMbgAYfhXmJze+JitzwaQAfxIHs9AUmWTqZdcvQ7cFEaX6IOp003IvF0LdKM/LVWs1fr31kA2u221vf1sg+y3OwqHCKgkzGwOI1rtVy1uXZmXpXZZvOFbABFWoXX7EuSbA7napDsPXojq9nvJkCo8hMvpHKN9+RFLSsaUl8RP8RI7SrTm7BI6llKQ82aKqVW4nIhMcOR4zdsOuqutlFds/19IInyvRFHt1DfOWoyJC82o8JIQE2G0hbHG4aAyLnI/RgXJu1En2uIglML2VK9xU1Vn4wA2IS2/Ju2WgpciwWmIxewC54KKG6eZtfXUnSaBjw/pTLfM8cj/0qJi8x+Qkf90xlaniB0uA8rzQMyNi7Uyks5oD4bllMAzca9cQ0vFE20t2cxpFV/ooa/kVN2Q4P20cG5/3iWt/PrxzP07OoieamfT8LBYC4NM/SwEyy38nn7IUW7zVNal3rEs4NciwHcRNLPRc+k7v2OVpckEyB8J85w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: At some point, we need to decouple "huge mapping" with THP, for any non-THP huge mappings in the future (hugetlb, pfnmap, etc..). This is the first step towards it. Or say, we already started to do this when PGTABLE_HAS_HUGE_LEAVES option was introduced: that is the first thing Linux start to describe LEAVEs rather than THPs when it is about huge mappings. Before that, mostly any huge mapping will have THP involved, like devmap. Hugetlb is special only because we duplicated the whole world there, but we also have a demand to decouple that now. Linux used to have huge_memory.c which only compiles with THP enabled, I wished it was called thp.c from the start. In reality, it contains more than processing THP: any huge mapping (even if not falling into THP category) will be able to leverage many of these helpers, but unfortunately this file is not compiled if !THP. These helpers are normally only about the pgtable operations, which may not be directly relevant to what type of huge folio (e.g. THP) underneath, or perhaps even if there's no vmemmap to back it. It's better we move them out of THP world. Create a new set of files huge_mapping_p[mu]d.c. This patch starts to move quite a few essential helpers from huge_memory.c into these new files, so that they'll start to work and compile rely on PGTABLE_HAS_PXX_LEAVES rather than THP. Split them into two files by nature so that e.g. archs that only supports PMD huge mapping can avoid compiling the whole -pud file, with the hope to reduce the size of object compiled and linked. No functional change intended, but only code movement. Said that, there will be some "ifdef" machinery changes to pass all kinds of compilations. Cc: Jason Gunthorpe Cc: Matthew Wilcox Cc: Oscar Salvador Signed-off-by: Peter Xu --- include/linux/huge_mm.h | 318 +++++--- include/linux/pgtable.h | 23 +- include/trace/events/huge_mapping.h | 41 + include/trace/events/thp.h | 28 - mm/Makefile | 2 + mm/huge_mapping_pmd.c | 979 +++++++++++++++++++++++ mm/huge_mapping_pud.c | 235 ++++++ mm/huge_memory.c | 1125 +-------------------------- 8 files changed, 1472 insertions(+), 1279 deletions(-) create mode 100644 include/trace/events/huge_mapping.h create mode 100644 mm/huge_mapping_pmd.c create mode 100644 mm/huge_mapping_pud.c diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index d8b642ad512d..aea2784df8ef 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -8,43 +8,214 @@ #include /* only for vma_is_dax() */ #include +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES +void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud); void touch_pud(struct vm_area_struct *vma, unsigned long addr, pud_t *pud, bool write); -void touch_pmd(struct vm_area_struct *vma, unsigned long addr, - pmd_t *pmd, bool write); -pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); -vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf); -int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, - pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, - struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma); -void huge_pmd_set_accessed(struct vm_fault *vmf); int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, pud_t *dst_pud, pud_t *src_pud, unsigned long addr, struct vm_area_struct *vma); +int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pud, + unsigned long addr); +int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, + pud_t *pudp, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags); +void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, + unsigned long address); +spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma); -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud); -#else -static inline void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) +static inline spinlock_t * +pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) { + if (pud_trans_huge(*pud) || pud_devmap(*pud)) + return __pud_trans_huge_lock(pud, vma); + else + return NULL; } -#endif +#define split_huge_pud(__vma, __pud, __address) \ + do { \ + pud_t *____pud = (__pud); \ + if (pud_trans_huge(*____pud) || pud_devmap(*____pud)) \ + __split_huge_pud(__vma, __pud, __address); \ + } while (0) +#else /* CONFIG_PGTABLE_HAS_PUD_LEAVES */ +static inline void +huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) +{ +} + +static inline int +change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, + pud_t *pudp, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags) +{ + return 0; +} + +static inline spinlock_t * +pud_trans_huge_lock(pud_t *pud, + struct vm_area_struct *vma) +{ + return NULL; +} + +static inline void +touch_pud(struct vm_area_struct *vma, unsigned long addr, + pud_t *pud, bool write) +{ +} + +static inline int +copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, + pud_t *dst_pud, pud_t *src_pud, unsigned long addr, + struct vm_area_struct *vma) +{ + return 0; +} + +static inline int +zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pud, + unsigned long addr) +{ + return 0; +} + +static inline void +__split_huge_pud(struct vm_area_struct *vma, pud_t *pud, unsigned long address) +{ +} + +#define split_huge_pud(__vma, __pud, __address) do {} while (0) +#endif /* CONFIG_PGTABLE_HAS_PUD_LEAVES */ + +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES +void touch_pmd(struct vm_area_struct *vma, unsigned long addr, + pmd_t *pmd, bool write); +pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); +int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, + pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, + struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma); +void huge_pmd_set_accessed(struct vm_fault *vmf); vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf); -bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, - pmd_t *pmd, unsigned long addr, unsigned long next); int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr); -int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pud, - unsigned long addr); bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd); int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, pgprot_t newprot, unsigned long cp_flags); +void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, bool freeze, struct folio *folio); +void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmd, bool freeze, struct folio *folio); +void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, + bool freeze, struct folio *folio); +spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma); +bool can_change_pmd_writable(struct vm_area_struct *vma, unsigned long addr, + pmd_t pmd); +void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd); + +static inline int is_swap_pmd(pmd_t pmd) +{ + return !pmd_none(pmd) && !pmd_present(pmd); +} + +/* mmap_lock must be held on entry */ +static inline spinlock_t * +pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) +{ + if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) + return __pmd_trans_huge_lock(pmd, vma); + else + return NULL; +} + +#define split_huge_pmd(__vma, __pmd, __address) \ + do { \ + pmd_t *____pmd = (__pmd); \ + if (is_swap_pmd(*____pmd) || pmd_is_leaf(*____pmd)) \ + __split_huge_pmd(__vma, __pmd, __address, \ + false, NULL); \ + } while (0) +#else /* CONFIG_PGTABLE_HAS_PMD_LEAVES */ +static inline spinlock_t * +pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) +{ + return NULL; +} + +static inline int is_swap_pmd(pmd_t pmd) +{ + return 0; +} +static inline void +__split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, bool freeze, struct folio *folio) +{ +} +#define split_huge_pmd(__vma, __pmd, __address) do {} while (0) + +static inline int +copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, + pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, + struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) +{ + return 0; +} + +static inline int +zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, + unsigned long addr) +{ + return 0; +} + +static inline vm_fault_t +do_huge_pmd_wp_page(struct vm_fault *vmf) +{ + return 0; +} + +static inline void +huge_pmd_set_accessed(struct vm_fault *vmf) +{ +} + +static inline int +change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, + pmd_t *pmd, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags) +{ + return 0; +} + +static inline bool +move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, + unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd) +{ + return false; +} + +static inline void +split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmd, bool freeze, struct folio *folio) +{ +} + +static inline void +split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, + bool freeze, struct folio *folio) +{ +} +#endif /* CONFIG_PGTABLE_HAS_PMD_LEAVES */ + +bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, + pmd_t *pmd, unsigned long addr, unsigned long next); vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write); vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write); +struct folio *mm_get_huge_zero_folio(struct mm_struct *mm); enum transparent_hugepage_flag { TRANSPARENT_HUGEPAGE_UNSUPPORTED, @@ -130,6 +301,9 @@ extern unsigned long huge_anon_orders_always; extern unsigned long huge_anon_orders_madvise; extern unsigned long huge_anon_orders_inherit; +void __split_huge_zero_page_pmd(struct vm_area_struct *vma, + unsigned long haddr, pmd_t *pmd); + static inline bool hugepage_global_enabled(void) { return transparent_hugepage_flags & @@ -332,44 +506,6 @@ static inline int split_huge_page(struct page *page) } void deferred_split_folio(struct folio *folio); -void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, bool freeze, struct folio *folio); - -#define split_huge_pmd(__vma, __pmd, __address) \ - do { \ - pmd_t *____pmd = (__pmd); \ - if (is_swap_pmd(*____pmd) || pmd_trans_huge(*____pmd) \ - || pmd_devmap(*____pmd)) \ - __split_huge_pmd(__vma, __pmd, __address, \ - false, NULL); \ - } while (0) - - -void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, - bool freeze, struct folio *folio); - -void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, - unsigned long address); - -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, - pud_t *pudp, unsigned long addr, pgprot_t newprot, - unsigned long cp_flags); -#else -static inline int -change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, - pud_t *pudp, unsigned long addr, pgprot_t newprot, - unsigned long cp_flags) { return 0; } -#endif - -#define split_huge_pud(__vma, __pud, __address) \ - do { \ - pud_t *____pud = (__pud); \ - if (pud_trans_huge(*____pud) \ - || pud_devmap(*____pud)) \ - __split_huge_pud(__vma, __pud, __address); \ - } while (0) - int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice); int madvise_collapse(struct vm_area_struct *vma, @@ -377,31 +513,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start, unsigned long end); void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, long adjust_next); -spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma); -spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma); - -static inline int is_swap_pmd(pmd_t pmd) -{ - return !pmd_none(pmd) && !pmd_present(pmd); -} - -/* mmap_lock must be held on entry */ -static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd, - struct vm_area_struct *vma) -{ - if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) - return __pmd_trans_huge_lock(pmd, vma); - else - return NULL; -} -static inline spinlock_t *pud_trans_huge_lock(pud_t *pud, - struct vm_area_struct *vma) -{ - if (pud_trans_huge(*pud) || pud_devmap(*pud)) - return __pud_trans_huge_lock(pud, vma); - else - return NULL; -} /** * folio_test_pmd_mappable - Can we map this folio with a PMD? @@ -416,6 +527,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, int flags, struct dev_pagemap **pgmap); vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf); +vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf); extern struct folio *huge_zero_folio; extern unsigned long huge_zero_pfn; @@ -445,13 +557,17 @@ static inline bool thp_migration_supported(void) return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION); } -void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, - pmd_t *pmd, bool freeze, struct folio *folio); bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, struct folio *folio); #else /* CONFIG_TRANSPARENT_HUGEPAGE */ +static inline void +__split_huge_zero_page_pmd(struct vm_area_struct *vma, + unsigned long haddr, pmd_t *pmd) +{ +} + static inline bool folio_test_pmd_mappable(struct folio *folio) { return false; @@ -505,16 +621,6 @@ static inline int split_huge_page(struct page *page) return 0; } static inline void deferred_split_folio(struct folio *folio) {} -#define split_huge_pmd(__vma, __pmd, __address) \ - do { } while (0) - -static inline void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, bool freeze, struct folio *folio) {} -static inline void split_huge_pmd_address(struct vm_area_struct *vma, - unsigned long address, bool freeze, struct folio *folio) {} -static inline void split_huge_pmd_locked(struct vm_area_struct *vma, - unsigned long address, pmd_t *pmd, - bool freeze, struct folio *folio) {} static inline bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, @@ -523,9 +629,6 @@ static inline bool unmap_huge_pmd_locked(struct vm_area_struct *vma, return false; } -#define split_huge_pud(__vma, __pmd, __address) \ - do { } while (0) - static inline int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice) { @@ -545,20 +648,6 @@ static inline void vma_adjust_trans_huge(struct vm_area_struct *vma, long adjust_next) { } -static inline int is_swap_pmd(pmd_t pmd) -{ - return 0; -} -static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd, - struct vm_area_struct *vma) -{ - return NULL; -} -static inline spinlock_t *pud_trans_huge_lock(pud_t *pud, - struct vm_area_struct *vma) -{ - return NULL; -} static inline vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) { @@ -606,15 +695,8 @@ static inline int next_order(unsigned long *orders, int prev) return 0; } -static inline void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, - unsigned long address) -{ -} - -static inline int change_huge_pud(struct mmu_gather *tlb, - struct vm_area_struct *vma, pud_t *pudp, - unsigned long addr, pgprot_t newprot, - unsigned long cp_flags) +static inline vm_fault_t +do_huge_pmd_anonymous_page(struct vm_fault *vmf) { return 0; } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 5a5aaee5fa1c..5e505373b113 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -628,8 +628,8 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, #endif /* __HAVE_ARCH_PUDP_HUGE_GET_AND_CLEAR */ #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -#ifndef __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR_FULL +#if defined(CONFIG_PGTABLE_HAS_PMD_LEAVES) && \ + !defined(__HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR_FULL) static inline pmd_t pmdp_huge_get_and_clear_full(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp, int full) @@ -638,14 +638,14 @@ static inline pmd_t pmdp_huge_get_and_clear_full(struct vm_area_struct *vma, } #endif -#ifndef __HAVE_ARCH_PUDP_HUGE_GET_AND_CLEAR_FULL +#if defined(CONFIG_PGTABLE_HAS_PUD_LEAVES) && \ + !defined(__HAVE_ARCH_PUDP_HUGE_GET_AND_CLEAR_FULL) static inline pud_t pudp_huge_get_and_clear_full(struct vm_area_struct *vma, unsigned long address, pud_t *pudp, int full) { return pudp_huge_get_and_clear(vma->vm_mm, address, pudp); } -#endif #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR_FULL @@ -894,9 +894,9 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif + #ifndef __HAVE_ARCH_PUDP_SET_WRPROTECT -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES static inline void pudp_set_wrprotect(struct mm_struct *mm, unsigned long address, pud_t *pudp) { @@ -910,8 +910,7 @@ static inline void pudp_set_wrprotect(struct mm_struct *mm, { BUILD_BUG(); } -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ +#endif /* CONFIG_PGTABLE_HAS_PUD_LEAVES */ #endif #ifndef pmdp_collapse_flush @@ -1735,7 +1734,6 @@ static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ #ifndef __HAVE_ARCH_FLUSH_PMD_TLB_RANGE -#ifdef CONFIG_TRANSPARENT_HUGEPAGE /* * ARCHes with special requirements for evicting THP backing TLB entries can * implement this. Otherwise also, it can help optimize normal TLB flush in @@ -1745,10 +1743,15 @@ static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) * invalidate the entire TLB which is not desirable. * e.g. see arch/arc: flush_pmd_tlb_range */ +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES #define flush_pmd_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) -#define flush_pud_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) #else #define flush_pmd_tlb_range(vma, addr, end) BUILD_BUG() +#endif + +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES +#define flush_pud_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) +#else #define flush_pud_tlb_range(vma, addr, end) BUILD_BUG() #endif #endif diff --git a/include/trace/events/huge_mapping.h b/include/trace/events/huge_mapping.h new file mode 100644 index 000000000000..20036d090ce5 --- /dev/null +++ b/include/trace/events/huge_mapping.h @@ -0,0 +1,41 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM huge_mapping + +#if !defined(_TRACE_HUGE_MAPPING_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_HUGE_MAPPING_H + +#include +#include + +DECLARE_EVENT_CLASS(migration_pmd, + + TP_PROTO(unsigned long addr, unsigned long pmd), + + TP_ARGS(addr, pmd), + + TP_STRUCT__entry( + __field(unsigned long, addr) + __field(unsigned long, pmd) + ), + + TP_fast_assign( + __entry->addr = addr; + __entry->pmd = pmd; + ), + TP_printk("addr=%lx, pmd=%lx", __entry->addr, __entry->pmd) +); + +DEFINE_EVENT(migration_pmd, set_migration_pmd, + TP_PROTO(unsigned long addr, unsigned long pmd), + TP_ARGS(addr, pmd) +); + +DEFINE_EVENT(migration_pmd, remove_migration_pmd, + TP_PROTO(unsigned long addr, unsigned long pmd), + TP_ARGS(addr, pmd) +); +#endif /* _TRACE_HUGE_MAPPING_H */ + +/* This part must be outside protection */ +#include diff --git a/include/trace/events/thp.h b/include/trace/events/thp.h index f50048af5fcc..395b574b1c79 100644 --- a/include/trace/events/thp.h +++ b/include/trace/events/thp.h @@ -66,34 +66,6 @@ DEFINE_EVENT(hugepage_update, hugepage_update_pud, TP_PROTO(unsigned long addr, unsigned long pud, unsigned long clr, unsigned long set), TP_ARGS(addr, pud, clr, set) ); - -DECLARE_EVENT_CLASS(migration_pmd, - - TP_PROTO(unsigned long addr, unsigned long pmd), - - TP_ARGS(addr, pmd), - - TP_STRUCT__entry( - __field(unsigned long, addr) - __field(unsigned long, pmd) - ), - - TP_fast_assign( - __entry->addr = addr; - __entry->pmd = pmd; - ), - TP_printk("addr=%lx, pmd=%lx", __entry->addr, __entry->pmd) -); - -DEFINE_EVENT(migration_pmd, set_migration_pmd, - TP_PROTO(unsigned long addr, unsigned long pmd), - TP_ARGS(addr, pmd) -); - -DEFINE_EVENT(migration_pmd, remove_migration_pmd, - TP_PROTO(unsigned long addr, unsigned long pmd), - TP_ARGS(addr, pmd) -); #endif /* _TRACE_THP_H */ /* This part must be outside protection */ diff --git a/mm/Makefile b/mm/Makefile index d2915f8c9dc0..3a846121b1f5 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -95,6 +95,8 @@ obj-$(CONFIG_MIGRATION) += migrate.o obj-$(CONFIG_NUMA) += memory-tiers.o obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o +obj-$(CONFIG_PGTABLE_HAS_PMD_LEAVES) += huge_mapping_pmd.o +obj-$(CONFIG_PGTABLE_HAS_PUD_LEAVES) += huge_mapping_pud.o obj-$(CONFIG_PAGE_COUNTER) += page_counter.o obj-$(CONFIG_MEMCG_V1) += memcontrol-v1.o obj-$(CONFIG_MEMCG) += memcontrol.o vmpressure.o diff --git a/mm/huge_mapping_pmd.c b/mm/huge_mapping_pmd.c new file mode 100644 index 000000000000..7b85e2a564d6 --- /dev/null +++ b/mm/huge_mapping_pmd.c @@ -0,0 +1,979 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2024 Red Hat, Inc. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include "internal.h" +#include "swap.h" + +#define CREATE_TRACE_POINTS +#include + +/* + * Returns page table lock pointer if a given pmd maps a thp, NULL otherwise. + * + * Note that if it returns page table lock pointer, this routine returns without + * unlocking page table lock. So callers must unlock it. + */ +spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) +{ + spinlock_t *ptl; + + ptl = pmd_lock(vma->vm_mm, pmd); + if (likely(is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || + pmd_devmap(*pmd))) + return ptl; + spin_unlock(ptl); + return NULL; +} + +pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) +{ + if (likely(vma->vm_flags & VM_WRITE)) + pmd = pmd_mkwrite(pmd, vma); + return pmd; +} + +void touch_pmd(struct vm_area_struct *vma, unsigned long addr, + pmd_t *pmd, bool write) +{ + pmd_t _pmd; + + _pmd = pmd_mkyoung(*pmd); + if (write) + _pmd = pmd_mkdirty(_pmd); + if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK, + pmd, _pmd, write)) + update_mmu_cache_pmd(vma, addr, pmd); +} + +int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, + pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, + struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) +{ + spinlock_t *dst_ptl, *src_ptl; + struct page *src_page; + struct folio *src_folio; + pmd_t pmd; + pgtable_t pgtable = NULL; + int ret = -ENOMEM; + + /* Skip if can be re-fill on fault */ + if (!vma_is_anonymous(dst_vma)) + return 0; + + pgtable = pte_alloc_one(dst_mm); + if (unlikely(!pgtable)) + goto out; + + dst_ptl = pmd_lock(dst_mm, dst_pmd); + src_ptl = pmd_lockptr(src_mm, src_pmd); + spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); + + ret = -EAGAIN; + pmd = *src_pmd; + +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION + if (unlikely(is_swap_pmd(pmd))) { + swp_entry_t entry = pmd_to_swp_entry(pmd); + + VM_BUG_ON(!is_pmd_migration_entry(pmd)); + if (!is_readable_migration_entry(entry)) { + entry = make_readable_migration_entry( + swp_offset(entry)); + pmd = swp_entry_to_pmd(entry); + if (pmd_swp_soft_dirty(*src_pmd)) + pmd = pmd_swp_mksoft_dirty(pmd); + if (pmd_swp_uffd_wp(*src_pmd)) + pmd = pmd_swp_mkuffd_wp(pmd); + set_pmd_at(src_mm, addr, src_pmd, pmd); + } + add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + if (!userfaultfd_wp(dst_vma)) + pmd = pmd_swp_clear_uffd_wp(pmd); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + ret = 0; + goto out_unlock; + } +#endif + + if (unlikely(!pmd_trans_huge(pmd))) { + pte_free(dst_mm, pgtable); + goto out_unlock; + } + /* + * When page table lock is held, the huge zero pmd should not be + * under splitting since we don't split the page itself, only pmd to + * a page table. + */ + if (is_huge_zero_pmd(pmd)) { + /* + * mm_get_huge_zero_folio() will never allocate a new + * folio here, since we already have a zero page to + * copy. It just takes a reference. + */ + mm_get_huge_zero_folio(dst_mm); + goto out_zero_page; + } + + src_page = pmd_page(pmd); + VM_BUG_ON_PAGE(!PageHead(src_page), src_page); + src_folio = page_folio(src_page); + + folio_get(src_folio); + if (unlikely(folio_try_dup_anon_rmap_pmd(src_folio, src_page, src_vma))) { + /* Page maybe pinned: split and retry the fault on PTEs. */ + folio_put(src_folio); + pte_free(dst_mm, pgtable); + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + __split_huge_pmd(src_vma, src_pmd, addr, false, NULL); + return -EAGAIN; + } + add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); +out_zero_page: + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); + pmdp_set_wrprotect(src_mm, addr, src_pmd); + if (!userfaultfd_wp(dst_vma)) + pmd = pmd_clear_uffd_wp(pmd); + pmd = pmd_mkold(pmd_wrprotect(pmd)); + set_pmd_at(dst_mm, addr, dst_pmd, pmd); + + ret = 0; +out_unlock: + spin_unlock(src_ptl); + spin_unlock(dst_ptl); +out: + return ret; +} + +void huge_pmd_set_accessed(struct vm_fault *vmf) +{ + bool write = vmf->flags & FAULT_FLAG_WRITE; + + vmf->ptl = pmd_lock(vmf->vma->vm_mm, vmf->pmd); + if (unlikely(!pmd_same(*vmf->pmd, vmf->orig_pmd))) + goto unlock; + + touch_pmd(vmf->vma, vmf->address, vmf->pmd, write); + +unlock: + spin_unlock(vmf->ptl); +} + +vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf) +{ + const bool unshare = vmf->flags & FAULT_FLAG_UNSHARE; + struct vm_area_struct *vma = vmf->vma; + struct folio *folio; + struct page *page; + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; + pmd_t orig_pmd = vmf->orig_pmd; + + vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd); + VM_BUG_ON_VMA(!vma->anon_vma, vma); + + if (is_huge_zero_pmd(orig_pmd)) + goto fallback; + + spin_lock(vmf->ptl); + + if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) { + spin_unlock(vmf->ptl); + return 0; + } + + page = pmd_page(orig_pmd); + folio = page_folio(page); + VM_BUG_ON_PAGE(!PageHead(page), page); + + /* Early check when only holding the PT lock. */ + if (PageAnonExclusive(page)) + goto reuse; + + if (!folio_trylock(folio)) { + folio_get(folio); + spin_unlock(vmf->ptl); + folio_lock(folio); + spin_lock(vmf->ptl); + if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) { + spin_unlock(vmf->ptl); + folio_unlock(folio); + folio_put(folio); + return 0; + } + folio_put(folio); + } + + /* Recheck after temporarily dropping the PT lock. */ + if (PageAnonExclusive(page)) { + folio_unlock(folio); + goto reuse; + } + + /* + * See do_wp_page(): we can only reuse the folio exclusively if + * there are no additional references. Note that we always drain + * the LRU cache immediately after adding a THP. + */ + if (folio_ref_count(folio) > + 1 + folio_test_swapcache(folio) * folio_nr_pages(folio)) + goto unlock_fallback; + if (folio_test_swapcache(folio)) + folio_free_swap(folio); + if (folio_ref_count(folio) == 1) { + pmd_t entry; + + folio_move_anon_rmap(folio, vma); + SetPageAnonExclusive(page); + folio_unlock(folio); +reuse: + if (unlikely(unshare)) { + spin_unlock(vmf->ptl); + return 0; + } + entry = pmd_mkyoung(orig_pmd); + entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); + if (pmdp_set_access_flags(vma, haddr, vmf->pmd, entry, 1)) + update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); + spin_unlock(vmf->ptl); + return 0; + } + +unlock_fallback: + folio_unlock(folio); + spin_unlock(vmf->ptl); +fallback: + __split_huge_pmd(vma, vmf->pmd, vmf->address, false, NULL); + return VM_FAULT_FALLBACK; +} + +bool can_change_pmd_writable(struct vm_area_struct *vma, unsigned long addr, + pmd_t pmd) +{ + struct page *page; + + if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE))) + return false; + + /* Don't touch entries that are not even readable (NUMA hinting). */ + if (pmd_protnone(pmd)) + return false; + + /* Do we need write faults for softdirty tracking? */ + if (pmd_needs_soft_dirty_wp(vma, pmd)) + return false; + + /* Do we need write faults for uffd-wp tracking? */ + if (userfaultfd_huge_pmd_wp(vma, pmd)) + return false; + + if (!(vma->vm_flags & VM_SHARED)) { + /* See can_change_pte_writable(). */ + page = vm_normal_page_pmd(vma, addr, pmd); + return page && PageAnon(page) && PageAnonExclusive(page); + } + + /* See can_change_pte_writable(). */ + return pmd_dirty(pmd); +} + +void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) +{ + pgtable_t pgtable; + + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pte_free(mm, pgtable); + mm_dec_nr_ptes(mm); +} + +int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, + pmd_t *pmd, unsigned long addr) +{ + pmd_t orig_pmd; + spinlock_t *ptl; + + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + + ptl = __pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + /* + * For architectures like ppc64 we look at deposited pgtable + * when calling pmdp_huge_get_and_clear. So do the + * pgtable_trans_huge_withdraw after finishing pmdp related + * operations. + */ + orig_pmd = pmdp_huge_get_and_clear_full(vma, addr, pmd, + tlb->fullmm); + arch_check_zapped_pmd(vma, orig_pmd); + tlb_remove_pmd_tlb_entry(tlb, pmd, addr); + if (vma_is_special_huge(vma)) { + if (arch_needs_pgtable_deposit()) + zap_deposited_table(tlb->mm, pmd); + spin_unlock(ptl); + } else if (is_huge_zero_pmd(orig_pmd)) { + zap_deposited_table(tlb->mm, pmd); + spin_unlock(ptl); + } else { + struct folio *folio = NULL; + int flush_needed = 1; + + if (pmd_present(orig_pmd)) { + struct page *page = pmd_page(orig_pmd); + + folio = page_folio(page); + folio_remove_rmap_pmd(folio, page, vma); + WARN_ON_ONCE(folio_mapcount(folio) < 0); + VM_BUG_ON_PAGE(!PageHead(page), page); + } else if (thp_migration_supported()) { + swp_entry_t entry; + + VM_BUG_ON(!is_pmd_migration_entry(orig_pmd)); + entry = pmd_to_swp_entry(orig_pmd); + folio = pfn_swap_entry_folio(entry); + flush_needed = 0; + } else + WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); + + if (folio_test_anon(folio)) { + zap_deposited_table(tlb->mm, pmd); + add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); + } else { + if (arch_needs_pgtable_deposit()) + zap_deposited_table(tlb->mm, pmd); + add_mm_counter(tlb->mm, mm_counter_file(folio), + -HPAGE_PMD_NR); + } + + spin_unlock(ptl); + if (flush_needed) + tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE); + } + return 1; +} + +static pmd_t move_soft_dirty_pmd(pmd_t pmd) +{ +#ifdef CONFIG_MEM_SOFT_DIRTY + if (unlikely(is_pmd_migration_entry(pmd))) + pmd = pmd_swp_mksoft_dirty(pmd); + else if (pmd_present(pmd)) + pmd = pmd_mksoft_dirty(pmd); +#endif + return pmd; +} + +#ifndef pmd_move_must_withdraw +static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl, + spinlock_t *old_pmd_ptl, + struct vm_area_struct *vma) +{ + /* + * With split pmd lock we also need to move preallocated + * PTE page table if new_pmd is on different PMD page table. + * + * We also don't deposit and withdraw tables for file pages. + */ + return (new_pmd_ptl != old_pmd_ptl) && vma_is_anonymous(vma); +} +#endif + +bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, + unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd) +{ + spinlock_t *old_ptl, *new_ptl; + pmd_t pmd; + struct mm_struct *mm = vma->vm_mm; + bool force_flush = false; + + /* + * The destination pmd shouldn't be established, free_pgtables() + * should have released it; but move_page_tables() might have already + * inserted a page table, if racing against shmem/file collapse. + */ + if (!pmd_none(*new_pmd)) { + VM_BUG_ON(pmd_trans_huge(*new_pmd)); + return false; + } + + /* + * We don't have to worry about the ordering of src and dst + * ptlocks because exclusive mmap_lock prevents deadlock. + */ + old_ptl = __pmd_trans_huge_lock(old_pmd, vma); + if (old_ptl) { + new_ptl = pmd_lockptr(mm, new_pmd); + if (new_ptl != old_ptl) + spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); + pmd = pmdp_huge_get_and_clear(mm, old_addr, old_pmd); + if (pmd_present(pmd)) + force_flush = true; + VM_BUG_ON(!pmd_none(*new_pmd)); + + if (pmd_move_must_withdraw(new_ptl, old_ptl, vma)) { + pgtable_t pgtable; + pgtable = pgtable_trans_huge_withdraw(mm, old_pmd); + pgtable_trans_huge_deposit(mm, new_pmd, pgtable); + } + pmd = move_soft_dirty_pmd(pmd); + set_pmd_at(mm, new_addr, new_pmd, pmd); + if (force_flush) + flush_pmd_tlb_range(vma, old_addr, old_addr + PMD_SIZE); + if (new_ptl != old_ptl) + spin_unlock(new_ptl); + spin_unlock(old_ptl); + return true; + } + return false; +} + +/* + * Returns + * - 0 if PMD could not be locked + * - 1 if PMD was locked but protections unchanged and TLB flush unnecessary + * or if prot_numa but THP migration is not supported + * - HPAGE_PMD_NR if protections changed and TLB flush necessary + */ +int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, + pmd_t *pmd, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags) +{ + struct mm_struct *mm = vma->vm_mm; + spinlock_t *ptl; + pmd_t oldpmd, entry; + bool prot_numa = cp_flags & MM_CP_PROT_NUMA; + bool uffd_wp = cp_flags & MM_CP_UFFD_WP; + bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; + int ret = 1; + + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + + if (prot_numa && !thp_migration_supported()) + return 1; + + ptl = __pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION + if (is_swap_pmd(*pmd)) { + swp_entry_t entry = pmd_to_swp_entry(*pmd); + struct folio *folio = pfn_swap_entry_folio(entry); + pmd_t newpmd; + + VM_BUG_ON(!is_pmd_migration_entry(*pmd)); + if (is_writable_migration_entry(entry)) { + /* + * A protection check is difficult so + * just be safe and disable write + */ + if (folio_test_anon(folio)) + entry = make_readable_exclusive_migration_entry(swp_offset(entry)); + else + entry = make_readable_migration_entry(swp_offset(entry)); + newpmd = swp_entry_to_pmd(entry); + if (pmd_swp_soft_dirty(*pmd)) + newpmd = pmd_swp_mksoft_dirty(newpmd); + } else { + newpmd = *pmd; + } + + if (uffd_wp) + newpmd = pmd_swp_mkuffd_wp(newpmd); + else if (uffd_wp_resolve) + newpmd = pmd_swp_clear_uffd_wp(newpmd); + if (!pmd_same(*pmd, newpmd)) + set_pmd_at(mm, addr, pmd, newpmd); + goto unlock; + } +#endif + + if (prot_numa) { + struct folio *folio; + bool toptier; + /* + * Avoid trapping faults against the zero page. The read-only + * data is likely to be read-cached on the local CPU and + * local/remote hits to the zero page are not interesting. + */ + if (is_huge_zero_pmd(*pmd)) + goto unlock; + + if (pmd_protnone(*pmd)) + goto unlock; + + folio = pmd_folio(*pmd); + toptier = node_is_toptier(folio_nid(folio)); + /* + * Skip scanning top tier node if normal numa + * balancing is disabled + */ + if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) && + toptier) + goto unlock; + + if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING && + !toptier) + folio_xchg_access_time(folio, + jiffies_to_msecs(jiffies)); + } + /* + * In case prot_numa, we are under mmap_read_lock(mm). It's critical + * to not clear pmd intermittently to avoid race with MADV_DONTNEED + * which is also under mmap_read_lock(mm): + * + * CPU0: CPU1: + * change_huge_pmd(prot_numa=1) + * pmdp_huge_get_and_clear_notify() + * madvise_dontneed() + * zap_pmd_range() + * pmd_trans_huge(*pmd) == 0 (without ptl) + * // skip the pmd + * set_pmd_at(); + * // pmd is re-established + * + * The race makes MADV_DONTNEED miss the huge pmd and don't clear it + * which may break userspace. + * + * pmdp_invalidate_ad() is required to make sure we don't miss + * dirty/young flags set by hardware. + */ + oldpmd = pmdp_invalidate_ad(vma, addr, pmd); + + entry = pmd_modify(oldpmd, newprot); + if (uffd_wp) + entry = pmd_mkuffd_wp(entry); + else if (uffd_wp_resolve) + /* + * Leave the write bit to be handled by PF interrupt + * handler, then things like COW could be properly + * handled. + */ + entry = pmd_clear_uffd_wp(entry); + + /* See change_pte_range(). */ + if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pmd_write(entry) && + can_change_pmd_writable(vma, addr, entry)) + entry = pmd_mkwrite(entry, vma); + + ret = HPAGE_PMD_NR; + set_pmd_at(mm, addr, pmd, entry); + + if (huge_pmd_needs_flush(oldpmd, entry)) + tlb_flush_pmd_range(tlb, addr, HPAGE_PMD_SIZE); +unlock: + spin_unlock(ptl); + return ret; +} + +static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long haddr, bool freeze) +{ + struct mm_struct *mm = vma->vm_mm; + struct folio *folio; + struct page *page; + pgtable_t pgtable; + pmd_t old_pmd, _pmd; + bool young, write, soft_dirty, pmd_migration = false, uffd_wp = false; + bool anon_exclusive = false, dirty = false; + unsigned long addr; + pte_t *pte; + int i; + + VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); + VM_BUG_ON_VMA(vma->vm_start > haddr, vma); + VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); + VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) && + !pmd_devmap(*pmd)); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + count_vm_event(THP_SPLIT_PMD); +#endif + + if (!vma_is_anonymous(vma)) { + old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd); + /* + * We are going to unmap this huge page. So + * just go ahead and zap it + */ + if (arch_needs_pgtable_deposit()) + zap_deposited_table(mm, pmd); + if (vma_is_special_huge(vma)) + return; + if (unlikely(is_pmd_migration_entry(old_pmd))) { + swp_entry_t entry; + + entry = pmd_to_swp_entry(old_pmd); + folio = pfn_swap_entry_folio(entry); + } else { + page = pmd_page(old_pmd); + folio = page_folio(page); + if (!folio_test_dirty(folio) && pmd_dirty(old_pmd)) + folio_mark_dirty(folio); + if (!folio_test_referenced(folio) && pmd_young(old_pmd)) + folio_set_referenced(folio); + folio_remove_rmap_pmd(folio, page, vma); + folio_put(folio); + } + add_mm_counter(mm, mm_counter_file(folio), -HPAGE_PMD_NR); + return; + } + + if (is_huge_zero_pmd(*pmd)) { + /* + * FIXME: Do we want to invalidate secondary mmu by calling + * mmu_notifier_arch_invalidate_secondary_tlbs() see comments below + * inside __split_huge_pmd() ? + * + * We are going from a zero huge page write protected to zero + * small page also write protected so it does not seems useful + * to invalidate secondary mmu at this time. + */ + return __split_huge_zero_page_pmd(vma, haddr, pmd); + } + + pmd_migration = is_pmd_migration_entry(*pmd); + if (unlikely(pmd_migration)) { + swp_entry_t entry; + + old_pmd = *pmd; + entry = pmd_to_swp_entry(old_pmd); + page = pfn_swap_entry_to_page(entry); + write = is_writable_migration_entry(entry); + if (PageAnon(page)) + anon_exclusive = is_readable_exclusive_migration_entry(entry); + young = is_migration_entry_young(entry); + dirty = is_migration_entry_dirty(entry); + soft_dirty = pmd_swp_soft_dirty(old_pmd); + uffd_wp = pmd_swp_uffd_wp(old_pmd); + } else { + /* + * Up to this point the pmd is present and huge and userland has + * the whole access to the hugepage during the split (which + * happens in place). If we overwrite the pmd with the not-huge + * version pointing to the pte here (which of course we could if + * all CPUs were bug free), userland could trigger a small page + * size TLB miss on the small sized TLB while the hugepage TLB + * entry is still established in the huge TLB. Some CPU doesn't + * like that. See + * http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum + * 383 on page 105. Intel should be safe but is also warns that + * it's only safe if the permission and cache attributes of the + * two entries loaded in the two TLB is identical (which should + * be the case here). But it is generally safer to never allow + * small and huge TLB entries for the same virtual address to be + * loaded simultaneously. So instead of doing "pmd_populate(); + * flush_pmd_tlb_range();" we first mark the current pmd + * notpresent (atomically because here the pmd_trans_huge must + * remain set at all times on the pmd until the split is + * complete for this pmd), then we flush the SMP TLB and finally + * we write the non-huge version of the pmd entry with + * pmd_populate. + */ + old_pmd = pmdp_invalidate(vma, haddr, pmd); + page = pmd_page(old_pmd); + folio = page_folio(page); + if (pmd_dirty(old_pmd)) { + dirty = true; + folio_set_dirty(folio); + } + write = pmd_write(old_pmd); + young = pmd_young(old_pmd); + soft_dirty = pmd_soft_dirty(old_pmd); + uffd_wp = pmd_uffd_wp(old_pmd); + + VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio); + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); + + /* + * Without "freeze", we'll simply split the PMD, propagating the + * PageAnonExclusive() flag for each PTE by setting it for + * each subpage -- no need to (temporarily) clear. + * + * With "freeze" we want to replace mapped pages by + * migration entries right away. This is only possible if we + * managed to clear PageAnonExclusive() -- see + * set_pmd_migration_entry(). + * + * In case we cannot clear PageAnonExclusive(), split the PMD + * only and let try_to_migrate_one() fail later. + * + * See folio_try_share_anon_rmap_pmd(): invalidate PMD first. + */ + anon_exclusive = PageAnonExclusive(page); + if (freeze && anon_exclusive && + folio_try_share_anon_rmap_pmd(folio, page)) + freeze = false; + if (!freeze) { + rmap_t rmap_flags = RMAP_NONE; + + folio_ref_add(folio, HPAGE_PMD_NR - 1); + if (anon_exclusive) + rmap_flags |= RMAP_EXCLUSIVE; + folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR, + vma, haddr, rmap_flags); + } + } + + /* + * Withdraw the table only after we mark the pmd entry invalid. + * This's critical for some architectures (Power). + */ + pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pmd_populate(mm, &_pmd, pgtable); + + pte = pte_offset_map(&_pmd, haddr); + VM_BUG_ON(!pte); + + /* + * Note that NUMA hinting access restrictions are not transferred to + * avoid any possibility of altering permissions across VMAs. + */ + if (freeze || pmd_migration) { + for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) { + pte_t entry; + swp_entry_t swp_entry; + + if (write) + swp_entry = make_writable_migration_entry( + page_to_pfn(page + i)); + else if (anon_exclusive) + swp_entry = make_readable_exclusive_migration_entry( + page_to_pfn(page + i)); + else + swp_entry = make_readable_migration_entry( + page_to_pfn(page + i)); + if (young) + swp_entry = make_migration_entry_young(swp_entry); + if (dirty) + swp_entry = make_migration_entry_dirty(swp_entry); + entry = swp_entry_to_pte(swp_entry); + if (soft_dirty) + entry = pte_swp_mksoft_dirty(entry); + if (uffd_wp) + entry = pte_swp_mkuffd_wp(entry); + + VM_WARN_ON(!pte_none(ptep_get(pte + i))); + set_pte_at(mm, addr, pte + i, entry); + } + } else { + pte_t entry; + + entry = mk_pte(page, READ_ONCE(vma->vm_page_prot)); + if (write) + entry = pte_mkwrite(entry, vma); + if (!young) + entry = pte_mkold(entry); + /* NOTE: this may set soft-dirty too on some archs */ + if (dirty) + entry = pte_mkdirty(entry); + if (soft_dirty) + entry = pte_mksoft_dirty(entry); + if (uffd_wp) + entry = pte_mkuffd_wp(entry); + + for (i = 0; i < HPAGE_PMD_NR; i++) + VM_WARN_ON(!pte_none(ptep_get(pte + i))); + + set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR); + } + pte_unmap(pte); + + if (!pmd_migration) + folio_remove_rmap_pmd(folio, page, vma); + if (freeze) + put_page(page); + + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(mm, pmd, pgtable); +} + +void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmd, bool freeze, struct folio *folio) +{ + VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio)); + VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE)); + VM_WARN_ON_ONCE(folio && !folio_test_locked(folio)); + VM_BUG_ON(freeze && !folio); + + /* + * When the caller requests to set up a migration entry, we + * require a folio to check the PMD against. Otherwise, there + * is a risk of replacing the wrong folio. + */ + if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || + is_pmd_migration_entry(*pmd)) { + if (folio && folio != pmd_folio(*pmd)) + return; + __split_huge_pmd_locked(vma, pmd, address, freeze); + } +} + +void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, + unsigned long address, bool freeze, struct folio *folio) +{ + spinlock_t *ptl; + struct mmu_notifier_range range; + + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, + address & HPAGE_PMD_MASK, + (address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); + ptl = pmd_lock(vma->vm_mm, pmd); + split_huge_pmd_locked(vma, range.start, pmd, freeze, folio); + spin_unlock(ptl); + mmu_notifier_invalidate_range_end(&range); +} + +void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, + bool freeze, struct folio *folio) +{ + pmd_t *pmd = mm_find_pmd(vma->vm_mm, address); + + if (!pmd) + return; + + __split_huge_pmd(vma, pmd, address, freeze, folio); +} + +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, + struct page *page) +{ + struct folio *folio = page_folio(page); + struct vm_area_struct *vma = pvmw->vma; + struct mm_struct *mm = vma->vm_mm; + unsigned long address = pvmw->address; + bool anon_exclusive; + pmd_t pmdval; + swp_entry_t entry; + pmd_t pmdswp; + + if (!(pvmw->pmd && !pvmw->pte)) + return 0; + + flush_cache_range(vma, address, address + HPAGE_PMD_SIZE); + pmdval = pmdp_invalidate(vma, address, pvmw->pmd); + + /* See folio_try_share_anon_rmap_pmd(): invalidate PMD first. */ + anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page); + if (anon_exclusive && folio_try_share_anon_rmap_pmd(folio, page)) { + set_pmd_at(mm, address, pvmw->pmd, pmdval); + return -EBUSY; + } + + if (pmd_dirty(pmdval)) + folio_mark_dirty(folio); + if (pmd_write(pmdval)) + entry = make_writable_migration_entry(page_to_pfn(page)); + else if (anon_exclusive) + entry = make_readable_exclusive_migration_entry(page_to_pfn(page)); + else + entry = make_readable_migration_entry(page_to_pfn(page)); + if (pmd_young(pmdval)) + entry = make_migration_entry_young(entry); + if (pmd_dirty(pmdval)) + entry = make_migration_entry_dirty(entry); + pmdswp = swp_entry_to_pmd(entry); + if (pmd_soft_dirty(pmdval)) + pmdswp = pmd_swp_mksoft_dirty(pmdswp); + if (pmd_uffd_wp(pmdval)) + pmdswp = pmd_swp_mkuffd_wp(pmdswp); + set_pmd_at(mm, address, pvmw->pmd, pmdswp); + folio_remove_rmap_pmd(folio, page, vma); + folio_put(folio); + trace_set_migration_pmd(address, pmd_val(pmdswp)); + + return 0; +} + +void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) +{ + struct folio *folio = page_folio(new); + struct vm_area_struct *vma = pvmw->vma; + struct mm_struct *mm = vma->vm_mm; + unsigned long address = pvmw->address; + unsigned long haddr = address & HPAGE_PMD_MASK; + pmd_t pmde; + swp_entry_t entry; + + if (!(pvmw->pmd && !pvmw->pte)) + return; + + entry = pmd_to_swp_entry(*pvmw->pmd); + folio_get(folio); + pmde = mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot)); + if (pmd_swp_soft_dirty(*pvmw->pmd)) + pmde = pmd_mksoft_dirty(pmde); + if (is_writable_migration_entry(entry)) + pmde = pmd_mkwrite(pmde, vma); + if (pmd_swp_uffd_wp(*pvmw->pmd)) + pmde = pmd_mkuffd_wp(pmde); + if (!is_migration_entry_young(entry)) + pmde = pmd_mkold(pmde); + /* NOTE: this may contain setting soft-dirty on some archs */ + if (folio_test_dirty(folio) && is_migration_entry_dirty(entry)) + pmde = pmd_mkdirty(pmde); + + if (folio_test_anon(folio)) { + rmap_t rmap_flags = RMAP_NONE; + + if (!is_readable_migration_entry(entry)) + rmap_flags |= RMAP_EXCLUSIVE; + + folio_add_anon_rmap_pmd(folio, new, vma, haddr, rmap_flags); + } else { + folio_add_file_rmap_pmd(folio, new, vma); + } + VM_BUG_ON(pmd_write(pmde) && folio_test_anon(folio) && !PageAnonExclusive(new)); + set_pmd_at(mm, haddr, pvmw->pmd, pmde); + + /* No need to invalidate - it was non-present before */ + update_mmu_cache_pmd(vma, address, pvmw->pmd); + trace_remove_migration_pmd(address, pmd_val(pmde)); +} +#endif diff --git a/mm/huge_mapping_pud.c b/mm/huge_mapping_pud.c new file mode 100644 index 000000000000..c3a6bffe2871 --- /dev/null +++ b/mm/huge_mapping_pud.c @@ -0,0 +1,235 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2024 Red Hat, Inc. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include "internal.h" +#include "swap.h" + +/* + * Returns page table lock pointer if a given pud maps a thp, NULL otherwise. + * + * Note that if it returns page table lock pointer, this routine returns without + * unlocking page table lock. So callers must unlock it. + */ +spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) +{ + spinlock_t *ptl; + + ptl = pud_lock(vma->vm_mm, pud); + if (likely(pud_trans_huge(*pud) || pud_devmap(*pud))) + return ptl; + spin_unlock(ptl); + return NULL; +} + +void touch_pud(struct vm_area_struct *vma, unsigned long addr, + pud_t *pud, bool write) +{ + pud_t _pud; + + _pud = pud_mkyoung(*pud); + if (write) + _pud = pud_mkdirty(_pud); + if (pudp_set_access_flags(vma, addr & HPAGE_PUD_MASK, + pud, _pud, write)) + update_mmu_cache_pud(vma, addr, pud); +} + +int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, + pud_t *dst_pud, pud_t *src_pud, unsigned long addr, + struct vm_area_struct *vma) +{ + spinlock_t *dst_ptl, *src_ptl; + pud_t pud; + int ret; + + dst_ptl = pud_lock(dst_mm, dst_pud); + src_ptl = pud_lockptr(src_mm, src_pud); + spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); + + ret = -EAGAIN; + pud = *src_pud; + if (unlikely(!pud_trans_huge(pud) && !pud_devmap(pud))) + goto out_unlock; + + /* + * When page table lock is held, the huge zero pud should not be + * under splitting since we don't split the page itself, only pud to + * a page table. + */ + if (is_huge_zero_pud(pud)) { + /* No huge zero pud yet */ + } + + /* + * TODO: once we support anonymous pages, use + * folio_try_dup_anon_rmap_*() and split if duplicating fails. + */ + pudp_set_wrprotect(src_mm, addr, src_pud); + pud = pud_mkold(pud_wrprotect(pud)); + set_pud_at(dst_mm, addr, dst_pud, pud); + + ret = 0; +out_unlock: + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + return ret; +} + +void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) +{ + bool write = vmf->flags & FAULT_FLAG_WRITE; + + vmf->ptl = pud_lock(vmf->vma->vm_mm, vmf->pud); + if (unlikely(!pud_same(*vmf->pud, orig_pud))) + goto unlock; + + touch_pud(vmf->vma, vmf->address, vmf->pud, write); +unlock: + spin_unlock(vmf->ptl); +} + +/* + * Returns: + * + * - 0: if pud leaf changed from under us + * - 1: if pud can be skipped + * - HPAGE_PUD_NR: if pud was successfully processed + */ +int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, + pud_t *pudp, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags) +{ + struct mm_struct *mm = vma->vm_mm; + pud_t oldpud, entry; + spinlock_t *ptl; + + tlb_change_page_size(tlb, HPAGE_PUD_SIZE); + + /* NUMA balancing doesn't apply to dax */ + if (cp_flags & MM_CP_PROT_NUMA) + return 1; + + /* + * Huge entries on userfault-wp only works with anonymous, while we + * don't have anonymous PUDs yet. + */ + if (WARN_ON_ONCE(cp_flags & MM_CP_UFFD_WP_ALL)) + return 1; + + ptl = __pud_trans_huge_lock(pudp, vma); + if (!ptl) + return 0; + + /* + * Can't clear PUD or it can race with concurrent zapping. See + * change_huge_pmd(). + */ + oldpud = pudp_invalidate(vma, addr, pudp); + entry = pud_modify(oldpud, newprot); + set_pud_at(mm, addr, pudp, entry); + tlb_flush_pud_range(tlb, addr, HPAGE_PUD_SIZE); + + spin_unlock(ptl); + return HPAGE_PUD_NR; +} + +int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, + pud_t *pud, unsigned long addr) +{ + spinlock_t *ptl; + pud_t orig_pud; + + ptl = __pud_trans_huge_lock(pud, vma); + if (!ptl) + return 0; + + orig_pud = pudp_huge_get_and_clear_full(vma, addr, pud, tlb->fullmm); + arch_check_zapped_pud(vma, orig_pud); + tlb_remove_pud_tlb_entry(tlb, pud, addr); + if (vma_is_special_huge(vma)) { + spin_unlock(ptl); + /* No zero page support yet */ + } else { + /* No support for anonymous PUD pages yet */ + BUG(); + } + return 1; +} + +static void __split_huge_pud_locked(struct vm_area_struct *vma, pud_t *pud, + unsigned long haddr) +{ + VM_BUG_ON(haddr & ~HPAGE_PUD_MASK); + VM_BUG_ON_VMA(vma->vm_start > haddr, vma); + VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PUD_SIZE, vma); + VM_BUG_ON(!pud_trans_huge(*pud) && !pud_devmap(*pud)); + +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ + defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) + count_vm_event(THP_SPLIT_PUD); +#endif + + pudp_huge_clear_flush(vma, haddr, pud); +} + +void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, + unsigned long address) +{ + spinlock_t *ptl; + struct mmu_notifier_range range; + + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, + address & HPAGE_PUD_MASK, + (address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE); + mmu_notifier_invalidate_range_start(&range); + ptl = pud_lock(vma->vm_mm, pud); + if (unlikely(!pud_trans_huge(*pud) && !pud_devmap(*pud))) + goto out; + __split_huge_pud_locked(vma, pud, range.start); + +out: + spin_unlock(ptl); + mmu_notifier_invalidate_range_end(&range); +} diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 554dec14b768..11aee24ce21a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -838,13 +838,6 @@ static int __init setup_transparent_hugepage(char *str) } __setup("transparent_hugepage=", setup_transparent_hugepage); -pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) -{ - if (likely(vma->vm_flags & VM_WRITE)) - pmd = pmd_mkwrite(pmd, vma); - return pmd; -} - #ifdef CONFIG_MEMCG static inline struct deferred_split *get_deferred_split_queue(struct folio *folio) @@ -1313,19 +1306,6 @@ vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write) EXPORT_SYMBOL_GPL(vmf_insert_pfn_pud); #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ -void touch_pmd(struct vm_area_struct *vma, unsigned long addr, - pmd_t *pmd, bool write) -{ - pmd_t _pmd; - - _pmd = pmd_mkyoung(*pmd); - if (write) - _pmd = pmd_mkdirty(_pmd); - if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK, - pmd, _pmd, write)) - update_mmu_cache_pmd(vma, addr, pmd); -} - struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, int flags, struct dev_pagemap **pgmap) { @@ -1366,309 +1346,6 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, return page; } -int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, - pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, - struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) -{ - spinlock_t *dst_ptl, *src_ptl; - struct page *src_page; - struct folio *src_folio; - pmd_t pmd; - pgtable_t pgtable = NULL; - int ret = -ENOMEM; - - /* Skip if can be re-fill on fault */ - if (!vma_is_anonymous(dst_vma)) - return 0; - - pgtable = pte_alloc_one(dst_mm); - if (unlikely(!pgtable)) - goto out; - - dst_ptl = pmd_lock(dst_mm, dst_pmd); - src_ptl = pmd_lockptr(src_mm, src_pmd); - spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); - - ret = -EAGAIN; - pmd = *src_pmd; - -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION - if (unlikely(is_swap_pmd(pmd))) { - swp_entry_t entry = pmd_to_swp_entry(pmd); - - VM_BUG_ON(!is_pmd_migration_entry(pmd)); - if (!is_readable_migration_entry(entry)) { - entry = make_readable_migration_entry( - swp_offset(entry)); - pmd = swp_entry_to_pmd(entry); - if (pmd_swp_soft_dirty(*src_pmd)) - pmd = pmd_swp_mksoft_dirty(pmd); - if (pmd_swp_uffd_wp(*src_pmd)) - pmd = pmd_swp_mkuffd_wp(pmd); - set_pmd_at(src_mm, addr, src_pmd, pmd); - } - add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); - mm_inc_nr_ptes(dst_mm); - pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - if (!userfaultfd_wp(dst_vma)) - pmd = pmd_swp_clear_uffd_wp(pmd); - set_pmd_at(dst_mm, addr, dst_pmd, pmd); - ret = 0; - goto out_unlock; - } -#endif - - if (unlikely(!pmd_trans_huge(pmd))) { - pte_free(dst_mm, pgtable); - goto out_unlock; - } - /* - * When page table lock is held, the huge zero pmd should not be - * under splitting since we don't split the page itself, only pmd to - * a page table. - */ - if (is_huge_zero_pmd(pmd)) { - /* - * mm_get_huge_zero_folio() will never allocate a new - * folio here, since we already have a zero page to - * copy. It just takes a reference. - */ - mm_get_huge_zero_folio(dst_mm); - goto out_zero_page; - } - - src_page = pmd_page(pmd); - VM_BUG_ON_PAGE(!PageHead(src_page), src_page); - src_folio = page_folio(src_page); - - folio_get(src_folio); - if (unlikely(folio_try_dup_anon_rmap_pmd(src_folio, src_page, src_vma))) { - /* Page maybe pinned: split and retry the fault on PTEs. */ - folio_put(src_folio); - pte_free(dst_mm, pgtable); - spin_unlock(src_ptl); - spin_unlock(dst_ptl); - __split_huge_pmd(src_vma, src_pmd, addr, false, NULL); - return -EAGAIN; - } - add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); -out_zero_page: - mm_inc_nr_ptes(dst_mm); - pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); - pmdp_set_wrprotect(src_mm, addr, src_pmd); - if (!userfaultfd_wp(dst_vma)) - pmd = pmd_clear_uffd_wp(pmd); - pmd = pmd_mkold(pmd_wrprotect(pmd)); - set_pmd_at(dst_mm, addr, dst_pmd, pmd); - - ret = 0; -out_unlock: - spin_unlock(src_ptl); - spin_unlock(dst_ptl); -out: - return ret; -} - -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -void touch_pud(struct vm_area_struct *vma, unsigned long addr, - pud_t *pud, bool write) -{ - pud_t _pud; - - _pud = pud_mkyoung(*pud); - if (write) - _pud = pud_mkdirty(_pud); - if (pudp_set_access_flags(vma, addr & HPAGE_PUD_MASK, - pud, _pud, write)) - update_mmu_cache_pud(vma, addr, pud); -} - -int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, - pud_t *dst_pud, pud_t *src_pud, unsigned long addr, - struct vm_area_struct *vma) -{ - spinlock_t *dst_ptl, *src_ptl; - pud_t pud; - int ret; - - dst_ptl = pud_lock(dst_mm, dst_pud); - src_ptl = pud_lockptr(src_mm, src_pud); - spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); - - ret = -EAGAIN; - pud = *src_pud; - if (unlikely(!pud_trans_huge(pud) && !pud_devmap(pud))) - goto out_unlock; - - /* - * When page table lock is held, the huge zero pud should not be - * under splitting since we don't split the page itself, only pud to - * a page table. - */ - if (is_huge_zero_pud(pud)) { - /* No huge zero pud yet */ - } - - /* - * TODO: once we support anonymous pages, use - * folio_try_dup_anon_rmap_*() and split if duplicating fails. - */ - pudp_set_wrprotect(src_mm, addr, src_pud); - pud = pud_mkold(pud_wrprotect(pud)); - set_pud_at(dst_mm, addr, dst_pud, pud); - - ret = 0; -out_unlock: - spin_unlock(src_ptl); - spin_unlock(dst_ptl); - return ret; -} - -void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) -{ - bool write = vmf->flags & FAULT_FLAG_WRITE; - - vmf->ptl = pud_lock(vmf->vma->vm_mm, vmf->pud); - if (unlikely(!pud_same(*vmf->pud, orig_pud))) - goto unlock; - - touch_pud(vmf->vma, vmf->address, vmf->pud, write); -unlock: - spin_unlock(vmf->ptl); -} -#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ - -void huge_pmd_set_accessed(struct vm_fault *vmf) -{ - bool write = vmf->flags & FAULT_FLAG_WRITE; - - vmf->ptl = pmd_lock(vmf->vma->vm_mm, vmf->pmd); - if (unlikely(!pmd_same(*vmf->pmd, vmf->orig_pmd))) - goto unlock; - - touch_pmd(vmf->vma, vmf->address, vmf->pmd, write); - -unlock: - spin_unlock(vmf->ptl); -} - -vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf) -{ - const bool unshare = vmf->flags & FAULT_FLAG_UNSHARE; - struct vm_area_struct *vma = vmf->vma; - struct folio *folio; - struct page *page; - unsigned long haddr = vmf->address & HPAGE_PMD_MASK; - pmd_t orig_pmd = vmf->orig_pmd; - - vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd); - VM_BUG_ON_VMA(!vma->anon_vma, vma); - - if (is_huge_zero_pmd(orig_pmd)) - goto fallback; - - spin_lock(vmf->ptl); - - if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) { - spin_unlock(vmf->ptl); - return 0; - } - - page = pmd_page(orig_pmd); - folio = page_folio(page); - VM_BUG_ON_PAGE(!PageHead(page), page); - - /* Early check when only holding the PT lock. */ - if (PageAnonExclusive(page)) - goto reuse; - - if (!folio_trylock(folio)) { - folio_get(folio); - spin_unlock(vmf->ptl); - folio_lock(folio); - spin_lock(vmf->ptl); - if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) { - spin_unlock(vmf->ptl); - folio_unlock(folio); - folio_put(folio); - return 0; - } - folio_put(folio); - } - - /* Recheck after temporarily dropping the PT lock. */ - if (PageAnonExclusive(page)) { - folio_unlock(folio); - goto reuse; - } - - /* - * See do_wp_page(): we can only reuse the folio exclusively if - * there are no additional references. Note that we always drain - * the LRU cache immediately after adding a THP. - */ - if (folio_ref_count(folio) > - 1 + folio_test_swapcache(folio) * folio_nr_pages(folio)) - goto unlock_fallback; - if (folio_test_swapcache(folio)) - folio_free_swap(folio); - if (folio_ref_count(folio) == 1) { - pmd_t entry; - - folio_move_anon_rmap(folio, vma); - SetPageAnonExclusive(page); - folio_unlock(folio); -reuse: - if (unlikely(unshare)) { - spin_unlock(vmf->ptl); - return 0; - } - entry = pmd_mkyoung(orig_pmd); - entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); - if (pmdp_set_access_flags(vma, haddr, vmf->pmd, entry, 1)) - update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); - spin_unlock(vmf->ptl); - return 0; - } - -unlock_fallback: - folio_unlock(folio); - spin_unlock(vmf->ptl); -fallback: - __split_huge_pmd(vma, vmf->pmd, vmf->address, false, NULL); - return VM_FAULT_FALLBACK; -} - -static inline bool can_change_pmd_writable(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd) -{ - struct page *page; - - if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE))) - return false; - - /* Don't touch entries that are not even readable (NUMA hinting). */ - if (pmd_protnone(pmd)) - return false; - - /* Do we need write faults for softdirty tracking? */ - if (pmd_needs_soft_dirty_wp(vma, pmd)) - return false; - - /* Do we need write faults for uffd-wp tracking? */ - if (userfaultfd_huge_pmd_wp(vma, pmd)) - return false; - - if (!(vma->vm_flags & VM_SHARED)) { - /* See can_change_pte_writable(). */ - page = vm_normal_page_pmd(vma, addr, pmd); - return page && PageAnon(page) && PageAnonExclusive(page); - } - - /* See can_change_pte_writable(). */ - return pmd_dirty(pmd); -} - /* NUMA hinting page fault entry point for trans huge pmds */ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) { @@ -1830,342 +1507,6 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, return ret; } -static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) -{ - pgtable_t pgtable; - - pgtable = pgtable_trans_huge_withdraw(mm, pmd); - pte_free(mm, pgtable); - mm_dec_nr_ptes(mm); -} - -int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, - pmd_t *pmd, unsigned long addr) -{ - pmd_t orig_pmd; - spinlock_t *ptl; - - tlb_change_page_size(tlb, HPAGE_PMD_SIZE); - - ptl = __pmd_trans_huge_lock(pmd, vma); - if (!ptl) - return 0; - /* - * For architectures like ppc64 we look at deposited pgtable - * when calling pmdp_huge_get_and_clear. So do the - * pgtable_trans_huge_withdraw after finishing pmdp related - * operations. - */ - orig_pmd = pmdp_huge_get_and_clear_full(vma, addr, pmd, - tlb->fullmm); - arch_check_zapped_pmd(vma, orig_pmd); - tlb_remove_pmd_tlb_entry(tlb, pmd, addr); - if (vma_is_special_huge(vma)) { - if (arch_needs_pgtable_deposit()) - zap_deposited_table(tlb->mm, pmd); - spin_unlock(ptl); - } else if (is_huge_zero_pmd(orig_pmd)) { - zap_deposited_table(tlb->mm, pmd); - spin_unlock(ptl); - } else { - struct folio *folio = NULL; - int flush_needed = 1; - - if (pmd_present(orig_pmd)) { - struct page *page = pmd_page(orig_pmd); - - folio = page_folio(page); - folio_remove_rmap_pmd(folio, page, vma); - WARN_ON_ONCE(folio_mapcount(folio) < 0); - VM_BUG_ON_PAGE(!PageHead(page), page); - } else if (thp_migration_supported()) { - swp_entry_t entry; - - VM_BUG_ON(!is_pmd_migration_entry(orig_pmd)); - entry = pmd_to_swp_entry(orig_pmd); - folio = pfn_swap_entry_folio(entry); - flush_needed = 0; - } else - WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); - - if (folio_test_anon(folio)) { - zap_deposited_table(tlb->mm, pmd); - add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); - } else { - if (arch_needs_pgtable_deposit()) - zap_deposited_table(tlb->mm, pmd); - add_mm_counter(tlb->mm, mm_counter_file(folio), - -HPAGE_PMD_NR); - } - - spin_unlock(ptl); - if (flush_needed) - tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE); - } - return 1; -} - -#ifndef pmd_move_must_withdraw -static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl, - spinlock_t *old_pmd_ptl, - struct vm_area_struct *vma) -{ - /* - * With split pmd lock we also need to move preallocated - * PTE page table if new_pmd is on different PMD page table. - * - * We also don't deposit and withdraw tables for file pages. - */ - return (new_pmd_ptl != old_pmd_ptl) && vma_is_anonymous(vma); -} -#endif - -static pmd_t move_soft_dirty_pmd(pmd_t pmd) -{ -#ifdef CONFIG_MEM_SOFT_DIRTY - if (unlikely(is_pmd_migration_entry(pmd))) - pmd = pmd_swp_mksoft_dirty(pmd); - else if (pmd_present(pmd)) - pmd = pmd_mksoft_dirty(pmd); -#endif - return pmd; -} - -bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, - unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd) -{ - spinlock_t *old_ptl, *new_ptl; - pmd_t pmd; - struct mm_struct *mm = vma->vm_mm; - bool force_flush = false; - - /* - * The destination pmd shouldn't be established, free_pgtables() - * should have released it; but move_page_tables() might have already - * inserted a page table, if racing against shmem/file collapse. - */ - if (!pmd_none(*new_pmd)) { - VM_BUG_ON(pmd_trans_huge(*new_pmd)); - return false; - } - - /* - * We don't have to worry about the ordering of src and dst - * ptlocks because exclusive mmap_lock prevents deadlock. - */ - old_ptl = __pmd_trans_huge_lock(old_pmd, vma); - if (old_ptl) { - new_ptl = pmd_lockptr(mm, new_pmd); - if (new_ptl != old_ptl) - spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); - pmd = pmdp_huge_get_and_clear(mm, old_addr, old_pmd); - if (pmd_present(pmd)) - force_flush = true; - VM_BUG_ON(!pmd_none(*new_pmd)); - - if (pmd_move_must_withdraw(new_ptl, old_ptl, vma)) { - pgtable_t pgtable; - pgtable = pgtable_trans_huge_withdraw(mm, old_pmd); - pgtable_trans_huge_deposit(mm, new_pmd, pgtable); - } - pmd = move_soft_dirty_pmd(pmd); - set_pmd_at(mm, new_addr, new_pmd, pmd); - if (force_flush) - flush_pmd_tlb_range(vma, old_addr, old_addr + PMD_SIZE); - if (new_ptl != old_ptl) - spin_unlock(new_ptl); - spin_unlock(old_ptl); - return true; - } - return false; -} - -/* - * Returns - * - 0 if PMD could not be locked - * - 1 if PMD was locked but protections unchanged and TLB flush unnecessary - * or if prot_numa but THP migration is not supported - * - HPAGE_PMD_NR if protections changed and TLB flush necessary - */ -int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, - pmd_t *pmd, unsigned long addr, pgprot_t newprot, - unsigned long cp_flags) -{ - struct mm_struct *mm = vma->vm_mm; - spinlock_t *ptl; - pmd_t oldpmd, entry; - bool prot_numa = cp_flags & MM_CP_PROT_NUMA; - bool uffd_wp = cp_flags & MM_CP_UFFD_WP; - bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; - int ret = 1; - - tlb_change_page_size(tlb, HPAGE_PMD_SIZE); - - if (prot_numa && !thp_migration_supported()) - return 1; - - ptl = __pmd_trans_huge_lock(pmd, vma); - if (!ptl) - return 0; - -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION - if (is_swap_pmd(*pmd)) { - swp_entry_t entry = pmd_to_swp_entry(*pmd); - struct folio *folio = pfn_swap_entry_folio(entry); - pmd_t newpmd; - - VM_BUG_ON(!is_pmd_migration_entry(*pmd)); - if (is_writable_migration_entry(entry)) { - /* - * A protection check is difficult so - * just be safe and disable write - */ - if (folio_test_anon(folio)) - entry = make_readable_exclusive_migration_entry(swp_offset(entry)); - else - entry = make_readable_migration_entry(swp_offset(entry)); - newpmd = swp_entry_to_pmd(entry); - if (pmd_swp_soft_dirty(*pmd)) - newpmd = pmd_swp_mksoft_dirty(newpmd); - } else { - newpmd = *pmd; - } - - if (uffd_wp) - newpmd = pmd_swp_mkuffd_wp(newpmd); - else if (uffd_wp_resolve) - newpmd = pmd_swp_clear_uffd_wp(newpmd); - if (!pmd_same(*pmd, newpmd)) - set_pmd_at(mm, addr, pmd, newpmd); - goto unlock; - } -#endif - - if (prot_numa) { - struct folio *folio; - bool toptier; - /* - * Avoid trapping faults against the zero page. The read-only - * data is likely to be read-cached on the local CPU and - * local/remote hits to the zero page are not interesting. - */ - if (is_huge_zero_pmd(*pmd)) - goto unlock; - - if (pmd_protnone(*pmd)) - goto unlock; - - folio = pmd_folio(*pmd); - toptier = node_is_toptier(folio_nid(folio)); - /* - * Skip scanning top tier node if normal numa - * balancing is disabled - */ - if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) && - toptier) - goto unlock; - - if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING && - !toptier) - folio_xchg_access_time(folio, - jiffies_to_msecs(jiffies)); - } - /* - * In case prot_numa, we are under mmap_read_lock(mm). It's critical - * to not clear pmd intermittently to avoid race with MADV_DONTNEED - * which is also under mmap_read_lock(mm): - * - * CPU0: CPU1: - * change_huge_pmd(prot_numa=1) - * pmdp_huge_get_and_clear_notify() - * madvise_dontneed() - * zap_pmd_range() - * pmd_trans_huge(*pmd) == 0 (without ptl) - * // skip the pmd - * set_pmd_at(); - * // pmd is re-established - * - * The race makes MADV_DONTNEED miss the huge pmd and don't clear it - * which may break userspace. - * - * pmdp_invalidate_ad() is required to make sure we don't miss - * dirty/young flags set by hardware. - */ - oldpmd = pmdp_invalidate_ad(vma, addr, pmd); - - entry = pmd_modify(oldpmd, newprot); - if (uffd_wp) - entry = pmd_mkuffd_wp(entry); - else if (uffd_wp_resolve) - /* - * Leave the write bit to be handled by PF interrupt - * handler, then things like COW could be properly - * handled. - */ - entry = pmd_clear_uffd_wp(entry); - - /* See change_pte_range(). */ - if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pmd_write(entry) && - can_change_pmd_writable(vma, addr, entry)) - entry = pmd_mkwrite(entry, vma); - - ret = HPAGE_PMD_NR; - set_pmd_at(mm, addr, pmd, entry); - - if (huge_pmd_needs_flush(oldpmd, entry)) - tlb_flush_pmd_range(tlb, addr, HPAGE_PMD_SIZE); -unlock: - spin_unlock(ptl); - return ret; -} - -/* - * Returns: - * - * - 0: if pud leaf changed from under us - * - 1: if pud can be skipped - * - HPAGE_PUD_NR: if pud was successfully processed - */ -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, - pud_t *pudp, unsigned long addr, pgprot_t newprot, - unsigned long cp_flags) -{ - struct mm_struct *mm = vma->vm_mm; - pud_t oldpud, entry; - spinlock_t *ptl; - - tlb_change_page_size(tlb, HPAGE_PUD_SIZE); - - /* NUMA balancing doesn't apply to dax */ - if (cp_flags & MM_CP_PROT_NUMA) - return 1; - - /* - * Huge entries on userfault-wp only works with anonymous, while we - * don't have anonymous PUDs yet. - */ - if (WARN_ON_ONCE(cp_flags & MM_CP_UFFD_WP_ALL)) - return 1; - - ptl = __pud_trans_huge_lock(pudp, vma); - if (!ptl) - return 0; - - /* - * Can't clear PUD or it can race with concurrent zapping. See - * change_huge_pmd(). - */ - oldpud = pudp_invalidate(vma, addr, pudp); - entry = pud_modify(oldpud, newprot); - set_pud_at(mm, addr, pudp, entry); - tlb_flush_pud_range(tlb, addr, HPAGE_PUD_SIZE); - - spin_unlock(ptl); - return HPAGE_PUD_NR; -} -#endif - #ifdef CONFIG_USERFAULTFD /* * The PT lock for src_pmd and dst_vma/src_vma (for reading) are locked by @@ -2306,105 +1647,8 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pm } #endif /* CONFIG_USERFAULTFD */ -/* - * Returns page table lock pointer if a given pmd maps a thp, NULL otherwise. - * - * Note that if it returns page table lock pointer, this routine returns without - * unlocking page table lock. So callers must unlock it. - */ -spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) -{ - spinlock_t *ptl; - ptl = pmd_lock(vma->vm_mm, pmd); - if (likely(is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || - pmd_devmap(*pmd))) - return ptl; - spin_unlock(ptl); - return NULL; -} - -/* - * Returns page table lock pointer if a given pud maps a thp, NULL otherwise. - * - * Note that if it returns page table lock pointer, this routine returns without - * unlocking page table lock. So callers must unlock it. - */ -spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) -{ - spinlock_t *ptl; - - ptl = pud_lock(vma->vm_mm, pud); - if (likely(pud_trans_huge(*pud) || pud_devmap(*pud))) - return ptl; - spin_unlock(ptl); - return NULL; -} - -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, - pud_t *pud, unsigned long addr) -{ - spinlock_t *ptl; - pud_t orig_pud; - - ptl = __pud_trans_huge_lock(pud, vma); - if (!ptl) - return 0; - - orig_pud = pudp_huge_get_and_clear_full(vma, addr, pud, tlb->fullmm); - arch_check_zapped_pud(vma, orig_pud); - tlb_remove_pud_tlb_entry(tlb, pud, addr); - if (vma_is_special_huge(vma)) { - spin_unlock(ptl); - /* No zero page support yet */ - } else { - /* No support for anonymous PUD pages yet */ - BUG(); - } - return 1; -} - -static void __split_huge_pud_locked(struct vm_area_struct *vma, pud_t *pud, - unsigned long haddr) -{ - VM_BUG_ON(haddr & ~HPAGE_PUD_MASK); - VM_BUG_ON_VMA(vma->vm_start > haddr, vma); - VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PUD_SIZE, vma); - VM_BUG_ON(!pud_trans_huge(*pud) && !pud_devmap(*pud)); - - count_vm_event(THP_SPLIT_PUD); - - pudp_huge_clear_flush(vma, haddr, pud); -} - -void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, - unsigned long address) -{ - spinlock_t *ptl; - struct mmu_notifier_range range; - - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, - address & HPAGE_PUD_MASK, - (address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE); - mmu_notifier_invalidate_range_start(&range); - ptl = pud_lock(vma->vm_mm, pud); - if (unlikely(!pud_trans_huge(*pud) && !pud_devmap(*pud))) - goto out; - __split_huge_pud_locked(vma, pud, range.start); - -out: - spin_unlock(ptl); - mmu_notifier_invalidate_range_end(&range); -} -#else -void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, - unsigned long address) -{ -} -#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ - -static void __split_huge_zero_page_pmd(struct vm_area_struct *vma, - unsigned long haddr, pmd_t *pmd) +void __split_huge_zero_page_pmd(struct vm_area_struct *vma, + unsigned long haddr, pmd_t *pmd) { struct mm_struct *mm = vma->vm_mm; pgtable_t pgtable; @@ -2444,274 +1688,6 @@ static void __split_huge_zero_page_pmd(struct vm_area_struct *vma, pmd_populate(mm, pmd, pgtable); } -static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long haddr, bool freeze) -{ - struct mm_struct *mm = vma->vm_mm; - struct folio *folio; - struct page *page; - pgtable_t pgtable; - pmd_t old_pmd, _pmd; - bool young, write, soft_dirty, pmd_migration = false, uffd_wp = false; - bool anon_exclusive = false, dirty = false; - unsigned long addr; - pte_t *pte; - int i; - - VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); - VM_BUG_ON_VMA(vma->vm_start > haddr, vma); - VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); - VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) - && !pmd_devmap(*pmd)); - - count_vm_event(THP_SPLIT_PMD); - - if (!vma_is_anonymous(vma)) { - old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd); - /* - * We are going to unmap this huge page. So - * just go ahead and zap it - */ - if (arch_needs_pgtable_deposit()) - zap_deposited_table(mm, pmd); - if (vma_is_special_huge(vma)) - return; - if (unlikely(is_pmd_migration_entry(old_pmd))) { - swp_entry_t entry; - - entry = pmd_to_swp_entry(old_pmd); - folio = pfn_swap_entry_folio(entry); - } else { - page = pmd_page(old_pmd); - folio = page_folio(page); - if (!folio_test_dirty(folio) && pmd_dirty(old_pmd)) - folio_mark_dirty(folio); - if (!folio_test_referenced(folio) && pmd_young(old_pmd)) - folio_set_referenced(folio); - folio_remove_rmap_pmd(folio, page, vma); - folio_put(folio); - } - add_mm_counter(mm, mm_counter_file(folio), -HPAGE_PMD_NR); - return; - } - - if (is_huge_zero_pmd(*pmd)) { - /* - * FIXME: Do we want to invalidate secondary mmu by calling - * mmu_notifier_arch_invalidate_secondary_tlbs() see comments below - * inside __split_huge_pmd() ? - * - * We are going from a zero huge page write protected to zero - * small page also write protected so it does not seems useful - * to invalidate secondary mmu at this time. - */ - return __split_huge_zero_page_pmd(vma, haddr, pmd); - } - - pmd_migration = is_pmd_migration_entry(*pmd); - if (unlikely(pmd_migration)) { - swp_entry_t entry; - - old_pmd = *pmd; - entry = pmd_to_swp_entry(old_pmd); - page = pfn_swap_entry_to_page(entry); - write = is_writable_migration_entry(entry); - if (PageAnon(page)) - anon_exclusive = is_readable_exclusive_migration_entry(entry); - young = is_migration_entry_young(entry); - dirty = is_migration_entry_dirty(entry); - soft_dirty = pmd_swp_soft_dirty(old_pmd); - uffd_wp = pmd_swp_uffd_wp(old_pmd); - } else { - /* - * Up to this point the pmd is present and huge and userland has - * the whole access to the hugepage during the split (which - * happens in place). If we overwrite the pmd with the not-huge - * version pointing to the pte here (which of course we could if - * all CPUs were bug free), userland could trigger a small page - * size TLB miss on the small sized TLB while the hugepage TLB - * entry is still established in the huge TLB. Some CPU doesn't - * like that. See - * http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum - * 383 on page 105. Intel should be safe but is also warns that - * it's only safe if the permission and cache attributes of the - * two entries loaded in the two TLB is identical (which should - * be the case here). But it is generally safer to never allow - * small and huge TLB entries for the same virtual address to be - * loaded simultaneously. So instead of doing "pmd_populate(); - * flush_pmd_tlb_range();" we first mark the current pmd - * notpresent (atomically because here the pmd_trans_huge must - * remain set at all times on the pmd until the split is - * complete for this pmd), then we flush the SMP TLB and finally - * we write the non-huge version of the pmd entry with - * pmd_populate. - */ - old_pmd = pmdp_invalidate(vma, haddr, pmd); - page = pmd_page(old_pmd); - folio = page_folio(page); - if (pmd_dirty(old_pmd)) { - dirty = true; - folio_set_dirty(folio); - } - write = pmd_write(old_pmd); - young = pmd_young(old_pmd); - soft_dirty = pmd_soft_dirty(old_pmd); - uffd_wp = pmd_uffd_wp(old_pmd); - - VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio); - VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); - - /* - * Without "freeze", we'll simply split the PMD, propagating the - * PageAnonExclusive() flag for each PTE by setting it for - * each subpage -- no need to (temporarily) clear. - * - * With "freeze" we want to replace mapped pages by - * migration entries right away. This is only possible if we - * managed to clear PageAnonExclusive() -- see - * set_pmd_migration_entry(). - * - * In case we cannot clear PageAnonExclusive(), split the PMD - * only and let try_to_migrate_one() fail later. - * - * See folio_try_share_anon_rmap_pmd(): invalidate PMD first. - */ - anon_exclusive = PageAnonExclusive(page); - if (freeze && anon_exclusive && - folio_try_share_anon_rmap_pmd(folio, page)) - freeze = false; - if (!freeze) { - rmap_t rmap_flags = RMAP_NONE; - - folio_ref_add(folio, HPAGE_PMD_NR - 1); - if (anon_exclusive) - rmap_flags |= RMAP_EXCLUSIVE; - folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR, - vma, haddr, rmap_flags); - } - } - - /* - * Withdraw the table only after we mark the pmd entry invalid. - * This's critical for some architectures (Power). - */ - pgtable = pgtable_trans_huge_withdraw(mm, pmd); - pmd_populate(mm, &_pmd, pgtable); - - pte = pte_offset_map(&_pmd, haddr); - VM_BUG_ON(!pte); - - /* - * Note that NUMA hinting access restrictions are not transferred to - * avoid any possibility of altering permissions across VMAs. - */ - if (freeze || pmd_migration) { - for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) { - pte_t entry; - swp_entry_t swp_entry; - - if (write) - swp_entry = make_writable_migration_entry( - page_to_pfn(page + i)); - else if (anon_exclusive) - swp_entry = make_readable_exclusive_migration_entry( - page_to_pfn(page + i)); - else - swp_entry = make_readable_migration_entry( - page_to_pfn(page + i)); - if (young) - swp_entry = make_migration_entry_young(swp_entry); - if (dirty) - swp_entry = make_migration_entry_dirty(swp_entry); - entry = swp_entry_to_pte(swp_entry); - if (soft_dirty) - entry = pte_swp_mksoft_dirty(entry); - if (uffd_wp) - entry = pte_swp_mkuffd_wp(entry); - - VM_WARN_ON(!pte_none(ptep_get(pte + i))); - set_pte_at(mm, addr, pte + i, entry); - } - } else { - pte_t entry; - - entry = mk_pte(page, READ_ONCE(vma->vm_page_prot)); - if (write) - entry = pte_mkwrite(entry, vma); - if (!young) - entry = pte_mkold(entry); - /* NOTE: this may set soft-dirty too on some archs */ - if (dirty) - entry = pte_mkdirty(entry); - if (soft_dirty) - entry = pte_mksoft_dirty(entry); - if (uffd_wp) - entry = pte_mkuffd_wp(entry); - - for (i = 0; i < HPAGE_PMD_NR; i++) - VM_WARN_ON(!pte_none(ptep_get(pte + i))); - - set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR); - } - pte_unmap(pte); - - if (!pmd_migration) - folio_remove_rmap_pmd(folio, page, vma); - if (freeze) - put_page(page); - - smp_wmb(); /* make pte visible before pmd */ - pmd_populate(mm, pmd, pgtable); -} - -void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, - pmd_t *pmd, bool freeze, struct folio *folio) -{ - VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio)); - VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE)); - VM_WARN_ON_ONCE(folio && !folio_test_locked(folio)); - VM_BUG_ON(freeze && !folio); - - /* - * When the caller requests to set up a migration entry, we - * require a folio to check the PMD against. Otherwise, there - * is a risk of replacing the wrong folio. - */ - if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || - is_pmd_migration_entry(*pmd)) { - if (folio && folio != pmd_folio(*pmd)) - return; - __split_huge_pmd_locked(vma, pmd, address, freeze); - } -} - -void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, bool freeze, struct folio *folio) -{ - spinlock_t *ptl; - struct mmu_notifier_range range; - - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, - address & HPAGE_PMD_MASK, - (address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE); - mmu_notifier_invalidate_range_start(&range); - ptl = pmd_lock(vma->vm_mm, pmd); - split_huge_pmd_locked(vma, range.start, pmd, freeze, folio); - spin_unlock(ptl); - mmu_notifier_invalidate_range_end(&range); -} - -void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, - bool freeze, struct folio *folio) -{ - pmd_t *pmd = mm_find_pmd(vma->vm_mm, address); - - if (!pmd) - return; - - __split_huge_pmd(vma, pmd, address, freeze, folio); -} - static inline void split_huge_pmd_if_needed(struct vm_area_struct *vma, unsigned long address) { /* @@ -3772,100 +2748,3 @@ static int __init split_huge_pages_debugfs(void) late_initcall(split_huge_pages_debugfs); #endif -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION -int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, - struct page *page) -{ - struct folio *folio = page_folio(page); - struct vm_area_struct *vma = pvmw->vma; - struct mm_struct *mm = vma->vm_mm; - unsigned long address = pvmw->address; - bool anon_exclusive; - pmd_t pmdval; - swp_entry_t entry; - pmd_t pmdswp; - - if (!(pvmw->pmd && !pvmw->pte)) - return 0; - - flush_cache_range(vma, address, address + HPAGE_PMD_SIZE); - pmdval = pmdp_invalidate(vma, address, pvmw->pmd); - - /* See folio_try_share_anon_rmap_pmd(): invalidate PMD first. */ - anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page); - if (anon_exclusive && folio_try_share_anon_rmap_pmd(folio, page)) { - set_pmd_at(mm, address, pvmw->pmd, pmdval); - return -EBUSY; - } - - if (pmd_dirty(pmdval)) - folio_mark_dirty(folio); - if (pmd_write(pmdval)) - entry = make_writable_migration_entry(page_to_pfn(page)); - else if (anon_exclusive) - entry = make_readable_exclusive_migration_entry(page_to_pfn(page)); - else - entry = make_readable_migration_entry(page_to_pfn(page)); - if (pmd_young(pmdval)) - entry = make_migration_entry_young(entry); - if (pmd_dirty(pmdval)) - entry = make_migration_entry_dirty(entry); - pmdswp = swp_entry_to_pmd(entry); - if (pmd_soft_dirty(pmdval)) - pmdswp = pmd_swp_mksoft_dirty(pmdswp); - if (pmd_uffd_wp(pmdval)) - pmdswp = pmd_swp_mkuffd_wp(pmdswp); - set_pmd_at(mm, address, pvmw->pmd, pmdswp); - folio_remove_rmap_pmd(folio, page, vma); - folio_put(folio); - trace_set_migration_pmd(address, pmd_val(pmdswp)); - - return 0; -} - -void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) -{ - struct folio *folio = page_folio(new); - struct vm_area_struct *vma = pvmw->vma; - struct mm_struct *mm = vma->vm_mm; - unsigned long address = pvmw->address; - unsigned long haddr = address & HPAGE_PMD_MASK; - pmd_t pmde; - swp_entry_t entry; - - if (!(pvmw->pmd && !pvmw->pte)) - return; - - entry = pmd_to_swp_entry(*pvmw->pmd); - folio_get(folio); - pmde = mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot)); - if (pmd_swp_soft_dirty(*pvmw->pmd)) - pmde = pmd_mksoft_dirty(pmde); - if (is_writable_migration_entry(entry)) - pmde = pmd_mkwrite(pmde, vma); - if (pmd_swp_uffd_wp(*pvmw->pmd)) - pmde = pmd_mkuffd_wp(pmde); - if (!is_migration_entry_young(entry)) - pmde = pmd_mkold(pmde); - /* NOTE: this may contain setting soft-dirty on some archs */ - if (folio_test_dirty(folio) && is_migration_entry_dirty(entry)) - pmde = pmd_mkdirty(pmde); - - if (folio_test_anon(folio)) { - rmap_t rmap_flags = RMAP_NONE; - - if (!is_readable_migration_entry(entry)) - rmap_flags |= RMAP_EXCLUSIVE; - - folio_add_anon_rmap_pmd(folio, new, vma, haddr, rmap_flags); - } else { - folio_add_file_rmap_pmd(folio, new, vma); - } - VM_BUG_ON(pmd_write(pmde) && folio_test_anon(folio) && !PageAnonExclusive(new)); - set_pmd_at(mm, haddr, pvmw->pmd, pmde); - - /* No need to invalidate - it was non-present before */ - update_mmu_cache_pmd(vma, address, pvmw->pmd); - trace_remove_migration_pmd(address, pmd_val(pmde)); -} -#endif From patchwork Wed Jul 17 22:02:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13735850 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C1A4C3DA60 for ; Wed, 17 Jul 2024 22:02:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 084846B009C; Wed, 17 Jul 2024 18:02:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F01406B00AD; Wed, 17 Jul 2024 18:02:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE05E6B00BC; Wed, 17 Jul 2024 18:02:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A5FD56B009C for ; Wed, 17 Jul 2024 18:02:42 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 5CDA640CC9 for ; Wed, 17 Jul 2024 22:02:42 +0000 (UTC) X-FDA: 82350619764.28.DF9807F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf22.hostedemail.com (Postfix) with ESMTP id 3B501C0007 for ; Wed, 17 Jul 2024 22:02:40 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gxR0klLh; spf=pass (imf22.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721253720; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cM3uv1r2lWcW+xf+zoFP/LGFvU433fJ+6ETuZi6JTtc=; b=JDfjtS+8eJ6v86FM91Zd5QwYsJd3zCemVSIDDyj5yRwVLJ8sgP5IE+J98jv/Sc1kOq2ntv FHAtvx79AVgBk5gRjryspnlSF1LBya0ti9BtVGnpWRhUwAwYeOT6tV5+zAmJXn92wTVWrg 9TYeiG/VHGdvtc1owMUhsIyqwgnKn2Y= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721253720; a=rsa-sha256; cv=none; b=cPJTr5QtZ0c/jO5KM21JHna2mVlZ3D8/mv20PiysTeVOxu+qp7hD7YTsVc6HnBmxoghYOC fYZal6UeqHmopaK7FPA3Gm6hqqquPWTMMneO4EsMmPSRabMWDtVkm94/Ax6AwbxzKBqoJJ 7JSVYvRc6O/iB6m4O7S4tbGZVBKQF04= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gxR0klLh; spf=pass (imf22.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721253759; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cM3uv1r2lWcW+xf+zoFP/LGFvU433fJ+6ETuZi6JTtc=; b=gxR0klLhOC9I7GfORJbMdC0ML/4ecEmgb5GZlD2UBFUH8RAj5LcNMu66YMqMw+yg978IZl zCH5uR5Bjk2A0bzOai0eavio1+N8ydaN5XZWGDL/2eA3Vh7FeMfXOKglSNqj/IdS1wLuI9 4wV5h76CPi8dAhfhk18qKov9pmirZ24= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-369-g3t-NySiPlGfvWufR-zcdw-1; Wed, 17 Jul 2024 18:02:36 -0400 X-MC-Unique: g3t-NySiPlGfvWufR-zcdw-1 Received: by mail-qt1-f199.google.com with SMTP id d75a77b69052e-44aeacbf2baso259671cf.2 for ; Wed, 17 Jul 2024 15:02:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721253755; x=1721858555; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cM3uv1r2lWcW+xf+zoFP/LGFvU433fJ+6ETuZi6JTtc=; b=TB3nM7BJv+zSEa2JHnLluEO6r9IH66w/ugFZYHMmwk71vngLlGn9cNxUABniYctYzu rIJ7J3W1W4dG6zUizsVS4Dfc0EK6MZwbKJSG/BcHIdRezo2VxBzeSZzpfCWAl9qBZVj5 JDXiQSsxrVk6tBDFl0i4v/weOI9/J5RwnMGOYN+t5s8zXfXH0AiUvUu2Dr7L5DjrdgQg iDAEUyyhq0FqogNd5KvmAjb9FkVOC0B5L1gKrFhcMXV81CQ0IBILk0LeiTJffm+GlI7T iTe6rPFfrakwWOI/bFSmB7bymwMpL47WRt1dCPwZbonvg0fV58QrIbgGpk7v/gTiXIgU l37A== X-Forwarded-Encrypted: i=1; AJvYcCWI4VL+6xMw6+POzRDPOOryAtOqxoxl5y7O0xnKGmjaDZ8KM9O220E9ahOt3xrln1312nzTH5e/iw==@kvack.org X-Gm-Message-State: AOJu0Yy4hkT9OH5Go3OE6BPPDrXoLQ0ZiWPQ7VQxu8RAc6USibY/OC6R sGeXE5X8pmYXpQAoKEatd8f+EwdKGgdM0L+GjWfKsJAUN0cWIrcL8TSAd3vMTCsGuIrQdmmo2rb JsjhP853Iwsm6Dro/yITzUoGIg+I0kaaLJAtuvKWl8rb4jxdY X-Received: by 2002:a05:622a:19a8:b0:446:5a29:c501 with SMTP id d75a77b69052e-44f864afa6cmr22373021cf.1.1721253755331; Wed, 17 Jul 2024 15:02:35 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH8QsOTDoliMPT3fo+/7Outqv1nF7W8rzQA7PybYEjXwyS1q7X3t8IYBAe2znVX4XcYe4OF/A== X-Received: by 2002:a05:622a:19a8:b0:446:5a29:c501 with SMTP id d75a77b69052e-44f864afa6cmr22372781cf.1.1721253754848; Wed, 17 Jul 2024 15:02:34 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b83f632sm53071651cf.85.2024.07.17.15.02.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 15:02:34 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Vlastimil Babka , peterx@redhat.com, David Hildenbrand , Oscar Salvador , linux-s390@vger.kernel.org, Andrew Morton , Matthew Wilcox , Dan Williams , Michal Hocko , linux-riscv@lists.infradead.org, sparclinux@vger.kernel.org, Alex Williamson , Jason Gunthorpe , x86@kernel.org, Alistair Popple , linuxppc-dev@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org, Ryan Roberts , Hugh Dickins , Axel Rasmussen Subject: [PATCH RFC 6/6] mm: Convert "*_trans_huge() || *_devmap()" to use *_leaf() Date: Wed, 17 Jul 2024 18:02:19 -0400 Message-ID: <20240717220219.3743374-7-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240717220219.3743374-1-peterx@redhat.com> References: <20240717220219.3743374-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 3B501C0007 X-Stat-Signature: xk5g9s7otj1814d4jggdt9shrb1gebj5 X-HE-Tag: 1721253760-558504 X-HE-Meta: U2FsdGVkX19iKqWq1r4QGeAxqOBCKpeIrUue8E3Y/AYk1S6xhSTskFZkA6K/njBTHx897LHeVMvqxQKtL9amC5FTpSD4nVObkhXG63/cS9Z8UT/hyLQFDssoqFCnhgy1IqpajgQFhmjqlWZYCu3LMfNqXygvAgsKLpuOKSDM1NClQhv0fK0X4+jVrUIYDHkQvOsea1VOP8M4ALc8XmD0Kuav2o33mQFtcwFMebO6LIpez0df2X0LbAP0y/0RpMlK62IsALgCZqIwDURA0K/J20IWs4v4t8FGNdt+3+31vdZ6Q1mFIzQOKZXQtBv3Dco/hZAciCVPqXKsM4DxfvNE+JosLV1DomQYyOi+/3MX6ySVwxxizwMJM0cRzbEuMmKvqpHovipQRNKjhmlKnpSJwoKJrVtHfWc3kjCdHrjrsMHFanj4fJZgECtzSbBiE6ui54QBKz+0x3NaXX7jossx99uXboCHpN1nTYkH14ssSZ4CXwPv6bxt6t7Wt14x4B/BvO0NWQYezAWHKUT2zvlrWnEh3Xvh6nM3DYCzl7FUb16EfY+xnrk7cBM7jLckKwPy19ICk+LeiXz5TYPfTRlK1tqEKEN6gfTkm5968LwWPnX/K4jCUfCfGpWLNs/Mk0yMbHrQCnKporb/gnNhTwuABkniM2ETNXEOtX76aEUL/bATjcrBI+bsvSyQhpkDfXilFdIKtWlV2gaTRq/oVLUsrNkIGGKOSpBw08oYtE63H4/KkOVPx441veWDXuFAvKE+8XULwBJWHq7LYmaIIJ5nTY2VrmQ/OORKTZJ+3aiLk4EgRap6o/eoVMasVTiL+gdyusITPK4R7kRFo9F8/A53RPojZEepXQJ9FNMY6TeCzdSrCeog+cR+fr3IXDYUmAa1eKV67ZlM3t/cxd5G6dk6yuYlzqJptTqxt+uXFTN6zk8omSiWt09oXxlh+C7NPFE4KDcI8TTve/4BAHLdkHe WMFeWxK0 ToQIsFfFE9XIHjLUKocAMuxu29GtzPfyJuHRVEL81EjWPLA/363JcuBWPkRoHxY+N7OJ0BZicXTLDauKPpZ3JKzlhLlnkaIJq9NuoTutrtdvBcSIl1e5QE/kcxH2Kbq82G7P4o4XjyJLbTLRj9kWpkqhIkfMVUnF0RTuFDBVrQXD6mTmFsyePCXON8WVZxRVuDiIc7heXAh1frn+WstAG+IjQwyJu69THAiQRXJDqITzMD0WhLbvke20PtTKE72nUdW/T80i2TJvMqtM/Bfuc7Blys6CVuE+P/lCBPz1Qld3hNU+he0CVEWU+zspdQl+59T6E0zd7urueUrcRAJt2hutPE9y5SNGDhHWZKnj1oYZVKtnACfIchaSTlOF2gxAEhpjajc/O200SDXFuwd2bVKksZ9ihdEp2i/sabBNf6TNhHc+ZsBu0syKzPQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch converted all such checks into one *_leaf() check under common mm/, as "thp+devmap" should compose everything for a *_leaf() for now. I didn't yet touch arch code in other directories, as some arch may need some special attention, so I left those separately. It should start to save some cycles on such check and pave way for the new leaf types. E.g., when a new type of leaf is introduced, it'll naturally go the same route to what we have now for thp+devmap. Here one issue with pxx_leaf() API is that such API will be defined by arch but it doesn't consider kernel config. For example, below "if" branch cannot be automatically optimized: if (pmd_leaf()) { ... } Even if both THP && HUGETLB are not enabled (which means pmd_leaf() can never return true). To provide a chance for compilers to optimize and omit code when possible, introduce a light wrapper for them and call them pxx_is_leaf(). That will take kernel config into account and properly allow omitting branches when the compiler knows it'll constantly returns false. This tries to mimic what we used to have with pxx_trans_huge() when !THP, so it now also applies to pxx_leaf() API. Cc: Alistair Popple Cc: Dan Williams Cc: Jason Gunthorpe Signed-off-by: Peter Xu --- include/linux/huge_mm.h | 6 +++--- include/linux/pgtable.h | 30 +++++++++++++++++++++++++++++- mm/hmm.c | 4 ++-- mm/huge_mapping_pmd.c | 9 +++------ mm/huge_mapping_pud.c | 6 +++--- mm/mapping_dirty_helpers.c | 4 ++-- mm/memory.c | 14 ++++++-------- mm/migrate_device.c | 2 +- mm/mprotect.c | 4 ++-- mm/mremap.c | 5 ++--- mm/page_vma_mapped.c | 5 ++--- mm/pgtable-generic.c | 7 +++---- 12 files changed, 58 insertions(+), 38 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index aea2784df8ef..a5b026d0731e 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -27,7 +27,7 @@ spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma); static inline spinlock_t * pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) { - if (pud_trans_huge(*pud) || pud_devmap(*pud)) + if (pud_is_leaf(*pud)) return __pud_trans_huge_lock(pud, vma); else return NULL; @@ -36,7 +36,7 @@ pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) #define split_huge_pud(__vma, __pud, __address) \ do { \ pud_t *____pud = (__pud); \ - if (pud_trans_huge(*____pud) || pud_devmap(*____pud)) \ + if (pud_is_leaf(*____pud)) \ __split_huge_pud(__vma, __pud, __address); \ } while (0) #else /* CONFIG_PGTABLE_HAS_PUD_LEAVES */ @@ -125,7 +125,7 @@ static inline int is_swap_pmd(pmd_t pmd) static inline spinlock_t * pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) { - if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) + if (is_swap_pmd(*pmd) || pmd_is_leaf(*pmd)) return __pmd_trans_huge_lock(pmd, vma); else return NULL; diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 5e505373b113..af7709a132aa 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1641,7 +1641,7 @@ static inline int pud_trans_unstable(pud_t *pud) defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) pud_t pudval = READ_ONCE(*pud); - if (pud_none(pudval) || pud_trans_huge(pudval) || pud_devmap(pudval)) + if (pud_none(pudval) || pud_leaf(pudval)) return 1; if (unlikely(pud_bad(pudval))) { pud_clear_bad(pud); @@ -1901,6 +1901,34 @@ typedef unsigned int pgtbl_mod_mask; #define pmd_leaf(x) false #endif +/* + * Wrapper of pxx_leaf() helpers. + * + * Comparing to pxx_leaf() API, the only difference is: using these macros + * can help code generation, so unnecessary code can be omitted when the + * specific level of leaf is not possible due to kernel config. It is + * needed because normally pxx_leaf() can be defined in arch code without + * knowing the kernel config. + * + * Currently we only need pmd/pud versions, because the largest leaf Linux + * supports so far is pud. + * + * Defining here also means that in arch's pgtable headers these macros + * cannot be used, pxx_leaf()s need to be used instead, because this file + * will not be included in arch's pgtable headers. + */ +#ifdef CONFIG_PGTABLE_HAS_PMD_LEAVES +#define pmd_is_leaf(x) pmd_leaf(x) +#else +#define pmd_is_leaf(x) false +#endif + +#ifdef CONFIG_PGTABLE_HAS_PUD_LEAVES +#define pud_is_leaf(x) pud_leaf(x) +#else +#define pud_is_leaf(x) false +#endif + #ifndef pgd_leaf_size #define pgd_leaf_size(x) (1ULL << PGDIR_SHIFT) #endif diff --git a/mm/hmm.c b/mm/hmm.c index 7e0229ae4a5a..8d985bbbfee9 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -351,7 +351,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR); } - if (pmd_devmap(pmd) || pmd_trans_huge(pmd)) { + if (pmd_is_leaf(pmd)) { /* * No need to take pmd_lock here, even if some other thread * is splitting the huge pmd we will get that event through @@ -362,7 +362,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, * values. */ pmd = pmdp_get_lockless(pmdp); - if (!pmd_devmap(pmd) && !pmd_trans_huge(pmd)) + if (!pmd_is_leaf(pmd)) goto again; return hmm_vma_handle_pmd(walk, addr, end, hmm_pfns, pmd); diff --git a/mm/huge_mapping_pmd.c b/mm/huge_mapping_pmd.c index 7b85e2a564d6..d30c60685f66 100644 --- a/mm/huge_mapping_pmd.c +++ b/mm/huge_mapping_pmd.c @@ -60,8 +60,7 @@ spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) spinlock_t *ptl; ptl = pmd_lock(vma->vm_mm, pmd); - if (likely(is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || - pmd_devmap(*pmd))) + if (likely(is_swap_pmd(*pmd) || pmd_is_leaf(*pmd))) return ptl; spin_unlock(ptl); return NULL; @@ -627,8 +626,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); VM_BUG_ON_VMA(vma->vm_start > haddr, vma); VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); - VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) && - !pmd_devmap(*pmd)); + VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_is_leaf(*pmd)); #ifdef CONFIG_TRANSPARENT_HUGEPAGE count_vm_event(THP_SPLIT_PMD); @@ -845,8 +843,7 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, * require a folio to check the PMD against. Otherwise, there * is a risk of replacing the wrong folio. */ - if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || - is_pmd_migration_entry(*pmd)) { + if (pmd_is_leaf(*pmd) || is_pmd_migration_entry(*pmd)) { if (folio && folio != pmd_folio(*pmd)) return; __split_huge_pmd_locked(vma, pmd, address, freeze); diff --git a/mm/huge_mapping_pud.c b/mm/huge_mapping_pud.c index c3a6bffe2871..58871dd74df2 100644 --- a/mm/huge_mapping_pud.c +++ b/mm/huge_mapping_pud.c @@ -57,7 +57,7 @@ spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) spinlock_t *ptl; ptl = pud_lock(vma->vm_mm, pud); - if (likely(pud_trans_huge(*pud) || pud_devmap(*pud))) + if (likely(pud_is_leaf(*pud))) return ptl; spin_unlock(ptl); return NULL; @@ -90,7 +90,7 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, ret = -EAGAIN; pud = *src_pud; - if (unlikely(!pud_trans_huge(pud) && !pud_devmap(pud))) + if (unlikely(!pud_leaf(pud))) goto out_unlock; /* @@ -225,7 +225,7 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, (address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE); mmu_notifier_invalidate_range_start(&range); ptl = pud_lock(vma->vm_mm, pud); - if (unlikely(!pud_trans_huge(*pud) && !pud_devmap(*pud))) + if (unlikely(!pud_is_leaf(*pud))) goto out; __split_huge_pud_locked(vma, pud, range.start); diff --git a/mm/mapping_dirty_helpers.c b/mm/mapping_dirty_helpers.c index 2f8829b3541a..a9ea767d2d73 100644 --- a/mm/mapping_dirty_helpers.c +++ b/mm/mapping_dirty_helpers.c @@ -129,7 +129,7 @@ static int wp_clean_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long end, pmd_t pmdval = pmdp_get_lockless(pmd); /* Do not split a huge pmd, present or migrated */ - if (pmd_trans_huge(pmdval) || pmd_devmap(pmdval)) { + if (pmd_is_leaf(pmdval)) { WARN_ON(pmd_write(pmdval) || pmd_dirty(pmdval)); walk->action = ACTION_CONTINUE; } @@ -152,7 +152,7 @@ static int wp_clean_pud_entry(pud_t *pud, unsigned long addr, unsigned long end, pud_t pudval = READ_ONCE(*pud); /* Do not split a huge pud */ - if (pud_trans_huge(pudval) || pud_devmap(pudval)) { + if (pud_is_leaf(pudval)) { WARN_ON(pud_write(pudval) || pud_dirty(pudval)); walk->action = ACTION_CONTINUE; } diff --git a/mm/memory.c b/mm/memory.c index 126ee0903c79..6dc92c514bb7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1235,8 +1235,7 @@ copy_pmd_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, src_pmd = pmd_offset(src_pud, addr); do { next = pmd_addr_end(addr, end); - if (is_swap_pmd(*src_pmd) || pmd_trans_huge(*src_pmd) - || pmd_devmap(*src_pmd)) { + if (is_swap_pmd(*src_pmd) || pmd_is_leaf(*src_pmd)) { int err; VM_BUG_ON_VMA(next-addr != HPAGE_PMD_SIZE, src_vma); err = copy_huge_pmd(dst_mm, src_mm, dst_pmd, src_pmd, @@ -1272,7 +1271,7 @@ copy_pud_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, src_pud = pud_offset(src_p4d, addr); do { next = pud_addr_end(addr, end); - if (pud_trans_huge(*src_pud) || pud_devmap(*src_pud)) { + if (pud_is_leaf(*src_pud)) { int err; VM_BUG_ON_VMA(next-addr != HPAGE_PUD_SIZE, src_vma); @@ -1710,7 +1709,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb, pmd = pmd_offset(pud, addr); do { next = pmd_addr_end(addr, end); - if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { + if (is_swap_pmd(*pmd) || pmd_is_leaf(*pmd)) { if (next - addr != HPAGE_PMD_SIZE) __split_huge_pmd(vma, pmd, addr, false, NULL); else if (zap_huge_pmd(tlb, vma, pmd, addr)) { @@ -1752,7 +1751,7 @@ static inline unsigned long zap_pud_range(struct mmu_gather *tlb, pud = pud_offset(p4d, addr); do { next = pud_addr_end(addr, end); - if (pud_trans_huge(*pud) || pud_devmap(*pud)) { + if (pud_is_leaf(*pud)) { if (next - addr != HPAGE_PUD_SIZE) { mmap_assert_locked(tlb->mm); split_huge_pud(vma, pud, addr); @@ -5605,8 +5604,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, pud_t orig_pud = *vmf.pud; barrier(); - if (pud_trans_huge(orig_pud) || pud_devmap(orig_pud)) { - + if (pud_is_leaf(orig_pud)) { /* * TODO once we support anonymous PUDs: NUMA case and * FAULT_FLAG_UNSHARE handling. @@ -5646,7 +5644,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, pmd_migration_entry_wait(mm, vmf.pmd); return 0; } - if (pmd_trans_huge(vmf.orig_pmd) || pmd_devmap(vmf.orig_pmd)) { + if (pmd_is_leaf(vmf.orig_pmd)) { if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma)) return do_huge_pmd_numa_page(&vmf); diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 6d66dc1c6ffa..1fbeee9619c8 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -596,7 +596,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, pmdp = pmd_alloc(mm, pudp, addr); if (!pmdp) goto abort; - if (pmd_trans_huge(*pmdp) || pmd_devmap(*pmdp)) + if (pmd_leaf(*pmdp)) goto abort; if (pte_alloc(mm, pmdp)) goto abort; diff --git a/mm/mprotect.c b/mm/mprotect.c index 694f13b83864..ddfee216a02b 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -381,7 +381,7 @@ static inline long change_pmd_range(struct mmu_gather *tlb, goto next; _pmd = pmdp_get_lockless(pmd); - if (is_swap_pmd(_pmd) || pmd_trans_huge(_pmd) || pmd_devmap(_pmd)) { + if (is_swap_pmd(_pmd) || pmd_is_leaf(_pmd)) { if ((next - addr != HPAGE_PMD_SIZE) || pgtable_split_needed(vma, cp_flags)) { __split_huge_pmd(vma, pmd, addr, false, NULL); @@ -452,7 +452,7 @@ static inline long change_pud_range(struct mmu_gather *tlb, mmu_notifier_invalidate_range_start(&range); } - if (pud_leaf(pud)) { + if (pud_is_leaf(pud)) { if ((next - addr != PUD_SIZE) || pgtable_split_needed(vma, cp_flags)) { __split_huge_pud(vma, pudp, addr); diff --git a/mm/mremap.c b/mm/mremap.c index e7ae140fc640..f5c9884ea1f8 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -587,7 +587,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma, new_pud = alloc_new_pud(vma->vm_mm, vma, new_addr); if (!new_pud) break; - if (pud_trans_huge(*old_pud) || pud_devmap(*old_pud)) { + if (pud_is_leaf(*old_pud)) { if (extent == HPAGE_PUD_SIZE) { move_pgt_entry(HPAGE_PUD, vma, old_addr, new_addr, old_pud, new_pud, need_rmap_locks); @@ -609,8 +609,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma, if (!new_pmd) break; again: - if (is_swap_pmd(*old_pmd) || pmd_trans_huge(*old_pmd) || - pmd_devmap(*old_pmd)) { + if (is_swap_pmd(*old_pmd) || pmd_is_leaf(*old_pmd)) { if (extent == HPAGE_PMD_SIZE && move_pgt_entry(HPAGE_PMD, vma, old_addr, new_addr, old_pmd, new_pmd, need_rmap_locks)) diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index ae5cc42aa208..891bea8062d2 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -235,8 +235,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) */ pmde = pmdp_get_lockless(pvmw->pmd); - if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde) || - (pmd_present(pmde) && pmd_devmap(pmde))) { + if (pmd_is_leaf(pmde) || is_pmd_migration_entry(pmde)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); pmde = *pvmw->pmd; if (!pmd_present(pmde)) { @@ -251,7 +250,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) return not_found(pvmw); return true; } - if (likely(pmd_trans_huge(pmde) || pmd_devmap(pmde))) { + if (likely(pmd_is_leaf(pmde))) { if (pvmw->flags & PVMW_MIGRATION) return not_found(pvmw); if (!check_pmd(pmd_pfn(pmde), pvmw)) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index e9fc3f6774a6..c7b7a803f4ad 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -139,8 +139,7 @@ pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, { pmd_t pmd; VM_BUG_ON(address & ~HPAGE_PMD_MASK); - VM_BUG_ON(pmd_present(*pmdp) && !pmd_trans_huge(*pmdp) && - !pmd_devmap(*pmdp)); + VM_BUG_ON(pmd_present(*pmdp) && !pmd_leaf(*pmdp)); pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp); flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); return pmd; @@ -247,7 +246,7 @@ pud_t pudp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, pud_t pud; VM_BUG_ON(address & ~HPAGE_PUD_MASK); - VM_BUG_ON(!pud_trans_huge(*pudp) && !pud_devmap(*pudp)); + VM_BUG_ON(!pud_leaf(*pudp)); pud = pudp_huge_get_and_clear(vma->vm_mm, address, pudp); flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE); return pud; @@ -293,7 +292,7 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) *pmdvalp = pmdval; if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval))) goto nomap; - if (unlikely(pmd_trans_huge(pmdval) || pmd_devmap(pmdval))) + if (unlikely(pmd_leaf(pmdval))) goto nomap; if (unlikely(pmd_bad(pmdval))) { pmd_clear_bad(pmd);