From patchwork Mon Jul 15 19:21:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13733834 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45952C3DA4B for ; Mon, 15 Jul 2024 19:22:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD9126B0099; Mon, 15 Jul 2024 15:21:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A86626B009A; Mon, 15 Jul 2024 15:21:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8B1856B009C; Mon, 15 Jul 2024 15:21:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 6FAB86B0099 for ; Mon, 15 Jul 2024 15:21:58 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 1F2EC401AE for ; Mon, 15 Jul 2024 19:21:58 +0000 (UTC) X-FDA: 82342957116.23.2958E43 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf04.hostedemail.com (Postfix) with ESMTP id 0C2DA4001B for ; Mon, 15 Jul 2024 19:21:55 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jIUx8xDl; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf04.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721071285; a=rsa-sha256; cv=none; b=EIYn2rS0zomEfkWfaH7cadeeO90EpjogRk0DhKKv0JjIUODY4pP1vlNRS3Jbo7n7Z6BUKi J4PcSfojfBINVlksQA62zcZvMCuDA3DL6TxjUBefV/AzAiTSSFitybeePXGjWajFmXMIpp 2tnE4KP8hlR8C1wPoNktsYMIjtoF+Y4= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jIUx8xDl; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf04.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721071285; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZTG40gQ1/IqtnBedMzEchXvuwrXGVRlB32P0aAwFHGI=; b=7fgJlET8NhqBxztA9Lb7+RgooX5oOkUHcDtL35rq5vr8ImgXWVHwH9HBcejWyjcsCsveWa v5C+kX4UCbmqZ2l7w+rbXY+zqSlgX1h+EfEWwGsIZurSgkM06gUC8S3bCRNQi95YfwIqJ7 9GBPYasBMeNuWWPy6FsGZJHDPLPjRMo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721071315; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZTG40gQ1/IqtnBedMzEchXvuwrXGVRlB32P0aAwFHGI=; b=jIUx8xDlPvZx/GdZ/Ox8H53XLFxwOiPzfC545Lqfv3FzUSw3r8PCaRVE8Clzf9GcffERU3 7oI4uFoV6h0xbihvIHefc44c1qtaMDXkhtGPKZh90S2+2biyd5zz5qkqgNjO7S8UrrFgqe Fmnck6UZQaBEuqNVSr4e6UkK7Nf/S1g= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-77-Sblj2wwbONmHjurpnYzelw-1; Mon, 15 Jul 2024 15:21:54 -0400 X-MC-Unique: Sblj2wwbONmHjurpnYzelw-1 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-6b60afbf5d1so14885376d6.0 for ; Mon, 15 Jul 2024 12:21:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721071313; x=1721676113; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZTG40gQ1/IqtnBedMzEchXvuwrXGVRlB32P0aAwFHGI=; b=Tl13cTQ1vaD5cAAunMRg3vmVt3ptMFKAzMYx3ly17Njph9opo2He3YR5HgLFpViRKO ePEiYZjfTr2IBYtUQrel6yW/V+I9yYlXRyYbeMpn1k5tc9WK9+fg1AEkVX4zX7NntxjS WPpPq8AnMELcJ4DzWIFUch8o3Yuy7jOmkJHHL1FgFgAAtcCDfSlbizJQPT4piejPwBk7 JoPGh+mu9z4EvD6NLiSx+bcnfBXbyYd51FbZQ5x3enAe0nFxjDNr0AoDp97Y8zgdU/aH YVvYCqXCOJ5WIKdhKzq8E+27SjoZy4GTGEVID06epbZ7d0ytRPOjozkrivEju491k6r8 Lkqg== X-Gm-Message-State: AOJu0YyBCqmRSYiFUJBxlUegMxWQ2CxOLduTBkbMai6sU50XtbK4r/AV OtukmzSKc/SRXpIxUlL2iBqMkP7ThbQRnmPNTwOVFptF1mth/v7K6Tgy6uOrQu+IkzyJEjOJ1Xo SDOZZrYBL77TV/Rc4w29u3wxcbBGjAdYnNJmNitsOBXRz3wGNo1PeHQ76TNXN/WVPt6OR6sTCh6 mTD6RTbItTV9fLFz+zVRbAetYzkJ5BEA== X-Received: by 2002:a05:6214:3d9c:b0:6b7:586c:6db with SMTP id 6a1803df08f44-6b77e1a9bbbmr4344596d6.9.1721071313108; Mon, 15 Jul 2024 12:21:53 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFuaD7Bvj1JQVrhIrOGjnKODWs3APjSgAvSuKVxjGp/QvXbyCQZIOrqbPCB4cG0KJw/WyvKgQ== X-Received: by 2002:a05:6214:3d9c:b0:6b7:586c:6db with SMTP id 6a1803df08f44-6b77e1a9bbbmr4344156d6.9.1721071312655; Mon, 15 Jul 2024 12:21:52 -0700 (PDT) Received: from x1n.redhat.com (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6b761978d30sm24039356d6.31.2024.07.15.12.21.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Jul 2024 12:21:52 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Jiang , Rik van Riel , Dave Hansen , Michael Ellerman , linuxppc-dev@lists.ozlabs.org, Matthew Wilcox , Rick P Edgecombe , peterx@redhat.com, Oscar Salvador , Mel Gorman , Andrew Morton , Borislav Petkov , Christophe Leroy , Huang Ying , "Kirill A . Shutemov" , "Aneesh Kumar K . V" , Dan Williams , Thomas Gleixner , Hugh Dickins , x86@kernel.org, Nicholas Piggin , Vlastimil Babka , Ingo Molnar , kvm@vger.kernel.org, Sean Christopherson , Paolo Bonzini , David Rientjes Subject: [PATCH v3 3/8] mm/mprotect: Push mmu notifier to PUDs Date: Mon, 15 Jul 2024 15:21:37 -0400 Message-ID: <20240715192142.3241557-4-peterx@redhat.com> X-Mailer: git-send-email 2.45.0 In-Reply-To: <20240715192142.3241557-1-peterx@redhat.com> References: <20240715192142.3241557-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 0C2DA4001B X-Stat-Signature: tdn5tyhsagdpccdmiyrhpjpzr3mhdwxw X-Rspam-User: X-HE-Tag: 1721071315-688924 X-HE-Meta: U2FsdGVkX1+IqlPO6ZFuGAtaq2/yR/jKXaQ4EzlCRikkBeNxzDNjfq2it5qhRiSoYyIZQUKkBcbgpuey0iwnO+WbLeoqPwk771c+zbNsDGNqu/BuC4DHA9Uv6j6PZGD1VC9fEmQE9zTkQGGThqjO8ICyBYOuvt29w7bO6Tj7oZsAZuUQCfbhte3m2amirut5PA8fmimpCCCvBJh5I0zQJJsnMpKgAM59rRCTqv+QiZYQKVUGbr4jgOxdhONf9L9E1eqTqWfdC9l2DQ7CRJyoyvHX3Sxo8iDcwn9rwZyTSWYytXOvE7dD+uNBJhlBcccclGoaS2nMANI2de/5KPi7vbLRPahPqcF7ykdIzo6YgA26Pa/sLl2NVot08WeH8pQWNcVRr1hnYtE5zwECGwizyUBAysbb+SXVZ7Qrgn0ku0+EFu1Tk8KVXFUblc5y/WMOtioYazIyKY7ZkCGIm5nhbLNTW1IsM6tBwOucrv7z3IlumTRFIlN+uKisDsANFXpOw9hbGeIiexSXSe8pMZc5HpE6o9ooAix0Td4j5IVL5f2D5MGosSnePsnKKbhe5Bv3blRuM0gzrxkRzZlAB8qryP/850BENRAr7Z6aO0iL1invjA5dSujl9LnOZbrjcLTr5K8Zv+EtDCufcRAkZswA8NXajrG+X5cypHTIh6mcg7DM9bH7a8nWmkBO2ZMcTV9HR3UZ8KBlozzIcZCBTMsoC2RRvToKRrPS+Kh5Tw0GJoTBKhK48atkN10MWUzWhXGcb72SYF1fkHpl9bqFkNOebBbIOS5CGIOf3qi60dyAyJquuziutzDFxo//tzfs/GHC1Da8U4hLXdc36pz2aORqbg473yeq+oiJUp6yqgL75v5+ffDwuhtzSSpFGE2hTb6NaFaKuNf+hOq4BV/qb2wkDE8DPGtveZa/9G72wH97NPJrCjtGOGMi3Y8VLTiOXgTR0ROskKQvgCuVJ+UhwwU 9YM7opOd aVfVHyi0AkBTvaxZxcr/RkMyIWe4+OsUavf0igY2aEVa0rg1lLLb9J7w6iKdY87pjJCQ5tTk4db1g2Ea1SH8jBQttB5xYoBzaMUM8LM8yxUXfEWd/Oo5rbz4mP3jHzIOTDTswNxAKa/5UJRxI83qkckkBcOSiEyAm1m+TP2ks9/EmRkyakyj//1bk24VArg5QHPNgPUMZR04OjVjC2IMRebX0f9D5kee2mfMhWQq83qwxvyogx3dXW/PLIgvTE3Wozu3A2gyWFrM0c8IMP3uoSBXWp8ZCFNcqCDm1YOYqtRwyn4bD6ohTQWQAALDyTa0wOuZlcVk4yqPw9S3twnoEwBHG1mtzfJtCspcQJEhEor9wU0lYtOAneu3cABo/lc5JZFiURmlG4DTBrehgHuexKS7n54Ww6fzEKSrayqRl+nqQGJ+9EOOJVtUfpQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: mprotect() does mmu notifiers in PMD levels. It's there since 2014 of commit a5338093bfb4 ("mm: move mmu notifier call from change_protection to change_pmd_range"). At that time, the issue was that NUMA balancing can be applied on a huge range of VM memory, even if nothing was populated. The notification can be avoided in this case if no valid pmd detected, which includes either THP or a PTE pgtable page. Now to pave way for PUD handling, this isn't enough. We need to generate mmu notifications even on PUD entries properly. mprotect() is currently broken on PUD (e.g., one can easily trigger kernel error with dax 1G mappings already), this is the start to fix it. To fix that, this patch proposes to push such notifications to the PUD layers. There is risk on regressing the problem Rik wanted to resolve before, but I think it shouldn't really happen, and I still chose this solution because of a few reasons: 1) Consider a large VM that should definitely contain more than GBs of memory, it's highly likely that PUDs are also none. In this case there will have no regression. 2) KVM has evolved a lot over the years to get rid of rmap walks, which might be the major cause of the previous soft-lockup. At least TDP MMU already got rid of rmap as long as not nested (which should be the major use case, IIUC), then the TDP MMU pgtable walker will simply see empty VM pgtable (e.g. EPT on x86), the invalidation of a full empty region in most cases could be pretty fast now, comparing to 2014. 3) KVM has explicit code paths now to even give way for mmu notifiers just like this one, e.g. in commit d02c357e5bfa ("KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing"). It'll also avoid contentions that may also contribute to a soft-lockup. 4) Stick with PMD layer simply don't work when PUD is there... We need one way or another to fix PUD mappings on mprotect(). Pushing it to PUD should be the safest approach as of now, e.g. there's yet no sign of huge P4D coming on any known archs. Cc: kvm@vger.kernel.org Cc: Sean Christopherson Cc: Paolo Bonzini Cc: David Rientjes Cc: Rik van Riel Signed-off-by: Peter Xu --- mm/mprotect.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/mm/mprotect.c b/mm/mprotect.c index 21172272695e..2a81060b603d 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -363,9 +363,6 @@ static inline long change_pmd_range(struct mmu_gather *tlb, pmd_t *pmd; unsigned long next; long pages = 0; - struct mmu_notifier_range range; - - range.start = 0; pmd = pmd_offset(pud, addr); do { @@ -383,14 +380,6 @@ static inline long change_pmd_range(struct mmu_gather *tlb, if (pmd_none(*pmd)) goto next; - /* invoke the mmu notifier if the pmd is populated */ - if (!range.start) { - mmu_notifier_range_init(&range, - MMU_NOTIFY_PROTECTION_VMA, 0, - vma->vm_mm, addr, end); - mmu_notifier_invalidate_range_start(&range); - } - _pmd = pmdp_get_lockless(pmd); if (is_swap_pmd(_pmd) || pmd_trans_huge(_pmd) || pmd_devmap(_pmd)) { if ((next - addr != HPAGE_PMD_SIZE) || @@ -428,9 +417,6 @@ static inline long change_pmd_range(struct mmu_gather *tlb, cond_resched(); } while (pmd++, addr = next, addr != end); - if (range.start) - mmu_notifier_invalidate_range_end(&range); - return pages; } @@ -438,22 +424,36 @@ static inline long change_pud_range(struct mmu_gather *tlb, struct vm_area_struct *vma, p4d_t *p4d, unsigned long addr, unsigned long end, pgprot_t newprot, unsigned long cp_flags) { + struct mmu_notifier_range range; pud_t *pud; unsigned long next; long pages = 0, ret; + range.start = 0; + pud = pud_offset(p4d, addr); do { next = pud_addr_end(addr, end); ret = change_prepare(vma, pud, pmd, addr, cp_flags); - if (ret) - return ret; + if (ret) { + pages = ret; + break; + } if (pud_none_or_clear_bad(pud)) continue; + if (!range.start) { + mmu_notifier_range_init(&range, + MMU_NOTIFY_PROTECTION_VMA, 0, + vma->vm_mm, addr, end); + mmu_notifier_invalidate_range_start(&range); + } pages += change_pmd_range(tlb, vma, pud, addr, next, newprot, cp_flags); } while (pud++, addr = next, addr != end); + if (range.start) + mmu_notifier_invalidate_range_end(&range); + return pages; }