From patchwork Thu Oct 21 12:21:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12576363 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3CA8C433EF for ; Thu, 21 Oct 2021 19:52:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A75B260F22 for ; Thu, 21 Oct 2021 19:52:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A75B260F22 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 114E4940008; Thu, 21 Oct 2021 15:52:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C1E9940007; Thu, 21 Oct 2021 15:52:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E5753940008; Thu, 21 Oct 2021 15:52:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0182.hostedemail.com [216.40.44.182]) by kanga.kvack.org (Postfix) with ESMTP id D3B74940007 for ; Thu, 21 Oct 2021 15:52:51 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 8BEBD82499A8 for ; Thu, 21 Oct 2021 19:52:51 +0000 (UTC) X-FDA: 78721492542.31.024F7D6 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf02.hostedemail.com (Postfix) with ESMTP id 9F8047001704 for ; Thu, 21 Oct 2021 19:52:48 +0000 (UTC) Received: by mail-pl1-f176.google.com with SMTP id s1so1136544plg.12 for ; Thu, 21 Oct 2021 12:52:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ppGy6AQbHnjCFGSfBBiNfJAwHFBC18I3Nums49Qp4rg=; b=U0hlQgwsREzwNIA20fgl1YsScyiSJXb4rLjRg1wz3mBQYgMNQ+IjpEK7387xyH7JkV qmVnulwbVyxw3rmj8vPQhyvOn/OqV4yJx4dK8k0Aako34Bs0jAtDvRxf10N9qF3mx0Rn f0vS0Tg97kF6Lzv+2QVRKcL6JvSnckS2f5E/vsZiseRS0EKWEeF1NOmgwBtVe+6D3P6q epYHqYopt5aj27JW2trNi91Und57gdcPk36/6uMqwJjCLK+5G5KcRPDfAiJB3ELOP20B DRGcxiyIrfopGeWhsl8OcHAkpqOqLd5f4eoo/dlv87ItbXi0AOZbyOKwsFqwfH05Im7p 1b+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ppGy6AQbHnjCFGSfBBiNfJAwHFBC18I3Nums49Qp4rg=; b=7k8NGteQnI00tPqmk877duTZ5MZVpPlYFGqYMvbgfjkuDI+nF83Onr/x1lGg6BmeEB /2mt/JDj0gWxzFbL+uk0icogwQb8RVWiwHbuzum3mMa98X5CW1uEQ/tbiLMuMTzDjUyw kmQRInvnecUokoPAI4ZvW6DtzdBYpZekOcEdt5N8XQsVWBIrpdTohKVLdvPvdjcDxt7G aKBswaAkKWwlo9H9w32ZihfzKBD/7lKx19asjrAY14cFFi5KOvJWDVYH9OlLfHL2cBIg Bx4FdyLsfaeTWDGbJwqE4kbEHzSSANbmH/TL150XkxVjysZflmbhYxxxb3qLXLjBQkHU jxrQ== X-Gm-Message-State: AOAM532MLvXUuVbIhNn5zEvIlKh+bAEQjkHv727d1bHyTi2TdwCpjBeZ ePDu2jptNkUxHjL28PYOkpR2ab7WtbU= X-Google-Smtp-Source: ABdhPJy7zcJn3qNfuZ+x0jgYPf1goT6pv5RF7V4ijPVQm6pq9YaVEd3bqwUA39f2Bi9H+10rQIqXeA== X-Received: by 2002:a17:90b:4c03:: with SMTP id na3mr9087894pjb.14.1634845969760; Thu, 21 Oct 2021 12:52:49 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id n202sm7098078pfd.160.2021.10.21.12.52.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Oct 2021 12:52:49 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Nadav Amit , Andrea Arcangeli , Andrew Cooper , Andrew Morton , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , x86@kernel.org Subject: [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd() Date: Thu, 21 Oct 2021 05:21:09 -0700 Message-Id: <20211021122112.592634-3-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211021122112.592634-1-namit@vmware.com> References: <20211021122112.592634-1-namit@vmware.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 9F8047001704 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=U0hlQgws; spf=none (imf02.hostedemail.com: domain of mail-pl1-f176.google.com has no SPF policy when checking 209.85.214.176) smtp.helo=mail-pl1-f176.google.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: i7bhqfo6ezgkb3mnxojwg3rbegitwmif X-Rspamd-Server: rspam06 X-HE-Tag: 1634845968-148501 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Calls to change_protection_range() on THP can trigger, at least on x86, two TLB flushes for one page: one immediately, when pmdp_invalidate() is called by change_huge_pmd(), and then another one later (that can be batched) when change_protection_range() finishes. The first TLB flush is only necessary to prevent the dirty bit (and with a lesser importance the access bit) from changing while the PTE is modified. However, this is not necessary as the x86 CPUs set the dirty-bit atomically with an additional check that the PTE is (still) present. One caveat is Intel's Knights Landing that has a bug and does not do so. Leverage this behavior to eliminate the unnecessary TLB flush in change_huge_pmd(). Introduce a new arch specific pmdp_invalidate_ad() that only invalidates the access and dirty bit from further changes. Cc: Andrea Arcangeli Cc: Andrew Cooper Cc: Andrew Morton Cc: Andy Lutomirski Cc: Dave Hansen Cc: Peter Xu Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Cc: Yu Zhao Cc: Nick Piggin Cc: x86@kernel.org Signed-off-by: Nadav Amit --- arch/x86/include/asm/pgtable.h | 8 ++++++++ include/linux/pgtable.h | 5 +++++ mm/huge_memory.c | 7 ++++--- mm/pgtable-generic.c | 8 ++++++++ 4 files changed, 25 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 448cd01eb3ec..18c3366f8f4d 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1146,6 +1146,14 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma, } } #endif + +#define __HAVE_ARCH_PMDP_INVALIDATE_AD +static inline pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, + unsigned long address, pmd_t *pmdp) +{ + return pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp)); +} + /* * Page table pages are page-aligned. The lower half of the top * level is used for userspace and the top half for the kernel. diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index e24d2c992b11..622efe0a9ef0 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -561,6 +561,11 @@ extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); #endif +#ifndef __HAVE_ARCH_PMDP_INVALIDATE_AD +extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, + unsigned long address, pmd_t *pmdp); +#endif + #ifndef __HAVE_ARCH_PTE_SAME static inline int pte_same(pte_t pte_a, pte_t pte_b) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e5ea5f775d5c..435da011b1a2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1795,10 +1795,11 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, * The race makes MADV_DONTNEED miss the huge pmd and don't clear it * which may break userspace. * - * pmdp_invalidate() is required to make sure we don't miss - * dirty/young flags set by hardware. + * pmdp_invalidate_ad() is required to make sure we don't miss + * dirty/young flags (which are also known as access/dirty) cannot be + * further modifeid by the hardware. */ - entry = pmdp_invalidate(vma, addr, pmd); + entry = pmdp_invalidate_ad(vma, addr, pmd); entry = pmd_modify(entry, newprot); if (preserve_write) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 4e640baf9794..b0ce6c7391bf 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -200,6 +200,14 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, } #endif +#ifndef __HAVE_ARCH_PMDP_INVALIDATE_AD +pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp) +{ + return pmdp_invalidate(vma, address, pmdp); +} +#endif + #ifndef pmdp_collapse_flush pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp)