From patchwork Thu Aug 10 08:57:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13349105 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 568E5C001B0 for ; Thu, 10 Aug 2023 09:24:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233914AbjHJJYp (ORCPT ); Thu, 10 Aug 2023 05:24:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57840 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232651AbjHJJYo (ORCPT ); Thu, 10 Aug 2023 05:24:44 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FF80212A; Thu, 10 Aug 2023 02:24:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691659484; x=1723195484; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=l5W7lo8AGp5GcicUaxx0a/KM1cCcP+g4Zw3OCaKNJvo=; b=OMlvpYIRAWItfKCvhnKJoXYJuqWPgAKtkk6MmZ0oOhbV0u4GbM5ioakD 5TOFQwqxA73uDUIEeNl+uAXcOV39boZI9R71ztmZyRGOmwEEimDoW8hSp 6ROUa8YSj1rq1fgfO2nStWjfm3fMUQ+aknFe5KWLfzVzZ80Yg+jz19tnL J+ery6a4+2Qzxxp4GB2E1ZL88q1k9JiVeg0YTq6E7oNgNSaK+Mnco1quP P7seLr3Of/DAb6gsX0eIr4oEu9yBpYGcGE4b0n65qOZj+es4lhmY8OlTe wIPKwYaroNxC9HeoL0nRtHMyKISDsF4G0Zb3FzbDr3zFwf9F3SdjtRfHz Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10797"; a="437701464" X-IronPort-AV: E=Sophos;i="6.01,161,1684825200"; d="scan'208";a="437701464" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Aug 2023 02:24:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10797"; a="725720308" X-IronPort-AV: E=Sophos;i="6.01,161,1684825200"; d="scan'208";a="725720308" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Aug 2023 02:24:40 -0700 From: Yan Zhao To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, mike.kravetz@oracle.com, apopple@nvidia.com, jgg@nvidia.com, rppt@kernel.org, akpm@linux-foundation.org, kevin.tian@intel.com, david@redhat.com, Yan Zhao Subject: [RFC PATCH v2 1/5] mm/mmu_notifier: introduce a new mmu notifier flag MMU_NOTIFIER_RANGE_NUMA Date: Thu, 10 Aug 2023 16:57:43 +0800 Message-Id: <20230810085743.25977-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230810085636.25914-1-yan.y.zhao@intel.com> References: <20230810085636.25914-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Introduce a new mmu notifier flag MMU_NOTIFIER_RANGE_NUMA to indicate the notification of MMU_NOTIFY_PROTECTION_VMA is for NUMA balance purpose specifically. So that, the subscriber of mmu notifier, like KVM, can recognize this type of notification and do numa protection specific operations in the handler. Signed-off-by: Yan Zhao --- include/linux/mmu_notifier.h | 1 + mm/mprotect.c | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 64a3e051c3c4..a6dc829a4bce 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -60,6 +60,7 @@ enum mmu_notifier_event { }; #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0) +#define MMU_NOTIFIER_RANGE_NUMA (1 << 1) struct mmu_notifier_ops { /* diff --git a/mm/mprotect.c b/mm/mprotect.c index 6f658d483704..cb99a7d66467 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -381,7 +381,9 @@ static inline long change_pmd_range(struct mmu_gather *tlb, /* invoke the mmu notifier if the pmd is populated */ if (!range.start) { mmu_notifier_range_init(&range, - MMU_NOTIFY_PROTECTION_VMA, 0, + MMU_NOTIFY_PROTECTION_VMA, + cp_flags & MM_CP_PROT_NUMA ? + MMU_NOTIFIER_RANGE_NUMA : 0, vma->vm_mm, addr, end); mmu_notifier_invalidate_range_start(&range); } From patchwork Thu Aug 10 08:58:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13349106 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24A3FC001B0 for ; Thu, 10 Aug 2023 09:25:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233279AbjHJJZV (ORCPT ); Thu, 10 Aug 2023 05:25:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39190 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231405AbjHJJZU (ORCPT ); Thu, 10 Aug 2023 05:25:20 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D3C3A213F; Thu, 10 Aug 2023 02:25:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691659519; x=1723195519; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=n1Msk62xu+U/2i8oJmw01jY05sFs/JrR7DH8MV3GHPs=; b=gVGzE4syAmyLeFTLq7q/I6Zw8D5pDr2UAigi0IPM3fWXkSJP7WeVbpNA ihHP2Mt5mc6GttJLAYvVnutUi2C4zDzBj0e654hLQLfl+q0gDMq3ZPOo8 Lqi4WmyOuBT/Vnot79P3T5pvKwzYob2fP1AYs7gHe2G4+HIBUCSAOD+gZ HWPnz3gGWTdcrha34ZVlBbOuGcAxF6WYv//NpvUH6HZ8UbWk94NWNxOBK wsCQPAruSC9bpPHeeTzQJ45GpvUPfU91v9aNTbfgljd+7/Y0ODnCZ+5Yo SBqcP63A6tT5MdcMTTNhQ3EJgsC/qpLvnmPvhI4pnXQveC8IT7smnBgOA A==; X-IronPort-AV: E=McAfee;i="6600,9927,10797"; a="374123736" X-IronPort-AV: E=Sophos;i="6.01,161,1684825200"; d="scan'208";a="374123736" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Aug 2023 02:25:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10797"; a="855867318" X-IronPort-AV: E=Sophos;i="6.01,161,1684825200"; d="scan'208";a="855867318" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Aug 2023 02:25:16 -0700 From: Yan Zhao To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, mike.kravetz@oracle.com, apopple@nvidia.com, jgg@nvidia.com, rppt@kernel.org, akpm@linux-foundation.org, kevin.tian@intel.com, david@redhat.com, Yan Zhao Subject: [RFC PATCH v2 2/5] mm: don't set PROT_NONE to maybe-dma-pinned pages for NUMA-migrate purpose Date: Thu, 10 Aug 2023 16:58:25 +0800 Message-Id: <20230810085825.26038-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230810085636.25914-1-yan.y.zhao@intel.com> References: <20230810085636.25914-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Don't set PROT_NONE for exclusive anonymas and maybe-dma-pinned pages for NUMA migration purpose. For exclusive anonymas and page_maybe_dma_pinned() pages, NUMA-migration will eventually drop migration of those pages in try_to_migrate_one(). (i.e. after -EBUSY returned in page_try_share_anon_rmap()). So, skip setting PROT_NONE to those kind of pages earlier in change_protection_range() phase to avoid later futile page faults, detections, and restoration to original PTEs/PMDs. Signed-off-by: Yan Zhao --- mm/huge_memory.c | 5 +++++ mm/mprotect.c | 5 +++++ 2 files changed, 10 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index eb3678360b97..a71cf686e3b2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1875,6 +1875,11 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, goto unlock; page = pmd_page(*pmd); + + if (PageAnon(page) && PageAnonExclusive(page) && + page_maybe_dma_pinned(page)) + goto unlock; + toptier = node_is_toptier(page_to_nid(page)); /* * Skip scanning top tier node if normal numa diff --git a/mm/mprotect.c b/mm/mprotect.c index cb99a7d66467..a1f63df34b86 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -146,6 +146,11 @@ static long change_pte_range(struct mmu_gather *tlb, nid = page_to_nid(page); if (target_node == nid) continue; + + if (PageAnon(page) && PageAnonExclusive(page) && + page_maybe_dma_pinned(page)) + continue; + toptier = node_is_toptier(nid); /* From patchwork Thu Aug 10 09:00:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13349112 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F7FAC001DB for ; Thu, 10 Aug 2023 09:27:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234468AbjHJJ1O (ORCPT ); Thu, 10 Aug 2023 05:27:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52346 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231297AbjHJJ1N (ORCPT ); Thu, 10 Aug 2023 05:27:13 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8CA31212E; Thu, 10 Aug 2023 02:27:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691659632; x=1723195632; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=C6zjOhS5sp3kUlz9iodlvnz6Wf6HLOevZBJUafi9FrU=; b=HDR9VMofSFRD8viOz3W4WAyjJs4b1/9MyWeUDclbT/G5B/QEi5vrfZr4 hAF01Jr+84MKZu7ga+gkkJlHm9IJVrD/Ngfd8ZhMyEPEOwkNHtYyv2K95 1hOHKgRnUeJuGnF0FS1j8T4txnbdJz4wB+tyutyTFpyu+NA2lJHecb5Rf ytA4cFAnE4F/VLy7Ld1E5uzUQ6JHyRI79uW0JE9iFn0++8HNOy0E2Yqv0 LoOn0bxxPCqOIyx24XPzw4MMwBOlAJqkgwPa1BWSA1w7H1OK7WCtYRjSX zzRAuTqAn2dtcCHLbC6qL+ryJx9yIjFveVHjnCC8XgR4wavvngmDB4LD9 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10797"; a="375066896" X-IronPort-AV: E=Sophos;i="6.01,161,1684825200"; d="scan'208";a="375066896" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Aug 2023 02:27:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10797"; a="905983293" X-IronPort-AV: E=Sophos;i="6.01,161,1684825200"; d="scan'208";a="905983293" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Aug 2023 02:27:08 -0700 From: Yan Zhao To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, mike.kravetz@oracle.com, apopple@nvidia.com, jgg@nvidia.com, rppt@kernel.org, akpm@linux-foundation.org, kevin.tian@intel.com, david@redhat.com, Yan Zhao Subject: [RFC PATCH v2 3/5] mm/mmu_notifier: introduce a new callback .numa_protect Date: Thu, 10 Aug 2023 17:00:08 +0800 Message-Id: <20230810090008.26122-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230810085636.25914-1-yan.y.zhao@intel.com> References: <20230810085636.25914-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This .numa_protect callback is called when PROT_NONE is set for sure on a PTE or a huge PMD for numa migration purpose. With this callback, subscriber of mmu notifier, (e.g. KVM), can unmap NUMA migration protected pages only in the handler, rather than unmap a wider range containing pages that are obvious none-NUMA-migratble. Signed-off-by: Yan Zhao --- include/linux/mmu_notifier.h | 15 +++++++++++++++ mm/mmu_notifier.c | 18 ++++++++++++++++++ 2 files changed, 33 insertions(+) diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index a6dc829a4bce..a173db83b071 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -132,6 +132,10 @@ struct mmu_notifier_ops { unsigned long address, pte_t pte); + void (*numa_protect)(struct mmu_notifier *subscription, + struct mm_struct *mm, + unsigned long start, + unsigned long end); /* * invalidate_range_start() and invalidate_range_end() must be * paired and are called only when the mmap_lock and/or the @@ -395,6 +399,9 @@ extern int __mmu_notifier_test_young(struct mm_struct *mm, unsigned long address); extern void __mmu_notifier_change_pte(struct mm_struct *mm, unsigned long address, pte_t pte); +extern void __mmu_notifier_numa_protect(struct mm_struct *mm, + unsigned long start, + unsigned long end); extern int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *r); extern void __mmu_notifier_invalidate_range_end(struct mmu_notifier_range *r, bool only_end); @@ -448,6 +455,14 @@ static inline void mmu_notifier_change_pte(struct mm_struct *mm, __mmu_notifier_change_pte(mm, address, pte); } +static inline void mmu_notifier_numa_protect(struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + if (mm_has_notifiers(mm)) + __mmu_notifier_numa_protect(mm, start, end); +} + static inline void mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range) { diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 50c0dde1354f..fc96fbd46e1d 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -382,6 +382,24 @@ int __mmu_notifier_clear_flush_young(struct mm_struct *mm, return young; } +void __mmu_notifier_numa_protect(struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + struct mmu_notifier *subscription; + int id; + + id = srcu_read_lock(&srcu); + hlist_for_each_entry_rcu(subscription, + &mm->notifier_subscriptions->list, hlist, + srcu_read_lock_held(&srcu)) { + if (subscription->ops->numa_protect) + subscription->ops->numa_protect(subscription, mm, start, + end); + } + srcu_read_unlock(&srcu, id); +} + int __mmu_notifier_clear_young(struct mm_struct *mm, unsigned long start, unsigned long end) From patchwork Thu Aug 10 09:00:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13349113 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B288C001DB for ; Thu, 10 Aug 2023 09:27:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233980AbjHJJ1n (ORCPT ); Thu, 10 Aug 2023 05:27:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40212 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231133AbjHJJ1m (ORCPT ); Thu, 10 Aug 2023 05:27:42 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 967B22129; Thu, 10 Aug 2023 02:27:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691659661; x=1723195661; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=ckO2Ib+mc5GNqAr3UWlAyUsh+UXakl7WbTml3P3tRYo=; b=elnXKUW0eyi9BJovupJJwBi4C9VAeXrbjIhj5mCLEOVNKLKRvu0Syobq avK4Q37FI3W6LoVKA51S2HsWvRj8SSND5CwcXh9GffLBj2LP8aGLddfBC 0/DVZjlPmNVlr9HDu/KznuB+aGjPf+DiAWG8kfxyrZ7KIETPYjoe30Xdh H8Koj/tMpc3SmBuHPCOZ+s19X2QCHwEBmijYF4vyPZLVHBIwAFOTv28O0 EKAFQCDtfgGC5eDM4LEUlFU/kgV6j1LR/89jBEK26KYdBSPdrSCkthLY5 xxlHKV3/Qsm7v8yl/aCEB7P+/B5tlrePUKL6vOjsyvWnNx0gO3hH7A+Ss g==; X-IronPort-AV: E=McAfee;i="6600,9927,10797"; a="361487657" X-IronPort-AV: E=Sophos;i="6.01,161,1684825200"; d="scan'208";a="361487657" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Aug 2023 02:27:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.01,202,1684825200"; d="scan'208";a="875629237" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Aug 2023 02:27:41 -0700 From: Yan Zhao To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, mike.kravetz@oracle.com, apopple@nvidia.com, jgg@nvidia.com, rppt@kernel.org, akpm@linux-foundation.org, kevin.tian@intel.com, david@redhat.com, Yan Zhao Subject: [RFC PATCH v2 4/5] mm/autonuma: call .numa_protect() when page is protected for NUMA migrate Date: Thu, 10 Aug 2023 17:00:48 +0800 Message-Id: <20230810090048.26184-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230810085636.25914-1-yan.y.zhao@intel.com> References: <20230810085636.25914-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Call mmu notifier's callback .numa_protect() in change_pmd_range() when a page is ensured to be protected by PROT_NONE for NUMA migration purpose. Signed-off-by: Yan Zhao --- mm/huge_memory.c | 1 + mm/mprotect.c | 1 + 2 files changed, 2 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a71cf686e3b2..8ae56507da12 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1892,6 +1892,7 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING && !toptier) xchg_page_access_time(page, jiffies_to_msecs(jiffies)); + mmu_notifier_numa_protect(vma->vm_mm, addr, addr + PMD_SIZE); } /* * In case prot_numa, we are under mmap_read_lock(mm). It's critical diff --git a/mm/mprotect.c b/mm/mprotect.c index a1f63df34b86..c401814b2992 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -164,6 +164,7 @@ static long change_pte_range(struct mmu_gather *tlb, !toptier) xchg_page_access_time(page, jiffies_to_msecs(jiffies)); + mmu_notifier_numa_protect(vma->vm_mm, addr, addr + PAGE_SIZE); } oldpte = ptep_modify_prot_start(vma, addr, pte); From patchwork Thu Aug 10 09:02:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13349114 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 769AEC04E69 for ; Thu, 10 Aug 2023 09:29:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234225AbjHJJ3R (ORCPT ); Thu, 10 Aug 2023 05:29:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52400 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231133AbjHJJ3N (ORCPT ); Thu, 10 Aug 2023 05:29:13 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E6CD42127; Thu, 10 Aug 2023 02:29:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1691659752; x=1723195752; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=c5lM4VELT+1iO33iaR4/GLXpKT7hVcv5kR8JEqEqxJc=; b=JENsK3pNIUS/VInK1pnaWGKTn3D7IZ8+iDwHb+vjQ7We6koIqbkePSbK 4Wz73aU7J8F01+lbmr2CRXIP/6A/fa81WXWZjchEXkStouLk+vQ5EF3RV e2oXVsYUM0C5MLMlASUeIaJEZOaKz3YeHDQ0jCnzBWbo+h8FjpJHPrtQP 5aW+KEfQzTNMb2aCvlI7PA3+s7nL7YTXoIJe49a9XHRLSM+G33XaKE5Dk GBws5o79nobVhiB9qxVoqd436x3eGtk1qroKFdcvy0AdP3ze4Jl1z1X5S W+AxxWLWA3RdveEyuGqKt9bbwmiHX68bw8nEdib0N7Ans3K3R2C41s7et w==; X-IronPort-AV: E=McAfee;i="6600,9927,10797"; a="368806344" X-IronPort-AV: E=Sophos;i="6.01,161,1684825200"; d="scan'208";a="368806344" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Aug 2023 02:29:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10797"; a="822172765" X-IronPort-AV: E=Sophos;i="6.01,161,1684825200"; d="scan'208";a="822172765" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Aug 2023 02:29:09 -0700 From: Yan Zhao To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, mike.kravetz@oracle.com, apopple@nvidia.com, jgg@nvidia.com, rppt@kernel.org, akpm@linux-foundation.org, kevin.tian@intel.com, david@redhat.com, Yan Zhao Subject: [RFC PATCH v2 5/5] KVM: Unmap pages only when it's indeed protected for NUMA migration Date: Thu, 10 Aug 2023 17:02:18 +0800 Message-Id: <20230810090218.26244-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230810085636.25914-1-yan.y.zhao@intel.com> References: <20230810085636.25914-1-yan.y.zhao@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Register to .numa_protect() callback in mmu notifier so that KVM can get acurate information about when a page is PROT_NONE protected in primary MMU and unmap it in secondary MMU accordingly. In KVM's .invalidate_range_start() handler, if the event is to notify that the range may be protected to PROT_NONE for NUMA migration purpose, don't do the unmapping in secondary MMU. Hold on until.numa_protect() comes. Signed-off-by: Yan Zhao Signed-off-by: Sean Christopherson Reported-by: Yan Zhao Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index dfbaafbe3a00..907444a1761b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -711,6 +711,20 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, kvm_handle_hva_range(mn, address, address + 1, pte, kvm_change_spte_gfn); } +static void kvm_mmu_notifier_numa_protect(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + struct kvm *kvm = mmu_notifier_to_kvm(mn); + + WARN_ON_ONCE(!READ_ONCE(kvm->mn_active_invalidate_count)); + if (!READ_ONCE(kvm->mmu_invalidate_in_progress)) + return; + + kvm_handle_hva_range(mn, start, end, __pte(0), kvm_unmap_gfn_range); +} + void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, unsigned long end) { @@ -744,14 +758,18 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, const struct mmu_notifier_range *range) { struct kvm *kvm = mmu_notifier_to_kvm(mn); + bool is_numa = (range->event == MMU_NOTIFY_PROTECTION_VMA) && + (range->flags & MMU_NOTIFIER_RANGE_NUMA); const struct kvm_hva_range hva_range = { .start = range->start, .end = range->end, .pte = __pte(0), - .handler = kvm_unmap_gfn_range, + .handler = !is_numa ? kvm_unmap_gfn_range : + (void *)kvm_null_fn, .on_lock = kvm_mmu_invalidate_begin, - .on_unlock = kvm_arch_guest_memory_reclaimed, - .flush_on_ret = true, + .on_unlock = !is_numa ? kvm_arch_guest_memory_reclaimed : + (void *)kvm_null_fn, + .flush_on_ret = !is_numa ? true : false, .may_block = mmu_notifier_range_blockable(range), }; @@ -899,6 +917,7 @@ static const struct mmu_notifier_ops kvm_mmu_notifier_ops = { .clear_young = kvm_mmu_notifier_clear_young, .test_young = kvm_mmu_notifier_test_young, .change_pte = kvm_mmu_notifier_change_pte, + .numa_protect = kvm_mmu_notifier_numa_protect, .release = kvm_mmu_notifier_release, };