From patchwork Tue Aug 15 17:18:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13353993 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19FDCC04A6A for ; Tue, 15 Aug 2023 17:20:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238720AbjHORTh (ORCPT ); Tue, 15 Aug 2023 13:19:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238705AbjHORTM (ORCPT ); Tue, 15 Aug 2023 13:19:12 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D40651BC7; Tue, 15 Aug 2023 10:19:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1692119951; x=1723655951; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BoNjcydlwVP9IfhBLhVhZL9vVzO2yH4AyILTumiROh0=; b=QdtnaR9t0YnmjwpFejUdWX6A2J9/iu39WM7TOCdaC618aeEj8+1VErZJ TEFu1TV0RfUppsvS2BBdBO9WuR0GsM1vvb5qoUfCEfyGz2AFUNzxsT1CP cfaDg/MrdG/Ruo7wsr1iKUG4CQPUKSBszTlucG/Ul6jgyes3ij3Pcw/FJ ciJ3NiQpgqBYA98Q1QguN6Clbs0KPy1+12nEoYliUvl/rBIT8nGTH2iWi tT1YrL9uJWfrvSowcYGrn/Kl1d6c0CYkCZrvm5iu/tUyidCdrTJcyt3Fr AE6DzXDuE7XkOkpbYJYyg4tsqHyBNrAXET7ppZXRyifVuhPExEgebQp0J w==; X-IronPort-AV: E=McAfee;i="6600,9927,10803"; a="362488586" X-IronPort-AV: E=Sophos;i="6.01,175,1684825200"; d="scan'208";a="362488586" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2023 10:19:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10803"; a="848148968" X-IronPort-AV: E=Sophos;i="6.01,175,1684825200"; d="scan'208";a="848148968" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2023 10:19:04 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , erdemaktas@google.com, Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, linux-coco@lists.linux.dev, Chao Peng , Ackerley Tng , Vishal Annapurve , Yuan Yao , Jarkko Sakkinen , Xu Yilun , Quentin Perret , wei.w.wang@intel.com, Fuad Tabba Subject: [PATCH 1/8] KVM: gmem: Make kvm_gmem_bind return EBADF on wrong fd Date: Tue, 15 Aug 2023 10:18:48 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Isaku Yamahata When kvm_gmem_bind() fails fget(), return EBADF instead of EINVAL because EBADF is more appropriate. Signed-off-by: Isaku Yamahata --- virt/kvm/guest_mem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c index db644f7fa48b..c81d2bb9ae93 100644 --- a/virt/kvm/guest_mem.c +++ b/virt/kvm/guest_mem.c @@ -479,7 +479,7 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, file = fget(fd); if (!file) - return -EINVAL; + return -EBADF; if (file->f_op != &kvm_gmem_fops) goto err; From patchwork Tue Aug 15 17:18:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13353999 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E17BAC07E8C for ; Tue, 15 Aug 2023 17:20:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238026AbjHORTo (ORCPT ); Tue, 15 Aug 2023 13:19:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57770 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238706AbjHORTM (ORCPT ); Tue, 15 Aug 2023 13:19:12 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A591319BF; Tue, 15 Aug 2023 10:19:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1692119951; x=1723655951; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QjH4UtWigF9raNSzy69mNymbrOXsfGuwJ3fVuFOdnfU=; b=Rv84xTlTKjRSR8nJ5spRYhfIP0N0byKd9+A2GBMUbj9BgbkyzC7Oelob vlWUD0J1mGfd9qFt3U9Zgbw2/a7mtEt92Ky1CTWgb4Tup52OmeZPPUUsa pINZXv1zL62qo8rrkYnn8XngASbUQRRdNl0KD48a+tQfTb9+vtGikWySg /0bHUXUVi+2zmnKrXJvvnxj5uaQR8/P0mM65d+sC5M/0AmzNk3xN6l5De IEsK5oRT1yftCSopYX0YEnq/f3oraFqgb7Ca+Y5XC4tIq6Ij5N7M4yMvW EfCsXCvzYdYnJEZwXkr7R82gETQUx7DhUTWuy2jd2NoJzF7eZGDFuh4Pt w==; X-IronPort-AV: E=McAfee;i="6600,9927,10803"; a="362488597" X-IronPort-AV: E=Sophos;i="6.01,175,1684825200"; d="scan'208";a="362488597" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2023 10:19:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10803"; a="848148971" X-IronPort-AV: E=Sophos;i="6.01,175,1684825200"; d="scan'208";a="848148971" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2023 10:19:04 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , erdemaktas@google.com, Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, linux-coco@lists.linux.dev, Chao Peng , Ackerley Tng , Vishal Annapurve , Yuan Yao , Jarkko Sakkinen , Xu Yilun , Quentin Perret , wei.w.wang@intel.com, Fuad Tabba , Xiaoyao Li Subject: [PATCH 2/8] KVM: gmem: removed duplicated kvm_gmem_init() Date: Tue, 15 Aug 2023 10:18:49 -0700 Message-Id: <5eec36e76ee288d56f45ff2f22b7c9f56d23b75a.1692119201.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Isaku Yamahata Reported-by: Xiaoyao Li Signed-off-by: Isaku Yamahata --- virt/kvm/kvm_main.c | 1 - 1 file changed, 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ee331cf8ba54..8bfeb615fc4d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -6308,7 +6308,6 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) kvm_preempt_ops.sched_out = kvm_sched_out; kvm_init_debug(); - kvm_gmem_init(); r = kvm_vfio_ops_init(); if (WARN_ON_ONCE(r)) From patchwork Tue Aug 15 17:18:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13353996 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E8BAC04FE2 for ; Tue, 15 Aug 2023 17:20:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238743AbjHORTl (ORCPT ); Tue, 15 Aug 2023 13:19:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238708AbjHORTN (ORCPT ); Tue, 15 Aug 2023 13:19:13 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 873C51BCB; Tue, 15 Aug 2023 10:19:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1692119952; x=1723655952; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OkwU5NRmSECTN4n1WrNtc+Oy7aJ1i53jsdIQEUuY6OA=; b=KRLrusZukpGbWoiE6WzJZIIibWnBUgB+eY4j8Roxcw4NKwuQX6dtHigS ij+9dYLSnPFDoxEe+mUtk7rP+uBB5GR1SeT0YAOtl5b5UYB8bcCJ4fEnL x4Y6PoUWFt4Yyg2RcpB7s3OPmA9pOz0nj7PqQsV+uXlEPMzOzjjPfuwOd 2JRU3At0Ld4KrUYrD2hd2F9JnkItUSJfpb+ydQVJ0jKcJtiszIOcfPVeW o3tIhmj8O/PVUQD0fqJF84Ki0sXb2ApYpC7LarTPLufRjAI9K1now0/6N 1lCF8tGjHWTPPAl+R5kwTsBSQC4IEU9Uckh/+v4GOJYn4SMhE0pXnA9HE Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10803"; a="362488602" X-IronPort-AV: E=Sophos;i="6.01,175,1684825200"; d="scan'208";a="362488602" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2023 10:19:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10803"; a="848148975" X-IronPort-AV: E=Sophos;i="6.01,175,1684825200"; d="scan'208";a="848148975" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2023 10:19:05 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , erdemaktas@google.com, Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, linux-coco@lists.linux.dev, Chao Peng , Ackerley Tng , Vishal Annapurve , Yuan Yao , Jarkko Sakkinen , Xu Yilun , Quentin Perret , wei.w.wang@intel.com, Fuad Tabba Subject: [PATCH 3/8] KVM: gmem: Fix kvm_gmem_issue_arch_invalidate() Date: Tue, 15 Aug 2023 10:18:50 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Isaku Yamahata __filemap_get_folio() can return error. Use IS_ERR_OR_NULL. Signed-off-by: Isaku Yamahata --- virt/kvm/guest_mem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c index c81d2bb9ae93..ed03f1d12172 100644 --- a/virt/kvm/guest_mem.c +++ b/virt/kvm/guest_mem.c @@ -53,7 +53,7 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index) struct folio *folio; folio = kvm_gmem_get_huge_folio(inode, index); - if (!folio) { + if (IS_ERR_OR_NULL(folio)) { folio = filemap_grab_folio(inode->i_mapping, index); if (!folio) return NULL; From patchwork Tue Aug 15 17:18:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13354000 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC9B0C07E8A for ; Tue, 15 Aug 2023 17:20:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238750AbjHORTm (ORCPT ); Tue, 15 Aug 2023 13:19:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57782 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238709AbjHORTN (ORCPT ); Tue, 15 Aug 2023 13:19:13 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7EB211BC2; Tue, 15 Aug 2023 10:19:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1692119952; x=1723655952; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0Nfl6CiEe9l9TCE09frQgEkEYx1zNWnP5lOrklwZ2vo=; b=fLTydvfTeUfRtxQ/AYYylpqSXzDZfycgfttQ3UPU6XIsaQNcH1KnpJij XwQnPZuV77GqWHybOgZPVz3slnlye5n7Yh+IKyscctecnWSJgDrgX7gMJ RrlhvGv4xpukYARaSAJybfyba83RwpOdMyD0NlCVkhn7Pn+eOZTH3lzvd nizOzTEPSTTbV0Bq7m23XxNblZCJYDZmMZpWpBT/r/J+x1CnD4kC8BKfi WBUk23wc0zk5vGMRXiCLPti4HTAZr4rfevG8Cxvo8poqEf907JbuKj7kZ 8F0glBED1S5taNecPV7kr+G+yfZn2dAMO2/g6/CX8YWH/zXkUwzaDHWoe A==; X-IronPort-AV: E=McAfee;i="6600,9927,10803"; a="362488606" X-IronPort-AV: E=Sophos;i="6.01,175,1684825200"; d="scan'208";a="362488606" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2023 10:19:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10803"; a="848148978" X-IronPort-AV: E=Sophos;i="6.01,175,1684825200"; d="scan'208";a="848148978" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2023 10:19:05 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , erdemaktas@google.com, Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, linux-coco@lists.linux.dev, Chao Peng , Ackerley Tng , Vishal Annapurve , Yuan Yao , Jarkko Sakkinen , Xu Yilun , Quentin Perret , wei.w.wang@intel.com, Fuad Tabba Subject: [PATCH 4/8] KVM: gmem: protect kvm_mmu_invalidate_end() Date: Tue, 15 Aug 2023 10:18:51 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Isaku Yamahata kvm_mmu_invalidate_end() updates struct kvm::mmu_invalidate_in_progress and it's protected by kvm::mmu_lock. call kvm_mmu_invalidate_end() before unlocking it. Not after the unlock. Fixes: 8e9009ca6d14 ("KVM: Introduce per-page memory attributes") Signed-off-by: Isaku Yamahata Acked-by: Jarkko Sakkinen --- virt/kvm/kvm_main.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8bfeb615fc4d..49380cd62367 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -535,6 +535,7 @@ struct kvm_mmu_notifier_range { } arg; gfn_handler_t handler; on_lock_fn_t on_lock; + on_unlock_fn_t before_unlock; on_unlock_fn_t on_unlock; bool flush_on_ret; bool may_block; @@ -629,6 +630,8 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, kvm_flush_remote_tlbs(kvm); if (locked) { + if (!IS_KVM_NULL_FN(range->before_unlock)) + range->before_unlock(kvm); KVM_MMU_UNLOCK(kvm); if (!IS_KVM_NULL_FN(range->on_unlock)) range->on_unlock(kvm); @@ -653,6 +656,7 @@ static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn, .arg.pte = pte, .handler = handler, .on_lock = (void *)kvm_null_fn, + .before_unlock = (void *)kvm_null_fn, .on_unlock = (void *)kvm_null_fn, .flush_on_ret = true, .may_block = false, @@ -672,6 +676,7 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn .end = end, .handler = handler, .on_lock = (void *)kvm_null_fn, + .before_unlock = (void *)kvm_null_fn, .on_unlock = (void *)kvm_null_fn, .flush_on_ret = false, .may_block = false, @@ -776,6 +781,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, .end = range->end, .handler = kvm_mmu_unmap_gfn_range, .on_lock = kvm_mmu_invalidate_begin, + .before_unlock = (void *)kvm_null_fn, .on_unlock = kvm_arch_guest_memory_reclaimed, .flush_on_ret = true, .may_block = mmu_notifier_range_blockable(range), @@ -815,6 +821,8 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, void kvm_mmu_invalidate_end(struct kvm *kvm) { + lockdep_assert_held_write(&kvm->mmu_lock); + /* * This sequence increase will notify the kvm page fault that * the page that is going to be mapped in the spte could have @@ -846,6 +854,7 @@ static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, .end = range->end, .handler = (void *)kvm_null_fn, .on_lock = kvm_mmu_invalidate_end, + .before_unlock = (void *)kvm_null_fn, .on_unlock = (void *)kvm_null_fn, .flush_on_ret = false, .may_block = mmu_notifier_range_blockable(range), @@ -2433,6 +2442,8 @@ static __always_inline void kvm_handle_gfn_range(struct kvm *kvm, kvm_flush_remote_tlbs(kvm); if (locked) { + if (!IS_KVM_NULL_FN(range->before_unlock)) + range->before_unlock(kvm); KVM_MMU_UNLOCK(kvm); if (!IS_KVM_NULL_FN(range->on_unlock)) range->on_unlock(kvm); @@ -2447,6 +2458,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, unsigned long attributes, .end = end, .handler = kvm_mmu_unmap_gfn_range, .on_lock = kvm_mmu_invalidate_begin, + .before_unlock = (void *)kvm_null_fn, .on_unlock = (void *)kvm_null_fn, .flush_on_ret = true, .may_block = true, @@ -2457,7 +2469,8 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, unsigned long attributes, .arg.attributes = attributes, .handler = kvm_arch_post_set_memory_attributes, .on_lock = (void *)kvm_null_fn, - .on_unlock = kvm_mmu_invalidate_end, + .before_unlock = kvm_mmu_invalidate_end, + .on_unlock = (void *)kvm_null_fn, .may_block = true, }; unsigned long i; From patchwork Tue Aug 15 17:18:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13353992 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF511C41513 for ; Tue, 15 Aug 2023 17:20:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238713AbjHORTg (ORCPT ); Tue, 15 Aug 2023 13:19:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57712 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238010AbjHORTL (ORCPT ); Tue, 15 Aug 2023 13:19:11 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 24E5719BF; Tue, 15 Aug 2023 10:19:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1692119950; x=1723655950; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=o3k6+pbIB4VT1tldgWw//qkeFxus1DFuhHSn8Vxfw6U=; b=bHdrJ9YAVFy5eTIdHwbAl/sd30S3VHNTEh6rn3qKyxrhRVSI5vU4jpCu zYk1vtqf6nsriT97o3Ux56b4tOvfJaI70f2gR5LJNUVPvXCBvxLlEuaqx 1o/reYHEali+7hcQYuQ2q9eHN3Jcuazh+ijXh/MaH97zAXYCEUFIALX7j 6J2xFmtAQIuEuUw5NY4s4j19Tp+mnoJR42TrSvRcKg75UeHA5iQjpgwEU UZXd1T2QCDvfjLtd4JC/B+c4J/1W+nQl94y2AL8usmv3/Te5glgJcAg6o GMnid7WYGlEprmZ93rXrCZ2WyVQxvcyRkqM09iEGfNQhlRka/tRSiRKfg Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10803"; a="436229834" X-IronPort-AV: E=Sophos;i="6.01,175,1684825200"; d="scan'208";a="436229834" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2023 10:19:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10803"; a="907693389" X-IronPort-AV: E=Sophos;i="6.01,175,1684825200"; d="scan'208";a="907693389" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2023 10:19:05 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , erdemaktas@google.com, Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, linux-coco@lists.linux.dev, Chao Peng , Ackerley Tng , Vishal Annapurve , Yuan Yao , Jarkko Sakkinen , Xu Yilun , Quentin Perret , wei.w.wang@intel.com, Fuad Tabba Subject: [PATCH 5/8] KVM: gmem, x86: Add gmem hook for initializing private memory Date: Tue, 15 Aug 2023 10:18:52 -0700 Message-Id: <3d5079d0a58616726e7471a93e3295676148865a.1692119201.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Michael Roth All gmem pages are expected to be 'private' as defined by a particular arch/platform. Platforms like SEV-SNP require additional operations to move these pages into a private state, so implement a hook that can be used to prepare this memory prior to mapping it into a guest. In the case of SEV-SNP, whether or not a 2MB page can be mapped via a 2MB mapping in the guest's nested page table depends on whether or not any subpages within the range have already been initialized as private in the RMP table, so this hook will also be used by the KVM MMU to clamp the maximum mapping size accordingly. Signed-off-by: Michael Roth Link: https://lore.kernel.org/r/20230612042559.375660-2-michael.roth@amd.com Acked-by: Jarkko Sakkinen --- Changes v2 -> v3: - Newly added --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/mmu/mmu.c | 12 ++++++++++-- 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index 13bc212cd4bc..439ba4beb5af 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -133,6 +133,7 @@ KVM_X86_OP(msr_filter_changed) KVM_X86_OP(complete_emulated_msr) KVM_X86_OP(vcpu_deliver_sipi_vector) KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); +KVM_X86_OP_OPTIONAL_RET0(gmem_prepare) #undef KVM_X86_OP #undef KVM_X86_OP_OPTIONAL diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index bbefd79b7950..2bc42f2887fa 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1732,6 +1732,9 @@ struct kvm_x86_ops { * Returns vCPU specific APICv inhibit reasons */ unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu); + + int (*gmem_prepare)(struct kvm *kvm, struct kvm_memory_slot *slot, + kvm_pfn_t pfn, gfn_t gfn, u8 *max_level); }; struct kvm_x86_nested_ops { diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 05943ccb55a4..06900b01b8f0 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4352,6 +4352,7 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { int max_order, r; + u8 max_level; if (!kvm_slot_can_be_private(fault->slot)) return kvm_do_memory_fault_exit(vcpu, fault); @@ -4361,8 +4362,15 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, if (r) return r; - fault->max_level = min(kvm_max_level_for_order(max_order), - fault->max_level); + max_level = kvm_max_level_for_order(max_order); + r = static_call(kvm_x86_gmem_prepare)(vcpu->kvm, fault->slot, fault->pfn, + fault->gfn, &max_level); + if (r) { + kvm_release_pfn_clean(fault->pfn); + return r; + } + + fault->max_level = min(max_level, fault->max_level); fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); return RET_PF_CONTINUE; } From patchwork Tue Aug 15 17:18:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13353995 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56114C04A94 for ; Tue, 15 Aug 2023 17:20:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238731AbjHORTj (ORCPT ); Tue, 15 Aug 2023 13:19:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57732 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238216AbjHORTM (ORCPT ); Tue, 15 Aug 2023 13:19:12 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9F211BC2; Tue, 15 Aug 2023 10:19:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1692119950; x=1723655950; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=5UFL8g4pHO7FMqMvTNqcXsSv7Q/kEXEGlxS+r2b9T+I=; b=FArx3jlro3xIWOdfhXiVlnfQhq+DQRWZMpQdytlQYeBdoY4ZTtRf/9nf zWcUaAGvUUcKx871bp6N0gKPMHO3+aEy/eBZwSy+UR2nmsybhSYGY+6GA zYWuuLIuLVI5feg0+J+lI6idfn/URbm50E6rNjnlHPQQMfvg7jQvB4gOV c70TwJ0Kl0/NI+46y1RDmE3MEPKCKGmhylToG0v0L+3u77Sl4WSo0zbBA 96x/BWEGham4raLjMaLIgFYuUptj+XsDDHcIcPwMHKHhQS9optufPtZfY 9NfJepRj+LN9EOpavA5FbmjWvJTANk4PL71j9DbbzhXij4W8aRkJuK4hA w==; X-IronPort-AV: E=McAfee;i="6600,9927,10803"; a="436229840" X-IronPort-AV: E=Sophos;i="6.01,175,1684825200"; d="scan'208";a="436229840" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2023 10:19:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10803"; a="907693400" X-IronPort-AV: E=Sophos;i="6.01,175,1684825200"; d="scan'208";a="907693400" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2023 10:19:06 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , erdemaktas@google.com, Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, linux-coco@lists.linux.dev, Chao Peng , Ackerley Tng , Vishal Annapurve , Yuan Yao , Jarkko Sakkinen , Xu Yilun , Quentin Perret , wei.w.wang@intel.com, Fuad Tabba Subject: [PATCH 6/8] KVM: gmem, x86: Add gmem hook for invalidating private memory Date: Tue, 15 Aug 2023 10:18:53 -0700 Message-Id: <8c9f0470ba6e5dc122f3f4e37c4dcfb6fb97b184.1692119201.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Michael Roth TODO: add a CONFIG option that can be to completely skip arch invalidation loop and avoid __weak references for arch/platforms that don't need an additional invalidation hook. In some cases, like with SEV-SNP, guest memory needs to be updated in a platform-specific manner before it can be safely freed back to the host. Add hooks to wire up handling of this sort when freeing memory in response to FALLOC_FL_PUNCH_HOLE operations. Also issue invalidations of all allocated pages when releasing the gmem file so that the pages are not left in an unusable state when they get freed back to the host. Signed-off-by: Michael Roth Link: https://lore.kernel.org/r/20230612042559.375660-3-michael.roth@amd.com Signed-off-by: Michael Roth Signed-off-by: Isaku Yamahata --- Changes v4 -> v5: - Fix compile issue by adding static inline when gmem is disabled Changes v2 -> v3: - Newly added --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/x86.c | 6 +++++ include/linux/kvm_host.h | 3 +++ virt/kvm/guest_mem.c | 42 ++++++++++++++++++++++++++++++ 5 files changed, 53 insertions(+) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index 439ba4beb5af..48f043de2ec0 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -134,6 +134,7 @@ KVM_X86_OP(complete_emulated_msr) KVM_X86_OP(vcpu_deliver_sipi_vector) KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); KVM_X86_OP_OPTIONAL_RET0(gmem_prepare) +KVM_X86_OP_OPTIONAL(gmem_invalidate) #undef KVM_X86_OP #undef KVM_X86_OP_OPTIONAL diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 2bc42f2887fa..17e78f9f2d17 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1735,6 +1735,7 @@ struct kvm_x86_ops { int (*gmem_prepare)(struct kvm *kvm, struct kvm_memory_slot *slot, kvm_pfn_t pfn, gfn_t gfn, u8 *max_level); + void (*gmem_invalidate)(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end); }; struct kvm_x86_nested_ops { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index de195ad83ec0..b54818d02cb1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13274,6 +13274,12 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_arch_no_poll); +#ifdef CONFIG_KVM_PRIVATE_MEM +void kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end) +{ + static_call_cond(kvm_x86_gmem_invalidate)(kvm, start, end); +} +#endif int kvm_spec_ctrl_test_value(u64 value) { diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 091bc89ae805..349b0bf81fa5 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2358,6 +2358,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn) #ifdef CONFIG_KVM_PRIVATE_MEM int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, kvm_pfn_t *pfn, int *max_order); +void kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end); #else static inline int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, @@ -2366,6 +2367,8 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm, KVM_BUG_ON(1, kvm); return -EIO; } + +static inline void kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end) { } #endif /* CONFIG_KVM_PRIVATE_MEM */ #endif diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c index ed03f1d12172..342d2938716a 100644 --- a/virt/kvm/guest_mem.c +++ b/virt/kvm/guest_mem.c @@ -127,6 +127,46 @@ static void kvm_gmem_invalidate_end(struct kvm_gmem *gmem, pgoff_t start, KVM_MMU_UNLOCK(kvm); } +void __weak kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end) +{ +} + +/* Handle arch-specific hooks needed before releasing guarded pages. */ +static void kvm_gmem_issue_arch_invalidate(struct kvm *kvm, struct file *file, + pgoff_t start, pgoff_t end) +{ + pgoff_t file_end = i_size_read(file_inode(file)) >> PAGE_SHIFT; + pgoff_t index = start; + + end = min(end, file_end); + + while (index < end) { + struct folio *folio; + unsigned int order; + struct page *page; + kvm_pfn_t pfn; + + folio = __filemap_get_folio(file->f_mapping, index, + FGP_LOCK, 0); + if (!folio) { + index++; + continue; + } + + page = folio_file_page(folio, index); + pfn = page_to_pfn(page); + order = folio_order(folio); + + kvm_arch_gmem_invalidate(kvm, pfn, pfn + min((1ul << order), end - index)); + + index = folio_next_index(folio); + folio_unlock(folio); + folio_put(folio); + + cond_resched(); + } +} + static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len) { struct list_head *gmem_list = &inode->i_mapping->private_list; @@ -143,6 +183,7 @@ static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len) list_for_each_entry(gmem, gmem_list, entry) kvm_gmem_invalidate_begin(gmem, start, end); + kvm_gmem_issue_arch_invalidate(kvm, file, start, end); truncate_inode_pages_range(inode->i_mapping, offset, offset + len - 1); list_for_each_entry(gmem, gmem_list, entry) @@ -253,6 +294,7 @@ static int kvm_gmem_release(struct inode *inode, struct file *file) * memory, as its lifetime is associated with the inode, not the file. */ kvm_gmem_invalidate_begin(gmem, 0, -1ul); + kvm_gmem_issue_arch_invalidate(gmem->kvm, file, 0, -1ul); kvm_gmem_invalidate_end(gmem, 0, -1ul); list_del(&gmem->entry); From patchwork Tue Aug 15 17:18:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13353998 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D7FDC04FE1 for ; Tue, 15 Aug 2023 17:20:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238755AbjHORTn (ORCPT ); Tue, 15 Aug 2023 13:19:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57776 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238707AbjHORTN (ORCPT ); Tue, 15 Aug 2023 13:19:13 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0517B1BCD; Tue, 15 Aug 2023 10:19:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1692119951; x=1723655951; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=IOB6egI1P24F5biT3CsDzSkUO+xJDoBD1NOa36eza/w=; b=KZA55ttxImdfat01Tx3P5+QAA6gpaQlO4e6dXN0lRmWXdZGJi5BflaIX x5aBEoyn3fWUi9O5D+tfS23Kv/g/0Q0Zho9GaCPJYmNIgIytMl83BloUl 3YxzYBvFAcLV5wQHQZCbuR3pWAtLR/21tMq1p8NhsA3EHp2Qm4xiL4MSj gYoy6o3dE4Ifv9S7CRxxJ5KK/j0hMXlWIYfxSSeimQVO9IA6iKw30zBG4 Z9GJxjCTXDAHJqYLUlF3wM1oGcTxEzFtCaueLI/jjwh8JOGJdj/OH1q0p d9Srq/woilkkJz0XUDh2VZTopE5t3K3IjnJ9Gytwo9F4FbKIHcZsWNAvc w==; X-IronPort-AV: E=McAfee;i="6600,9927,10803"; a="436229847" X-IronPort-AV: E=Sophos;i="6.01,175,1684825200"; d="scan'208";a="436229847" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2023 10:19:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10803"; a="907693407" X-IronPort-AV: E=Sophos;i="6.01,175,1684825200"; d="scan'208";a="907693407" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2023 10:19:07 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , erdemaktas@google.com, Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, linux-coco@lists.linux.dev, Chao Peng , Ackerley Tng , Vishal Annapurve , Yuan Yao , Jarkko Sakkinen , Xu Yilun , Quentin Perret , wei.w.wang@intel.com, Fuad Tabba Subject: [PATCH 7/8] KVM: gmem: Avoid race with kvm_gmem_release and mmu notifier Date: Tue, 15 Aug 2023 10:18:54 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Isaku Yamahata Add slots_lock around kvm_flush_shadow_all(). kvm_gmem_release() via fput() and kvm_mmu_notifier_release() via mmput() can be called simultaneously on process exit because vhost, /dev/vhost_{net, vsock}, can delay the call to release mmu_notifier, kvm_mmu_notifier_release() by its kernel thread. Vhost uses get_task_mm() and mmput() for the kernel thread to access process memory. mmput() can defer after closing the file. kvm_flush_shadow_all() and kvm_gmem_release() can be called simultaneously. With TDX KVM, HKID releasing by kvm_flush_shadow_all() and private memory releasing by kvm_gmem_release() can race. Add slots_lock to kvm_mmu_notifier_release(). Signed-off-by: Isaku Yamahata --- virt/kvm/kvm_main.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 49380cd62367..4855d0b7a859 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -927,9 +927,16 @@ static void kvm_mmu_notifier_release(struct mmu_notifier *mn, struct kvm *kvm = mmu_notifier_to_kvm(mn); int idx; + /* + * Avoide race with kvm_gmem_release(). + * This function is called via mmu notifier, mmu_release(). + * kvm_gmem_release() is called via fput() on process exit. + */ + mutex_lock(&kvm->slots_lock); idx = srcu_read_lock(&kvm->srcu); kvm_flush_shadow_all(kvm); srcu_read_unlock(&kvm->srcu, idx); + mutex_unlock(&kvm->slots_lock); } static const struct mmu_notifier_ops kvm_mmu_notifier_ops = { From patchwork Tue Aug 15 17:18:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 13353994 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FF66C04E69 for ; Tue, 15 Aug 2023 17:20:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238726AbjHORTi (ORCPT ); Tue, 15 Aug 2023 13:19:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57738 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238704AbjHORTM (ORCPT ); Tue, 15 Aug 2023 13:19:12 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3339A1BCB; Tue, 15 Aug 2023 10:19:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1692119951; x=1723655951; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fnhmYRSLLF7upnCUT+rF6FBxCOUm7AKxntdwOQGVU48=; b=nxlTRel/kQ67/0a8wwv1RLZj0whFlMvBgqlqGYgzQuRjPuQ9zQadzrw1 9EysTS3mGXR+lC7PVnx/FTeAEhu1pt3MP1Y3l5Om8k1evOueMyl51STK1 d2lIF/kA38QZC6pxsBjPrtOL9nbkPsQkfsDc+UhqtiaWwP+worWg/50VU DZScYAhZTn7vQC9xWiBQRjXVLgr8Z0Huo+51tRknW91ehQjfgPiX/pEc7 PqFbOO3H0FyJXH5TWZnj8stgtA1RoXo9h+rdxaC+YiF+6oEmQkD1jBPdL LgVceTcQb0NxYcn1wWKiJphHyrN0+oj9ctKajh9vVqheXywypFjOdMlJt Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10803"; a="436229858" X-IronPort-AV: E=Sophos;i="6.01,175,1684825200"; d="scan'208";a="436229858" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2023 10:19:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10803"; a="907693416" X-IronPort-AV: E=Sophos;i="6.01,175,1684825200"; d="scan'208";a="907693416" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2023 10:19:08 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , erdemaktas@google.com, Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, linux-coco@lists.linux.dev, Chao Peng , Ackerley Tng , Vishal Annapurve , Yuan Yao , Jarkko Sakkinen , Xu Yilun , Quentin Perret , wei.w.wang@intel.com, Fuad Tabba Subject: [PATCH 8/8] RFC: KVM: gmem: Guarantee the order of destruction Date: Tue, 15 Aug 2023 10:18:55 -0700 Message-Id: <72655345e07a02028c9239ccb2c3633dd72bbf9d.1692119201.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Isaku Yamahata Call kvm_flush_shadow_all() before releasing kvm gmem file on the guest destruction. The current gmem doesn't guarantee the destruction order between kvm gmem and kvm_mmu_notifier_release(), which calls kvm_flush_shadow_all(). When destructing TD guest, it's efficient to call kvm_flush_shadow_all() before calling kvm_gmem_issue_arch_invalidate() on releasing its struct file because kvm_flush_shadow_all() releases its host key ID (HKID). After releasing HKID, the TDX module doesn't have to validate the consistency of the Secure-EPT structure. One way is to make struct kvm to reference kvm gmem file. The current gmem implementation chose to make kvm gmem file to reference struct kvm. So reference from struct kvm to reference kvm gmem file results in a circular reference. Use kvm_mmu_notifier_release() to break it. Signed-off-by: Isaku Yamahata --- include/linux/kvm_host.h | 24 ++++++++++++++++++++++++ virt/kvm/kvm_main.c | 23 ++++++++++++++++++++++- 2 files changed, 46 insertions(+), 1 deletion(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 349b0bf81fa5..d717945702a8 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -594,6 +594,10 @@ struct kvm_memory_slot { u16 as_id; #ifdef CONFIG_KVM_PRIVATE_MEM +#ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER + struct file *file; +#endif + /* Private guest_mem */ struct { struct file __rcu *file; pgoff_t pgoff; @@ -601,6 +605,26 @@ struct kvm_memory_slot { #endif }; +static inline int kvm_memslot_gmem_fget(struct kvm_memory_slot *memslot, int fd) +{ +#if defined(CONFIG_KVM_PRIVATE_MEM) && defined(CONFIG_KVM_GENERIC_MMU_NOTIFIER) + memslot->file = fget(fd); + if (!memslot->file) + return -EBADF; +#endif + return 0; +} + +static inline void kvm_memslot_gmem_fput(struct kvm_memory_slot *memslot) +{ +#if defined(CONFIG_KVM_PRIVATE_MEM) && defined(CONFIG_KVM_GENERIC_MMU_NOTIFIER) + if (memslot->file) { + fput(memslot->file); + memslot->file = NULL; + } +#endif +} + static inline bool kvm_slot_can_be_private(const struct kvm_memory_slot *slot) { return slot && (slot->flags & KVM_MEM_PRIVATE); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4855d0b7a859..35bc3b64b7e4 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -926,6 +926,7 @@ static void kvm_mmu_notifier_release(struct mmu_notifier *mn, { struct kvm *kvm = mmu_notifier_to_kvm(mn); int idx; + int i; /* * Avoide race with kvm_gmem_release(). @@ -936,6 +937,18 @@ static void kvm_mmu_notifier_release(struct mmu_notifier *mn, idx = srcu_read_lock(&kvm->srcu); kvm_flush_shadow_all(kvm); srcu_read_unlock(&kvm->srcu, idx); + + /* Break circular reference count: kvm->gmem, gmem->kvm. */ + for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) { + struct kvm_memslots *slots = __kvm_memslots(kvm, i); + struct kvm_memory_slot *memslot; + struct hlist_node *tmp; + int bkt; + + hash_for_each_safe(slots->id_hash, bkt, tmp, memslot, id_node[slots->node_idx]) + kvm_memslot_gmem_fput(memslot); + } + mutex_unlock(&kvm->slots_lock); } @@ -1008,8 +1021,10 @@ static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot) /* This does not remove the slot from struct kvm_memslots data structures */ static void kvm_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) { - if (slot->flags & KVM_MEM_PRIVATE) + if (slot->flags & KVM_MEM_PRIVATE) { kvm_gmem_unbind(slot); + kvm_memslot_gmem_fput(slot); + } kvm_destroy_dirty_bitmap(slot); @@ -1734,6 +1749,8 @@ static void kvm_commit_memory_region(struct kvm *kvm, if (old->dirty_bitmap && !new->dirty_bitmap) kvm_destroy_dirty_bitmap(old); + kvm_memslot_gmem_fput(old); + /* * The final quirk. Free the detached, old slot, but only its * memory, not any metadata. Metadata, including arch specific @@ -2088,6 +2105,9 @@ int __kvm_set_memory_region(struct kvm *kvm, new->flags = mem->flags; new->userspace_addr = mem->userspace_addr; if (mem->flags & KVM_MEM_PRIVATE) { + r = kvm_memslot_gmem_fget(new, mem->gmem_fd); + if (r) + goto out; r = kvm_gmem_bind(kvm, new, mem->gmem_fd, mem->gmem_offset); if (r) goto out; @@ -2103,6 +2123,7 @@ int __kvm_set_memory_region(struct kvm *kvm, if (mem->flags & KVM_MEM_PRIVATE) kvm_gmem_unbind(new); out: + kvm_memslot_gmem_fput(new); kfree(new); return r; }