[2/2] TESTING! KVM: x86: add invalidate_range mmu notifier

Message ID	20171130180546.4331-2-rkrcmar@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= <rkrcmar@redhat.com> To: =?UTF-8?q?Fabian=20Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com> Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>, Andrea Arcangeli <aarcange@redhat.com>, =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com> Subject: [PATCH 2/2] TESTING! KVM: x86: add invalidate_range mmu notifier Date: Thu, 30 Nov 2017 19:05:46 +0100 Message-Id: <20171130180546.4331-2-rkrcmar@redhat.com> In-Reply-To: <20171130180546.4331-1-rkrcmar@redhat.com> References: <20171130161933.GB1606@flask> <20171130180546.4331-1-rkrcmar@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: kvm-owner@vger.kernel.org Precedence: bulk

Message ID

20171130180546.4331-2-rkrcmar@redhat.com (mailing list archive)

State

New, archived

Headers

From: =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= <rkrcmar@redhat.com>
To: =?UTF-8?q?Fabian=20Gr=C3=BCnbichler?= <f.gruenbichler@proxmox.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	=?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com>
Subject: [PATCH 2/2] TESTING! KVM: x86: add invalidate_range mmu notifier
Date: Thu, 30 Nov 2017 19:05:46 +0100
Message-Id: <20171130180546.4331-2-rkrcmar@redhat.com>
In-Reply-To: <20171130180546.4331-1-rkrcmar@redhat.com>
References: <20171130161933.GB1606@flask>
	<20171130180546.4331-1-rkrcmar@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Sender: kvm-owner@vger.kernel.org
Precedence: bulk

Commit Message

Radim Krčmář Nov. 30, 2017, 6:05 p.m. UTC

Does roughly what kvm_mmu_notifier_invalidate_page did before.

I am not certain why this would be needed.  It might mean that we have
another bug with start/end or just that I missed something.

Please try just [1/2] first and apply this one only if [1/2] still bugs,
thanks!
---
 virt/kvm/kvm_main.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

Comments

Paolo Bonzini Dec. 1, 2017, 3:15 p.m. UTC | #1

On 30/11/2017 19:05, Radim Krčmář wrote:
> Does roughly what kvm_mmu_notifier_invalidate_page did before.
> 
> I am not certain why this would be needed.  It might mean that we have
> another bug with start/end or just that I missed something.

I don't think this is needed, because we don't have shared page tables.
My understanding is that without shared page tables, you can assume that
all page modifications go through invalidate_range_start/end.  With
shared page tables, there are additional TLB flushes to take care of,
which require invalidate_range.

Thanks,

Paolo

> Please try just [1/2] first and apply this one only if [1/2] still bugs,
> thanks!
> ---
>  virt/kvm/kvm_main.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index b7f4689e373f..0825ea624f16 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -342,6 +342,29 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
>  	srcu_read_unlock(&kvm->srcu, idx);
>  }
>  
> +static void kvm_mmu_notifier_invalidate_range(struct mmu_notifier *mn,
> +						    struct mm_struct *mm,
> +						    unsigned long start,
> +						    unsigned long end)
> +{
> +	struct kvm *kvm = mmu_notifier_to_kvm(mn);
> +	int need_tlb_flush = 0, idx;
> +
> +	idx = srcu_read_lock(&kvm->srcu);
> +	spin_lock(&kvm->mmu_lock);
> +	kvm->mmu_notifier_seq++;
> +	need_tlb_flush = kvm_unmap_hva_range(kvm, start, end);
> +	need_tlb_flush |= kvm->tlbs_dirty;
> +	if (need_tlb_flush)
> +		kvm_flush_remote_tlbs(kvm);
> +
> +	spin_unlock(&kvm->mmu_lock);
> +
> +	kvm_arch_mmu_notifier_invalidate_range(kvm, start, end);
> +
> +	srcu_read_unlock(&kvm->srcu, idx);
> +}
> +
>  static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
>  						    struct mm_struct *mm,
>  						    unsigned long start,
> @@ -476,6 +499,7 @@ static void kvm_mmu_notifier_release(struct mmu_notifier *mn,
>  }
>  
>  static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
> +	.invalidate_range	= kvm_mmu_notifier_invalidate_range,
>  	.invalidate_range_start	= kvm_mmu_notifier_invalidate_range_start,
>  	.invalidate_range_end	= kvm_mmu_notifier_invalidate_range_end,
>  	.clear_flush_young	= kvm_mmu_notifier_clear_flush_young,
>

Andrea Arcangeli Dec. 3, 2017, 5:24 p.m. UTC | #2

On Fri, Dec 01, 2017 at 04:15:37PM +0100, Paolo Bonzini wrote:
> On 30/11/2017 19:05, Radim Krčmář wrote:
> > Does roughly what kvm_mmu_notifier_invalidate_page did before.
> > 
> > I am not certain why this would be needed.  It might mean that we have
> > another bug with start/end or just that I missed something.
> 
> I don't think this is needed, because we don't have shared page tables.
> My understanding is that without shared page tables, you can assume that
> all page modifications go through invalidate_range_start/end.  With
> shared page tables, there are additional TLB flushes to take care of,
> which require invalidate_range.

Agreed, invalidate_range only is ever needed if you the secondary MMU
(i.e. KVM) shares the same pagetables of the primary MMU in the
host. Only in such case we need a special secondary MMU invalidate in
the tlb gather before the page is freed because there's no way to
block the secondary MMU from walking the host pagetables in
invalidate_range_start.

In KVM case the secondary MMU always go through the shadow pagetables,
so all shadow pagetable invalidates can happen in
invalidate_range_start and patch 2/2 is not needed here.

Note that the host kernel could have always decided to call
invalidate_range_start/end and never to call invalidate_page even
before invalidate_page was removed.

So the problem in practice could only be noticed after the removal of
invalidate_page of course, but in more theoretical terms 1/2 is
actually fixing a longstanding bug. The removal of invalidate_page
made the lack of kvm_arch_mmu_notifier_invalidate_page call in
invalidate_range_start more apparent.

Thanks,
Andrea

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b7f4689e373f..0825ea624f16 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -342,6 +342,29 @@  static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
 	srcu_read_unlock(&kvm->srcu, idx);
 }
 
+static void kvm_mmu_notifier_invalidate_range(struct mmu_notifier *mn,
+						    struct mm_struct *mm,
+						    unsigned long start,
+						    unsigned long end)
+{
+	struct kvm *kvm = mmu_notifier_to_kvm(mn);
+	int need_tlb_flush = 0, idx;
+
+	idx = srcu_read_lock(&kvm->srcu);
+	spin_lock(&kvm->mmu_lock);
+	kvm->mmu_notifier_seq++;
+	need_tlb_flush = kvm_unmap_hva_range(kvm, start, end);
+	need_tlb_flush |= kvm->tlbs_dirty;
+	if (need_tlb_flush)
+		kvm_flush_remote_tlbs(kvm);
+
+	spin_unlock(&kvm->mmu_lock);
+
+	kvm_arch_mmu_notifier_invalidate_range(kvm, start, end);
+
+	srcu_read_unlock(&kvm->srcu, idx);
+}
+
 static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 						    struct mm_struct *mm,
 						    unsigned long start,
@@ -476,6 +499,7 @@  static void kvm_mmu_notifier_release(struct mmu_notifier *mn,
 }
 
 static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
+	.invalidate_range	= kvm_mmu_notifier_invalidate_range,
 	.invalidate_range_start	= kvm_mmu_notifier_invalidate_range_start,
 	.invalidate_range_end	= kvm_mmu_notifier_invalidate_range_end,
 	.clear_flush_young	= kvm_mmu_notifier_clear_flush_young,

[2/2] TESTING! KVM: x86: add invalidate_range mmu notifier

Commit Message

Comments

Patch