diff mbox series

[v3,08/12] KVM: Propagate vcpu explicitly to mark_page_dirty_in_slot()

Message ID 20211117174003.297096-9-dwmw2@infradead.org (mailing list archive)
State New, archived
Headers show
Series KVM: x86/xen: Add in-kernel Xen event channel delivery | expand

Commit Message

David Woodhouse Nov. 17, 2021, 5:39 p.m. UTC
From: David Woodhouse <dwmw@amazon.co.uk>

The kvm_dirty_ring_get() function uses kvm_get_running_vcpu() to work out
which dirty ring to use, but there are some use cases where that doesn't
work.

There's one in setting the Xen shared info page, introduced in commit
629b5348841a ("KVM: x86/xen: update wallclock region") and reported by
"butt3rflyh4ck" <butterflyhuangxx@gmail.com> in
https://lore.kernel.org/kvm/CAFcO6XOmoS7EacN_n6v4Txk7xL7iqRa2gABg3F7E3Naf5uG94g@mail.gmail.com/

There's also about to be another one when the newly-reintroduced
gfn_to_pfn_cache needs to mark a page as dirty from the MMU notifier
which invalidates the mapping. In that case, we will *know* the vcpu
that can be 'blamed' for dirtying the page, and we just need to be
able to pass it in as an explicit argument when doing so.

This patch preemptively resolves the second issue, and paves the way
for resolving the first. A complete fix for the first issue will need
us to switch the Xen shinfo to be owned by a particular vCPU, which
will happen in a separate patch.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/arm64/kvm/mmu.c           |  2 +-
 arch/x86/kvm/mmu/mmu.c         |  2 +-
 arch/x86/kvm/mmu/spte.c        |  2 +-
 arch/x86/kvm/mmu/tdp_mmu.c     |  2 +-
 arch/x86/kvm/x86.c             |  4 ++--
 include/linux/kvm_dirty_ring.h |  6 ++++--
 include/linux/kvm_host.h       |  3 ++-
 virt/kvm/dirty_ring.c          |  8 ++++++--
 virt/kvm/kvm_main.c            | 18 +++++++++---------
 9 files changed, 27 insertions(+), 20 deletions(-)

Comments

Marc Zyngier Nov. 17, 2021, 6:13 p.m. UTC | #1
On Wed, 17 Nov 2021 17:39:59 +0000,
David Woodhouse <dwmw2@infradead.org> wrote:
> 
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> The kvm_dirty_ring_get() function uses kvm_get_running_vcpu() to work out
> which dirty ring to use, but there are some use cases where that doesn't
> work.
> 
> There's one in setting the Xen shared info page, introduced in commit
> 629b5348841a ("KVM: x86/xen: update wallclock region") and reported by
> "butt3rflyh4ck" <butterflyhuangxx@gmail.com> in
> https://lore.kernel.org/kvm/CAFcO6XOmoS7EacN_n6v4Txk7xL7iqRa2gABg3F7E3Naf5uG94g@mail.gmail.com/
> 
> There's also about to be another one when the newly-reintroduced
> gfn_to_pfn_cache needs to mark a page as dirty from the MMU notifier
> which invalidates the mapping. In that case, we will *know* the vcpu
> that can be 'blamed' for dirtying the page, and we just need to be
> able to pass it in as an explicit argument when doing so.
> 
> This patch preemptively resolves the second issue, and paves the way
> for resolving the first. A complete fix for the first issue will need
> us to switch the Xen shinfo to be owned by a particular vCPU, which
> will happen in a separate patch.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  arch/arm64/kvm/mmu.c           |  2 +-
>  arch/x86/kvm/mmu/mmu.c         |  2 +-
>  arch/x86/kvm/mmu/spte.c        |  2 +-
>  arch/x86/kvm/mmu/tdp_mmu.c     |  2 +-
>  arch/x86/kvm/x86.c             |  4 ++--
>  include/linux/kvm_dirty_ring.h |  6 ++++--
>  include/linux/kvm_host.h       |  3 ++-
>  virt/kvm/dirty_ring.c          |  8 ++++++--
>  virt/kvm/kvm_main.c            | 18 +++++++++---------
>  9 files changed, 27 insertions(+), 20 deletions(-)

What's the base for this series? This patch fails to compile for me
(at least on arm64), and the following patch doesn't apply on -rc1.

Thanks,

	M.
David Woodhouse Nov. 17, 2021, 6:31 p.m. UTC | #2
On 17 November 2021 18:13:37 GMT, Marc Zyngier <maz@kernel.org> wrote:
>On Wed, 17 Nov 2021 17:39:59 +0000,
>David Woodhouse <dwmw2@infradead.org> wrote:
>> 
>> From: David Woodhouse <dwmw@amazon.co.uk>
>> 
>> The kvm_dirty_ring_get() function uses kvm_get_running_vcpu() to work out
>> which dirty ring to use, but there are some use cases where that doesn't
>> work.
>> 
>> There's one in setting the Xen shared info page, introduced in commit
>> 629b5348841a ("KVM: x86/xen: update wallclock region") and reported by
>> "butt3rflyh4ck" <butterflyhuangxx@gmail.com> in
>> https://lore.kernel.org/kvm/CAFcO6XOmoS7EacN_n6v4Txk7xL7iqRa2gABg3F7E3Naf5uG94g@mail.gmail.com/
>> 
>> There's also about to be another one when the newly-reintroduced
>> gfn_to_pfn_cache needs to mark a page as dirty from the MMU notifier
>> which invalidates the mapping. In that case, we will *know* the vcpu
>> that can be 'blamed' for dirtying the page, and we just need to be
>> able to pass it in as an explicit argument when doing so.
>> 
>> This patch preemptively resolves the second issue, and paves the way
>> for resolving the first. A complete fix for the first issue will need
>> us to switch the Xen shinfo to be owned by a particular vCPU, which
>> will happen in a separate patch.
>> 
>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>> ---
>>  arch/arm64/kvm/mmu.c           |  2 +-
>>  arch/x86/kvm/mmu/mmu.c         |  2 +-
>>  arch/x86/kvm/mmu/spte.c        |  2 +-
>>  arch/x86/kvm/mmu/tdp_mmu.c     |  2 +-
>>  arch/x86/kvm/x86.c             |  4 ++--
>>  include/linux/kvm_dirty_ring.h |  6 ++++--
>>  include/linux/kvm_host.h       |  3 ++-
>>  virt/kvm/dirty_ring.c          |  8 ++++++--
>>  virt/kvm/kvm_main.c            | 18 +++++++++---------
>>  9 files changed, 27 insertions(+), 20 deletions(-)
>
>What's the base for this series? This patch fails to compile for me
>(at least on arm64), and the following patch doesn't apply on -rc1.

kvm/master
David Woodhouse Nov. 17, 2021, 7:30 p.m. UTC | #3
On Wed, 2021-11-17 at 18:13 +0000, Marc Zyngier wrote:
> What's the base for this series? This patch fails to compile for me
> (at least on arm64), and the following patch doesn't apply on -rc1.

It's on top of kvm/master, and it's also at
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/xen-evtchn

git pull git://git.infradead.org/users/dwmw2/linux.git xen-evtchn
David Woodhouse Nov. 17, 2021, 9:09 p.m. UTC | #4
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> The kvm_dirty_ring_get() function uses kvm_get_running_vcpu() to work out
> which dirty ring to use, but there are some use cases where that doesn't
> work.
>
> There's one in setting the Xen shared info page, introduced in commit
> 629b5348841a ("KVM: x86/xen: update wallclock region") and reported by
> "butt3rflyh4ck" <butterflyhuangxx@gmail.com> in
> https://lore.kernel.org/kvm/CAFcO6XOmoS7EacN_n6v4Txk7xL7iqRa2gABg3F7E3Naf5uG94g@mail.gmail.com/
>
> There's also about to be another one when the newly-reintroduced
> gfn_to_pfn_cache needs to mark a page as dirty from the MMU notifier
> which invalidates the mapping. In that case, we will *know* the vcpu
> that can be 'blamed' for dirtying the page, and we just need to be
> able to pass it in as an explicit argument when doing so.
>
> This patch preemptively resolves the second issue, and paves the way
> for resolving the first. A complete fix for the first issue will need
> us to switch the Xen shinfo to be owned by a particular vCPU, which
> will happen in a separate patch.
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>



> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -36,12 +36,16 @@ static bool kvm_dirty_ring_full(struct kvm_dirty_ring
> *ring)
>  	return kvm_dirty_ring_used(ring) >= ring->size;
>  }
>
> -struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm)
> +struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm, struct
> kvm_vcpu *vcpu)
>  {
> -	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> +	struct kvm_vcpu *running_vcpu = kvm_get_running_vcpu();
>
> +	WARN_ON_ONCE(vcpu && vcpu != running_vcpu);
>  	WARN_ON_ONCE(vcpu->kvm != kvm);

Ah, that one needs to be changed to check running_vcpu instead. Or this
needs to go first:

I think I prefer making the vCPU a required argument. If anyone's going to
pull a vCPU pointer out of their posterior, let the caller do it.

> +	if (!vcpu)
> +		vcpu = running_vcpu;
> +
>  	return &vcpu->dirty_ring;
>  }
>
Paolo Bonzini Nov. 18, 2021, 12:04 p.m. UTC | #5
On 11/17/21 22:09, David Woodhouse wrote:
>>   {
>> -	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>> +	struct kvm_vcpu *running_vcpu = kvm_get_running_vcpu();
>>
>> +	WARN_ON_ONCE(vcpu && vcpu != running_vcpu);
>>   	WARN_ON_ONCE(vcpu->kvm != kvm);
> Ah, that one needs to be changed to check running_vcpu instead. Or this
> needs to go first:
> 
> I think I prefer making the vCPU a required argument. If anyone's going to
> pull a vCPU pointer out of their posterior, let the caller do it.
> 

I understand that feeling, but still using the running vCPU is by far 
the common case, and it's not worth adding a new function parameter to 
all call sites.

What about using a separate function, possibly __-prefixed, for the case 
where you have a very specific vCPU?

Paolo
David Woodhouse Nov. 18, 2021, 2:22 p.m. UTC | #6
On Thu, 2021-11-18 at 13:04 +0100, Paolo Bonzini wrote:
> On 11/17/21 22:09, David Woodhouse wrote:
> > >   {
> > > -	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
> > > +	struct kvm_vcpu *running_vcpu = kvm_get_running_vcpu();
> > > 
> > > +	WARN_ON_ONCE(vcpu && vcpu != running_vcpu);
> > >   	WARN_ON_ONCE(vcpu->kvm != kvm);
> > Ah, that one needs to be changed to check running_vcpu instead. Or this
> > needs to go first:
> > 
> > I think I prefer making the vCPU a required argument. If anyone's going to
> > pull a vCPU pointer out of their posterior, let the caller do it.
> > 
> 
> I understand that feeling, but still using the running vCPU is by far 
> the common case, and it's not worth adding a new function parameter to 
> all call sites.
> 
> What about using a separate function, possibly __-prefixed, for the case 
> where you have a very specific vCPU?

We could certainly do a kvm_vcpu_mark_page_dirty_in_slot() and reduce
the collateral damage. I think this patch wouldn't need to touch
anything outside virt/kvm/ if we did that.

But bikeshedding aside, it occurs to me that I have a more substantial
concern about this approach — are we even *allowed* to touch the dirty
ring of another vCPU? It seems to be based on the assumption that it's
entirely a per-cpu structure on the kernel side. Wouldn't we need a
spinlock in kvm_dirty_ring_push() at the very least?

At this point I'm somewhat tempted to remove the mark_page_dirty() call
from gfn_to_pfn_cache_invalidate_start() and move it to the tail of 
kvm_gfn_to_pfn_cache_refresh() instead, where it unpins and memunmaps
the page. (We'd still tell the *kernel* by calling kvm_set_pfn_dirty()
immediately in the invalidation, just defer the KVM mark_page_dirty()
to be done in the context of an actual vCPU.)

If I do that, I can drop this 'propagate vcpu' patch completely.

I'm already happy enough that I'll fix the Xen shared_info page by
*never* bothering to dirty_log for it, and I can wash my hands of that
problem.... if I stop typing now. But...

That leaves the one in TDP MMU handle_changed_spte_dirty_log() which
AFAICT can trigger the same crash seen by butt3rflyh4ck — can't that
happen from a thread where kvm_get_running_vcpu() is NULL too? For that
one I'm not sure.
Sean Christopherson Nov. 18, 2021, 6:40 p.m. UTC | #7
On Thu, Nov 18, 2021, David Woodhouse wrote:
> That leaves the one in TDP MMU handle_changed_spte_dirty_log() which
> AFAICT can trigger the same crash seen by butt3rflyh4ck — can't that
> happen from a thread where kvm_get_running_vcpu() is NULL too? For that
> one I'm not sure.

I think could be trigger in the TDP MMU via kvm_mmu_notifier_release()
-> kvm_mmu_zap_all(), e.g. if the userspace VMM exits while dirty logging is
enabled.  That should be easy to (dis)prove via a selftest.

And for the record :-)

On Mon, Dec 02, 2019 at 12:10:36PM -0800, Sean Christopherson wrote:
> IMO, adding kvm_get_running_vcpu() is a hack that is just asking for future
> abuse and the vcpu/vm/as_id interactions in mark_page_dirty_in_ring()
> look extremely fragile.

On 03/12/19 20:01, Sean Christopherson wrote:
> In case it was clear, I strongly dislike adding kvm_get_running_vcpu().
> IMO, it's a unnecessary hack.  The proper change to ensure a valid vCPU is
> seen by mark_page_dirty_in_ring() when there is a current vCPU is to
> plumb the vCPU down through the various call stacks.
Sean Christopherson Nov. 18, 2021, 6:50 p.m. UTC | #8
On Thu, Nov 18, 2021, Sean Christopherson wrote:
> On Thu, Nov 18, 2021, David Woodhouse wrote:
> > That leaves the one in TDP MMU handle_changed_spte_dirty_log() which
> > AFAICT can trigger the same crash seen by butt3rflyh4ck — can't that
> > happen from a thread where kvm_get_running_vcpu() is NULL too? For that
> > one I'm not sure.
> 
> I think could be trigger in the TDP MMU via kvm_mmu_notifier_release()
> -> kvm_mmu_zap_all(), e.g. if the userspace VMM exits while dirty logging is
> enabled.  That should be easy to (dis)prove via a selftest.

Scratch that, the dirty log update is guarded by the new_spte being present, so
zapping of any kind won't trigger it.

Currently, I believe the only path that would create a present SPTE without an
active vCPU is mmu_notifer.change_pte, but that squeaks by because its required
to be wrapped with invalidate_range_{start,end}(MMU_NOTIFY_CLEAR), and KVM zaps
in that situation.

Ben's series to promote pages on disabling of dirty logging will also sqeuak by
because dirty logging is obviously disabled.
David Woodhouse Nov. 18, 2021, 7:23 p.m. UTC | #9
On 18 November 2021 18:50:55 GMT, Sean Christopherson <seanjc@google.com> wrote:
>On Thu, Nov 18, 2021, Sean Christopherson wrote:
>> On Thu, Nov 18, 2021, David Woodhouse wrote:
>> > That leaves the one in TDP MMU handle_changed_spte_dirty_log() which
>> > AFAICT can trigger the same crash seen by butt3rflyh4ck — can't that
>> > happen from a thread where kvm_get_running_vcpu() is NULL too? For that
>> > one I'm not sure.
>> 
>> I think could be trigger in the TDP MMU via kvm_mmu_notifier_release()
>> -> kvm_mmu_zap_all(), e.g. if the userspace VMM exits while dirty logging is
>> enabled.  That should be easy to (dis)prove via a selftest.
>
>Scratch that, the dirty log update is guarded by the new_spte being present, so
>zapping of any kind won't trigger it.
>
>Currently, I believe the only path that would create a present SPTE without an
>active vCPU is mmu_notifer.change_pte, but that squeaks by because its required
>to be wrapped with invalidate_range_{start,end}(MMU_NOTIFY_CLEAR), and KVM zaps
>in that situation.

Is it sufficient to have *an* active vCPU?  What if a VMM has threads for active vCPUs but is doing mmap/munmap on a *different* thread? Does that not suffer the same crash?
Sean Christopherson Nov. 18, 2021, 7:46 p.m. UTC | #10
On Thu, Nov 18, 2021, David Woodhouse wrote:
> 
> 
> On 18 November 2021 18:50:55 GMT, Sean Christopherson <seanjc@google.com> wrote:
> >On Thu, Nov 18, 2021, Sean Christopherson wrote:
> >> On Thu, Nov 18, 2021, David Woodhouse wrote:
> >> > That leaves the one in TDP MMU handle_changed_spte_dirty_log() which
> >> > AFAICT can trigger the same crash seen by butt3rflyh4ck — can't that
> >> > happen from a thread where kvm_get_running_vcpu() is NULL too? For that
> >> > one I'm not sure.
> >> 
> >> I think could be trigger in the TDP MMU via kvm_mmu_notifier_release()
> >> -> kvm_mmu_zap_all(), e.g. if the userspace VMM exits while dirty logging is
> >> enabled.  That should be easy to (dis)prove via a selftest.
> >
> >Scratch that, the dirty log update is guarded by the new_spte being present, so
> >zapping of any kind won't trigger it.
> >
> >Currently, I believe the only path that would create a present SPTE without an
> >active vCPU is mmu_notifer.change_pte, but that squeaks by because its required
> >to be wrapped with invalidate_range_{start,end}(MMU_NOTIFY_CLEAR), and KVM zaps
> >in that situation.
> 
> Is it sufficient to have *an* active vCPU?  What if a VMM has threads for
> active vCPUs but is doing mmap/munmap on a *different* thread? Does that not
> suffer the same crash?

It is sufficient for the current physical CPU to have an active vCPU, which is
generally guaranteed in the MMU code because, with a few exceptions, populating
SPTEs is done in vCPU context.

mmap() will never directly trigger SPTE creation, KVM first requires a vCPU to
fault on the new address.  munmap() is a pure zap flow, i.e. won't create a
present SPTE and trigger the writeback of the dirty bit.

That's also why I dislike using kvm_get_running_vcpu(); when it's needed, there's
a valid vCPU from the caller, but it deliberately gets dropped and indirectly
picked back up.
David Woodhouse Nov. 19, 2021, 9:23 a.m. UTC | #11
On Thu, 2021-11-18 at 19:46 +0000, Sean Christopherson wrote:
> It is sufficient for the current physical CPU to have an active vCPU, which is
> generally guaranteed in the MMU code because, with a few exceptions, populating
> SPTEs is done in vCPU context.
> 
> mmap() will never directly trigger SPTE creation, KVM first requires a vCPU to
> fault on the new address.  munmap() is a pure zap flow, i.e. won't create a
> present SPTE and trigger the writeback of the dirty bit.

OK, thanks.

> That's also why I dislike using kvm_get_running_vcpu(); when it's needed, there's
> a valid vCPU from the caller, but it deliberately gets dropped and indirectly
> picked back up.

Yeah. So as things stand we have a kvm_write_guest() function which
takes a 'struct kvm *', as well as a kvm_vcpu_write_guest() function
which takes a 'struct kvm_vcpu *'.

But it is verboten to *use* the kvm_write_guest() or mark_page_dirty()
functions unless you actually *do* have an active vCPU. Do so, and the
kernel might just crash; not even a graceful failure mode.

That's a fairly awful bear trap that has now caught me *twice*. I'm
kind of amused that in all my hairy inline asm and pinning and crap for
guest memory access, the thing that's been *broken* is where I just
used the *existing* kvm_write_wall_clock() which does the simple
kvm_write_guest() thing.

I think at the very least perhaps we should do something like this in
mark_page_dirty_in_slot():

 WARN_ON_ONCE(!kvm_get_running_vcpu() || kvm_get_running_vcpu()->kvm != kvm);

(For illustration only; I'd actually use a local vcpu variable *and*
pass that vcpu to kvm_dirty_ring_get())

On propagating the caller's vcpu through and killing off the non-vCPU
versions of the functions, I'm torn... because even if we insist on *a*
vCPU being passed, it might be *the* vCPU, and that's just setting a
more subtle trap (which would have bitten my GPC invalidate code, for
example).

There are other more complex approaches like adding an extra ring, with
spinlocks, for the 'not from a vCPU' cases. But I think that's
overkill.
diff mbox series

Patch

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 326cdfec74a1..d8411ce4db4b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1143,7 +1143,7 @@  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	/* Mark the page dirty only if the fault is handled successfully */
 	if (writable && !ret) {
 		kvm_set_pfn_dirty(pfn);
-		mark_page_dirty_in_slot(kvm, memslot, gfn);
+		mark_page_dirty_in_slot(kvm, vcpu, memslot, gfn);
 	}
 
 out_unlock:
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3be9beea838d..cedeab55b0d8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3090,7 +3090,7 @@  fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 		return false;
 
 	if (is_writable_pte(new_spte) && !is_writable_pte(old_spte))
-		mark_page_dirty_in_slot(vcpu->kvm, fault->slot, fault->gfn);
+		mark_page_dirty_in_slot(vcpu->kvm, vcpu, fault->slot, fault->gfn);
 
 	return true;
 }
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 0c76c45fdb68..0598515f3ae2 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -184,7 +184,7 @@  bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) {
 		/* Enforced by kvm_mmu_hugepage_adjust. */
 		WARN_ON(level > PG_LEVEL_4K);
-		mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
+		mark_page_dirty_in_slot(vcpu->kvm, vcpu, slot, gfn);
 	}
 
 	*new_spte = spte;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index a54c3491af42..c5669c9918a4 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -247,7 +247,7 @@  static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 	if ((!is_writable_pte(old_spte) || pfn_changed) &&
 	    is_writable_pte(new_spte)) {
 		slot = __gfn_to_memslot(__kvm_memslots(kvm, as_id), gfn);
-		mark_page_dirty_in_slot(kvm, slot, gfn);
+		mark_page_dirty_in_slot(kvm, NULL, slot, gfn);
 	}
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5a403d92833f..5af88f9e7fd1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3353,7 +3353,7 @@  static void record_steal_time(struct kvm_vcpu *vcpu)
  out:
 	user_access_end();
  dirty:
-	mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
+	mark_page_dirty_in_slot(vcpu->kvm, vcpu, ghc->memslot, gpa_to_gfn(ghc->gpa));
 }
 
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
@@ -4420,7 +4420,7 @@  static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
 	if (!copy_to_user_nofault(&st->preempted, &preempted, sizeof(preempted)))
 		vcpu->arch.st.preempted = KVM_VCPU_PREEMPTED;
 
-	mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
+	mark_page_dirty_in_slot(vcpu->kvm, vcpu, ghc->memslot, gpa_to_gfn(ghc->gpa));
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 4da8d4a4140b..f3be974f9c5a 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -43,7 +43,8 @@  static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring,
 	return 0;
 }
 
-static inline struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm)
+static inline struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm,
+							struct kvm_vcpu *vcpu)
 {
 	return NULL;
 }
@@ -78,7 +79,8 @@  static inline bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
 
 u32 kvm_dirty_ring_get_rsvd_entries(void);
 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
-struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm);
+struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm,
+					  struct kvm_vcpu *vcpu);
 
 /*
  * called with kvm->slots_lock held, returns the number of
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c310648cc8f1..1628c32e4464 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -942,7 +942,8 @@  struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
 bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
 bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn);
-void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot, gfn_t gfn);
+void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_vcpu *vcpu,
+			     struct kvm_memory_slot *memslot, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 
 struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 88f4683198ea..61c94fc4d7f0 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -36,12 +36,16 @@  static bool kvm_dirty_ring_full(struct kvm_dirty_ring *ring)
 	return kvm_dirty_ring_used(ring) >= ring->size;
 }
 
-struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm)
+struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm, struct kvm_vcpu *vcpu)
 {
-	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
+	struct kvm_vcpu *running_vcpu = kvm_get_running_vcpu();
 
+	WARN_ON_ONCE(vcpu && vcpu != running_vcpu);
 	WARN_ON_ONCE(vcpu->kvm != kvm);
 
+	if (!vcpu)
+		vcpu = running_vcpu;
+
 	return &vcpu->dirty_ring;
 }
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 356d636e037d..7137995cab41 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2781,7 +2781,7 @@  int kvm_vcpu_read_guest_atomic(struct kvm_vcpu *vcpu, gpa_t gpa,
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_atomic);
 
-static int __kvm_write_guest_page(struct kvm *kvm,
+static int __kvm_write_guest_page(struct kvm *kvm, struct kvm_vcpu *vcpu,
 				  struct kvm_memory_slot *memslot, gfn_t gfn,
 			          const void *data, int offset, int len)
 {
@@ -2794,7 +2794,7 @@  static int __kvm_write_guest_page(struct kvm *kvm,
 	r = __copy_to_user((void __user *)addr + offset, data, len);
 	if (r)
 		return -EFAULT;
-	mark_page_dirty_in_slot(kvm, memslot, gfn);
+	mark_page_dirty_in_slot(kvm, vcpu, memslot, gfn);
 	return 0;
 }
 
@@ -2803,7 +2803,7 @@  int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn,
 {
 	struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
 
-	return __kvm_write_guest_page(kvm, slot, gfn, data, offset, len);
+	return __kvm_write_guest_page(kvm, NULL, slot, gfn, data, offset, len);
 }
 EXPORT_SYMBOL_GPL(kvm_write_guest_page);
 
@@ -2812,7 +2812,7 @@  int kvm_vcpu_write_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn,
 {
 	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
 
-	return __kvm_write_guest_page(vcpu->kvm, slot, gfn, data, offset, len);
+	return __kvm_write_guest_page(vcpu->kvm, vcpu, slot, gfn, data, offset, len);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_write_guest_page);
 
@@ -2931,7 +2931,7 @@  int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
 	r = __copy_to_user((void __user *)ghc->hva + offset, data, len);
 	if (r)
 		return -EFAULT;
-	mark_page_dirty_in_slot(kvm, ghc->memslot, gpa >> PAGE_SHIFT);
+	mark_page_dirty_in_slot(kvm, NULL, ghc->memslot, gpa >> PAGE_SHIFT);
 
 	return 0;
 }
@@ -3000,7 +3000,7 @@  int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len)
 }
 EXPORT_SYMBOL_GPL(kvm_clear_guest);
 
-void mark_page_dirty_in_slot(struct kvm *kvm,
+void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_vcpu *vcpu,
 			     struct kvm_memory_slot *memslot,
 		 	     gfn_t gfn)
 {
@@ -3009,7 +3009,7 @@  void mark_page_dirty_in_slot(struct kvm *kvm,
 		u32 slot = (memslot->as_id << 16) | memslot->id;
 
 		if (kvm->dirty_ring_size)
-			kvm_dirty_ring_push(kvm_dirty_ring_get(kvm),
+			kvm_dirty_ring_push(kvm_dirty_ring_get(kvm, vcpu),
 					    slot, rel_gfn);
 		else
 			set_bit_le(rel_gfn, memslot->dirty_bitmap);
@@ -3022,7 +3022,7 @@  void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
 	struct kvm_memory_slot *memslot;
 
 	memslot = gfn_to_memslot(kvm, gfn);
-	mark_page_dirty_in_slot(kvm, memslot, gfn);
+	mark_page_dirty_in_slot(kvm, NULL, memslot, gfn);
 }
 EXPORT_SYMBOL_GPL(mark_page_dirty);
 
@@ -3031,7 +3031,7 @@  void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn)
 	struct kvm_memory_slot *memslot;
 
 	memslot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
-	mark_page_dirty_in_slot(vcpu->kvm, memslot, gfn);
+	mark_page_dirty_in_slot(vcpu->kvm, vcpu, memslot, gfn);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty);