diff mbox series

[v6,1/8] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL

Message ID 20221011061447.131531-2-gshan@redhat.com (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Enable ring-based dirty memory tracking | expand

Commit Message

Gavin Shan Oct. 11, 2022, 6:14 a.m. UTC
This adds KVM_REQ_RING_SOFT_FULL, which is raised when the dirty
ring of the specific VCPU becomes softly full in kvm_dirty_ring_push().
The VCPU is enforced to exit when the request is raised and its
dirty ring is softly full on its entrance.

The event is checked and handled in the newly introduced helper
kvm_dirty_ring_check_request(). With this, kvm_dirty_ring_soft_full()
becomes a private function.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/kvm/x86.c             | 15 ++++++---------
 include/linux/kvm_dirty_ring.h |  8 ++------
 include/linux/kvm_host.h       |  1 +
 virt/kvm/dirty_ring.c          | 19 ++++++++++++++++++-
 4 files changed, 27 insertions(+), 16 deletions(-)

Comments

Sean Christopherson Oct. 20, 2022, 10:42 p.m. UTC | #1
On Tue, Oct 11, 2022, Gavin Shan wrote:
> This adds KVM_REQ_RING_SOFT_FULL, which is raised when the dirty

"This" is basically "This patch", which is generally frowned upon.  Just state
what changes are being made.

> ring of the specific VCPU becomes softly full in kvm_dirty_ring_push().
> The VCPU is enforced to exit when the request is raised and its
> dirty ring is softly full on its entrance.
> 
> The event is checked and handled in the newly introduced helper
> kvm_dirty_ring_check_request(). With this, kvm_dirty_ring_soft_full()
> becomes a private function.

None of this captures why the request is being added.  I'm guessing Marc's
motivation is to avoid having to check ring on every entry, though there might
also be a correctness issue too?

It'd also be helpful to explain that KVM re-queues the request to maintain KVM's
existing uABI, which enforces the soft_limit even if no entries have been added
to the ring since the last KVM_EXIT_DIRTY_RING_FULL exit.

And maybe call out the alternative(s) that was discussed in v2[*]?

[*] https://lore.kernel.org/all/87illlkqfu.wl-maz@kernel.org

> Suggested-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> ---
>  arch/x86/kvm/x86.c             | 15 ++++++---------
>  include/linux/kvm_dirty_ring.h |  8 ++------
>  include/linux/kvm_host.h       |  1 +
>  virt/kvm/dirty_ring.c          | 19 ++++++++++++++++++-
>  4 files changed, 27 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b0c47b41c264..0dd0d32073e7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10260,16 +10260,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  
>  	bool req_immediate_exit = false;
>  
> -	/* Forbid vmenter if vcpu dirty ring is soft-full */
> -	if (unlikely(vcpu->kvm->dirty_ring_size &&
> -		     kvm_dirty_ring_soft_full(&vcpu->dirty_ring))) {
> -		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
> -		trace_kvm_dirty_ring_exit(vcpu);
> -		r = 0;
> -		goto out;
> -	}
> -
>  	if (kvm_request_pending(vcpu)) {
> +		/* Forbid vmenter if vcpu dirty ring is soft-full */

Eh, I'd drop the comment, pretty obvious what the code is doing

> +		if (kvm_dirty_ring_check_request(vcpu)) {

I think it makes to move this check below at KVM_REQ_VM_DEAD.  I doubt it will
ever matter in practice, but conceptually VM_DEAD is a higher priority event.

I'm pretty sure the check can be moved to the very end of the request checks,
e.g. to avoid an aborted VM-Enter attempt if one of the other request triggers
KVM_REQ_RING_SOFT_FULL.

Heh, this might actually be a bug fix of sorts.  If anything pushes to the ring
after the check at the start of vcpu_enter_guest(), then without the request, KVM
would enter the guest while at or above the soft limit, e.g. record_steal_time()
can dirty a page, and the big pile of stuff that's behind KVM_REQ_EVENT can
certainly dirty pages.

> +			r = 0;
> +			goto out;
> +		}
> +
>  		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
>  			r = -EIO;
>  			goto out;

> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -157,6 +157,7 @@ static inline bool is_error_page(struct page *page)
>  #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>  #define KVM_REQ_UNBLOCK           2
>  #define KVM_REQ_UNHALT            3

UNHALT is gone, the new request can use '3'.

> +#define KVM_REQ_RING_SOFT_FULL    4

Any objection to calling this KVM_REQ_DIRTY_RING_SOFT_FULL?  None of the users
are in danger of having too long lines, and at first glance it's not clear that
this is specifically for the dirty ring.

It'd also give us an excuse to replace spaces with tabs in the above alignment :-)

#define KVM_REQ_TLB_FLUSH		(0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_VM_DEAD			(1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_UNBLOCK			2
#define KVM_REQ_DIRTY_RING_SOFT_FULL	3
#define KVM_REQUEST_ARCH_BASE		8

> @@ -149,6 +149,7 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
>  
>  void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
>  {
> +	struct kvm_vcpu *vcpu = container_of(ring, struct kvm_vcpu, dirty_ring);
>  	struct kvm_dirty_gfn *entry;
>  
>  	/* It should never get full */
> @@ -166,6 +167,22 @@ void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
>  	kvm_dirty_gfn_set_dirtied(entry);
>  	ring->dirty_index++;
>  	trace_kvm_dirty_ring_push(ring, slot, offset);
> +
> +	if (kvm_dirty_ring_soft_full(ring))
> +		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);

Would it make sense to clear the request in kvm_dirty_ring_reset()?  I don't care
about the overhead of having to re-check the request, the goal would be to help
document what causes the request to go away.

E.g. modify kvm_dirty_ring_reset() to take @vcpu and then do:

	if (!kvm_dirty_ring_soft_full(ring))
		kvm_clear_request(KVM_REQ_RING_SOFT_FULL, vcpu);

> +}
> +
> +bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu)
> +{
> +	if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu) &&
> +		kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {

Align please,

	if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu) &&
	    kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {

> +		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);

A comment would be helpful to explain (a) why KVM needs to re-check on the next
KVM_RUN and (b) why this won't indefinitely prevent KVM from entering the guest.
For pretty every other request I can think of, re-queueing a request like this
will effectively hang the vCPU, i.e. this looks wrong at first glance.

> +		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
> +		trace_kvm_dirty_ring_exit(vcpu);
> +		return true;
> +	}
> +
> +	return false;
>  }
>  
>  struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset)
> -- 
> 2.23.0
>
Gavin Shan Oct. 21, 2022, 5:54 a.m. UTC | #2
Hi Sean,

On 10/21/22 6:42 AM, Sean Christopherson wrote:
> On Tue, Oct 11, 2022, Gavin Shan wrote:
>> This adds KVM_REQ_RING_SOFT_FULL, which is raised when the dirty
> 
> "This" is basically "This patch", which is generally frowned upon.  Just state
> what changes are being made.
> 

Ok.

>> ring of the specific VCPU becomes softly full in kvm_dirty_ring_push().
>> The VCPU is enforced to exit when the request is raised and its
>> dirty ring is softly full on its entrance.
>>
>> The event is checked and handled in the newly introduced helper
>> kvm_dirty_ring_check_request(). With this, kvm_dirty_ring_soft_full()
>> becomes a private function.
> 
> None of this captures why the request is being added.  I'm guessing Marc's
> motivation is to avoid having to check ring on every entry, though there might
> also be a correctness issue too?
> 
> It'd also be helpful to explain that KVM re-queues the request to maintain KVM's
> existing uABI, which enforces the soft_limit even if no entries have been added
> to the ring since the last KVM_EXIT_DIRTY_RING_FULL exit.
> 
> And maybe call out the alternative(s) that was discussed in v2[*]?
> 
> [*] https://lore.kernel.org/all/87illlkqfu.wl-maz@kernel.org
> 

I think Marc want to make the check more generalized with a new event [1].

[1] https://lore.kernel.org/kvmarm/87fshovtu0.wl-maz@kernel.org/

Yes, the commit log will be modified accordingly after your comments are
addressed. I will add something to explain why KVM_REQ_DIRTY_RING_SOFT_FULL
needed to re-queued, to ensure we have spare space in the dirty ring before
the VCPU becomes runnable.

>> Suggested-by: Marc Zyngier <maz@kernel.org>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> Reviewed-by: Peter Xu <peterx@redhat.com>
>> ---
>>   arch/x86/kvm/x86.c             | 15 ++++++---------
>>   include/linux/kvm_dirty_ring.h |  8 ++------
>>   include/linux/kvm_host.h       |  1 +
>>   virt/kvm/dirty_ring.c          | 19 ++++++++++++++++++-
>>   4 files changed, 27 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index b0c47b41c264..0dd0d32073e7 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -10260,16 +10260,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>   
>>   	bool req_immediate_exit = false;
>>   
>> -	/* Forbid vmenter if vcpu dirty ring is soft-full */
>> -	if (unlikely(vcpu->kvm->dirty_ring_size &&
>> -		     kvm_dirty_ring_soft_full(&vcpu->dirty_ring))) {
>> -		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
>> -		trace_kvm_dirty_ring_exit(vcpu);
>> -		r = 0;
>> -		goto out;
>> -	}
>> -
>>   	if (kvm_request_pending(vcpu)) {
>> +		/* Forbid vmenter if vcpu dirty ring is soft-full */
> 
> Eh, I'd drop the comment, pretty obvious what the code is doing
> 

Ok, It will be dropped in next revision.

>> +		if (kvm_dirty_ring_check_request(vcpu)) {
> 
> I think it makes to move this check below at KVM_REQ_VM_DEAD.  I doubt it will
> ever matter in practice, but conceptually VM_DEAD is a higher priority event.
> 
> I'm pretty sure the check can be moved to the very end of the request checks,
> e.g. to avoid an aborted VM-Enter attempt if one of the other request triggers
> KVM_REQ_RING_SOFT_FULL.
> 
> Heh, this might actually be a bug fix of sorts.  If anything pushes to the ring
> after the check at the start of vcpu_enter_guest(), then without the request, KVM
> would enter the guest while at or above the soft limit, e.g. record_steal_time()
> can dirty a page, and the big pile of stuff that's behind KVM_REQ_EVENT can
> certainly dirty pages.
> 

When dirty ring becomes full, the VCPU can't handle any operations, which will
bring more dirty pages. So lets move the check right after KVM_REQ_VM_DEAD, which
is obviously having higher priority than KVM_REQ_DIRTY_RING_SOFT_FULL.

>> +			r = 0;
>> +			goto out;
>> +		}
>> +
>>   		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
>>   			r = -EIO;
>>   			goto out;
> 
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -157,6 +157,7 @@ static inline bool is_error_page(struct page *page)
>>   #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>   #define KVM_REQ_UNBLOCK           2
>>   #define KVM_REQ_UNHALT            3
> 
> UNHALT is gone, the new request can use '3'.
> 

Yep :)

>> +#define KVM_REQ_RING_SOFT_FULL    4
> 
> Any objection to calling this KVM_REQ_DIRTY_RING_SOFT_FULL?  None of the users
> are in danger of having too long lines, and at first glance it's not clear that
> this is specifically for the dirty ring.
> 
> It'd also give us an excuse to replace spaces with tabs in the above alignment :-)
> 
> #define KVM_REQ_TLB_FLUSH		(0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> #define KVM_REQ_VM_DEAD			(1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> #define KVM_REQ_UNBLOCK			2
> #define KVM_REQ_DIRTY_RING_SOFT_FULL	3
> #define KVM_REQUEST_ARCH_BASE		8
> 

KVM_REQ_DIRTY_RING_SOFT_FULL is better. I will rename the event and
replace spaces with tabs in next revision.

>> @@ -149,6 +149,7 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
>>   
>>   void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
>>   {
>> +	struct kvm_vcpu *vcpu = container_of(ring, struct kvm_vcpu, dirty_ring);
>>   	struct kvm_dirty_gfn *entry;
>>   
>>   	/* It should never get full */
>> @@ -166,6 +167,22 @@ void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
>>   	kvm_dirty_gfn_set_dirtied(entry);
>>   	ring->dirty_index++;
>>   	trace_kvm_dirty_ring_push(ring, slot, offset);
>> +
>> +	if (kvm_dirty_ring_soft_full(ring))
>> +		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);
> 
> Would it make sense to clear the request in kvm_dirty_ring_reset()?  I don't care
> about the overhead of having to re-check the request, the goal would be to help
> document what causes the request to go away.
> 
> E.g. modify kvm_dirty_ring_reset() to take @vcpu and then do:
> 
> 	if (!kvm_dirty_ring_soft_full(ring))
> 		kvm_clear_request(KVM_REQ_RING_SOFT_FULL, vcpu);
> 

It's reasonable to clear KVM_REQ_DIRTY_RING_SOFT_FULL when the ring is reseted.
@vcpu can be achieved by container_of(..., ring).


>> +}
>> +
>> +bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu)
>> +{
>> +	if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu) &&
>> +		kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {
> 
> Align please,
> 

  Will be fixed in next revision.

> 	if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu) &&
> 	    kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {
> 
>> +		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);
> 
> A comment would be helpful to explain (a) why KVM needs to re-check on the next
> KVM_RUN and (b) why this won't indefinitely prevent KVM from entering the guest.
> For pretty every other request I can think of, re-queueing a request like this
> will effectively hang the vCPU, i.e. this looks wrong at first glance.
> 

It can indefinitely prevent the VCPU from running if the dirty pages aren't
harvested and the dirty ring is reseted by userspace. I will add something
like below to explain why we need to re-queue the event.

        /*
         * The VCPU isn't runnable when the dirty ring becomes full. The
         * KVM_REQ_DIRTY_RING_SOFT_FULL event is always set to prevent
         * the VCPU from running until the dirty pages are harvested and
         * the dirty ring is reseted by userspace.
         */


>> +		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
>> +		trace_kvm_dirty_ring_exit(vcpu);
>> +		return true;
>> +	}
>> +
>> +	return false;
>>   }
>>   
>>   struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset)
>> -- 
>> 2.23.0
>>
>
Sean Christopherson Oct. 21, 2022, 3:25 p.m. UTC | #3
On Fri, Oct 21, 2022, Gavin Shan wrote:
> I think Marc want to make the check more generalized with a new event [1].

Generalized code can be achieved with a helper though.  The motivation is indeed
to avoid overhead on every run:

  : A seemingly approach would be to make this a request on dirty log
  : insertion, and avoid the whole "check the log size" on every run,
  : which adds pointless overhead to unsuspecting users (aka everyone).


https://lore.kernel.org/kvmarm/87lerkwtm5.wl-maz@kernel.org

> > I'm pretty sure the check can be moved to the very end of the request checks,
> > e.g. to avoid an aborted VM-Enter attempt if one of the other request triggers
> > KVM_REQ_RING_SOFT_FULL.
> > 
> > Heh, this might actually be a bug fix of sorts.  If anything pushes to the ring
> > after the check at the start of vcpu_enter_guest(), then without the request, KVM
> > would enter the guest while at or above the soft limit, e.g. record_steal_time()
> > can dirty a page, and the big pile of stuff that's behind KVM_REQ_EVENT can
> > certainly dirty pages.
> > 
> 
> When dirty ring becomes full, the VCPU can't handle any operations, which will
> bring more dirty pages.

Right, but there's a buffer of 64 entries on top of what the CPU can buffer (VMX's
PML can buffer 512 entries).  Hence the "soft full".  If x86 is already on the
edge of exhausting that buffer, i.e. can fill 64 entries while handling requests,
than we need to increase the buffer provided by the soft limit because sooner or
later KVM will be able to fill 65 entries, at which point errors will occur
regardless of when the "soft full" request is processed.

In other words, we can take advantage of the fact that the soft-limit buffer needs
to be quite conservative.

> > Would it make sense to clear the request in kvm_dirty_ring_reset()?  I don't care
> > about the overhead of having to re-check the request, the goal would be to help
> > document what causes the request to go away.
> > 
> > E.g. modify kvm_dirty_ring_reset() to take @vcpu and then do:
> > 
> > 	if (!kvm_dirty_ring_soft_full(ring))
> > 		kvm_clear_request(KVM_REQ_RING_SOFT_FULL, vcpu);
> > 
> 
> It's reasonable to clear KVM_REQ_DIRTY_RING_SOFT_FULL when the ring is reseted.
> @vcpu can be achieved by container_of(..., ring).

Using container_of() is silly, there's literally one caller that does:

	kvm_for_each_vcpu(i, vcpu, kvm)
		cleared += kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring);
Gavin Shan Oct. 21, 2022, 11:03 p.m. UTC | #4
Hi Sean,

On 10/21/22 11:25 PM, Sean Christopherson wrote:
> On Fri, Oct 21, 2022, Gavin Shan wrote:
>> I think Marc want to make the check more generalized with a new event [1].
> 
> Generalized code can be achieved with a helper though.  The motivation is indeed
> to avoid overhead on every run:
> 
>    : A seemingly approach would be to make this a request on dirty log
>    : insertion, and avoid the whole "check the log size" on every run,
>    : which adds pointless overhead to unsuspecting users (aka everyone).
> 
> 
> https://lore.kernel.org/kvmarm/87lerkwtm5.wl-maz@kernel.org
> 

Ok. I would say both are the motivations. I will refer to the words in the commit
log and include the link. In that way, the motivations are cleared mentioned in
the commit log.

>>> I'm pretty sure the check can be moved to the very end of the request checks,
>>> e.g. to avoid an aborted VM-Enter attempt if one of the other request triggers
>>> KVM_REQ_RING_SOFT_FULL.
>>>
>>> Heh, this might actually be a bug fix of sorts.  If anything pushes to the ring
>>> after the check at the start of vcpu_enter_guest(), then without the request, KVM
>>> would enter the guest while at or above the soft limit, e.g. record_steal_time()
>>> can dirty a page, and the big pile of stuff that's behind KVM_REQ_EVENT can
>>> certainly dirty pages.
>>>
>>
>> When dirty ring becomes full, the VCPU can't handle any operations, which will
>> bring more dirty pages.
> 
> Right, but there's a buffer of 64 entries on top of what the CPU can buffer (VMX's
> PML can buffer 512 entries).  Hence the "soft full".  If x86 is already on the
> edge of exhausting that buffer, i.e. can fill 64 entries while handling requests,
> than we need to increase the buffer provided by the soft limit because sooner or
> later KVM will be able to fill 65 entries, at which point errors will occur
> regardless of when the "soft full" request is processed.
> 
> In other words, we can take advantage of the fact that the soft-limit buffer needs
> to be quite conservative.
> 

Right, there are extra 64 entries in the ring between soft full and hard full.
Another 512 entries are reserved when PML is enabled. However, the other requests,
who produce dirty pages, are producers to the ring. We can't just have the assumption
that those producers will need less than 64 entries. So I think KVM_REQ_DIRTY_RING_SOFT_FULL
has higher priority than other requests, except KVM_REQ_VM_DEAD. KVM_REQ_VM_DEAD
needs to be handled immediately.

>>> Would it make sense to clear the request in kvm_dirty_ring_reset()?  I don't care
>>> about the overhead of having to re-check the request, the goal would be to help
>>> document what causes the request to go away.
>>>
>>> E.g. modify kvm_dirty_ring_reset() to take @vcpu and then do:
>>>
>>> 	if (!kvm_dirty_ring_soft_full(ring))
>>> 		kvm_clear_request(KVM_REQ_RING_SOFT_FULL, vcpu);
>>>
>>
>> It's reasonable to clear KVM_REQ_DIRTY_RING_SOFT_FULL when the ring is reseted.
>> @vcpu can be achieved by container_of(..., ring).
> 
> Using container_of() is silly, there's literally one caller that does:
> 
> 	kvm_for_each_vcpu(i, vcpu, kvm)
> 		cleared += kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring);
> 

May I ask why it's silly by using container_of()? In order to avoid using
container_of(), kvm_dirty_ring_push() also need @vcpu. So lets change those
two functions to something like below. Please double-check if they looks good
to you?

   void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset);
   int kvm_dirty_ring_reset(struct kvm_vcpu *vcpu);

Thanks,
Gavin
Sean Christopherson Oct. 21, 2022, 11:48 p.m. UTC | #5
On Sat, Oct 22, 2022, Gavin Shan wrote:
> > > When dirty ring becomes full, the VCPU can't handle any operations, which will
> > > bring more dirty pages.
> > 
> > Right, but there's a buffer of 64 entries on top of what the CPU can buffer (VMX's
> > PML can buffer 512 entries).  Hence the "soft full".  If x86 is already on the
> > edge of exhausting that buffer, i.e. can fill 64 entries while handling requests,
> > than we need to increase the buffer provided by the soft limit because sooner or
> > later KVM will be able to fill 65 entries, at which point errors will occur
> > regardless of when the "soft full" request is processed.
> > 
> > In other words, we can take advantage of the fact that the soft-limit buffer needs
> > to be quite conservative.
> > 
> 
> Right, there are extra 64 entries in the ring between soft full and hard full.
> Another 512 entries are reserved when PML is enabled. However, the other requests,
> who produce dirty pages, are producers to the ring. We can't just have the assumption
> that those producers will need less than 64 entries.

But we're already assuming those producers will need less than 65 entries.  My point
is that if one (or even five) extra entries pushes KVM over the limit, then the
buffer provided by the soft limit needs to be jacked up regardless of when the
request is processed.

Hmm, but I suppose it's possible there's a pathological emulator path that can push
double digit entries, and servicing the request right away ensures that requests
have the full 64 entry buffer to play with.

So yeah, I agree, move it below the DEAD check, but keep it above most everything
else.

> > > > Would it make sense to clear the request in kvm_dirty_ring_reset()?  I don't care
> > > > about the overhead of having to re-check the request, the goal would be to help
> > > > document what causes the request to go away.
> > > > 
> > > > E.g. modify kvm_dirty_ring_reset() to take @vcpu and then do:
> > > > 
> > > > 	if (!kvm_dirty_ring_soft_full(ring))
> > > > 		kvm_clear_request(KVM_REQ_RING_SOFT_FULL, vcpu);
> > > > 
> > > 
> > > It's reasonable to clear KVM_REQ_DIRTY_RING_SOFT_FULL when the ring is reseted.
> > > @vcpu can be achieved by container_of(..., ring).
> > 
> > Using container_of() is silly, there's literally one caller that does:
> > 
> > 	kvm_for_each_vcpu(i, vcpu, kvm)
> > 		cleared += kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring);
> > 
> 
> May I ask why it's silly by using container_of()?

Because container_of() is inherently dangerous, e.g. if it's used on a pointer that
isn't contained by the expected type, the code will compile cleanly but explode
at runtime.  That's unlikely to happen in this case, e.g. doesn't look like we'll
be adding a ring to "struct kvm", but if someone wanted to add a per-VM ring,
taking the vCPU makes it very obvious that pushing to a ring _requires_ a vCPU,
and enforces that requirement at compile time.

In other words, it's preferable to avoid container_of() unless using it solves a
real problem that doesn't have a better alternative.

In these cases, passing in the vCPU is most definitely a better alternative as
each of the functions in question has a sole caller that has easy access to the
container (vCPU), i.e. it's a trivial change.

> In order to avoid using container_of(), kvm_dirty_ring_push() also need
> @vcpu.

Yep, that one should be changed too.

> So lets change those two functions to something like below. Please
> double-check if they looks good to you?
> 
>   void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset);
>   int kvm_dirty_ring_reset(struct kvm_vcpu *vcpu);

Yep, looks good.
Gavin Shan Oct. 22, 2022, 12:16 a.m. UTC | #6
Hi Sean,

On 10/22/22 7:48 AM, Sean Christopherson wrote:
> On Sat, Oct 22, 2022, Gavin Shan wrote:
>>>> When dirty ring becomes full, the VCPU can't handle any operations, which will
>>>> bring more dirty pages.
>>>
>>> Right, but there's a buffer of 64 entries on top of what the CPU can buffer (VMX's
>>> PML can buffer 512 entries).  Hence the "soft full".  If x86 is already on the
>>> edge of exhausting that buffer, i.e. can fill 64 entries while handling requests,
>>> than we need to increase the buffer provided by the soft limit because sooner or
>>> later KVM will be able to fill 65 entries, at which point errors will occur
>>> regardless of when the "soft full" request is processed.
>>>
>>> In other words, we can take advantage of the fact that the soft-limit buffer needs
>>> to be quite conservative.
>>>
>>
>> Right, there are extra 64 entries in the ring between soft full and hard full.
>> Another 512 entries are reserved when PML is enabled. However, the other requests,
>> who produce dirty pages, are producers to the ring. We can't just have the assumption
>> that those producers will need less than 64 entries.
> 
> But we're already assuming those producers will need less than 65 entries.  My point
> is that if one (or even five) extra entries pushes KVM over the limit, then the
> buffer provided by the soft limit needs to be jacked up regardless of when the
> request is processed.
> 
> Hmm, but I suppose it's possible there's a pathological emulator path that can push
> double digit entries, and servicing the request right away ensures that requests
> have the full 64 entry buffer to play with.
> 
> So yeah, I agree, move it below the DEAD check, but keep it above most everything
> else.
> 

Ok, Thanks for double confirm on this. I will move the check after READ in next
revision.

>>>>> Would it make sense to clear the request in kvm_dirty_ring_reset()?  I don't care
>>>>> about the overhead of having to re-check the request, the goal would be to help
>>>>> document what causes the request to go away.
>>>>>
>>>>> E.g. modify kvm_dirty_ring_reset() to take @vcpu and then do:
>>>>>
>>>>> 	if (!kvm_dirty_ring_soft_full(ring))
>>>>> 		kvm_clear_request(KVM_REQ_RING_SOFT_FULL, vcpu);
>>>>>
>>>>
>>>> It's reasonable to clear KVM_REQ_DIRTY_RING_SOFT_FULL when the ring is reseted.
>>>> @vcpu can be achieved by container_of(..., ring).
>>>
>>> Using container_of() is silly, there's literally one caller that does:
>>>
>>> 	kvm_for_each_vcpu(i, vcpu, kvm)
>>> 		cleared += kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring);
>>>
>>
>> May I ask why it's silly by using container_of()?
> 
> Because container_of() is inherently dangerous, e.g. if it's used on a pointer that
> isn't contained by the expected type, the code will compile cleanly but explode
> at runtime.  That's unlikely to happen in this case, e.g. doesn't look like we'll
> be adding a ring to "struct kvm", but if someone wanted to add a per-VM ring,
> taking the vCPU makes it very obvious that pushing to a ring _requires_ a vCPU,
> and enforces that requirement at compile time.
> 
> In other words, it's preferable to avoid container_of() unless using it solves a
> real problem that doesn't have a better alternative.
> 
> In these cases, passing in the vCPU is most definitely a better alternative as
> each of the functions in question has a sole caller that has easy access to the
> container (vCPU), i.e. it's a trivial change.
> 

Right, container_of() can't ensure consistence and full sanity check by itself.
It's reasonable to avoid using it if possible. Thanks for the details and
explanation.

>> In order to avoid using container_of(), kvm_dirty_ring_push() also need
>> @vcpu.
> 
> Yep, that one should be changed too.
> 

Ok.

>> So lets change those two functions to something like below. Please
>> double-check if they looks good to you?
>>
>>    void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset);
>>    int kvm_dirty_ring_reset(struct kvm_vcpu *vcpu);
> 
> Yep, looks good.
> 

Ok, Thanks for your confirm.

Thanks,
Gavin
diff mbox series

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b0c47b41c264..0dd0d32073e7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10260,16 +10260,13 @@  static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	bool req_immediate_exit = false;
 
-	/* Forbid vmenter if vcpu dirty ring is soft-full */
-	if (unlikely(vcpu->kvm->dirty_ring_size &&
-		     kvm_dirty_ring_soft_full(&vcpu->dirty_ring))) {
-		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
-		trace_kvm_dirty_ring_exit(vcpu);
-		r = 0;
-		goto out;
-	}
-
 	if (kvm_request_pending(vcpu)) {
+		/* Forbid vmenter if vcpu dirty ring is soft-full */
+		if (kvm_dirty_ring_check_request(vcpu)) {
+			r = 0;
+			goto out;
+		}
+
 		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
 			r = -EIO;
 			goto out;
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 906f899813dc..66508afa0b40 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -64,11 +64,6 @@  static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 {
 }
 
-static inline bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
-{
-	return true;
-}
-
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
 u32 kvm_dirty_ring_get_rsvd_entries(void);
@@ -86,11 +81,12 @@  int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring);
  */
 void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset);
 
+bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu);
+
 /* for use in vm_operations_struct */
 struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset);
 
 void kvm_dirty_ring_free(struct kvm_dirty_ring *ring);
-bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring);
 
 #endif /* CONFIG_HAVE_KVM_DIRTY_RING */
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f4519d3689e1..53fa3134fee0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -157,6 +157,7 @@  static inline bool is_error_page(struct page *page)
 #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_UNBLOCK           2
 #define KVM_REQ_UNHALT            3
+#define KVM_REQ_RING_SOFT_FULL    4
 #define KVM_REQUEST_ARCH_BASE     8
 
 /*
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index d6fabf238032..f68d75026bc0 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -26,7 +26,7 @@  static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
 	return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index);
 }
 
-bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
+static bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
 {
 	return kvm_dirty_ring_used(ring) >= ring->soft_limit;
 }
@@ -149,6 +149,7 @@  int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
 
 void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
 {
+	struct kvm_vcpu *vcpu = container_of(ring, struct kvm_vcpu, dirty_ring);
 	struct kvm_dirty_gfn *entry;
 
 	/* It should never get full */
@@ -166,6 +167,22 @@  void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
 	kvm_dirty_gfn_set_dirtied(entry);
 	ring->dirty_index++;
 	trace_kvm_dirty_ring_push(ring, slot, offset);
+
+	if (kvm_dirty_ring_soft_full(ring))
+		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);
+}
+
+bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu)
+{
+	if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu) &&
+		kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {
+		kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu);
+		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
+		trace_kvm_dirty_ring_exit(vcpu);
+		return true;
+	}
+
+	return false;
 }
 
 struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset)