[v7,2/4] KVM: x86: Dirty quota-based throttling of vcpus

Message ID	20221113170507.208810-3-shivam.kumar1@nutanix.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: Shivam Kumar <shivam.kumar1@nutanix.com> To: pbonzini@redhat.com, seanjc@google.com, maz@kernel.org, james.morse@arm.com, borntraeger@linux.ibm.com, david@redhat.com Cc: kvm@vger.kernel.org, Shivam Kumar <shivam.kumar1@nutanix.com>, Shaju Abraham <shaju.abraham@nutanix.com>, Manish Mishra <manish.mishra@nutanix.com>, Anurag Madnawat <anurag.madnawat@nutanix.com> Subject: [PATCH v7 2/4] KVM: x86: Dirty quota-based throttling of vcpus Date: Sun, 13 Nov 2022 17:05:08 +0000 Message-Id: <20221113170507.208810-3-shivam.kumar1@nutanix.com> In-Reply-To: <20221113170507.208810-1-shivam.kumar1@nutanix.com> References: <20221113170507.208810-1-shivam.kumar1@nutanix.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain MIME-Version: 1.0 Precedence: bulk
Series	KVM: Dirty quota-based throttling \| expand [v7,0/4] KVM: Dirty quota-based throttling [v7,1/4] KVM: Implement dirty quota-based throttling of vcpus [v7,2/4] KVM: x86: Dirty quota-based throttling of vcpus [v7,3/4] KVM: arm64: Dirty quota-based throttling of vcpus [v7,4/4] KVM: selftests: Add selftests for dirty quota throttling

Message ID

20221113170507.208810-3-shivam.kumar1@nutanix.com (mailing list archive)

State

New, archived

Headers

From: Shivam Kumar <shivam.kumar1@nutanix.com>
To: pbonzini@redhat.com, seanjc@google.com, maz@kernel.org,
        james.morse@arm.com, borntraeger@linux.ibm.com, david@redhat.com
Cc: kvm@vger.kernel.org, Shivam Kumar <shivam.kumar1@nutanix.com>,
        Shaju Abraham <shaju.abraham@nutanix.com>,
        Manish Mishra <manish.mishra@nutanix.com>,
        Anurag Madnawat <anurag.madnawat@nutanix.com>
Subject: [PATCH v7 2/4] KVM: x86: Dirty quota-based throttling of vcpus
Date: Sun, 13 Nov 2022 17:05:08 +0000
Message-Id: <20221113170507.208810-3-shivam.kumar1@nutanix.com>
In-Reply-To: <20221113170507.208810-1-shivam.kumar1@nutanix.com>
References: <20221113170507.208810-1-shivam.kumar1@nutanix.com>
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
MIME-Version: 1.0
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: 
 fOXrNP0ux2URx6dNOv98+VSZ9RbYEtFvGhzUKxlgA2H5IVsIp0epqwYydrtBKLQUxio7RUWKXZb3ZMgE2P/98acXyqOMFBMdDBd9M/O2cU8gbHkXIPL+sjTLRc5tHBo5x3zqgUfwAuLLRwPvZC0JFDWYC/JOPTGqH/lEsmiW+VWtnUNFXpr1tPrxdDGMG8Wa9xT3bJMetLKsLSPkGoEUwhpXMucbkfJanchO+6SjElT+enAJyeTmznuS4qslHPR69hA+ZQ0HSDJA0sd/m1r4Y0KZc9VzUa/pxQv7s109QVm96/j8g0VpMKXlqHyh0zBhC44DzhA85umDvnZdRl310xRDINTLivnrpezsiI1NO8+GzmqyNfkVIkTD2EBofEDwx3tFWQJ2949T25oai60Mc0t+0qn9SgB5APDg/LVpmCfHHsNUiKjwDEdodb5j7xyz+iqvRsvTTxY1+jbb+K+xUNcpKSKdvlS3XJgGwGCiIlDEHIkMkDMukBJiGAkS44YRM+YFgZkm4ZUDTsFlspnvO+axE6CJ7n0YUhDYuWxef//uwqkqetKebvBwcMd8Djq76t7RzKO/j3KwXFYZDsdtLbNkCqZWR07T3tJLX/zI3hEzAL0aAN4/0asmCxA7RcEbWeh8a8/dJ/VmYm/Xauhz8eJ9ZgBXHnGdaYTX+3GvuVwfXe48hRq462RQD6Co9MmVcDYiN4JsoVpa+SDTbDcMr3lxeZ8WTEG1epBV9UM+0d39J8SWPQiW0v0sYYHxKRFmH78IIzTc4z1PD1mBz83HZctkMq9C6iMfPcXyKz1sV9zOw02ZgMHn0auabbQxdpWwLTMCrfvNjwD46Irq57OcWV/K4iGpoOaZ621IZLHublSeEkgpjNl0HxTfzJ8koSC3j9eclFqAG+8mlcH9mI1Wu1T/ylN89kiGyzwKYvNeSRp7ZOAi6O73TrKKIp6JAmqq7pwOMoelc0d9wEqZDL6uZE8keExZ2dJip4NrfAvDFdb5O33vF/3Z3T5360mo6EBXwSqRFtVi5ScsutYX1Z7I0cHqxBqdWKG43KneBCRv/VoSA461owIk/+apb9imPDPcWAkTbmUhi8k4BR8MWZt3mkHdOIpNrjkJMlJJpkEMIjSIFo/PcA5+DLscNB/zpNz6upx87UrpbSMz3RlsOFlcQ3KvrwhgBc+uw0/05Rrp7eWXwHkf67AukbuxR/Ns9Jt56eDlCnaIPMK2+/90eRw8sL0eswJFRrVmzDUNVF9/jC0/v9CoU2McbSZ0NFYLL+dSRgIyhOLL3VYM0UUYNzeFWpzI2QyS3qTSBkOaqn+9VNucsP1MVdJyhbHZrb4ZfaSBv4LVkWJVYLgU+TIR6Hgrecf4SGfL6ePsLrH6BWKbQaOfBjT7efi3izIRL7eWZYKcIYz7ck7dYa2ZNlSNngpVXCqVTRLkgeShPs5R36gIiUqPFSNZVlnR0H2Mp6G79sscDyX79bENIPihXXt+Dsuxvp4sdbBqrm9TZDFKwMMJKA5NHOfkHR/+/mVQAcA15kO8DsG9vhy0QK2RPAX2a3figKgh5TSSmXkJyU4fxlD/djkj49qtBn4Hg+KhdUY6TRylAiqGDOTjyguka2KPFa0bpg==
X-OriginatorOrg: nutanix.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 64d7df9c-03e5-4c7c-4e65-08dac59965d6
X-MS-Exchange-CrossTenant-AuthSource: SA2PR02MB7564.namprd02.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Nov 2022 17:06:25.6651
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: bb047546-786f-4de1-bd75-24e5b6f79043
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: 
 RXj+Pa7UBN/kI1qJB3iawanGJqj/zB3crVxYwAMeqaM1GQ3y4GdPN5eMxzY2mBChnzhLH91M9ArDo+m6jVtIfJYWVQ2VS8Obi82un1AtkUc=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR02MB6723
X-Proofpoint-GUID: XmLMMn8lGLt-cYWvuFybXKsU-LU0IjJ5
X-Proofpoint-ORIG-GUID: XmLMMn8lGLt-cYWvuFybXKsU-LU0IjJ5
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1
 definitions=2022-11-13_11,2022-11-11_01,2022-06-22_01
X-Proofpoint-Spam-Reason: safe
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Series

KVM: Dirty quota-based throttling | expand

Commit Message

Shivam Kumar Nov. 13, 2022, 5:05 p.m. UTC

Exit to userspace whenever the dirty quota is exhausted (i.e. dirty count
equals/exceeds dirty quota) to request more dirty quota.

Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
---
 arch/x86/kvm/mmu/spte.c |  4 ++--
 arch/x86/kvm/vmx/vmx.c  |  3 +++
 arch/x86/kvm/x86.c      | 28 ++++++++++++++++++++++++++++
 3 files changed, 33 insertions(+), 2 deletions(-)

Comments

Yunhong Jiang Nov. 15, 2022, 12:16 a.m. UTC | #1

On Sun, Nov 13, 2022 at 05:05:08PM +0000, Shivam Kumar wrote:
> Exit to userspace whenever the dirty quota is exhausted (i.e. dirty count
> equals/exceeds dirty quota) to request more dirty quota.
> 
> Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
> ---
>  arch/x86/kvm/mmu/spte.c |  4 ++--
>  arch/x86/kvm/vmx/vmx.c  |  3 +++
>  arch/x86/kvm/x86.c      | 28 ++++++++++++++++++++++++++++
>  3 files changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index 2e08b2a45361..c0ed35abbf2d 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -228,9 +228,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>  		  "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
>  		  get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));
>  
> -	if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) {
> +	if (spte & PT_WRITABLE_MASK) {
>  		/* Enforced by kvm_mmu_hugepage_adjust. */
> -		WARN_ON(level > PG_LEVEL_4K);
> +		WARN_ON(level > PG_LEVEL_4K && kvm_slot_dirty_track_enabled(slot));
>  		mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
>  	}
>  
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 63247c57c72c..cc130999eddf 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5745,6 +5745,9 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>  		 */
>  		if (__xfer_to_guest_mode_work_pending())
>  			return 1;
> +
> +		if (kvm_test_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu))
> +			return 1;
Any reason for this check? Is this quota related to the invalid
guest state? Sorry if I missed anything here.

>  	}
>  
>  	return 1;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ecea83f0da49..1a960fbb51f4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10494,6 +10494,30 @@ void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(__kvm_request_immediate_exit);
>  
> +static inline bool kvm_check_dirty_quota_request(struct kvm_vcpu *vcpu)
> +{
> +#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
> +	struct kvm_run *run;
> +
> +	if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
> +		run = vcpu->run;
> +		run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
> +		run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
> +		run->dirty_quota_exit.quota = READ_ONCE(run->dirty_quota);
> +
> +		/*
> +		 * Re-check the quota and exit if and only if the vCPU still
> +		 * exceeds its quota.  If userspace increases (or disables
> +		 * entirely) the quota, then no exit is required as KVM is
> +		 * still honoring its ABI, e.g. userspace won't even be aware
> +		 * that KVM temporarily detected an exhausted quota.
> +		 */
> +		return run->dirty_quota_exit.count >= run->dirty_quota_exit.quota;
Would it be better to check before updating the vcpu->run?

Shivam Kumar Nov. 15, 2022, 4:55 a.m. UTC | #2

On 15/11/22 5:46 am, Yunhong Jiang wrote:
> On Sun, Nov 13, 2022 at 05:05:08PM +0000, Shivam Kumar wrote:
>> Exit to userspace whenever the dirty quota is exhausted (i.e. dirty count
>> equals/exceeds dirty quota) to request more dirty quota.
>>
>> Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
>> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
>> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
>> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
>> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
>> ---
>>   arch/x86/kvm/mmu/spte.c |  4 ++--
>>   arch/x86/kvm/vmx/vmx.c  |  3 +++
>>   arch/x86/kvm/x86.c      | 28 ++++++++++++++++++++++++++++
>>   3 files changed, 33 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
>> index 2e08b2a45361..c0ed35abbf2d 100644
>> --- a/arch/x86/kvm/mmu/spte.c
>> +++ b/arch/x86/kvm/mmu/spte.c
>> @@ -228,9 +228,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>>   		  "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
>>   		  get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));
>>   
>> -	if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) {
>> +	if (spte & PT_WRITABLE_MASK) {
>>   		/* Enforced by kvm_mmu_hugepage_adjust. */
>> -		WARN_ON(level > PG_LEVEL_4K);
>> +		WARN_ON(level > PG_LEVEL_4K && kvm_slot_dirty_track_enabled(slot));
>>   		mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
>>   	}
>>   
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 63247c57c72c..cc130999eddf 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -5745,6 +5745,9 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>>   		 */
>>   		if (__xfer_to_guest_mode_work_pending())
>>   			return 1;
>> +
>> +		if (kvm_test_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu))
>> +			return 1;
> Any reason for this check? Is this quota related to the invalid
> guest state? Sorry if I missed anything here.
Quoting Sean:
"And thinking more about silly edge cases, VMX's big emulation loop for 
invalid
guest state when unrestricted guest is disabled should probably 
explicitly check
the dirty quota.  Again, I doubt it matters to anyone's use case, but it 
is treated
as a full run loop for things like pending signals, it'd be good to be 
consistent."

Please see v4 for details. Thanks.
> 
>>   	}
>>   
>>   	return 1;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index ecea83f0da49..1a960fbb51f4 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -10494,6 +10494,30 @@ void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu)
>>   }
>>   EXPORT_SYMBOL_GPL(__kvm_request_immediate_exit);
>>   
>> +static inline bool kvm_check_dirty_quota_request(struct kvm_vcpu *vcpu)
>> +{
>> +#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
>> +	struct kvm_run *run;
>> +
>> +	if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
>> +		run = vcpu->run;
>> +		run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
>> +		run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
>> +		run->dirty_quota_exit.quota = READ_ONCE(run->dirty_quota);
>> +
>> +		/*
>> +		 * Re-check the quota and exit if and only if the vCPU still
>> +		 * exceeds its quota.  If userspace increases (or disables
>> +		 * entirely) the quota, then no exit is required as KVM is
>> +		 * still honoring its ABI, e.g. userspace won't even be aware
>> +		 * that KVM temporarily detected an exhausted quota.
>> +		 */
>> +		return run->dirty_quota_exit.count >= run->dirty_quota_exit.quota;
> Would it be better to check before updating the vcpu->run?
The reason for checking it at the last moment is to avoid invalid exits to
userspace as much as possible.


Thanks and regards,
Shivam

Yunhong Jiang Nov. 15, 2022, 6:45 a.m. UTC | #3

On Tue, Nov 15, 2022 at 10:25:31AM +0530, Shivam Kumar wrote:
> 
> 
> On 15/11/22 5:46 am, Yunhong Jiang wrote:
> > On Sun, Nov 13, 2022 at 05:05:08PM +0000, Shivam Kumar wrote:
> > > Exit to userspace whenever the dirty quota is exhausted (i.e. dirty count
> > > equals/exceeds dirty quota) to request more dirty quota.
> > > 
> > > Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
> > > Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
> > > Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> > > Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> > > Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
> > > ---
> > >   arch/x86/kvm/mmu/spte.c |  4 ++--
> > >   arch/x86/kvm/vmx/vmx.c  |  3 +++
> > >   arch/x86/kvm/x86.c      | 28 ++++++++++++++++++++++++++++
> > >   3 files changed, 33 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> > > index 2e08b2a45361..c0ed35abbf2d 100644
> > > --- a/arch/x86/kvm/mmu/spte.c
> > > +++ b/arch/x86/kvm/mmu/spte.c
> > > @@ -228,9 +228,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> > >   		  "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
> > >   		  get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));
> > > -	if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) {
> > > +	if (spte & PT_WRITABLE_MASK) {
> > >   		/* Enforced by kvm_mmu_hugepage_adjust. */
> > > -		WARN_ON(level > PG_LEVEL_4K);
> > > +		WARN_ON(level > PG_LEVEL_4K && kvm_slot_dirty_track_enabled(slot));
> > >   		mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
> > >   	}
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 63247c57c72c..cc130999eddf 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -5745,6 +5745,9 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
> > >   		 */
> > >   		if (__xfer_to_guest_mode_work_pending())
> > >   			return 1;
> > > +
> > > +		if (kvm_test_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu))
> > > +			return 1;
> > Any reason for this check? Is this quota related to the invalid
> > guest state? Sorry if I missed anything here.
> Quoting Sean:
> "And thinking more about silly edge cases, VMX's big emulation loop for
> invalid
> guest state when unrestricted guest is disabled should probably explicitly
> check
> the dirty quota.  Again, I doubt it matters to anyone's use case, but it is
> treated
> as a full run loop for things like pending signals, it'd be good to be
> consistent."
> 
> Please see v4 for details. Thanks.
Thank you for the sharing.
> > 
> > >   	}
> > >   	return 1;
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index ecea83f0da49..1a960fbb51f4 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -10494,6 +10494,30 @@ void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu)
> > >   }
> > >   EXPORT_SYMBOL_GPL(__kvm_request_immediate_exit);
> > > +static inline bool kvm_check_dirty_quota_request(struct kvm_vcpu *vcpu)
> > > +{
> > > +#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
> > > +	struct kvm_run *run;
> > > +
> > > +	if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
> > > +		run = vcpu->run;
> > > +		run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
> > > +		run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
> > > +		run->dirty_quota_exit.quota = READ_ONCE(run->dirty_quota);
> > > +
> > > +		/*
> > > +		 * Re-check the quota and exit if and only if the vCPU still
> > > +		 * exceeds its quota.  If userspace increases (or disables
> > > +		 * entirely) the quota, then no exit is required as KVM is
> > > +		 * still honoring its ABI, e.g. userspace won't even be aware
> > > +		 * that KVM temporarily detected an exhausted quota.
> > > +		 */
> > > +		return run->dirty_quota_exit.count >= run->dirty_quota_exit.quota;
> > Would it be better to check before updating the vcpu->run?
> The reason for checking it at the last moment is to avoid invalid exits to
> userspace as much as possible.

So if the userspace increases the quota, then the above vcpu->run change just
leaves some garbage information on vcpu->run and the exit_reason is
misleading. Possibly it's ok since this information will not be used anymore.

Not sure how critical is the time spent on the vcpu->run update.

Shivam Kumar Nov. 18, 2022, 8:51 a.m. UTC | #4

On 15/11/22 12:15 pm, Yunhong Jiang wrote:
> On Tue, Nov 15, 2022 at 10:25:31AM +0530, Shivam Kumar wrote:
>>
>>
>> On 15/11/22 5:46 am, Yunhong Jiang wrote:
>>> On Sun, Nov 13, 2022 at 05:05:08PM +0000, Shivam Kumar wrote:
>>>> Exit to userspace whenever the dirty quota is exhausted (i.e. dirty count
>>>> equals/exceeds dirty quota) to request more dirty quota.
>>>>
>>>> Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
>>>> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
>>>> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
>>>> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
>>>> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
>>>> ---
>>>>    arch/x86/kvm/mmu/spte.c |  4 ++--
>>>>    arch/x86/kvm/vmx/vmx.c  |  3 +++
>>>>    arch/x86/kvm/x86.c      | 28 ++++++++++++++++++++++++++++
>>>>    3 files changed, 33 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
>>>> index 2e08b2a45361..c0ed35abbf2d 100644
>>>> --- a/arch/x86/kvm/mmu/spte.c
>>>> +++ b/arch/x86/kvm/mmu/spte.c
>>>> @@ -228,9 +228,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>>>>    		  "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
>>>>    		  get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));
>>>> -	if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) {
>>>> +	if (spte & PT_WRITABLE_MASK) {
>>>>    		/* Enforced by kvm_mmu_hugepage_adjust. */
>>>> -		WARN_ON(level > PG_LEVEL_4K);
>>>> +		WARN_ON(level > PG_LEVEL_4K && kvm_slot_dirty_track_enabled(slot));
>>>>    		mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
>>>>    	}
>>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>>> index 63247c57c72c..cc130999eddf 100644
>>>> --- a/arch/x86/kvm/vmx/vmx.c
>>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>>> @@ -5745,6 +5745,9 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>>>>    		 */
>>>>    		if (__xfer_to_guest_mode_work_pending())
>>>>    			return 1;
>>>> +
>>>> +		if (kvm_test_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu))
>>>> +			return 1;
>>> Any reason for this check? Is this quota related to the invalid
>>> guest state? Sorry if I missed anything here.
>> Quoting Sean:
>> "And thinking more about silly edge cases, VMX's big emulation loop for
>> invalid
>> guest state when unrestricted guest is disabled should probably explicitly
>> check
>> the dirty quota.  Again, I doubt it matters to anyone's use case, but it is
>> treated
>> as a full run loop for things like pending signals, it'd be good to be
>> consistent."
>>
>> Please see v4 for details. Thanks.
> Thank you for the sharing.
>>>
>>>>    	}
>>>>    	return 1;
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index ecea83f0da49..1a960fbb51f4 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -10494,6 +10494,30 @@ void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu)
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(__kvm_request_immediate_exit);
>>>> +static inline bool kvm_check_dirty_quota_request(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
>>>> +	struct kvm_run *run;
>>>> +
>>>> +	if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
>>>> +		run = vcpu->run;
>>>> +		run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
>>>> +		run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
>>>> +		run->dirty_quota_exit.quota = READ_ONCE(run->dirty_quota);
>>>> +
>>>> +		/*
>>>> +		 * Re-check the quota and exit if and only if the vCPU still
>>>> +		 * exceeds its quota.  If userspace increases (or disables
>>>> +		 * entirely) the quota, then no exit is required as KVM is
>>>> +		 * still honoring its ABI, e.g. userspace won't even be aware
>>>> +		 * that KVM temporarily detected an exhausted quota.
>>>> +		 */
>>>> +		return run->dirty_quota_exit.count >= run->dirty_quota_exit.quota;
>>> Would it be better to check before updating the vcpu->run?
>> The reason for checking it at the last moment is to avoid invalid exits to
>> userspace as much as possible.
> 
> So if the userspace increases the quota, then the above vcpu->run change just
> leaves some garbage information on vcpu->run and the exit_reason is
> misleading. Possibly it's ok since this information will not be used anymore.
> 
> Not sure how critical is the time spent on the vcpu->run update.
IMO the time spent in the update might not be very significant but the 
grabage value is harmless.

Thanks,
Shivam

diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 2e08b2a45361..c0ed35abbf2d 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -228,9 +228,9 @@  bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		  "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
 		  get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));
 
-	if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) {
+	if (spte & PT_WRITABLE_MASK) {
 		/* Enforced by kvm_mmu_hugepage_adjust. */
-		WARN_ON(level > PG_LEVEL_4K);
+		WARN_ON(level > PG_LEVEL_4K && kvm_slot_dirty_track_enabled(slot));
 		mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
 	}
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 63247c57c72c..cc130999eddf 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5745,6 +5745,9 @@  static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
 		 */
 		if (__xfer_to_guest_mode_work_pending())
 			return 1;
+
+		if (kvm_test_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu))
+			return 1;
 	}
 
 	return 1;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ecea83f0da49..1a960fbb51f4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10494,6 +10494,30 @@  void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(__kvm_request_immediate_exit);
 
+static inline bool kvm_check_dirty_quota_request(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
+	struct kvm_run *run;
+
+	if (kvm_check_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu)) {
+		run = vcpu->run;
+		run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
+		run->dirty_quota_exit.count = vcpu->stat.generic.pages_dirtied;
+		run->dirty_quota_exit.quota = READ_ONCE(run->dirty_quota);
+
+		/*
+		 * Re-check the quota and exit if and only if the vCPU still
+		 * exceeds its quota.  If userspace increases (or disables
+		 * entirely) the quota, then no exit is required as KVM is
+		 * still honoring its ABI, e.g. userspace won't even be aware
+		 * that KVM temporarily detected an exhausted quota.
+		 */
+		return run->dirty_quota_exit.count >= run->dirty_quota_exit.quota;
+	}
+#endif
+	return false;
+}
+
 /*
  * Called within kvm->srcu read side.
  * Returns 1 to let vcpu_run() continue the guest execution loop without
@@ -10625,6 +10649,10 @@  static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			r = 0;
 			goto out;
 		}
+		if (kvm_check_dirty_quota_request(vcpu)) {
+			r = 0;
+			goto out;
+		}
 
 		/*
 		 * KVM_REQ_HV_STIMER has to be processed after

[v7,2/4] KVM: x86: Dirty quota-based throttling of vcpus

Commit Message

Comments

Patch