diff mbox series

kvm/debugfs: add file to get vcpu steal time statistics

Message ID 20240917112028.278005-1-den-plotnikov@yandex-team.ru (mailing list archive)
State New, archived
Headers show
Series kvm/debugfs: add file to get vcpu steal time statistics | expand

Commit Message

Denis Plotnikov Sept. 17, 2024, 11:20 a.m. UTC
It's helpful to know whether some other host activity affects a virtual
machine to estimate virtual machine quality of sevice.
The fact of virtual machine affection from the host side can be obtained
by reading "preemption_reported" counter via kvm entries of sysfs, but
the exact vcpu waiting time isn't reported to the host.
This patch adds this reporting.

Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/debugfs.c          | 17 +++++++++++++++++
 arch/x86/kvm/x86.c              |  1 +
 3 files changed, 19 insertions(+)

Comments

Sean Christopherson Sept. 22, 2024, 8:04 a.m. UTC | #1
On Tue, Sep 17, 2024, Denis Plotnikov wrote:
> It's helpful to know whether some other host activity affects a virtual
> machine to estimate virtual machine quality of sevice.
> The fact of virtual machine affection from the host side can be obtained
> by reading "preemption_reported" counter via kvm entries of sysfs, but
> the exact vcpu waiting time isn't reported to the host.
> This patch adds this reporting.
> 
> Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/debugfs.c          | 17 +++++++++++++++++

Using debugfs is undesirable, as it's (a) not ABI and (b) not guaranteed to be
present as KVM (correctly) ignores debugfs setup errors.

Using debugfs is also unnecessary.  The total steal time is available in guest
memory, and by definition that memory is shared with the host.  To query total
steal time from userspace, use MSR filtering to trap writes (and reflect writes
back into KVM) so that the GPA of the steal time structure is known, and then
simply read the actual steal time from guest memory as needed.
Denis Plotnikov Sept. 23, 2024, 9:32 a.m. UTC | #2
On 9/22/24 11:04, Sean Christopherson wrote:
> On Tue, Sep 17, 2024, Denis Plotnikov wrote:
>> It's helpful to know whether some other host activity affects a virtual
>> machine to estimate virtual machine quality of sevice.
>> The fact of virtual machine affection from the host side can be obtained
>> by reading "preemption_reported" counter via kvm entries of sysfs, but
>> the exact vcpu waiting time isn't reported to the host.
>> This patch adds this reporting.
>>
>> Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
>> ---
>>   arch/x86/include/asm/kvm_host.h |  1 +
>>   arch/x86/kvm/debugfs.c          | 17 +++++++++++++++++
> 
> Using debugfs is undesirable, as it's (a) not ABI and (b) not guaranteed to be
> present as KVM (correctly) ignores debugfs setup errors.
> 
> Using debugfs is also unnecessary.  The total steal time is available in guest
> memory, and by definition that memory is shared with the host.  To query total
> steal time from userspace, use MSR filtering to trap writes (and reflect writes
> back into KVM) so that the GPA of the steal time structure is known, and then
> simply read the actual steal time from guest memory as needed.
Thanks for the reply!
Just to clarify, by reading the actual steal time from guest memory do 
you mean by using some kind of new vcpu ioctl?


Best,
Denis
Sean Christopherson Sept. 23, 2024, 11:46 a.m. UTC | #3
On Mon, Sep 23, 2024, Denis Plotnikov wrote:
> On 9/22/24 11:04, Sean Christopherson wrote:
> > On Tue, Sep 17, 2024, Denis Plotnikov wrote:
> > > It's helpful to know whether some other host activity affects a virtual
> > > machine to estimate virtual machine quality of sevice.
> > > The fact of virtual machine affection from the host side can be obtained
> > > by reading "preemption_reported" counter via kvm entries of sysfs, but
> > > the exact vcpu waiting time isn't reported to the host.
> > > This patch adds this reporting.
> > > 
> > > Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
> > > ---
> > >   arch/x86/include/asm/kvm_host.h |  1 +
> > >   arch/x86/kvm/debugfs.c          | 17 +++++++++++++++++
> > 
> > Using debugfs is undesirable, as it's (a) not ABI and (b) not guaranteed to be
> > present as KVM (correctly) ignores debugfs setup errors.
> > 
> > Using debugfs is also unnecessary.  The total steal time is available in guest
> > memory, and by definition that memory is shared with the host.  To query total
> > steal time from userspace, use MSR filtering to trap writes (and reflect writes
> > back into KVM) so that the GPA of the steal time structure is known, and then
> > simply read the actual steal time from guest memory as needed.
> Thanks for the reply!
> Just to clarify, by reading the actual steal time from guest memory do you
> mean by using some kind of new vcpu ioctl?

No, I mean by using the host userspace VMA to read the memory.
Denis Plotnikov Sept. 30, 2024, 2:29 p.m. UTC | #4
On 9/23/24 14:46, Sean Christopherson wrote:
> On Mon, Sep 23, 2024, Denis Plotnikov wrote:
>> On 9/22/24 11:04, Sean Christopherson wrote:
>>> On Tue, Sep 17, 2024, Denis Plotnikov wrote:
>>>> It's helpful to know whether some other host activity affects a virtual
>>>> machine to estimate virtual machine quality of sevice.
>>>> The fact of virtual machine affection from the host side can be obtained
>>>> by reading "preemption_reported" counter via kvm entries of sysfs, but
>>>> the exact vcpu waiting time isn't reported to the host.
>>>> This patch adds this reporting.
>>>>
>>>> Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
>>>> ---
>>>>    arch/x86/include/asm/kvm_host.h |  1 +
>>>>    arch/x86/kvm/debugfs.c          | 17 +++++++++++++++++
>>>
>>> Using debugfs is undesirable, as it's (a) not ABI and (b) not guaranteed to be
>>> present as KVM (correctly) ignores debugfs setup errors.
>>>
>>> Using debugfs is also unnecessary.  The total steal time is available in guest
>>> memory, and by definition that memory is shared with the host.  To query total
>>> steal time from userspace, use MSR filtering to trap writes (and reflect writes
>>> back into KVM) so that the GPA of the steal time structure is known, and then
>>> simply read the actual steal time from guest memory as needed.
>> Thanks for the reply!
>> Just to clarify, by reading the actual steal time from guest memory do you
>> mean by using some kind of new vcpu ioctl?
> 
> No, I mean by using the host userspace VMA to read the memory.

Oh, I think I got your idea. You mean
using KVM_CAP_X86_MSR_FILTER which...

"In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space 
to trap and emulate MSRs ..."

And then having guest's steal time struct valid address read the value 
from userspace VMM like qemu directly.

Thanks for the answers!

Best,
Denis
Sean Christopherson Sept. 30, 2024, 3:44 p.m. UTC | #5
On Mon, Sep 30, 2024, Denis Plotnikov wrote:
> 
> 
> On 9/23/24 14:46, Sean Christopherson wrote:
> > On Mon, Sep 23, 2024, Denis Plotnikov wrote:
> > > On 9/22/24 11:04, Sean Christopherson wrote:
> > > > On Tue, Sep 17, 2024, Denis Plotnikov wrote:
> > > > > It's helpful to know whether some other host activity affects a virtual
> > > > > machine to estimate virtual machine quality of sevice.
> > > > > The fact of virtual machine affection from the host side can be obtained
> > > > > by reading "preemption_reported" counter via kvm entries of sysfs, but
> > > > > the exact vcpu waiting time isn't reported to the host.
> > > > > This patch adds this reporting.
> > > > > 
> > > > > Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
> > > > > ---
> > > > >    arch/x86/include/asm/kvm_host.h |  1 +
> > > > >    arch/x86/kvm/debugfs.c          | 17 +++++++++++++++++
> > > > 
> > > > Using debugfs is undesirable, as it's (a) not ABI and (b) not guaranteed to be
> > > > present as KVM (correctly) ignores debugfs setup errors.
> > > > 
> > > > Using debugfs is also unnecessary.  The total steal time is available in guest
> > > > memory, and by definition that memory is shared with the host.  To query total
> > > > steal time from userspace, use MSR filtering to trap writes (and reflect writes
> > > > back into KVM) so that the GPA of the steal time structure is known, and then
> > > > simply read the actual steal time from guest memory as needed.
> > > Thanks for the reply!
> > > Just to clarify, by reading the actual steal time from guest memory do you
> > > mean by using some kind of new vcpu ioctl?
> > 
> > No, I mean by using the host userspace VMA to read the memory.
> 
> Oh, I think I got your idea. You mean
> using KVM_CAP_X86_MSR_FILTER which...
> 
> "In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space to
> trap and emulate MSRs ..."
> 
> And then having guest's steal time struct valid address read the value from
> userspace VMM like qemu directly.

Yep, exactly!
Denis Plotnikov Nov. 5, 2024, 12:43 p.m. UTC | #6
> On 9/30/24 18:44, Sean Christopherson wrote:
>>> No, I mean by using the host userspace VMA to read the memory.
>>
>> Oh, I think I got your idea. You mean
>> using KVM_CAP_X86_MSR_FILTER which...
>>
>> "In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space to
>> trap and emulate MSRs ..."
>>
>> And then having guest's steal time struct valid address read the value from
>> userspace VMM like qemu directly.
> 
> Yep, exactly!

By the way, what if we add "steal time" as a kvm statistics item?

Why I think it's a good idea?
* it is available via standard KVM_GET_STATS_FD
* it doesn't introduce any overhead
* it is quite easy to add with just three lines of code
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1596,6 +1596,7 @@ struct kvm_vcpu_stat {
         u64 preemption_other;
         u64 guest_mode;
         u64 notify_window_exits;
+       u64 steal_time;
  };

  struct x86_instruction_info;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 83fe0a78146fc..cd771aef1558a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -291,6 +291,7 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
         STATS_DESC_COUNTER(VCPU, preemption_other),
         STATS_DESC_IBOOLEAN(VCPU, guest_mode),
         STATS_DESC_COUNTER(VCPU, notify_window_exits),
+       STATS_DESC_TIME_NSEC(VCPU, steal_time),
  };

  const struct kvm_stats_header kvm_vcpu_stats_header = {
@@ -3763,6 +3764,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
         version += 1;
         unsafe_put_user(version, &st->version, out);

+       vcpu->stat.steal_time = steal;

The disadvantage of this approach is that it adds some kind of data 
duplication but it doesn't seem to be a problem - using shadowing and 
caching are common practices.

My concern about intercepting steal time MSR in user space is 
overcomplication - we need to add significant amount of userspace code 
to achieve what we can get in much easier and, in my opinion, cleaner 
way. I think it's a cleaner way because every userspace app (like QEMU) 
will get steal time without any modification via means provided by kvm. 
For example, QEMU will be able to get steal time via qmp with 
"query-stats" command which returns every statistics item provided by 
KVM_GET_STATS_FD.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4a68cb3eba78f..3d4bd3ca83593 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -899,6 +899,7 @@  struct kvm_vcpu_arch {
 		u64 msr_val;
 		u64 last_steal;
 		struct gfn_to_hva_cache cache;
+		u64 steal_total;
 	} st;
 
 	u64 l1_tsc_offset;
diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c
index 999227fc7c665..e67136954c095 100644
--- a/arch/x86/kvm/debugfs.c
+++ b/arch/x86/kvm/debugfs.c
@@ -11,6 +11,7 @@ 
 #include "lapic.h"
 #include "mmu.h"
 #include "mmu/mmu_internal.h"
+#include "cpuid.h"
 
 static int vcpu_get_timer_advance_ns(void *data, u64 *val)
 {
@@ -56,6 +57,19 @@  static int vcpu_get_tsc_scaling_frac_bits(void *data, u64 *val)
 
 DEFINE_SIMPLE_ATTRIBUTE(vcpu_tsc_scaling_frac_fops, vcpu_get_tsc_scaling_frac_bits, NULL, "%llu\n");
 
+static int vcpu_get_steal_time(void *data, u64 *val)
+{
+	struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data;
+
+	if (!guest_pv_has(vcpu, KVM_FEATURE_STEAL_TIME))
+		return 1;
+
+	*val = vcpu->arch.st.steal_total;
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(vcpu_steal_time_fops, vcpu_get_steal_time, NULL, "%llu\n");
+
 void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, struct dentry *debugfs_dentry)
 {
 	debugfs_create_file("guest_mode", 0444, debugfs_dentry, vcpu,
@@ -76,6 +90,9 @@  void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, struct dentry *debugfs_
 				    debugfs_dentry, vcpu,
 				    &vcpu_tsc_scaling_frac_fops);
 	}
+
+	debugfs_create_file("steal-time", 0444, debugfs_dentry, vcpu,
+			    &vcpu_steal_time_fops);
 }
 
 /*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 70219e4069874..ca5f21b930d4d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3735,6 +3735,7 @@  static void record_steal_time(struct kvm_vcpu *vcpu)
 	steal += current->sched_info.run_delay -
 		vcpu->arch.st.last_steal;
 	vcpu->arch.st.last_steal = current->sched_info.run_delay;
+	vcpu->arch.st.steal_total = steal;
 	unsafe_put_user(steal, &st->steal, out);
 
 	version += 1;