kvm/debugfs: add file to get vcpu steal time statistics

Message ID	20240917112028.278005-1-den-plotnikov@yandex-team.ru (mailing list archive)
State	New, archived
Headers	show Received: from forwardcorp1a.mail.yandex.net (forwardcorp1a.mail.yandex.net [178.154.239.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5B1CD1E4A6; Tue, 17 Sep 2024 11:22:40 +0000 (UTC) Precedence: bulk From: Denis Plotnikov <den-plotnikov@yandex-team.ru> To: kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, yc-core@yandex-team.ru, linux-kernel@vger.kernel.org Subject: [PATCH] kvm/debugfs: add file to get vcpu steal time statistics Date: Tue, 17 Sep 2024 14:20:28 +0300 Message-Id: <20240917112028.278005-1-den-plotnikov@yandex-team.ru> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	kvm/debugfs: add file to get vcpu steal time statistics \| expand kvm/debugfs: add file to get vcpu steal time statistics

Denis Plotnikov Sept. 17, 2024, 11:20 a.m. UTC

It's helpful to know whether some other host activity affects a virtual
machine to estimate virtual machine quality of sevice.
The fact of virtual machine affection from the host side can be obtained
by reading "preemption_reported" counter via kvm entries of sysfs, but
the exact vcpu waiting time isn't reported to the host.
This patch adds this reporting.

Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/debugfs.c          | 17 +++++++++++++++++
 arch/x86/kvm/x86.c              |  1 +
 3 files changed, 19 insertions(+)

Sean Christopherson Sept. 22, 2024, 8:04 a.m. UTC | #1

On Tue, Sep 17, 2024, Denis Plotnikov wrote:
> It's helpful to know whether some other host activity affects a virtual
> machine to estimate virtual machine quality of sevice.
> The fact of virtual machine affection from the host side can be obtained
> by reading "preemption_reported" counter via kvm entries of sysfs, but
> the exact vcpu waiting time isn't reported to the host.
> This patch adds this reporting.
> 
> Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/debugfs.c          | 17 +++++++++++++++++

Using debugfs is undesirable, as it's (a) not ABI and (b) not guaranteed to be
present as KVM (correctly) ignores debugfs setup errors.

Using debugfs is also unnecessary.  The total steal time is available in guest
memory, and by definition that memory is shared with the host.  To query total
steal time from userspace, use MSR filtering to trap writes (and reflect writes
back into KVM) so that the GPA of the steal time structure is known, and then
simply read the actual steal time from guest memory as needed.

Denis Plotnikov Sept. 23, 2024, 9:32 a.m. UTC | #2

On 9/22/24 11:04, Sean Christopherson wrote:
> On Tue, Sep 17, 2024, Denis Plotnikov wrote:
>> It's helpful to know whether some other host activity affects a virtual
>> machine to estimate virtual machine quality of sevice.
>> The fact of virtual machine affection from the host side can be obtained
>> by reading "preemption_reported" counter via kvm entries of sysfs, but
>> the exact vcpu waiting time isn't reported to the host.
>> This patch adds this reporting.
>>
>> Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
>> ---
>>   arch/x86/include/asm/kvm_host.h |  1 +
>>   arch/x86/kvm/debugfs.c          | 17 +++++++++++++++++
> 
> Using debugfs is undesirable, as it's (a) not ABI and (b) not guaranteed to be
> present as KVM (correctly) ignores debugfs setup errors.
> 
> Using debugfs is also unnecessary.  The total steal time is available in guest
> memory, and by definition that memory is shared with the host.  To query total
> steal time from userspace, use MSR filtering to trap writes (and reflect writes
> back into KVM) so that the GPA of the steal time structure is known, and then
> simply read the actual steal time from guest memory as needed.
Thanks for the reply!
Just to clarify, by reading the actual steal time from guest memory do 
you mean by using some kind of new vcpu ioctl?


Best,
Denis

Sean Christopherson Sept. 23, 2024, 11:46 a.m. UTC | #3

On Mon, Sep 23, 2024, Denis Plotnikov wrote:
> On 9/22/24 11:04, Sean Christopherson wrote:
> > On Tue, Sep 17, 2024, Denis Plotnikov wrote:
> > > It's helpful to know whether some other host activity affects a virtual
> > > machine to estimate virtual machine quality of sevice.
> > > The fact of virtual machine affection from the host side can be obtained
> > > by reading "preemption_reported" counter via kvm entries of sysfs, but
> > > the exact vcpu waiting time isn't reported to the host.
> > > This patch adds this reporting.
> > > 
> > > Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
> > > ---
> > >   arch/x86/include/asm/kvm_host.h |  1 +
> > >   arch/x86/kvm/debugfs.c          | 17 +++++++++++++++++
> > 
> > Using debugfs is undesirable, as it's (a) not ABI and (b) not guaranteed to be
> > present as KVM (correctly) ignores debugfs setup errors.
> > 
> > Using debugfs is also unnecessary.  The total steal time is available in guest
> > memory, and by definition that memory is shared with the host.  To query total
> > steal time from userspace, use MSR filtering to trap writes (and reflect writes
> > back into KVM) so that the GPA of the steal time structure is known, and then
> > simply read the actual steal time from guest memory as needed.
> Thanks for the reply!
> Just to clarify, by reading the actual steal time from guest memory do you
> mean by using some kind of new vcpu ioctl?

No, I mean by using the host userspace VMA to read the memory.

Denis Plotnikov Sept. 30, 2024, 2:29 p.m. UTC | #4

On 9/23/24 14:46, Sean Christopherson wrote:
> On Mon, Sep 23, 2024, Denis Plotnikov wrote:
>> On 9/22/24 11:04, Sean Christopherson wrote:
>>> On Tue, Sep 17, 2024, Denis Plotnikov wrote:
>>>> It's helpful to know whether some other host activity affects a virtual
>>>> machine to estimate virtual machine quality of sevice.
>>>> The fact of virtual machine affection from the host side can be obtained
>>>> by reading "preemption_reported" counter via kvm entries of sysfs, but
>>>> the exact vcpu waiting time isn't reported to the host.
>>>> This patch adds this reporting.
>>>>
>>>> Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
>>>> ---
>>>>    arch/x86/include/asm/kvm_host.h |  1 +
>>>>    arch/x86/kvm/debugfs.c          | 17 +++++++++++++++++
>>>
>>> Using debugfs is undesirable, as it's (a) not ABI and (b) not guaranteed to be
>>> present as KVM (correctly) ignores debugfs setup errors.
>>>
>>> Using debugfs is also unnecessary.  The total steal time is available in guest
>>> memory, and by definition that memory is shared with the host.  To query total
>>> steal time from userspace, use MSR filtering to trap writes (and reflect writes
>>> back into KVM) so that the GPA of the steal time structure is known, and then
>>> simply read the actual steal time from guest memory as needed.
>> Thanks for the reply!
>> Just to clarify, by reading the actual steal time from guest memory do you
>> mean by using some kind of new vcpu ioctl?
> 
> No, I mean by using the host userspace VMA to read the memory.

Oh, I think I got your idea. You mean
using KVM_CAP_X86_MSR_FILTER which...

"In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space 
to trap and emulate MSRs ..."

And then having guest's steal time struct valid address read the value 
from userspace VMM like qemu directly.

Thanks for the answers!

Best,
Denis

Sean Christopherson Sept. 30, 2024, 3:44 p.m. UTC | #5

On Mon, Sep 30, 2024, Denis Plotnikov wrote:
> 
> 
> On 9/23/24 14:46, Sean Christopherson wrote:
> > On Mon, Sep 23, 2024, Denis Plotnikov wrote:
> > > On 9/22/24 11:04, Sean Christopherson wrote:
> > > > On Tue, Sep 17, 2024, Denis Plotnikov wrote:
> > > > > It's helpful to know whether some other host activity affects a virtual
> > > > > machine to estimate virtual machine quality of sevice.
> > > > > The fact of virtual machine affection from the host side can be obtained
> > > > > by reading "preemption_reported" counter via kvm entries of sysfs, but
> > > > > the exact vcpu waiting time isn't reported to the host.
> > > > > This patch adds this reporting.
> > > > > 
> > > > > Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
> > > > > ---
> > > > >    arch/x86/include/asm/kvm_host.h |  1 +
> > > > >    arch/x86/kvm/debugfs.c          | 17 +++++++++++++++++
> > > > 
> > > > Using debugfs is undesirable, as it's (a) not ABI and (b) not guaranteed to be
> > > > present as KVM (correctly) ignores debugfs setup errors.
> > > > 
> > > > Using debugfs is also unnecessary.  The total steal time is available in guest
> > > > memory, and by definition that memory is shared with the host.  To query total
> > > > steal time from userspace, use MSR filtering to trap writes (and reflect writes
> > > > back into KVM) so that the GPA of the steal time structure is known, and then
> > > > simply read the actual steal time from guest memory as needed.
> > > Thanks for the reply!
> > > Just to clarify, by reading the actual steal time from guest memory do you
> > > mean by using some kind of new vcpu ioctl?
> > 
> > No, I mean by using the host userspace VMA to read the memory.
> 
> Oh, I think I got your idea. You mean
> using KVM_CAP_X86_MSR_FILTER which...
> 
> "In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space to
> trap and emulate MSRs ..."
> 
> And then having guest's steal time struct valid address read the value from
> userspace VMM like qemu directly.

Yep, exactly!

Denis Plotnikov Nov. 5, 2024, 12:43 p.m. UTC | #6

> On 9/30/24 18:44, Sean Christopherson wrote:
>>> No, I mean by using the host userspace VMA to read the memory.
>>
>> Oh, I think I got your idea. You mean
>> using KVM_CAP_X86_MSR_FILTER which...
>>
>> "In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space to
>> trap and emulate MSRs ..."
>>
>> And then having guest's steal time struct valid address read the value from
>> userspace VMM like qemu directly.
> 
> Yep, exactly!

By the way, what if we add "steal time" as a kvm statistics item?

Why I think it's a good idea?
* it is available via standard KVM_GET_STATS_FD
* it doesn't introduce any overhead
* it is quite easy to add with just three lines of code
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1596,6 +1596,7 @@ struct kvm_vcpu_stat {
         u64 preemption_other;
         u64 guest_mode;
         u64 notify_window_exits;
+       u64 steal_time;
  };

  struct x86_instruction_info;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 83fe0a78146fc..cd771aef1558a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -291,6 +291,7 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
         STATS_DESC_COUNTER(VCPU, preemption_other),
         STATS_DESC_IBOOLEAN(VCPU, guest_mode),
         STATS_DESC_COUNTER(VCPU, notify_window_exits),
+       STATS_DESC_TIME_NSEC(VCPU, steal_time),
  };

  const struct kvm_stats_header kvm_vcpu_stats_header = {
@@ -3763,6 +3764,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
         version += 1;
         unsafe_put_user(version, &st->version, out);

+       vcpu->stat.steal_time = steal;

The disadvantage of this approach is that it adds some kind of data 
duplication but it doesn't seem to be a problem - using shadowing and 
caching are common practices.

My concern about intercepting steal time MSR in user space is 
overcomplication - we need to add significant amount of userspace code 
to achieve what we can get in much easier and, in my opinion, cleaner 
way. I think it's a cleaner way because every userspace app (like QEMU) 
will get steal time without any modification via means provided by kvm. 
For example, QEMU will be able to get steal time via qmp with 
"query-stats" command which returns every statistics item provided by 
KVM_GET_STATS_FD.

kvm/debugfs: add file to get vcpu steal time statistics

Commit Message

Comments

Patch