[v3,1/2] kvm: support -dedicated cpu-pm=on|off
diff mbox

Message ID 20180615222855.44421-2-mst@redhat.com
State New
Headers show

Commit Message

Michael S. Tsirkin June 15, 2018, 10:29 p.m. UTC
With this flag, kvm allows guest to control host CPU power state.  This
increases latency for other processes using same host CPU in an
unpredictable way, but if decreases idle entry/exit times for the
running VCPU, so it works best if you use a dedicated host cpu,
hence the name.

Follow-up patches will expose this capability to guest
(using mwait leaf).

Based on a patch by Wanpeng Li <kernellwp@gmail.com> .

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/sysemu/sysemu.h |  1 +
 target/i386/kvm.c       | 23 +++++++++++++++++++++++
 vl.c                    | 32 +++++++++++++++++++++++++++++++-
 qemu-options.hx         | 18 ++++++++++++++++++
 4 files changed, 73 insertions(+), 1 deletion(-)

Comments

Paolo Bonzini June 19, 2018, 3:17 p.m. UTC | #1
On 16/06/2018 00:29, Michael S. Tsirkin wrote:
>  
> +static QemuOptsList qemu_dedicated_opts = {
> +    .name = "dedicated",
> +    .head = QTAILQ_HEAD_INITIALIZER(qemu_dedicated_opts.head),
> +    .desc = {
> +        {
> +            .name = "mem-lock",
> +            .type = QEMU_OPT_BOOL,
> +        },
> +        {
> +            .name = "cpu-pm",
> +            .type = QEMU_OPT_BOOL,
> +        },
> +        { /* end of list */ }
> +    },
> +};
> +

Let the bikeshedding begin!

1) Should we deprecate -realtime?

2) Maybe -hostresource?

Paolo
Michael S. Tsirkin June 19, 2018, 8:43 p.m. UTC | #2
On Tue, Jun 19, 2018 at 05:17:45PM +0200, Paolo Bonzini wrote:
> On 16/06/2018 00:29, Michael S. Tsirkin wrote:
> >  
> > +static QemuOptsList qemu_dedicated_opts = {
> > +    .name = "dedicated",
> > +    .head = QTAILQ_HEAD_INITIALIZER(qemu_dedicated_opts.head),
> > +    .desc = {
> > +        {
> > +            .name = "mem-lock",
> > +            .type = QEMU_OPT_BOOL,
> > +        },
> > +        {
> > +            .name = "cpu-pm",
> > +            .type = QEMU_OPT_BOOL,
> > +        },
> > +        { /* end of list */ }
> > +    },
> > +};
> > +
> 
> Let the bikeshedding begin!
> 
> 1) Should we deprecate -realtime?

Can be a patch on top, by whoever cares.

> 2) Maybe -hostresource?
> 
> Paolo

Is ability to cause high latency for other threads really a resource?

The issues in question:
1. a malicious guest can cause high latency for others sharing the host cpu.
2. to host scheduler cpu looks busier than it really is.

All are avoided if you use a dedicated host cpu, and 2 will
help scheduler get closer to giving you one.
Eric Blake June 19, 2018, 10:07 p.m. UTC | #3
On 06/19/2018 10:17 AM, Paolo Bonzini wrote:
> On 16/06/2018 00:29, Michael S. Tsirkin wrote:
>>   
>> +static QemuOptsList qemu_dedicated_opts = {
>> +    .name = "dedicated",
>> +    .head = QTAILQ_HEAD_INITIALIZER(qemu_dedicated_opts.head),
>> +    .desc = {
>> +        {
>> +            .name = "mem-lock",
>> +            .type = QEMU_OPT_BOOL,
>> +        },
>> +        {
>> +            .name = "cpu-pm",
>> +            .type = QEMU_OPT_BOOL,
>> +        },
>> +        { /* end of list */ }
>> +    },
>> +};
>> +
> 
> Let the bikeshedding begin!
> 
> 1) Should we deprecate -realtime?
> 
> 2) Maybe -hostresource?

What further things might we add in the future?

-dedicated sounds wrong (it is an adjective, while most of our options 
are nouns - thing -machine, -drive, -object, ...)

-hostresource at least sounds like a noun, but is long to type.  But at 
least '-hostresource cpu-pm=on' reads reasonably well.

About the only other noun I could think of would be '-feature cpu-pm=on'.
Michael S. Tsirkin June 20, 2018, 12:06 a.m. UTC | #4
On Tue, Jun 19, 2018 at 05:07:46PM -0500, Eric Blake wrote:
> On 06/19/2018 10:17 AM, Paolo Bonzini wrote:
> > On 16/06/2018 00:29, Michael S. Tsirkin wrote:
> > > +static QemuOptsList qemu_dedicated_opts = {
> > > +    .name = "dedicated",
> > > +    .head = QTAILQ_HEAD_INITIALIZER(qemu_dedicated_opts.head),
> > > +    .desc = {
> > > +        {
> > > +            .name = "mem-lock",
> > > +            .type = QEMU_OPT_BOOL,
> > > +        },
> > > +        {
> > > +            .name = "cpu-pm",
> > > +            .type = QEMU_OPT_BOOL,
> > > +        },
> > > +        { /* end of list */ }
> > > +    },
> > > +};
> > > +
> > 
> > Let the bikeshedding begin!
> > 
> > 1) Should we deprecate -realtime?
> > 
> > 2) Maybe -hostresource?
> 
> What further things might we add in the future?
> 
> -dedicated sounds wrong (it is an adjective, while most of our options are
> nouns - thing -machine, -drive, -object, ...)
> 
> -hostresource at least sounds like a noun, but is long to type.  But at
> least '-hostresource cpu-pm=on' reads reasonably well.

Yes but host resource what? I feel it says nothing at all about what
one can expect to find in this flag.

> About the only other noun I could think of would be '-feature cpu-pm=on'.

If we have nothing at all to say about what is grouping these things,
we don't need a new flag. We can make it a machine property.

It's user's hint that some host resource is dedicated to a VM.


> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
Wanpeng Li June 20, 2018, 12:46 a.m. UTC | #5
On Wed, 20 Jun 2018 at 08:07, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jun 19, 2018 at 05:07:46PM -0500, Eric Blake wrote:
> > On 06/19/2018 10:17 AM, Paolo Bonzini wrote:
> > > On 16/06/2018 00:29, Michael S. Tsirkin wrote:
> > > > +static QemuOptsList qemu_dedicated_opts = {
> > > > +    .name = "dedicated",
> > > > +    .head = QTAILQ_HEAD_INITIALIZER(qemu_dedicated_opts.head),
> > > > +    .desc = {
> > > > +        {
> > > > +            .name = "mem-lock",
> > > > +            .type = QEMU_OPT_BOOL,
> > > > +        },
> > > > +        {
> > > > +            .name = "cpu-pm",
> > > > +            .type = QEMU_OPT_BOOL,
> > > > +        },
> > > > +        { /* end of list */ }
> > > > +    },
> > > > +};
> > > > +
> > >
> > > Let the bikeshedding begin!
> > >
> > > 1) Should we deprecate -realtime?
> > >
> > > 2) Maybe -hostresource?
> >
> > What further things might we add in the future?
> >
> > -dedicated sounds wrong (it is an adjective, while most of our options are
> > nouns - thing -machine, -drive, -object, ...)
> >
> > -hostresource at least sounds like a noun, but is long to type.  But at
> > least '-hostresource cpu-pm=on' reads reasonably well.
>
> Yes but host resource what? I feel it says nothing at all about what
> one can expect to find in this flag.
>
> > About the only other noun I could think of would be '-feature cpu-pm=on'.
>
> If we have nothing at all to say about what is grouping these things,
> we don't need a new flag. We can make it a machine property.
>
> It's user's hint that some host resource is dedicated to a VM.

The commit 633711e82 (kvm: rename KVM_HINTS_DEDICATED to
KVM_HINTS_REALTIME) should be reverted according to several threads
discussion I think.

Regards,
Wanpeng Li
Michael S. Tsirkin June 20, 2018, 2:41 a.m. UTC | #6
On Wed, Jun 20, 2018 at 08:46:10AM +0800, Wanpeng Li wrote:
> On Wed, 20 Jun 2018 at 08:07, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Jun 19, 2018 at 05:07:46PM -0500, Eric Blake wrote:
> > > On 06/19/2018 10:17 AM, Paolo Bonzini wrote:
> > > > On 16/06/2018 00:29, Michael S. Tsirkin wrote:
> > > > > +static QemuOptsList qemu_dedicated_opts = {
> > > > > +    .name = "dedicated",
> > > > > +    .head = QTAILQ_HEAD_INITIALIZER(qemu_dedicated_opts.head),
> > > > > +    .desc = {
> > > > > +        {
> > > > > +            .name = "mem-lock",
> > > > > +            .type = QEMU_OPT_BOOL,
> > > > > +        },
> > > > > +        {
> > > > > +            .name = "cpu-pm",
> > > > > +            .type = QEMU_OPT_BOOL,
> > > > > +        },
> > > > > +        { /* end of list */ }
> > > > > +    },
> > > > > +};
> > > > > +
> > > >
> > > > Let the bikeshedding begin!
> > > >
> > > > 1) Should we deprecate -realtime?
> > > >
> > > > 2) Maybe -hostresource?
> > >
> > > What further things might we add in the future?
> > >
> > > -dedicated sounds wrong (it is an adjective, while most of our options are
> > > nouns - thing -machine, -drive, -object, ...)
> > >
> > > -hostresource at least sounds like a noun, but is long to type.  But at
> > > least '-hostresource cpu-pm=on' reads reasonably well.
> >
> > Yes but host resource what? I feel it says nothing at all about what
> > one can expect to find in this flag.
> >
> > > About the only other noun I could think of would be '-feature cpu-pm=on'.
> >
> > If we have nothing at all to say about what is grouping these things,
> > we don't need a new flag. We can make it a machine property.
> >
> > It's user's hint that some host resource is dedicated to a VM.
> 
> The commit 633711e82 (kvm: rename KVM_HINTS_DEDICATED to
> KVM_HINTS_REALTIME) should be reverted according to several threads
> discussion I think.
> 
> Regards,
> Wanpeng Li

IMHO that is unrelated - these KVM hints are hints to *guest*.

In this thread we are talking about hints to QEMU that are only
necessary because QEMU is separate from the host scheduler/memory
management.
Paolo Bonzini June 20, 2018, 2:20 p.m. UTC | #7
On 19/06/2018 22:43, Michael S. Tsirkin wrote:
> 
>> 2) Maybe -hostresource?
>
> Is ability to cause high latency for other threads really a resource?

The "resource" here is host CPU time.  In general, a vCPU with
KVM_CPU_X86_DISABLE_EXITS will use more host CPU time and block
overcommitting, just like mlock does for memory.

Paolo

> The issues in question:
> 1. a malicious guest can cause high latency for others sharing the host cpu.
> 2. to host scheduler cpu looks busier than it really is.
Michael S. Tsirkin June 20, 2018, 2:29 p.m. UTC | #8
On Wed, Jun 20, 2018 at 04:20:40PM +0200, Paolo Bonzini wrote:
> On 19/06/2018 22:43, Michael S. Tsirkin wrote:
> > 
> >> 2) Maybe -hostresource?
> >
> > Is ability to cause high latency for other threads really a resource?
> 
> The "resource" here is host CPU time.

Right but then everything we do is a host resource in that sense.
Host network, host disk ...

> In general, a vCPU with
> KVM_CPU_X86_DISABLE_EXITS will use more host CPU time and block
> overcommitting, just like mlock does for memory.

What bothers me is that it does not block overcommit as such.
It has a side effect that if something does end up
running on the same CPU, that something will get bad
latency jitter. 

> 
> Paolo

I agree there's similarity here around overcommit.

That's why I suggested -dedicated as an antonym to -overcommit.

But I'm fine with -disable-overcommit or -dedicated-host-resource too.

Or, how about

-locked

?


> > The issues in question:
> > 1. a malicious guest can cause high latency for others sharing the host cpu.
> > 2. to host scheduler cpu looks busier than it really is.
Paolo Bonzini June 20, 2018, 2:45 p.m. UTC | #9
On 20/06/2018 16:29, Michael S. Tsirkin wrote:
> On Wed, Jun 20, 2018 at 04:20:40PM +0200, Paolo Bonzini wrote:
>> On 19/06/2018 22:43, Michael S. Tsirkin wrote:
>>>
>>>> 2) Maybe -hostresource?
>>>
>>> Is ability to cause high latency for other threads really a resource?
>>
>> The "resource" here is host CPU time.
> 
> Right but then everything we do is a host resource in that sense.
> Host network, host disk ...

Yes of course.  These options control how (and how much) QEMU uses those
resources.

>> In general, a vCPU with
>> KVM_CPU_X86_DISABLE_EXITS will use more host CPU time and block
>> overcommitting, just like mlock does for memory.
> 
> What bothers me is that it does not block overcommit as such.
> It has a side effect that if something does end up
> running on the same CPU, that something will get bad
> latency jitter. 
>
> I agree there's similarity here around overcommit.
> 
> That's why I suggested -dedicated as an antonym to -overcommit.
> 
> But I'm fine with -disable-overcommit or -dedicated-host-resource too.

Both of those are quite a mouthful.  I somewhat prefer "-overcommit" to
"-dedicated", though "-hostresource" it's still my favorite mostly
because it's the most future-proof.

Paolo
Wanpeng Li July 5, 2018, 5:52 a.m. UTC | #10
On Wed, 20 Jun 2018 at 10:41, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jun 20, 2018 at 08:46:10AM +0800, Wanpeng Li wrote:
> > On Wed, 20 Jun 2018 at 08:07, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Jun 19, 2018 at 05:07:46PM -0500, Eric Blake wrote:
> > > > On 06/19/2018 10:17 AM, Paolo Bonzini wrote:
> > > > > On 16/06/2018 00:29, Michael S. Tsirkin wrote:
> > > > > > +static QemuOptsList qemu_dedicated_opts = {
> > > > > > +    .name = "dedicated",
> > > > > > +    .head = QTAILQ_HEAD_INITIALIZER(qemu_dedicated_opts.head),
> > > > > > +    .desc = {
> > > > > > +        {
> > > > > > +            .name = "mem-lock",
> > > > > > +            .type = QEMU_OPT_BOOL,
> > > > > > +        },
> > > > > > +        {
> > > > > > +            .name = "cpu-pm",
> > > > > > +            .type = QEMU_OPT_BOOL,
> > > > > > +        },
> > > > > > +        { /* end of list */ }
> > > > > > +    },
> > > > > > +};
> > > > > > +
> > > > >
> > > > > Let the bikeshedding begin!
> > > > >
> > > > > 1) Should we deprecate -realtime?
> > > > >
> > > > > 2) Maybe -hostresource?
> > > >
> > > > What further things might we add in the future?
> > > >
> > > > -dedicated sounds wrong (it is an adjective, while most of our options are
> > > > nouns - thing -machine, -drive, -object, ...)
> > > >
> > > > -hostresource at least sounds like a noun, but is long to type.  But at
> > > > least '-hostresource cpu-pm=on' reads reasonably well.
> > >
> > > Yes but host resource what? I feel it says nothing at all about what
> > > one can expect to find in this flag.
> > >
> > > > About the only other noun I could think of would be '-feature cpu-pm=on'.
> > >
> > > If we have nothing at all to say about what is grouping these things,
> > > we don't need a new flag. We can make it a machine property.
> > >
> > > It's user's hint that some host resource is dedicated to a VM.
> >
> > The commit 633711e82 (kvm: rename KVM_HINTS_DEDICATED to
> > KVM_HINTS_REALTIME) should be reverted according to several threads
> > discussion I think.
> >
> > Regards,
> > Wanpeng Li
>
> IMHO that is unrelated - these KVM hints are hints to *guest*.

Actually I really don't like the KVM_HINT_REALTIME renaming, there are
dedicated instances in public cloud environment consider security or
performance. The financial customers may prefer dedicated pCPUs when
considering security, and other gaming customers may prefer dedicated
pCPUs when considering performance. So "realtime" is not suitable.

Regards,
Wanpeng Li

Patch
diff mbox

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index e893f72f3b..b921c6f3b7 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -128,6 +128,7 @@  extern bool boot_strict;
 extern uint8_t *boot_splash_filedata;
 extern size_t boot_splash_filedata_size;
 extern bool enable_mlock;
+extern bool enable_cpu_pm;
 extern uint8_t qemu_extra_params_fw[2];
 extern QEMUClockType rtc_clock;
 extern const char *mem_path;
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 44f70733e7..cf9107be4b 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -1357,6 +1357,29 @@  int kvm_arch_init(MachineState *ms, KVMState *s)
         smram_machine_done.notify = register_smram_listener;
         qemu_add_machine_init_done_notifier(&smram_machine_done);
     }
+
+    if (enable_cpu_pm) {
+        int disable_exits = kvm_check_extension(s, KVM_CAP_X86_DISABLE_EXITS);
+        int ret;
+
+/* Work around for kernel header with a typo. TODO: fix header and drop. */
+#if defined(KVM_X86_DISABLE_EXITS_HTL) && !defined(KVM_X86_DISABLE_EXITS_HLT)
+#define KVM_X86_DISABLE_EXITS_HLT KVM_X86_DISABLE_EXITS_HTL
+#endif
+        if (disable_exits) {
+            disable_exits &= (KVM_X86_DISABLE_EXITS_MWAIT |
+                              KVM_X86_DISABLE_EXITS_HLT |
+                              KVM_X86_DISABLE_EXITS_PAUSE);
+        }
+
+        ret = kvm_vm_enable_cap(s, KVM_CAP_X86_DISABLE_EXITS, 0,
+                                disable_exits);
+        if (ret < 0) {
+            error_report("kvm: guest stopping CPU not supported: %s",
+                         strerror(-ret));
+        }
+    }
+
     return 0;
 }
 
diff --git a/vl.c b/vl.c
index 06031715ac..d53a9abcde 100644
--- a/vl.c
+++ b/vl.c
@@ -142,6 +142,7 @@  ram_addr_t ram_size;
 const char *mem_path = NULL;
 int mem_prealloc = 0; /* force preallocation of physical target memory */
 bool enable_mlock = false;
+bool enable_cpu_pm = false;
 int nb_nics;
 NICInfo nd_table[MAX_NICS];
 int autostart;
@@ -390,6 +391,22 @@  static QemuOptsList qemu_realtime_opts = {
     },
 };
 
+static QemuOptsList qemu_dedicated_opts = {
+    .name = "dedicated",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_dedicated_opts.head),
+    .desc = {
+        {
+            .name = "mem-lock",
+            .type = QEMU_OPT_BOOL,
+        },
+        {
+            .name = "cpu-pm",
+            .type = QEMU_OPT_BOOL,
+        },
+        { /* end of list */ }
+    },
+};
+
 static QemuOptsList qemu_msg_opts = {
     .name = "msg",
     .head = QTAILQ_HEAD_INITIALIZER(qemu_msg_opts.head),
@@ -3903,7 +3920,20 @@  int main(int argc, char **argv, char **envp)
                 if (!opts) {
                     exit(1);
                 }
-                enable_mlock = qemu_opt_get_bool(opts, "mlock", true);
+                /* Don't override the -dedicated option if set */
+                enable_mlock = enable_mlock ||
+                    qemu_opt_get_bool(opts, "mlock", true);
+                break;
+            case QEMU_OPTION_dedicated:
+                opts = qemu_opts_parse_noisily(qemu_find_opts("dedicated"),
+                                               optarg, false);
+                if (!opts) {
+                    exit(1);
+                }
+                /* Don't override the -realtime option if set */
+                enable_mlock = enable_mlock ||
+                    qemu_opt_get_bool(opts, "mem-lock", false);
+                enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", false);
                 break;
             case QEMU_OPTION_msg:
                 opts = qemu_opts_parse_noisily(qemu_find_opts("msg"), optarg,
diff --git a/qemu-options.hx b/qemu-options.hx
index c0d3951e9f..ddedb7eb92 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3337,6 +3337,24 @@  mlocking qemu and guest memory can be enabled via @option{mlock=on}
 (enabled by default).
 ETEXI
 
+DEF("dedicated", HAS_ARG, QEMU_OPTION_dedicated,
+    "-dedicated [mem-lock=on|off][cpu-pm=on|off]\n"
+    "                run qemu with realtime features\n"
+    "                mem-lock=on|off controls memory lock support (default: off)\n"
+    "                cpu-pm=on|off controls cpu power management (default: off)\n",
+    QEMU_ARCH_ALL)
+STEXI
+@item -dedicated mem-lock=on|off
+@item -dedicated cpu-pm=on|off
+@findex -dedicated
+Run qemu using dedicated host resources.
+Locking qemu and guest memory can be enabled via @option{mem-lock=on}
+(disabled by default). This is equivalent to @option{realtime}.
+Guest ability to manage power state of host cpus (increasing latency for other
+processes on the same host cpu, but decreasing latency for guest)
+can be enabled via @option{cpu-pm=on} (disabled by default).
+ETEXI
+
 DEF("gdb", HAS_ARG, QEMU_OPTION_gdb, \
     "-gdb dev        wait for gdb connection on 'dev'\n", QEMU_ARCH_ALL)
 STEXI