diff mbox

xen: preserve native TSC speed during migration between identical hosts

Message ID 20170524142505.11460-1-olaf@aepfle.de (mailing list archive)
State New, archived
Headers show

Commit Message

Olaf Hering May 24, 2017, 2:25 p.m. UTC
After migrating a domU to another identical host a performance drop can
be observed. One reason is that before migration TSC was accessed at
native speed, after migration TSC has to be emulated. This happens
because the measured CPU frequency is not accurate, the values differ
even between reboots.

To avoid the emulation a tolerance range can be specified during boot
with "vtsc-tolerance=N".  If the frequency expected by the domU is
within the range, TSC access from the domU will remain native. If the
domU is migrated to another machine type TSC might be emulated.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
---
 docs/man/xen-tscmode.pod.7          | 11 +++++++----
 docs/misc/xen-command-line.markdown | 10 ++++++++++
 xen/arch/x86/time.c                 | 14 +++++++++++++-
 3 files changed, 30 insertions(+), 5 deletions(-)

Comments

Konrad Rzeszutek Wilk May 24, 2017, 3:09 p.m. UTC | #1
On Wed, May 24, 2017 at 04:25:05PM +0200, Olaf Hering wrote:
> After migrating a domU to another identical host a performance drop can
> be observed. One reason is that before migration TSC was accessed at
> native speed, after migration TSC has to be emulated. This happens
> because the measured CPU frequency is not accurate, the values differ
> even between reboots.
> 
> To avoid the emulation a tolerance range can be specified during boot
> with "vtsc-tolerance=N".  If the frequency expected by the domU is
> within the range, TSC access from the domU will remain native. If the

How can that be determined? As in how can the guest (domU) be within
the range? Is there some way to determine that? Is there some
matrix of the various OS-es that can tolerate this?
Olaf Hering May 24, 2017, 3:25 p.m. UTC | #2
On Wed, May 24, Konrad Rzeszutek Wilk wrote:

> How can that be determined? As in how can the guest (domU) be within
> the range? Is there some way to determine that? Is there some
> matrix of the various OS-es that can tolerate this?

Just cycle through all dom0s and look for the cpu_khz values:
 # xl dmesg | grep -w MHz
(XEN) Detected 2494.018 MHz processor.

What I have seen are ranges up to 200khz, even if /proc/cpuinfo says
"2.50GHz" . Some dom0s calibrated themselves to exactly 2500.000 MHz,
they do not need a tolerance.

The expected frequency of a given domU can be seen in 'dump softtsc
stats' (s). How various guest kernels deal with the slightly different
frequency, no idea.


Olaf
Konrad Rzeszutek Wilk May 24, 2017, 3:33 p.m. UTC | #3
On Wed, May 24, 2017 at 05:25:03PM +0200, Olaf Hering wrote:
> On Wed, May 24, Konrad Rzeszutek Wilk wrote:
> 
> > How can that be determined? As in how can the guest (domU) be within
> > the range? Is there some way to determine that? Is there some
> > matrix of the various OS-es that can tolerate this?
> 
> Just cycle through all dom0s and look for the cpu_khz values:
>  # xl dmesg | grep -w MHz
> (XEN) Detected 2494.018 MHz processor.
> 
> What I have seen are ranges up to 200khz, even if /proc/cpuinfo says
> "2.50GHz" . Some dom0s calibrated themselves to exactly 2500.000 MHz,
> they do not need a tolerance.
> 
> The expected frequency of a given domU can be seen in 'dump softtsc
> stats' (s). How various guest kernels deal with the slightly different
> frequency, no idea.

Right, so that answers how one would find the values (which I think
should be in the docs?).

But it does not help customers to figure out if this is OK for them?

As in, how can customers be assured that 1% jitter is OK for their
kernel? That time won't go backwards?

Is there some form of tests that they can run to verify and test
that this is safe? Or perhaps this is something that is based on the kernel
versions? Like 4.11 are safe, but 3.18 is not?

> 
> 
> Olaf
Olaf Hering May 24, 2017, 3:44 p.m. UTC | #4
Am Wed, 24 May 2017 11:33:15 -0400
schrieb Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>:

> But it does not help customers to figure out if this is OK for them?
> As in, how can customers be assured that 1% jitter is OK for their
> kernel? That time won't go backwards?

Well, that would be some new documentation.
I think the tsc_mode=native part already lacks that info.

In my testing time does not go backwards. The actual value of tsc
is different anyway on each host. Querying an ntp server shows no
drift, the offset remains within the +- 0.5 sec range.

> Is there some form of tests that they can run to verify and test
> that this is safe? Or perhaps this is something that is based on the kernel
> versions? Like 4.11 are safe, but 3.18 is not?

I'm not aware of any tests to measure the results of such jitter
with tsc_mode=native.

Olaf
Jan Beulich May 29, 2017, 2:45 p.m. UTC | #5
>>> On 24.05.17 at 17:33, <konrad.wilk@oracle.com> wrote:
> On Wed, May 24, 2017 at 05:25:03PM +0200, Olaf Hering wrote:
>> On Wed, May 24, Konrad Rzeszutek Wilk wrote:
>> 
>> > How can that be determined? As in how can the guest (domU) be within
>> > the range? Is there some way to determine that? Is there some
>> > matrix of the various OS-es that can tolerate this?
>> 
>> Just cycle through all dom0s and look for the cpu_khz values:
>>  # xl dmesg | grep -w MHz
>> (XEN) Detected 2494.018 MHz processor.
>> 
>> What I have seen are ranges up to 200khz, even if /proc/cpuinfo says
>> "2.50GHz" . Some dom0s calibrated themselves to exactly 2500.000 MHz,
>> they do not need a tolerance.
>> 
>> The expected frequency of a given domU can be seen in 'dump softtsc
>> stats' (s). How various guest kernels deal with the slightly different
>> frequency, no idea.
> 
> Right, so that answers how one would find the values (which I think
> should be in the docs?).
> 
> But it does not help customers to figure out if this is OK for them?
> 
> As in, how can customers be assured that 1% jitter is OK for their
> kernel? That time won't go backwards?
> 
> Is there some form of tests that they can run to verify and test
> that this is safe? Or perhaps this is something that is based on the kernel
> versions? Like 4.11 are safe, but 3.18 is not?

Well, no, what jitter may be acceptable depends on the
applications running inside the guest. I.e. you can only know
for yourself or ask the application vendor(s). I think such an
option, if we really want to have it, would need to be
prominently documented as unsupported - after all we can't
help it if people use it and then find their applications break.

Jan
Jan Beulich May 29, 2017, 2:50 p.m. UTC | #6
>>> On 24.05.17 at 16:25, <olaf@aepfle.de> wrote:
> @@ -2024,6 +2029,13 @@ void tsc_set_info(struct domain *d,
>          d->arch.vtsc_offset = get_s_time() - elapsed_nsec;
>          d->arch.tsc_khz = gtsc_khz ?: cpu_khz;
>          set_time_scale(&d->arch.vtsc_to_ns, d->arch.tsc_khz * 1000);
> +        if (!opt_vtsc_tolerance) {
> +            tolerated = d->arch.tsc_khz == cpu_khz;
> +        } else {

Leaving aside the question of whether we want anything like this,
there are multiple coding style issues here (braces on their own
lines, blanks inside the parentheses of control statements). 

> +            khz_diff = cpu_khz > d->arch.tsc_khz ?
> +                       cpu_khz - d->arch.tsc_khz : d->arch.tsc_khz - cpu_khz;
> +            tolerated = khz_diff <= opt_vtsc_tolerance;
> +        }

These assignments to tolerated also suggest it wants to be bool.

Finally I don't think a host wide option will do. If it is to be of use
(considering that it used wrong may break applications), it needs
to be per-domain, and its value needs to be migrated (perhaps
in the form of a low/high pair of TSC frequency values).

Jan
Olaf Hering May 30, 2017, 6:36 a.m. UTC | #7
On Mon, May 29, Jan Beulich wrote:

> Well, no, what jitter may be acceptable depends on the
> applications running inside the guest. I.e. you can only know
> for yourself or ask the application vendor(s). I think such an
> option, if we really want to have it, would need to be
> prominently documented as unsupported - after all we can't
> help it if people use it and then find their applications break.

The very same is true for tsc_mode=native, no such warning exists AFAIK.

Olaf
Olaf Hering May 30, 2017, 6:41 a.m. UTC | #8
On Mon, May 29, Jan Beulich wrote:

> Finally I don't think a host wide option will do. If it is to be of use
> (considering that it used wrong may break applications), it needs
> to be per-domain, and its value needs to be migrated (perhaps
> in the form of a low/high pair of TSC frequency values).

How is it supposed to be propagated from one host to another?

With the global option one can unconditionally receive a domU and
preserve "native performance" in case of tsc_mode=default. With a
per-domU option one has to know upfront.


Olaf
Jan Beulich May 30, 2017, 7:25 a.m. UTC | #9
>>> On 30.05.17 at 08:36, <olaf@aepfle.de> wrote:
> On Mon, May 29, Jan Beulich wrote:
> 
>> Well, no, what jitter may be acceptable depends on the
>> applications running inside the guest. I.e. you can only know
>> for yourself or ask the application vendor(s). I think such an
>> option, if we really want to have it, would need to be
>> prominently documented as unsupported - after all we can't
>> help it if people use it and then find their applications break.
> 
> The very same is true for tsc_mode=native, no such warning exists AFAIK.

I fully agree.

Jan
Jan Beulich May 30, 2017, 7:27 a.m. UTC | #10
>>> On 30.05.17 at 08:41, <olaf@aepfle.de> wrote:
> On Mon, May 29, Jan Beulich wrote:
> 
>> Finally I don't think a host wide option will do. If it is to be of use
>> (considering that it used wrong may break applications), it needs
>> to be per-domain, and its value needs to be migrated (perhaps
>> in the form of a low/high pair of TSC frequency values).
> 
> How is it supposed to be propagated from one host to another?
> 
> With the global option one can unconditionally receive a domU and
> preserve "native performance" in case of tsc_mode=default. With a
> per-domU option one has to know upfront.

I don't understand: If the incoming stream tells you the acceptable
clock range, what else do you need to know (upfront or later)?

Jan
Olaf Hering May 30, 2017, 9:27 a.m. UTC | #11
On Wed, May 24, Konrad Rzeszutek Wilk wrote:

> Is there some form of tests that they can run to verify and test
> that this is safe? Or perhaps this is something that is based on the kernel
> versions? Like 4.11 are safe, but 3.18 is not?

I'm not sure why you are asking for specific kernel versions.


But the test is simply to boot a HVM domU with 'tsc_mode=native' and
migrate it to a similar dom0. This fails with 3.12 and 4.4 based domU
kernels, with xen-4.7 as dom0. This pretty much ruins the plan.

So far I have not investigated why the domU on the remote side does not
get any cpu time, or why it makes no progress. But in the end the I
think only a hardware upgrade to TSC-scaling capable CPUs will avoid the
performance penalty due to the TSC emulation.
I do not have access to such hardware to verify it.


Olaf
Konrad Rzeszutek Wilk May 30, 2017, 2:28 p.m. UTC | #12
On Tue, May 30, 2017 at 11:27:05AM +0200, Olaf Hering wrote:
> On Wed, May 24, Konrad Rzeszutek Wilk wrote:
> 
> > Is there some form of tests that they can run to verify and test
> > that this is safe? Or perhaps this is something that is based on the kernel
> > versions? Like 4.11 are safe, but 3.18 is not?
> 
> I'm not sure why you are asking for specific kernel versions.

Because a blanket statement saying: "Use that for everything" (including
Windows OSes) makes me nervous that some other OSes may not be
comfortable with this or would not work.

But it is unrealistic to ask you to test _every_ single OS or kernel
to verify this. And hence providing a test-case for folks to run/test
can help them with evaluating this.

> 
> 
> But the test is simply to boot a HVM domU with 'tsc_mode=native' and
> migrate it to a similar dom0. This fails with 3.12 and 4.4 based domU
> kernels, with xen-4.7 as dom0. This pretty much ruins the plan.
> 
> So far I have not investigated why the domU on the remote side does not
> get any cpu time, or why it makes no progress. But in the end the I
> think only a hardware upgrade to TSC-scaling capable CPUs will avoid the
> performance penalty due to the TSC emulation.
> I do not have access to such hardware to verify it.
> 
> 
> Olaf
diff mbox

Patch

diff --git a/docs/man/xen-tscmode.pod.7 b/docs/man/xen-tscmode.pod.7
index 3bbc96f201..e04af70855 100644
--- a/docs/man/xen-tscmode.pod.7
+++ b/docs/man/xen-tscmode.pod.7
@@ -206,10 +206,13 @@  TSC-safe, rdtsc will execute at hardware speed; if it is not, rdtsc
 will be emulated.  Once a virtual machine is save/restored or migrated,
 however, there are two possibilities: TSC remains native IF the source
 physical machine and target physical machine have the same TSC frequency
-(or, for HVM/PVH guests, if TSC scaling support is available); else TSC
-is emulated.  Note that, though emulated, the "apparent" TSC frequency
-will be the TSC frequency of the initial physical machine, even after
-migration.
+(or, for HVM/PVH guests, if TSC scaling support is available); else TSC is
+emulated. The Xen cmdline option "vtsc-tolerance" allows host admins to
+specify a tolerance range in case the measured frequency of supposedly
+identical machines differs slightly. If the frequency on the target machine
+is within the range the tsc_mode remains native.
+Note that, though emulated, the "apparent" TSC frequency will be the TSC
+frequency of the initial physical machine, even after migration.
 
 For environments where both TSC-safeness AND highest performance
 even across migration is a requirement, application code can be specially
diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 44d99852aa..ff92975a15 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1723,6 +1723,16 @@  Note that if **watchdog** option is also specified vpmu will be turned off.
 As the virtualisation is not 100% safe, don't use the vpmu flag on
 production systems (see http://xenbits.xen.org/xsa/advisory-163.html)!
 
+### vtsc-tolerance
+> `= <integer>`
+
+> Default: `0`
+
+Specify the tolerated difference of pCPUs clock frequency in kHz.
+This option affects only domUs which have tsc\_mode=default enabled.
+If the frequency expected by a domU is within the tolerance range tsc
+will remain native. Otherwise tsc emulation will be used for the domU.
+
 ### vwfi
 > `= trap | native
 
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 655af33cb3..b290e8aca9 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -41,6 +41,9 @@ 
 static char __initdata opt_clocksource[10];
 string_param("clocksource", opt_clocksource);
 
+static unsigned int __read_mostly opt_vtsc_tolerance;
+integer_param("vtsc-tolerance", opt_vtsc_tolerance);
+
 unsigned long __read_mostly cpu_khz;  /* CPU clock frequency in kHz. */
 DEFINE_SPINLOCK(rtc_lock);
 unsigned long pit0_ticks;
@@ -2009,6 +2012,8 @@  void tsc_set_info(struct domain *d,
                   uint32_t tsc_mode, uint64_t elapsed_nsec,
                   uint32_t gtsc_khz, uint32_t incarnation)
 {
+    uint32_t khz_diff, tolerated;
+
     if ( is_idle_domain(d) || is_hardware_domain(d) )
     {
         d->arch.vtsc = 0;
@@ -2024,6 +2029,13 @@  void tsc_set_info(struct domain *d,
         d->arch.vtsc_offset = get_s_time() - elapsed_nsec;
         d->arch.tsc_khz = gtsc_khz ?: cpu_khz;
         set_time_scale(&d->arch.vtsc_to_ns, d->arch.tsc_khz * 1000);
+        if (!opt_vtsc_tolerance) {
+            tolerated = d->arch.tsc_khz == cpu_khz;
+        } else {
+            khz_diff = cpu_khz > d->arch.tsc_khz ?
+                       cpu_khz - d->arch.tsc_khz : d->arch.tsc_khz - cpu_khz;
+            tolerated = khz_diff <= opt_vtsc_tolerance;
+        }
 
         /*
          * In default mode use native TSC if the host has safe TSC and
@@ -2033,7 +2045,7 @@  void tsc_set_info(struct domain *d,
          * d->arch.tsc_khz == cpu_khz. Thus no need to check incarnation.
          */
         if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() &&
-             (d->arch.tsc_khz == cpu_khz ||
+             (tolerated ||
               (is_hvm_domain(d) &&
                hvm_get_tsc_scaling_ratio(d->arch.tsc_khz))) )
         {