kvm-unit-test failures

Message ID	54072CBF.7080108@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> Message-ID: <54072CBF.7080108@redhat.com> Date: Wed, 03 Sep 2014 16:59:11 +0200 From: Paolo Bonzini <pbonzini@redhat.com> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Chris J Arges <chris.j.arges@canonical.com> CC: kvm@vger.kernel.org Subject: Re: kvm-unit-test failures References: <53FDDD3F.8050102@redhat.com> <1409150751-22763-1-git-send-email-chris.j.arges@canonical.com> <53FDF09F.4060007@redhat.com> <53FE4C7E.90201@canonical.com> <1018860798.37183509.1409177149937.JavaMail.zimbra@redhat.com> <5400BA22.9080704@canonical.com> <5400EB01.1080106@canonical.com> <540347AD.20308@redhat.com> <54062146.9040609@canonical.com> In-Reply-To: <54062146.9040609@canonical.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: kvm-owner@vger.kernel.org Precedence: bulk

Paolo Bonzini Sept. 3, 2014, 2:59 p.m. UTC

Il 02/09/2014 21:57, Chris J Arges ha scritto:
> Seconds get from host:     1409687073
> Seconds get from kvmclock: 1409333034
> Offset:                    -354039
> offset too large!
> Check the stability of raw cycle ...
> Worst warp -354462672821748
> Total vcpus: 2
> Test  loops: 10000000
> Total warps:  1
> Total stalls: 0
> Worst warp:   -354462672821748
> Raw cycle is not stable
> Monotonic cycle test:
> Worst warp -354455286691490

Looks like one CPU is not being initialized correctly:

- The "next" correction in the trace is 18445647546048704244,
  and (next-2^64) / -354039 is about 3.1*10^9.  This is a pretty
  plausible value of the TSC frequency.  As a comparison, on my machine
  I have next=18446366988261784997 and an uptime of 29:12 hours, and
  the two match nicely with the CPU clock:

  -(18446366988261784997-2^64) / (29.2 * 3600 * 10^9) = 3.587

  $ grep -m1 model.name /proc/cpuinfo
  model name	: Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz

- The offset in seconds * 10^9 is pretty close to the warp in nanoseconds.

Can you: 1) try this patch 2) gather a new trace 3) include uptime and
cpuinfo in your report?  All this without clocksource=hpet of course.

Thanks,

Paolo


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris J Arges Sept. 3, 2014, 4:23 p.m. UTC | #1

On 09/03/2014 09:59 AM, Paolo Bonzini wrote:
> Il 02/09/2014 21:57, Chris J Arges ha scritto:
>> Seconds get from host:     1409687073
>> Seconds get from kvmclock: 1409333034
>> Offset:                    -354039
>> offset too large!
>> Check the stability of raw cycle ...
>> Worst warp -354462672821748
>> Total vcpus: 2
>> Test  loops: 10000000
>> Total warps:  1
>> Total stalls: 0
>> Worst warp:   -354462672821748
>> Raw cycle is not stable
>> Monotonic cycle test:
>> Worst warp -354455286691490
> 
> Looks like one CPU is not being initialized correctly:
> 
> - The "next" correction in the trace is 18445647546048704244,
>   and (next-2^64) / -354039 is about 3.1*10^9.  This is a pretty
>   plausible value of the TSC frequency.  As a comparison, on my machine
>   I have next=18446366988261784997 and an uptime of 29:12 hours, and
>   the two match nicely with the CPU clock:
> 
>   -(18446366988261784997-2^64) / (29.2 * 3600 * 10^9) = 3.587
> 
>   $ grep -m1 model.name /proc/cpuinfo
>   model name	: Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz
> 
> - The offset in seconds * 10^9 is pretty close to the warp in nanoseconds.
> 
> Can you: 1) try this patch 2) gather a new trace 3) include uptime and
> cpuinfo in your report?  All this without clocksource=hpet of course.
> 
> Thanks,
> 
> Paolo
> 
> diff --git a/x86/kvmclock_test.c b/x86/kvmclock_test.c
> index 52a43fb..f68881c 100644
> --- a/x86/kvmclock_test.c
> +++ b/x86/kvmclock_test.c
> @@ -7,6 +7,9 @@
>  #define DEFAULT_TEST_LOOPS 100000000L
>  #define DEFAULT_THRESHOLD  5L
>  
> +long threshold = DEFAULT_THRESHOLD;
> +int nerr = 0;
> +
>  struct test_info {
>          struct spinlock lock;
>          long loops;               /* test loops */
> @@ -20,8 +23,9 @@ struct test_info {
>  
>  struct test_info ti[4];
>  
> -static int wallclock_test(long sec, long threshold)
> +static void wallclock_test(void *p_sec)
>  {
> +	long sec = *(long *)p_sec;
>          long ksec, offset;
>          struct timespec ts;
>  
> @@ -36,10 +40,8 @@ static int wallclock_test(long sec, long threshold)
>  
>          if (offset > threshold || offset < -threshold) {
>                  printf("offset too large!\n");
> -                return 1;
> +                nerr++;
>          }
> -
> -        return 0;
>  }
>  
>  static void kvm_clock_test(void *data)
> @@ -116,10 +118,9 @@ static int cycle_test(int ncpus, long loops, int check, struct test_info *ti)
>  int main(int ac, char **av)
>  {
>          int ncpus;
> -        int nerr = 0, i;
> +        int i;
>          long loops = DEFAULT_TEST_LOOPS;
>          long sec = 0;
> -        long threshold = DEFAULT_THRESHOLD;
>  
>          if (ac > 1)
>                  loops = atol(av[1]);
> @@ -137,7 +138,8 @@ int main(int ac, char **av)
>                  on_cpu(i, kvm_clock_init, (void *)0);
>  
>          if (ac > 2)
> -                nerr += wallclock_test(sec, threshold);
> +	        for (i = 0; i < ncpus; ++i)
> +	                on_cpu(i, wallclock_test, &sec);
>  
>          printf("Check the stability of raw cycle ...\n");
>          pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT
> 

Here are the results of the trace as you requested:
http://people.canonical.com/~arges/kvm/trace-2.dat.xz

$ uptime
 16:18:31 up 53 min,  1 user,  load average: 1.16, 0.39, 0.17

$ grep -m1 model.name /proc/cpuinfo
model name      : Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz

Here is the output of the command:
qemu-system-x86_64 -enable-kvm -device pc-testdev -device
isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio
-device pci-testdev -kernel x86/kvmclock_test.flat -smp 2 --append
10000000 1409761075
enabling apic
enabling apic
kvm-clock: cpu 0, msr 0x:44e520
kvm-clock: cpu 0, msr 0x:44e520
Wallclock test, threshold 5
Seconds get from host:     1409761075
Seconds get from kvmclock: 1409757927
Offset:                    -3148
offset too large!
Wallclock test, threshold 5
Seconds get from host:     1409761075
Seconds get from kvmclock: 1409757927
Offset:                    -3148
offset too large!
Check the stability of raw cycle ...
Worst warp -3147762665310
Total vcpus: 2
Test  loops: 10000000
Total warps:  1
Total stalls: 0
Worst warp:   -3147762665310
Raw cycle is not stable
Monotonic cycle test:
Worst warp -3142929472775
Total vcpus: 2
Test  loops: 10000000
Total warps:  1
Total stalls: 0
Worst warp:   -3142929472775
Measure the performance of raw cycle ...
Total vcpus: 2
Test  loops: 10000000
TSC cycles:  1242044050
Measure the performance of adjusted cycle ...
Total vcpus: 2
Test  loops: 10000000
TSC cycles:  1242665486
Return value from qemu: 3

--chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Paolo Bonzini Sept. 3, 2014, 5:52 p.m. UTC | #2

Il 03/09/2014 18:23, Chris J Arges ha scritto:
> $ uptime
>  16:18:31 up 53 min,  1 user,  load average: 1.16, 0.39, 0.17
> 
> $ grep -m1 model.name /proc/cpuinfo
> model name      : Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
> 
> Here is the output of the command:
> qemu-system-x86_64 -enable-kvm -device pc-testdev -device
> isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio
> -device pci-testdev -kernel x86/kvmclock_test.flat -smp 2 --append
> 10000000 1409761075
> enabling apic
> enabling apic
> kvm-clock: cpu 0, msr 0x:44e520
> kvm-clock: cpu 0, msr 0x:44e520
> Wallclock test, threshold 5
> Seconds get from host:     1409761075
> Seconds get from kvmclock: 1409757927
> Offset:                    -3148
> offset too large!
> Wallclock test, threshold 5
> Seconds get from host:     1409761075
> Seconds get from kvmclock: 1409757927
> Offset:                    -3148
> offset too large!
> Check the stability of raw cycle ...
> Worst warp -3147762665310

I'm not sure about the reason for the warp, but indeed the offset and
uptime match (I'll check them against the trace tomorrow) so it's "just"
that the VM's TSC base is not taken into account correctly.

Can you gather another trace with the problematic patch reverted?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris J Arges Sept. 3, 2014, 6:25 p.m. UTC | #3

<snip>
> I'm not sure about the reason for the warp, but indeed the offset and
> uptime match (I'll check them against the trace tomorrow) so it's "just"
> that the VM's TSC base is not taken into account correctly.
> 
> Can you gather another trace with the problematic patch reverted?
> 
> Paolo
> 

Here is the third trace running with 0d3da0d2 reverted from the latest
kvm queue branch 11cc9ea3:

http://people.canonical.com/~arges/kvm/trace-3.dat.xz

$ uptime
 18:25:13 up 5 min,  1 user,  load average: 0.21, 0.74, 0.44

qemu-system-x86_64 -enable-kvm -device pc-testdev -device
isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio
-device pci-testdev -kernel x86/kvmclock_test.flat -smp 2 --append
10000000 1409768537
enabling apic
enabling apic
kvm-clock: cpu 0, msr 0x:44e520
kvm-clock: cpu 0, msr 0x:44e520
Wallclock test, threshold 5
Seconds get from host:     1409768537
Seconds get from kvmclock: 1409768538
Offset:                    1
Wallclock test, threshold 5
Seconds get from host:     1409768537
Seconds get from kvmclock: 1409768538
Offset:                    1
Check the stability of raw cycle ...
Total vcpus: 2
Test  loops: 10000000
Total warps:  0
Total stalls: 0
Worst warp:   0
Raw cycle is stable
Monotonic cycle test:
Total vcpus: 2
Test  loops: 10000000
Total warps:  0
Total stalls: 0
Worst warp:   0
Measure the performance of raw cycle ...
Total vcpus: 2
Test  loops: 10000000
TSC cycles:  1241970306
Measure the performance of adjusted cycle ...
Total vcpus: 2
Test  loops: 10000000
TSC cycles:  3266701026
Return value from qemu: 1
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Paolo Bonzini Sept. 4, 2014, 9:53 a.m. UTC | #4

Il 03/09/2014 20:25, Chris J Arges ha scritto:
> <snip>
>> I'm not sure about the reason for the warp, but indeed the offset and
>> uptime match (I'll check them against the trace tomorrow) so it's "just"
>> that the VM's TSC base is not taken into account correctly.
>>
>> Can you gather another trace with the problematic patch reverted?
>>
>> Paolo
>>
> 
> Here is the third trace running with 0d3da0d2 reverted from the latest
> kvm queue branch 11cc9ea3:
> 
> http://people.canonical.com/~arges/kvm/trace-3.dat.xz

Thanks!  And---yay!---I reproduced it on another machine.

Paolo

> $ uptime
>  18:25:13 up 5 min,  1 user,  load average: 0.21, 0.74, 0.44
> 
> qemu-system-x86_64 -enable-kvm -device pc-testdev -device
> isa-debug-exit,iobase=0xf4,iosize=0x4 -display none -serial stdio
> -device pci-testdev -kernel x86/kvmclock_test.flat -smp 2 --append
> 10000000 1409768537
> enabling apic
> enabling apic
> kvm-clock: cpu 0, msr 0x:44e520
> kvm-clock: cpu 0, msr 0x:44e520
> Wallclock test, threshold 5
> Seconds get from host:     1409768537
> Seconds get from kvmclock: 1409768538
> Offset:                    1
> Wallclock test, threshold 5
> Seconds get from host:     1409768537
> Seconds get from kvmclock: 1409768538
> Offset:                    1
> Check the stability of raw cycle ...
> Total vcpus: 2
> Test  loops: 10000000
> Total warps:  0
> Total stalls: 0
> Worst warp:   0
> Raw cycle is stable
> Monotonic cycle test:
> Total vcpus: 2
> Test  loops: 10000000
> Total warps:  0
> Total stalls: 0
> Worst warp:   0
> Measure the performance of raw cycle ...
> Total vcpus: 2
> Test  loops: 10000000
> TSC cycles:  1241970306
> Measure the performance of adjusted cycle ...
> Total vcpus: 2
> Test  loops: 10000000
> TSC cycles:  3266701026
> Return value from qemu: 1
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Paolo Bonzini Sept. 4, 2014, 11:33 a.m. UTC | #5

Il 04/09/2014 11:53, Paolo Bonzini ha scritto:
> Il 03/09/2014 20:25, Chris J Arges ha scritto:
>> <snip>
>>> I'm not sure about the reason for the warp, but indeed the offset and
>>> uptime match (I'll check them against the trace tomorrow) so it's "just"
>>> that the VM's TSC base is not taken into account correctly.
>>>
>>> Can you gather another trace with the problematic patch reverted?
>>>
>>> Paolo
>>>
>>
>> Here is the third trace running with 0d3da0d2 reverted from the latest
>> kvm queue branch 11cc9ea3:
>>
>> http://people.canonical.com/~arges/kvm/trace-3.dat.xz
> 
> Thanks!  And---yay!---I reproduced it on another machine.

And my bisection landed on the merge of the timer branch (commit
e7fda6c4c3c1a7d6996dd75fd84670fa0b5d448f).  Here is the log:

$ git bisect bad origin/master
$ git bisect good v3.16
$ git bisect good kvm-3.17-1 # 42cbc04fd3b5e3f9b011bf9fa3ce0b3d1e10b58b

good    [ 6929.863545] loaded kvm module (v3.17-rc1-158-g451fd72219dd)
bad     [ 6971.932790] loaded kvm module (for-linus)
bad     [ 7216.073128] loaded kvm module (v3.16-6426-gae045e245542)
good    [ 7286.198948] loaded kvm module (v3.16-3283-g53ee983378ff)
good    [ 7350.534060] loaded kvm module (v3.16-rc7-1668-gaef4f5b6db65)
good    [ 7439.037038] loaded kvm module (v3.16-4006-g91c2ff7708d4)
good    [ 7481.188637] loaded kvm module (v3.16-rc6-450-g7ba3c21c17d0)
bad     [ 7535.292730] loaded kvm module (v3.16-4635-ge7fda6c4c3c1)
good    [ 7589.722691] loaded kvm module (v3.16-rc5-110-g9b0fd802e8c0)
good    [ 7630.286418] loaded kvm module (v3.16-4467-ged5c41d30ef2)
good    [ 7712.470986] loaded kvm module (v3.16-rc1-35-g885d078bfe92)
good    [ 7763.443626] loaded kvm module (v3.16-rc1-381-g1b0733837a9b)
good    [ 7825.497414] loaded kvm module (v3.16-rc5-116-g7806f60e1d20)
good    [ 7893.174056] loaded kvm module (v3.16-rc1-384-gc6f1224573c3)

This means that:

- Tomasz's patch (commit 0d3da0d26e3c3515997c99451ce3b0ad1a69a36c) is
fine, it just enables the (wrong) master clock more often

- the failure is within that branch.

I then cherry-picked Tomasz's patch during a new bisection, and landed
on one of my original suspects:

commit cbcf2dd3b3d4d990610259e8d878fc8dc1f17d80
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Jul 16 21:04:54 2014 +0000

    x86: kvm: Make kvm_get_time_and_clockread() nanoseconds based

    Convert the relevant base data right away to nanoseconds instead of
    doing the conversion on every readout. Reduces text size by 160
    bytes.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: Gleb Natapov <gleb@kernel.org>
    Cc: kvm@vger.kernel.org
    Acked-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: John Stultz <john.stultz@linaro.org>

Again, here is the log:

$ git bisect bad 953dec21aed4038464fec02f96a2f1b8701a5bce
$ git bisect good 1af447bd8cbfb808a320885d214555fb2d32e6e6

good    [ 8384.334892] loaded kvm module (v3.16-rc5-81-g68f6783d2831)
bad     [ 8525.975170] loaded kvm module (v3.16-rc5-99-gf519b1a2e08c)
good    [ 8562.204988] loaded kvm module (v3.16-rc5-90-g41fa4215f8e8)
bad     [ 8629.133287] loaded kvm module (v3.16-rc5-94-g48f18fd6addc)
bad     [ 8772.846612] loaded kvm module (v3.16-rc5-92-gcbcf2dd3b3d4)
good    [ 8836.509602] loaded kvm module (v3.16-rc5-91-gbb0b58127c5a)

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Wanpeng Li Sept. 4, 2014, 12:24 p.m. UTC | #6

On Thu, Sep 04, 2014 at 01:33:10PM +0200, Paolo Bonzini wrote:
>Il 04/09/2014 11:53, Paolo Bonzini ha scritto:
>> Il 03/09/2014 20:25, Chris J Arges ha scritto:
>>> <snip>
>>>> I'm not sure about the reason for the warp, but indeed the offset and
>>>> uptime match (I'll check them against the trace tomorrow) so it's "just"
>>>> that the VM's TSC base is not taken into account correctly.
>>>>
>>>> Can you gather another trace with the problematic patch reverted?
>>>>
>>>> Paolo
>>>>
>>>
>>> Here is the third trace running with 0d3da0d2 reverted from the latest
>>> kvm queue branch 11cc9ea3:
>>>
>>> http://people.canonical.com/~arges/kvm/trace-3.dat.xz
>> 
>> Thanks!  And---yay!---I reproduced it on another machine.
>
>And my bisection landed on the merge of the timer branch (commit
>e7fda6c4c3c1a7d6996dd75fd84670fa0b5d448f).  Here is the log:
>
>$ git bisect bad origin/master
>$ git bisect good v3.16
>$ git bisect good kvm-3.17-1 # 42cbc04fd3b5e3f9b011bf9fa3ce0b3d1e10b58b
>
>good    [ 6929.863545] loaded kvm module (v3.17-rc1-158-g451fd72219dd)
>bad     [ 6971.932790] loaded kvm module (for-linus)
>bad     [ 7216.073128] loaded kvm module (v3.16-6426-gae045e245542)
>good    [ 7286.198948] loaded kvm module (v3.16-3283-g53ee983378ff)
>good    [ 7350.534060] loaded kvm module (v3.16-rc7-1668-gaef4f5b6db65)
>good    [ 7439.037038] loaded kvm module (v3.16-4006-g91c2ff7708d4)
>good    [ 7481.188637] loaded kvm module (v3.16-rc6-450-g7ba3c21c17d0)
>bad     [ 7535.292730] loaded kvm module (v3.16-4635-ge7fda6c4c3c1)
>good    [ 7589.722691] loaded kvm module (v3.16-rc5-110-g9b0fd802e8c0)
>good    [ 7630.286418] loaded kvm module (v3.16-4467-ged5c41d30ef2)
>good    [ 7712.470986] loaded kvm module (v3.16-rc1-35-g885d078bfe92)
>good    [ 7763.443626] loaded kvm module (v3.16-rc1-381-g1b0733837a9b)
>good    [ 7825.497414] loaded kvm module (v3.16-rc5-116-g7806f60e1d20)
>good    [ 7893.174056] loaded kvm module (v3.16-rc1-384-gc6f1224573c3)
>
>This means that:
>
>- Tomasz's patch (commit 0d3da0d26e3c3515997c99451ce3b0ad1a69a36c) is
>fine, it just enables the (wrong) master clock more often
>
>- the failure is within that branch.
>
>I then cherry-picked Tomasz's patch during a new bisection, and landed
>on one of my original suspects:
>
>commit cbcf2dd3b3d4d990610259e8d878fc8dc1f17d80
>Author: Thomas Gleixner <tglx@linutronix.de>
>Date:   Wed Jul 16 21:04:54 2014 +0000
>
>    x86: kvm: Make kvm_get_time_and_clockread() nanoseconds based
>
>    Convert the relevant base data right away to nanoseconds instead of
>    doing the conversion on every readout. Reduces text size by 160
>    bytes.
>
>    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>    Cc: Gleb Natapov <gleb@kernel.org>
>    Cc: kvm@vger.kernel.org
>    Acked-by: Paolo Bonzini <pbonzini@redhat.com>
>    Signed-off-by: John Stultz <john.stultz@linaro.org>
>

Yes, I also look into this bad commit recently which lead to guest hang
after live migration or after local save/restore. 

Regards,
Wanpeng Li 

>Again, here is the log:
>
>$ git bisect bad 953dec21aed4038464fec02f96a2f1b8701a5bce
>$ git bisect good 1af447bd8cbfb808a320885d214555fb2d32e6e6
>
>good    [ 8384.334892] loaded kvm module (v3.16-rc5-81-g68f6783d2831)
>bad     [ 8525.975170] loaded kvm module (v3.16-rc5-99-gf519b1a2e08c)
>good    [ 8562.204988] loaded kvm module (v3.16-rc5-90-g41fa4215f8e8)
>bad     [ 8629.133287] loaded kvm module (v3.16-rc5-94-g48f18fd6addc)
>bad     [ 8772.846612] loaded kvm module (v3.16-rc5-92-gcbcf2dd3b3d4)
>good    [ 8836.509602] loaded kvm module (v3.16-rc5-91-gbb0b58127c5a)
>
>Paolo
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Paolo Bonzini Sept. 4, 2014, 1:11 p.m. UTC | #7

Il 04/09/2014 14:24, Wanpeng Li ha scritto:
> On Thu, Sep 04, 2014 at 01:33:10PM +0200, Paolo Bonzini wrote:
>> Il 04/09/2014 11:53, Paolo Bonzini ha scritto:
>>> Il 03/09/2014 20:25, Chris J Arges ha scritto:
>>>> <snip>
>>>>> I'm not sure about the reason for the warp, but indeed the offset and
>>>>> uptime match (I'll check them against the trace tomorrow) so it's "just"
>>>>> that the VM's TSC base is not taken into account correctly.
>>>>>
>>>>> Can you gather another trace with the problematic patch reverted?
>>>>>
>>>>> Paolo
>>>>>
>>>>
>>>> Here is the third trace running with 0d3da0d2 reverted from the latest
>>>> kvm queue branch 11cc9ea3:
>>>>
>>>> http://people.canonical.com/~arges/kvm/trace-3.dat.xz
>>>
>>> Thanks!  And---yay!---I reproduced it on another machine.
>>
>> And my bisection landed on the merge of the timer branch (commit
>> e7fda6c4c3c1a7d6996dd75fd84670fa0b5d448f).  Here is the log:
>>
>> $ git bisect bad origin/master
>> $ git bisect good v3.16
>> $ git bisect good kvm-3.17-1 # 42cbc04fd3b5e3f9b011bf9fa3ce0b3d1e10b58b
>>
>> good    [ 6929.863545] loaded kvm module (v3.17-rc1-158-g451fd72219dd)
>> bad     [ 6971.932790] loaded kvm module (for-linus)
>> bad     [ 7216.073128] loaded kvm module (v3.16-6426-gae045e245542)
>> good    [ 7286.198948] loaded kvm module (v3.16-3283-g53ee983378ff)
>> good    [ 7350.534060] loaded kvm module (v3.16-rc7-1668-gaef4f5b6db65)
>> good    [ 7439.037038] loaded kvm module (v3.16-4006-g91c2ff7708d4)
>> good    [ 7481.188637] loaded kvm module (v3.16-rc6-450-g7ba3c21c17d0)
>> bad     [ 7535.292730] loaded kvm module (v3.16-4635-ge7fda6c4c3c1)
>> good    [ 7589.722691] loaded kvm module (v3.16-rc5-110-g9b0fd802e8c0)
>> good    [ 7630.286418] loaded kvm module (v3.16-4467-ged5c41d30ef2)
>> good    [ 7712.470986] loaded kvm module (v3.16-rc1-35-g885d078bfe92)
>> good    [ 7763.443626] loaded kvm module (v3.16-rc1-381-g1b0733837a9b)
>> good    [ 7825.497414] loaded kvm module (v3.16-rc5-116-g7806f60e1d20)
>> good    [ 7893.174056] loaded kvm module (v3.16-rc1-384-gc6f1224573c3)
>>
>> This means that:
>>
>> - Tomasz's patch (commit 0d3da0d26e3c3515997c99451ce3b0ad1a69a36c) is
>> fine, it just enables the (wrong) master clock more often
>>
>> - the failure is within that branch.
>>
>> I then cherry-picked Tomasz's patch during a new bisection, and landed
>> on one of my original suspects:
>>
>> commit cbcf2dd3b3d4d990610259e8d878fc8dc1f17d80
>> Author: Thomas Gleixner <tglx@linutronix.de>
>> Date:   Wed Jul 16 21:04:54 2014 +0000
>>
>>    x86: kvm: Make kvm_get_time_and_clockread() nanoseconds based
>>
>>    Convert the relevant base data right away to nanoseconds instead of
>>    doing the conversion on every readout. Reduces text size by 160
>>    bytes.
>>
>>    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>    Cc: Gleb Natapov <gleb@kernel.org>
>>    Cc: kvm@vger.kernel.org
>>    Acked-by: Paolo Bonzini <pbonzini@redhat.com>
>>    Signed-off-by: John Stultz <john.stultz@linaro.org>
>>
> 
> Yes, I also look into this bad commit recently which lead to guest hang
> after live migration or after local save/restore. 

Thanks for the report!

Wanpeng, can you test and/or review the patch I just posted ("[PATCH]
KVM: x86: fix kvmclock breakage from timers branch merge")?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kvm-unit-test failures

Commit Message

Comments

Patch