diff mbox

[kvm-unit-tests,2/3] arm64: timer: Fix test on APM X-Gene

Message ID 20170713192009.10069-3-cdall@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Christoffer Dall July 13, 2017, 7:20 p.m. UTC
When running the vtimer test on an APM X-Gene, setting the timer value
to (2^64 - 1) apparently results in the timer always firing, even
thought the counter is mich lower than the cval.

Since the idea of the code is to set everything up and schedule the
timer for some time very far in the future, take a pragmatic approach
and just add 10s worth of delay instead.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 arm/timer.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Marc Zyngier July 14, 2017, 8:04 a.m. UTC | #1
On 13/07/17 20:20, Christoffer Dall wrote:
> When running the vtimer test on an APM X-Gene, setting the timer value
> to (2^64 - 1) apparently results in the timer always firing, even
> thought the counter is mich lower than the cval.

Note that the system counter is only guaranteed to be at least 56 bit
wide (see DDI0487B.a G5.1.2), and I seem to remember that X-Gene only
has the minimum. This could explain why setting the comparator to a
value greater than (2^56 - 1) leads to a firing timer (the comparator
appears to be in the past).

Thanks,

	M.
Christoffer Dall July 14, 2017, 3:45 p.m. UTC | #2
On Fri, Jul 14, 2017 at 09:04:10AM +0100, Marc Zyngier wrote:
> On 13/07/17 20:20, Christoffer Dall wrote:
> > When running the vtimer test on an APM X-Gene, setting the timer value
> > to (2^64 - 1) apparently results in the timer always firing, even
> > thought the counter is mich lower than the cval.
> 
> Note that the system counter is only guaranteed to be at least 56 bit
> wide (see DDI0487B.a G5.1.2), and I seem to remember that X-Gene only
> has the minimum. This could explain why setting the comparator to a
> value greater than (2^56 - 1) leads to a firing timer (the comparator
> appears to be in the past).

Thanks for pointing that out, that makes good sense.  So then we should
definitely fix the test.

We could either set it to 2^56 - 1 instead, or just keep the 10s as used
in this patch, because the whole test times out after 2s anyway.

Thanks,
-Christoffer
Andrew Jones July 18, 2017, 10:05 a.m. UTC | #3
On Fri, Jul 14, 2017 at 08:45:12AM -0700, Christoffer Dall wrote:
> On Fri, Jul 14, 2017 at 09:04:10AM +0100, Marc Zyngier wrote:
> > On 13/07/17 20:20, Christoffer Dall wrote:
> > > When running the vtimer test on an APM X-Gene, setting the timer value
> > > to (2^64 - 1) apparently results in the timer always firing, even
> > > thought the counter is mich lower than the cval.
> > 
> > Note that the system counter is only guaranteed to be at least 56 bit
> > wide (see DDI0487B.a G5.1.2), and I seem to remember that X-Gene only
> > has the minimum. This could explain why setting the comparator to a
> > value greater than (2^56 - 1) leads to a firing timer (the comparator
> > appears to be in the past).

Ah, that explains why when I tried setting it to ~0 on my mustang, and
then reading it back, it was always 2^56 - 1 instead. However my mustang
still also requires me to clear bit 31, otherwise the vcpu hangs.

> 
> Thanks for pointing that out, that makes good sense.  So then we should
> definitely fix the test.
> 
> We could either set it to 2^56 - 1 instead, or just keep the 10s as used
> in this patch, because the whole test times out after 2s anyway.

With the 10s version the test runs and passes on my mustang, so on one
hand I prefer it. OTOH, testing to the spec, by using 2^56 - 1, seems
more correct and allows one to find issues like the one I have on my
mustang, i.e. a vcpu hang when bit 31 isn't clear.

I guess I lean more towards testings to the spec, but not enough to
ask for a v2 of the patch. It's up to you.

Thanks,
drew
Christoffer Dall July 18, 2017, 10:35 a.m. UTC | #4
On Tue, Jul 18, 2017 at 12:05:03PM +0200, Andrew Jones wrote:
> On Fri, Jul 14, 2017 at 08:45:12AM -0700, Christoffer Dall wrote:
> > On Fri, Jul 14, 2017 at 09:04:10AM +0100, Marc Zyngier wrote:
> > > On 13/07/17 20:20, Christoffer Dall wrote:
> > > > When running the vtimer test on an APM X-Gene, setting the timer value
> > > > to (2^64 - 1) apparently results in the timer always firing, even
> > > > thought the counter is mich lower than the cval.
> > > 
> > > Note that the system counter is only guaranteed to be at least 56 bit
> > > wide (see DDI0487B.a G5.1.2), and I seem to remember that X-Gene only
> > > has the minimum. This could explain why setting the comparator to a
> > > value greater than (2^56 - 1) leads to a firing timer (the comparator
> > > appears to be in the past).
> 
> Ah, that explains why when I tried setting it to ~0 on my mustang, and
> then reading it back, it was always 2^56 - 1 instead. However my mustang
> still also requires me to clear bit 31, otherwise the vcpu hangs.
> 
> > 
> > Thanks for pointing that out, that makes good sense.  So then we should
> > definitely fix the test.
> > 
> > We could either set it to 2^56 - 1 instead, or just keep the 10s as used
> > in this patch, because the whole test times out after 2s anyway.
> 
> With the 10s version the test runs and passes on my mustang, so on one
> hand I prefer it. OTOH, testing to the spec, by using 2^56 - 1, seems
> more correct and allows one to find issues like the one I have on my
> mustang, i.e. a vcpu hang when bit 31 isn't clear.
> 
> I guess I lean more towards testings to the spec, but not enough to
> ask for a v2 of the patch. It's up to you.

I think we should have something that tests KVM on the platforms we have
and that are available for people's use.  I don't think we should verify
the architecture as much.  People use the m400 (basically
Mustang) in CloudLab, for example, which is we I keep caring about that.

I think this test was designed to test "if I program a timer to some
time in the future it shouldn't fire right away", which is still what
we test with this patch.

If we want to add a "the platform provides a timer with 56 valid bits in
the counter and compare register", then I think it should be a separate
test, and the the user can see that "basic stuff works", "architecture
compliance not so much" and shrug accordingly.

Thanks,
-Christoffer
Andrew Jones July 18, 2017, 12:15 p.m. UTC | #5
On Tue, Jul 18, 2017 at 12:35:12PM +0200, Christoffer Dall wrote:
> On Tue, Jul 18, 2017 at 12:05:03PM +0200, Andrew Jones wrote:
> > On Fri, Jul 14, 2017 at 08:45:12AM -0700, Christoffer Dall wrote:
> > > On Fri, Jul 14, 2017 at 09:04:10AM +0100, Marc Zyngier wrote:
> > > > On 13/07/17 20:20, Christoffer Dall wrote:
> > > > > When running the vtimer test on an APM X-Gene, setting the timer value
> > > > > to (2^64 - 1) apparently results in the timer always firing, even
> > > > > thought the counter is mich lower than the cval.
> > > > 
> > > > Note that the system counter is only guaranteed to be at least 56 bit
> > > > wide (see DDI0487B.a G5.1.2), and I seem to remember that X-Gene only
> > > > has the minimum. This could explain why setting the comparator to a
> > > > value greater than (2^56 - 1) leads to a firing timer (the comparator
> > > > appears to be in the past).
> > 
> > Ah, that explains why when I tried setting it to ~0 on my mustang, and
> > then reading it back, it was always 2^56 - 1 instead. However my mustang
> > still also requires me to clear bit 31, otherwise the vcpu hangs.
> > 
> > > 
> > > Thanks for pointing that out, that makes good sense.  So then we should
> > > definitely fix the test.
> > > 
> > > We could either set it to 2^56 - 1 instead, or just keep the 10s as used
> > > in this patch, because the whole test times out after 2s anyway.
> > 
> > With the 10s version the test runs and passes on my mustang, so on one
> > hand I prefer it. OTOH, testing to the spec, by using 2^56 - 1, seems
> > more correct and allows one to find issues like the one I have on my
> > mustang, i.e. a vcpu hang when bit 31 isn't clear.
> > 
> > I guess I lean more towards testings to the spec, but not enough to
> > ask for a v2 of the patch. It's up to you.
> 
> I think we should have something that tests KVM on the platforms we have
> and that are available for people's use.  I don't think we should verify
> the architecture as much.  People use the m400 (basically
> Mustang) in CloudLab, for example, which is we I keep caring about that.
> 
> I think this test was designed to test "if I program a timer to some
> time in the future it shouldn't fire right away", which is still what
> we test with this patch.
> 
> If we want to add a "the platform provides a timer with 56 valid bits in
> the counter and compare register", then I think it should be a separate
> test, and the the user can see that "basic stuff works", "architecture
> compliance not so much" and shrug accordingly.

Two separate tests sounds good to me. Although, if the value of 'now' is
large enough that now + 10s will set bit 31, then a mustang run (at least
mine) will fail - and most likely cause a lot of confusion, since it
normally does not. We should probably attempt to address that known issue
in some way as well.

Thanks,
drew
Paolo Bonzini July 24, 2017, 5:13 p.m. UTC | #6
On 18/07/2017 14:15, Andrew Jones wrote:
>>
>> If we want to add a "the platform provides a timer with 56 valid bits in
>> the counter and compare register", then I think it should be a separate
>> test, and the the user can see that "basic stuff works", "architecture
>> compliance not so much" and shrug accordingly.
> Two separate tests sounds good to me. Although, if the value of 'now' is
> large enough that now + 10s will set bit 31, then a mustang run (at least
> mine) will fail - and most likely cause a lot of confusion, since it
> normally does not. We should probably attempt to address that known issue
> in some way as well.

Is the value relative to vm startup or host startup?

Paolo
Christoffer Dall July 24, 2017, 9:25 p.m. UTC | #7
On Mon, Jul 24, 2017 at 7:13 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 18/07/2017 14:15, Andrew Jones wrote:
>>>
>>> If we want to add a "the platform provides a timer with 56 valid bits in
>>> the counter and compare register", then I think it should be a separate
>>> test, and the the user can see that "basic stuff works", "architecture
>>> compliance not so much" and shrug accordingly.
>> Two separate tests sounds good to me. Although, if the value of 'now' is
>> large enough that now + 10s will set bit 31, then a mustang run (at least
>> mine) will fail - and most likely cause a lot of confusion, since it
>> normally does not. We should probably attempt to address that known issue
>> in some way as well.
>
> Is the value relative to vm startup or host startup?
>
For the virtual timer, KVM currently resets the counter to zero (so VM
startup) and for the physical timer it's since host startup.

I confirmed the behavior Drew reported on my mustang too, but it only
appears to apply for the vtimer when writing cval with bit 31 set, not
for the ptimer.  I was thinking that this may be a problem with KVM,
some sort of signed bit extension problem, but since the problem
doesn't show up on other platforms, it may just be a problem there.
Shrug.

Anyway, since it works with the ptimer, and we should have at least 56
bits of counter space at ~50 MHz, we should be fine here.

Thanks,
-Christoffer
Christoffer Dall July 26, 2017, 11:38 a.m. UTC | #8
On Mon, Jul 24, 2017 at 11:25:50PM +0200, Christoffer Dall wrote:
> On Mon, Jul 24, 2017 at 7:13 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > On 18/07/2017 14:15, Andrew Jones wrote:
> >>>
> >>> If we want to add a "the platform provides a timer with 56 valid bits in
> >>> the counter and compare register", then I think it should be a separate
> >>> test, and the the user can see that "basic stuff works", "architecture
> >>> compliance not so much" and shrug accordingly.
> >> Two separate tests sounds good to me. Although, if the value of 'now' is
> >> large enough that now + 10s will set bit 31, then a mustang run (at least
> >> mine) will fail - and most likely cause a lot of confusion, since it
> >> normally does not. We should probably attempt to address that known issue
> >> in some way as well.
> >
> > Is the value relative to vm startup or host startup?
> >
> For the virtual timer, KVM currently resets the counter to zero (so VM
> startup) and for the physical timer it's since host startup.
> 
> I confirmed the behavior Drew reported on my mustang too, but it only
> appears to apply for the vtimer when writing cval with bit 31 set, not
> for the ptimer.  I was thinking that this may be a problem with KVM,
> some sort of signed bit extension problem, but since the problem
> doesn't show up on other platforms, it may just be a problem there.

ok, so I did some digging here.  The problem doesn't seem to be with bit
31 specifically, but rather that the Mustang cannot handle a larger
difference between cval and cnt, than (1 << 31) - 1.

I verified this by writing to tval instead and increasing its value more
and more.

Furthermore, I also added a delay loop, that waits until the counter
reaches 0xffffffff and then program cval with something later, which
works just fine.

I also looked at the hardware registers and observe things like:

cval: 0x100000000 cnt_ctl: 0x5 cnt: 0x225a6e

Which means that the timer sets ISTATUS even though it clearly shouldn't
fire.

The explanation why Linux always (seems to) work on the platform is
likely that Linux never programs a timer further out than ~80 seconds.

Broken hardware, oh well.

Thanks,
-Christoffer
diff mbox

Patch

diff --git a/arm/timer.c b/arm/timer.c
index 02d9e0a..77d257c 100644
--- a/arm/timer.c
+++ b/arm/timer.c
@@ -70,10 +70,14 @@  static bool test_cval_10msec(void)
 
 static void test_vtimer(void)
 {
+	u64 now = read_sysreg(cntvct_el0);
+	u64 time_10s = read_sysreg(cntfrq_el0) * 10;
+	u64 later = now + time_10s;
+
 	report_prefix_push("vtimer-busy-loop");
 
-	/* Enable the timer */
-	write_sysreg(~0, cntv_cval_el0);
+	/* Enable the timer, but schedule it for much later*/
+	write_sysreg(later, cntv_cval_el0);
 	isb();
 	write_sysreg(CNTV_CTL_ENABLE, cntv_ctl_el0);