Message ID | 20170713192009.10069-3-cdall@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 13/07/17 20:20, Christoffer Dall wrote: > When running the vtimer test on an APM X-Gene, setting the timer value > to (2^64 - 1) apparently results in the timer always firing, even > thought the counter is mich lower than the cval. Note that the system counter is only guaranteed to be at least 56 bit wide (see DDI0487B.a G5.1.2), and I seem to remember that X-Gene only has the minimum. This could explain why setting the comparator to a value greater than (2^56 - 1) leads to a firing timer (the comparator appears to be in the past). Thanks, M.
On Fri, Jul 14, 2017 at 09:04:10AM +0100, Marc Zyngier wrote: > On 13/07/17 20:20, Christoffer Dall wrote: > > When running the vtimer test on an APM X-Gene, setting the timer value > > to (2^64 - 1) apparently results in the timer always firing, even > > thought the counter is mich lower than the cval. > > Note that the system counter is only guaranteed to be at least 56 bit > wide (see DDI0487B.a G5.1.2), and I seem to remember that X-Gene only > has the minimum. This could explain why setting the comparator to a > value greater than (2^56 - 1) leads to a firing timer (the comparator > appears to be in the past). Thanks for pointing that out, that makes good sense. So then we should definitely fix the test. We could either set it to 2^56 - 1 instead, or just keep the 10s as used in this patch, because the whole test times out after 2s anyway. Thanks, -Christoffer
On Fri, Jul 14, 2017 at 08:45:12AM -0700, Christoffer Dall wrote: > On Fri, Jul 14, 2017 at 09:04:10AM +0100, Marc Zyngier wrote: > > On 13/07/17 20:20, Christoffer Dall wrote: > > > When running the vtimer test on an APM X-Gene, setting the timer value > > > to (2^64 - 1) apparently results in the timer always firing, even > > > thought the counter is mich lower than the cval. > > > > Note that the system counter is only guaranteed to be at least 56 bit > > wide (see DDI0487B.a G5.1.2), and I seem to remember that X-Gene only > > has the minimum. This could explain why setting the comparator to a > > value greater than (2^56 - 1) leads to a firing timer (the comparator > > appears to be in the past). Ah, that explains why when I tried setting it to ~0 on my mustang, and then reading it back, it was always 2^56 - 1 instead. However my mustang still also requires me to clear bit 31, otherwise the vcpu hangs. > > Thanks for pointing that out, that makes good sense. So then we should > definitely fix the test. > > We could either set it to 2^56 - 1 instead, or just keep the 10s as used > in this patch, because the whole test times out after 2s anyway. With the 10s version the test runs and passes on my mustang, so on one hand I prefer it. OTOH, testing to the spec, by using 2^56 - 1, seems more correct and allows one to find issues like the one I have on my mustang, i.e. a vcpu hang when bit 31 isn't clear. I guess I lean more towards testings to the spec, but not enough to ask for a v2 of the patch. It's up to you. Thanks, drew
On Tue, Jul 18, 2017 at 12:05:03PM +0200, Andrew Jones wrote: > On Fri, Jul 14, 2017 at 08:45:12AM -0700, Christoffer Dall wrote: > > On Fri, Jul 14, 2017 at 09:04:10AM +0100, Marc Zyngier wrote: > > > On 13/07/17 20:20, Christoffer Dall wrote: > > > > When running the vtimer test on an APM X-Gene, setting the timer value > > > > to (2^64 - 1) apparently results in the timer always firing, even > > > > thought the counter is mich lower than the cval. > > > > > > Note that the system counter is only guaranteed to be at least 56 bit > > > wide (see DDI0487B.a G5.1.2), and I seem to remember that X-Gene only > > > has the minimum. This could explain why setting the comparator to a > > > value greater than (2^56 - 1) leads to a firing timer (the comparator > > > appears to be in the past). > > Ah, that explains why when I tried setting it to ~0 on my mustang, and > then reading it back, it was always 2^56 - 1 instead. However my mustang > still also requires me to clear bit 31, otherwise the vcpu hangs. > > > > > Thanks for pointing that out, that makes good sense. So then we should > > definitely fix the test. > > > > We could either set it to 2^56 - 1 instead, or just keep the 10s as used > > in this patch, because the whole test times out after 2s anyway. > > With the 10s version the test runs and passes on my mustang, so on one > hand I prefer it. OTOH, testing to the spec, by using 2^56 - 1, seems > more correct and allows one to find issues like the one I have on my > mustang, i.e. a vcpu hang when bit 31 isn't clear. > > I guess I lean more towards testings to the spec, but not enough to > ask for a v2 of the patch. It's up to you. I think we should have something that tests KVM on the platforms we have and that are available for people's use. I don't think we should verify the architecture as much. People use the m400 (basically Mustang) in CloudLab, for example, which is we I keep caring about that. I think this test was designed to test "if I program a timer to some time in the future it shouldn't fire right away", which is still what we test with this patch. If we want to add a "the platform provides a timer with 56 valid bits in the counter and compare register", then I think it should be a separate test, and the the user can see that "basic stuff works", "architecture compliance not so much" and shrug accordingly. Thanks, -Christoffer
On Tue, Jul 18, 2017 at 12:35:12PM +0200, Christoffer Dall wrote: > On Tue, Jul 18, 2017 at 12:05:03PM +0200, Andrew Jones wrote: > > On Fri, Jul 14, 2017 at 08:45:12AM -0700, Christoffer Dall wrote: > > > On Fri, Jul 14, 2017 at 09:04:10AM +0100, Marc Zyngier wrote: > > > > On 13/07/17 20:20, Christoffer Dall wrote: > > > > > When running the vtimer test on an APM X-Gene, setting the timer value > > > > > to (2^64 - 1) apparently results in the timer always firing, even > > > > > thought the counter is mich lower than the cval. > > > > > > > > Note that the system counter is only guaranteed to be at least 56 bit > > > > wide (see DDI0487B.a G5.1.2), and I seem to remember that X-Gene only > > > > has the minimum. This could explain why setting the comparator to a > > > > value greater than (2^56 - 1) leads to a firing timer (the comparator > > > > appears to be in the past). > > > > Ah, that explains why when I tried setting it to ~0 on my mustang, and > > then reading it back, it was always 2^56 - 1 instead. However my mustang > > still also requires me to clear bit 31, otherwise the vcpu hangs. > > > > > > > > Thanks for pointing that out, that makes good sense. So then we should > > > definitely fix the test. > > > > > > We could either set it to 2^56 - 1 instead, or just keep the 10s as used > > > in this patch, because the whole test times out after 2s anyway. > > > > With the 10s version the test runs and passes on my mustang, so on one > > hand I prefer it. OTOH, testing to the spec, by using 2^56 - 1, seems > > more correct and allows one to find issues like the one I have on my > > mustang, i.e. a vcpu hang when bit 31 isn't clear. > > > > I guess I lean more towards testings to the spec, but not enough to > > ask for a v2 of the patch. It's up to you. > > I think we should have something that tests KVM on the platforms we have > and that are available for people's use. I don't think we should verify > the architecture as much. People use the m400 (basically > Mustang) in CloudLab, for example, which is we I keep caring about that. > > I think this test was designed to test "if I program a timer to some > time in the future it shouldn't fire right away", which is still what > we test with this patch. > > If we want to add a "the platform provides a timer with 56 valid bits in > the counter and compare register", then I think it should be a separate > test, and the the user can see that "basic stuff works", "architecture > compliance not so much" and shrug accordingly. Two separate tests sounds good to me. Although, if the value of 'now' is large enough that now + 10s will set bit 31, then a mustang run (at least mine) will fail - and most likely cause a lot of confusion, since it normally does not. We should probably attempt to address that known issue in some way as well. Thanks, drew
On 18/07/2017 14:15, Andrew Jones wrote: >> >> If we want to add a "the platform provides a timer with 56 valid bits in >> the counter and compare register", then I think it should be a separate >> test, and the the user can see that "basic stuff works", "architecture >> compliance not so much" and shrug accordingly. > Two separate tests sounds good to me. Although, if the value of 'now' is > large enough that now + 10s will set bit 31, then a mustang run (at least > mine) will fail - and most likely cause a lot of confusion, since it > normally does not. We should probably attempt to address that known issue > in some way as well. Is the value relative to vm startup or host startup? Paolo
On Mon, Jul 24, 2017 at 7:13 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > On 18/07/2017 14:15, Andrew Jones wrote: >>> >>> If we want to add a "the platform provides a timer with 56 valid bits in >>> the counter and compare register", then I think it should be a separate >>> test, and the the user can see that "basic stuff works", "architecture >>> compliance not so much" and shrug accordingly. >> Two separate tests sounds good to me. Although, if the value of 'now' is >> large enough that now + 10s will set bit 31, then a mustang run (at least >> mine) will fail - and most likely cause a lot of confusion, since it >> normally does not. We should probably attempt to address that known issue >> in some way as well. > > Is the value relative to vm startup or host startup? > For the virtual timer, KVM currently resets the counter to zero (so VM startup) and for the physical timer it's since host startup. I confirmed the behavior Drew reported on my mustang too, but it only appears to apply for the vtimer when writing cval with bit 31 set, not for the ptimer. I was thinking that this may be a problem with KVM, some sort of signed bit extension problem, but since the problem doesn't show up on other platforms, it may just be a problem there. Shrug. Anyway, since it works with the ptimer, and we should have at least 56 bits of counter space at ~50 MHz, we should be fine here. Thanks, -Christoffer
On Mon, Jul 24, 2017 at 11:25:50PM +0200, Christoffer Dall wrote: > On Mon, Jul 24, 2017 at 7:13 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 18/07/2017 14:15, Andrew Jones wrote: > >>> > >>> If we want to add a "the platform provides a timer with 56 valid bits in > >>> the counter and compare register", then I think it should be a separate > >>> test, and the the user can see that "basic stuff works", "architecture > >>> compliance not so much" and shrug accordingly. > >> Two separate tests sounds good to me. Although, if the value of 'now' is > >> large enough that now + 10s will set bit 31, then a mustang run (at least > >> mine) will fail - and most likely cause a lot of confusion, since it > >> normally does not. We should probably attempt to address that known issue > >> in some way as well. > > > > Is the value relative to vm startup or host startup? > > > For the virtual timer, KVM currently resets the counter to zero (so VM > startup) and for the physical timer it's since host startup. > > I confirmed the behavior Drew reported on my mustang too, but it only > appears to apply for the vtimer when writing cval with bit 31 set, not > for the ptimer. I was thinking that this may be a problem with KVM, > some sort of signed bit extension problem, but since the problem > doesn't show up on other platforms, it may just be a problem there. ok, so I did some digging here. The problem doesn't seem to be with bit 31 specifically, but rather that the Mustang cannot handle a larger difference between cval and cnt, than (1 << 31) - 1. I verified this by writing to tval instead and increasing its value more and more. Furthermore, I also added a delay loop, that waits until the counter reaches 0xffffffff and then program cval with something later, which works just fine. I also looked at the hardware registers and observe things like: cval: 0x100000000 cnt_ctl: 0x5 cnt: 0x225a6e Which means that the timer sets ISTATUS even though it clearly shouldn't fire. The explanation why Linux always (seems to) work on the platform is likely that Linux never programs a timer further out than ~80 seconds. Broken hardware, oh well. Thanks, -Christoffer
diff --git a/arm/timer.c b/arm/timer.c index 02d9e0a..77d257c 100644 --- a/arm/timer.c +++ b/arm/timer.c @@ -70,10 +70,14 @@ static bool test_cval_10msec(void) static void test_vtimer(void) { + u64 now = read_sysreg(cntvct_el0); + u64 time_10s = read_sysreg(cntfrq_el0) * 10; + u64 later = now + time_10s; + report_prefix_push("vtimer-busy-loop"); - /* Enable the timer */ - write_sysreg(~0, cntv_cval_el0); + /* Enable the timer, but schedule it for much later*/ + write_sysreg(later, cntv_cval_el0); isb(); write_sysreg(CNTV_CTL_ENABLE, cntv_ctl_el0);
When running the vtimer test on an APM X-Gene, setting the timer value to (2^64 - 1) apparently results in the timer always firing, even thought the counter is mich lower than the cval. Since the idea of the code is to set everything up and schedule the timer for some time very far in the future, take a pragmatic approach and just add 10s worth of delay instead. Signed-off-by: Christoffer Dall <cdall@linaro.org> --- arm/timer.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)