diff mbox

tsc: use kvmclock for calibration

Message ID 5024C1F3.80103@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Gerd Hoffmann Aug. 10, 2012, 8:10 a.m. UTC
Hi,

>>>   (1) Use this patch (with alignment issue fixed of course).
>>>   (2) Do a full kvmclock implementation.  Feels a bit like overkill.
>>>   (3) SeaBIOS can fallback to the PIT for timing on machines which
>>>       have no TSC.  We could do that too in case we detect kvm ...
>>
>> What sort of timeouts are these?  If seconds, maybe the rtc would be best.
> 
> I vote for 3 so nobody has to maintain kvmclock code in SeaBIOS and Gerd
> can fix the in-kernel PIT issues with GRUB (see Michaels message) while testing.

(2) turned out to be not too bad when taking a shortcut: Go through an
enable/disable cycle each time we read the clock, then just grab
system_time.  Not that efficient, but should be ok for seabios.  Usually
it checks the clock when sitting around idle, waiting for something to
happen.  And it simplifies the implementation alot as we can just skip
all the tsc frequency & delta calculations.

Draft patch attached.  Comments?

cheers,
  Gerd
From e42d62e90ae4b8a00413a0665d4022069154a516 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann <kraxel@redhat.com>
Date: Thu, 9 Aug 2012 13:26:18 +0200
Subject: [PATCH] kvmclock clocksource

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 Makefile       |    4 +-
 src/clock.c    |   13 +++++++++++
 src/paravirt.c |   65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/paravirt.h |    3 ++
 4 files changed, 83 insertions(+), 2 deletions(-)

Comments

Marcelo Tosatti Aug. 10, 2012, 9:26 p.m. UTC | #1
On Fri, Aug 10, 2012 at 10:10:27AM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> >>>   (1) Use this patch (with alignment issue fixed of course).
> >>>   (2) Do a full kvmclock implementation.  Feels a bit like overkill.
> >>>   (3) SeaBIOS can fallback to the PIT for timing on machines which
> >>>       have no TSC.  We could do that too in case we detect kvm ...
> >>
> >> What sort of timeouts are these?  If seconds, maybe the rtc would be best.
> > 
> > I vote for 3 so nobody has to maintain kvmclock code in SeaBIOS and Gerd
> > can fix the in-kernel PIT issues with GRUB (see Michaels message) while testing.
> 
> (2) turned out to be not too bad when taking a shortcut: Go through an
> enable/disable cycle each time we read the clock, then just grab
> system_time.  Not that efficient, but should be ok for seabios.  Usually
> it checks the clock when sitting around idle, waiting for something to
> happen.  And it simplifies the implementation alot as we can just skip
> all the tsc frequency & delta calculations.
> 
> Draft patch attached.  Comments?

Given the history of problems with kvmclock, would rather see it not
being used for delays, if possible. Your shortcut gets rid of a class of
problems, but there might be others (...).

Isnt pmtimer ioport usable? 14MHz.

Error handling in kvmclock_init is awkward.

Thanks

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity Aug. 12, 2012, 9 a.m. UTC | #2
On 08/10/2012 11:10 AM, Gerd Hoffmann wrote:
>   Hi,
> 
>>>> >>>   (1) Use this patch (with alignment issue fixed of course).
>>>> >>>   (2) Do a full kvmclock implementation.  Feels a bit like overkill.
>>>> >>>   (3) SeaBIOS can fallback to the PIT for timing on machines which
>>>> >>>       have no TSC.  We could do that too in case we detect kvm ...
>>> >>
>>> >> What sort of timeouts are these?  If seconds, maybe the rtc would be best.
>> > 
>> > I vote for 3 so nobody has to maintain kvmclock code in SeaBIOS and Gerd
>> > can fix the in-kernel PIT issues with GRUB (see Michaels message) while testing.
> (2) turned out to be not too bad when taking a shortcut: Go through an
> enable/disable cycle each time we read the clock, then just grab
> system_time.  Not that efficient, but should be ok for seabios.  Usually
> it checks the clock when sitting around idle, waiting for something to
> happen.  And it simplifies the implementation alot as we can just skip
> all the tsc frequency & delta calculations.
> 
> Draft patch attached.  Comments?
> 
> +
> +static void kvmclock_fetch(struct pvclock_vcpu_time_info *time)
> +{
> +    u32 addr = (u32)MAKE_FLATPTR(GET_SEG(SS), time);
> +    u32 msr = GET_GLOBAL(kvm_systime_msr);
> +
> +    memset(time, 0, sizeof(*time));
> +    wrmsr(msr, addr | 1);

I'd put the time calculations in here.  We don't specify what happens to
the data area after disabling kvmclock; it could be in the middle of an
update.

> +    wrmsr(msr, 0);
> +}
> +
> +u64 kvmclock_get(void)
> +{
> +    struct pvclock_vcpu_time_info time;
> +
> +    kvmclock_fetch(&time);
> +    return time.system_time;

That's just a random number.  You have to do the full calculation.
Gerd Hoffmann Aug. 13, 2012, 10:37 a.m. UTC | #3
Hi,

> Isnt pmtimer ioport usable? 14MHz.

Can give it a try.  14 MHz looks wrong though, apci.h says:

/* PM Timer ticks per second (HZ) */
#define PM_TIMER_FREQUENCY  3579545

Is this fixed?  Or hardware specific?

cheers,
  Gerd

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov Aug. 13, 2012, 10:46 a.m. UTC | #4
On Mon, Aug 13, 2012 at 12:37:11PM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> > Isnt pmtimer ioport usable? 14MHz.
> 
> Can give it a try.  14 MHz looks wrong though, apci.h says:
> 
> /* PM Timer ticks per second (HZ) */
> #define PM_TIMER_FREQUENCY  3579545
> 
> Is this fixed?  Or hardware specific?
> 
3.579545 MHz clock required by ACPI spec.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Fred . Aug. 13, 2012, 12:55 p.m. UTC | #5
Add a comment about it in the source code.

-#define PM_TIMER_FREQUENCY  3579545
+#define PM_TIMER_FREQUENCY  3579545 // 3.579545 MHz clock required by
ACPI spec.

On Mon, Aug 13, 2012 at 12:46 PM, Gleb Natapov <gleb@redhat.com> wrote:
> On Mon, Aug 13, 2012 at 12:37:11PM +0200, Gerd Hoffmann wrote:
>>   Hi,
>>
>> > Isnt pmtimer ioport usable? 14MHz.
>>
>> Can give it a try.  14 MHz looks wrong though, apci.h says:
>>
>> /* PM Timer ticks per second (HZ) */
>> #define PM_TIMER_FREQUENCY  3579545
>>
>> Is this fixed?  Or hardware specific?
>>
> 3.579545 MHz clock required by ACPI spec.
>
> --
>                         Gleb.
>
> _______________________________________________
> SeaBIOS mailing list
> SeaBIOS@seabios.org
> http://www.seabios.org/mailman/listinfo/seabios
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Makefile b/Makefile
index 72ee152..b692a96 100644
--- a/Makefile
+++ b/Makefile
@@ -13,11 +13,11 @@  SRCBOTH=misc.c stacks.c pmm.c output.c util.c block.c floppy.c ata.c mouse.c \
     pnpbios.c pirtable.c vgahooks.c ramdisk.c pcibios.c blockcmd.c \
     usb.c usb-uhci.c usb-ohci.c usb-ehci.c usb-hid.c usb-msc.c \
     virtio-ring.c virtio-pci.c virtio-blk.c virtio-scsi.c apm.c ahci.c \
-    usb-uas.c lsi-scsi.c esp-scsi.c
+    usb-uas.c lsi-scsi.c esp-scsi.c paravirt.c
 SRC16=$(SRCBOTH) system.c disk.c font.c
 SRC32FLAT=$(SRCBOTH) post.c shadow.c memmap.c coreboot.c boot.c \
     acpi.c smm.c mptable.c smbios.c pciinit.c optionroms.c mtrr.c \
-    lzmadecode.c bootsplash.c jpeg.c usb-hub.c paravirt.c \
+    lzmadecode.c bootsplash.c jpeg.c usb-hub.c \
     biostables.c xen.c bmp.c romfile.c
 SRC32SEG=util.c output.c pci.c pcibios.c apm.c stacks.c
 
diff --git a/src/clock.c b/src/clock.c
index 69e9f17..15921fa 100644
--- a/src/clock.c
+++ b/src/clock.c
@@ -13,6 +13,7 @@ 
 #include "bregs.h" // struct bregs
 #include "biosvar.h" // GET_GLOBAL
 #include "usb-hid.h" // usb_check_event
+#include "paravirt.h" // kvm clock
 
 // RTC register flags
 #define RTC_A_UIP 0x80
@@ -64,6 +65,7 @@ 
 
 u32 cpu_khz VAR16VISIBLE;
 u8 no_tsc VAR16VISIBLE;
+u8 use_kvmclock VAR16VISIBLE;
 
 static void
 calibrate_tsc(void)
@@ -80,6 +82,15 @@  calibrate_tsc(void)
         return;
     }
 
+    if (kvm_para_available()) {
+        u32 hz = kvmclock_init();
+        if (hz != 0) {
+            SET_GLOBAL(use_kvmclock, 1);
+            SET_GLOBAL(cpu_khz, hz / 1000);
+            return;
+        }
+    }
+
     // Setup "timer2"
     u8 orig = inb(PORT_PS2_CTRLB);
     outb((orig & ~PPCB_SPKR) | PPCB_T2GATE, PORT_PS2_CTRLB);
@@ -134,6 +145,8 @@  get_tsc(void)
 {
     if (unlikely(GET_GLOBAL(no_tsc)))
         return emulate_tsc();
+    if (unlikely(GET_GLOBAL(use_kvmclock)))
+        return kvmclock_get();
     return rdtscll();
 }
 
diff --git a/src/paravirt.c b/src/paravirt.c
index 2a98d53..07aa926 100644
--- a/src/paravirt.c
+++ b/src/paravirt.c
@@ -12,6 +12,7 @@ 
 #include "ioport.h" // outw
 #include "paravirt.h" // qemu_cfg_port_probe
 #include "smbios.h" // struct smbios_structure_header
+#include "biosvar.h" // GET_GLOBAL
 
 int qemu_cfg_present;
 
@@ -346,3 +347,67 @@  void qemu_cfg_romfile_setup(void)
         dprintf(3, "Found fw_cfg file: %s (size=%d)\n", file->name, file->size);
     }
 }
+
+#define KVM_CPUID_SIGNATURE       0x40000000
+#define KVM_CPUID_FEATURES        0x40000001
+#define KVM_FEATURE_CLOCKSOURCE            0
+#define KVM_FEATURE_CLOCKSOURCE2           3
+#define MSR_KVM_SYSTEM_TIME             0x12
+#define MSR_KVM_SYSTEM_TIME_NEW   0x4b564d01
+
+struct pvclock_vcpu_time_info {
+	u32   version;
+	u32   pad0;
+	u64   tsc_timestamp;
+	u64   system_time;
+	u32   tsc_to_system_mul;
+	s8    tsc_shift;
+	u8    flags;
+	u8    pad[2];
+} PACKED;
+
+/* kvmclock system time runs with nanoseconds */
+#define KVM_SYSTIME_HZ   1000000000
+
+u32 kvm_systime_msr VAR16VISIBLE;
+
+static void kvmclock_fetch(struct pvclock_vcpu_time_info *time)
+{
+    u32 addr = (u32)MAKE_FLATPTR(GET_SEG(SS), time);
+    u32 msr = GET_GLOBAL(kvm_systime_msr);
+
+    memset(time, 0, sizeof(*time));
+    wrmsr(msr, addr | 1);
+    wrmsr(msr, 0);
+}
+
+u64 kvmclock_get(void)
+{
+    struct pvclock_vcpu_time_info time;
+
+    kvmclock_fetch(&time);
+    return time.system_time;
+}
+
+u32 kvmclock_init(void)
+{
+    u32 eax, ebx, ecx, edx;
+    struct pvclock_vcpu_time_info time;
+
+    cpuid(KVM_CPUID_FEATURES, &eax, &ebx, &ecx, &edx);
+    if (eax & KVM_FEATURE_CLOCKSOURCE2) {
+        SET_GLOBAL(kvm_systime_msr, MSR_KVM_SYSTEM_TIME_NEW);
+    } else if (eax & KVM_FEATURE_CLOCKSOURCE) {
+        SET_GLOBAL(kvm_systime_msr, MSR_KVM_SYSTEM_TIME);
+    } else {
+        return 0;
+    }
+
+    kvmclock_fetch(&time);
+    if (time.version < 2 || time.tsc_to_system_mul == 0)
+        return 0;
+
+    dprintf(1, "Using kvmclock, msr 0x%x\n",
+            GET_GLOBAL(kvm_systime_msr));
+    return KVM_SYSTIME_HZ;
+}
diff --git a/src/paravirt.h b/src/paravirt.h
index a284c41..64ed3d8 100644
--- a/src/paravirt.h
+++ b/src/paravirt.h
@@ -28,6 +28,9 @@  static inline int kvm_para_available(void)
     return 0;
 }
 
+extern u64 kvmclock_get(void);
+extern u32 kvmclock_init(void);
+
 #define QEMU_CFG_SIGNATURE              0x00
 #define QEMU_CFG_ID                     0x01
 #define QEMU_CFG_UUID                   0x02