mbox series

[v7,0/3] hw/acpi: Add vmclock device

Message ID 20250116140315.2455143-1-dwmw2@infradead.org (mailing list archive)
Headers show
Series hw/acpi: Add vmclock device | expand

Message

David Woodhouse Jan. 16, 2025, 1:59 p.m. UTC
(Posting one last time with the header commits split out).

The vmclock device addresses the problem of live migration with
precision clocks. The tolerances of a hardware counter (e.g. TSC) are
typically around ±50PPM. A guest will use NTP/PTP/PPS to discipline that
counter against an external source of 'real' time, and track the precise
frequency of the counter as it changes with environmental conditions.

When a guest is live migrated, anything it knows about the frequency of
the underlying counter becomes invalid. It may move from a host where
the counter running at -50PPM of its nominal frequency, to a host where
it runs at +50PPM. There will also be a step change in the value of the
counter, as the correctness of its absolute value at migration is
limited by the accuracy of the source and destination host's time
synchronization.

The device exposes a shared memory region to guests, which can be mapped
all the way to userspace. In the first phase, this merely advertises a
'disruption_marker', which indicates that the guest should throw away any
NTP synchronization it thinks it has, and start again.

Because the region can be exposed all the way to userspace, applications
can still use time from a fast vDSO 'system call', and check the
disruption marker to be sure that their timestamp is indeed truthful.

The structure also allows for the precise time, as known by the host, to
be exposed directly to guests so that they don't have to wait for NTP to
resync from scratch.

The values and fields are based on the nascent virtio-rtc specification,
and the intent is that a version (hopefully precisely this version) of
this structure will be included as an optional part of that spec. In the
meantime, a simple ACPI device along the lines of VMGENID is perfectly
sufficient and is compatible with what's being shipped in certain
commercial hypervisors.

Linux guest support was merged into the 6.13-rc1 kernel:
https://git.kernel.org/torvalds/c/205032724226

---
v7:
 • Split update-kernel-headers.sh and the addition of the new header
   file into separate commits, add MAINTAINERS entry.

v6:
 • Rebase for DEFINE_PROP_END_OF_LIST removal and sysemu→system
   rename.

v5:
 • Trivial simplification to AML generation.
 • Import vmclock-abi.h from Linux now the guest support is merged.

v4:
 • Trivial checkpatch fixes and comment improvements.

v3:
 • Add comment that vmclock-abi.h will come from the Linux kernel
   headers once it gets merged there.

v2:
 • Change esterror/maxerror fields to nanoseconds.
 • Change to officially assigned AMZNC10C ACPI HID.
 • Fix little-endian handling of fields in update.

David Woodhouse (3):
      linux-headers: Add vmclock-abi.h
      linux-headers: Update to Linux 6.13-rc7
      hw/acpi: Add vmclock device

 MAINTAINERS                                  |   5 +
 hw/acpi/Kconfig                              |   5 +
 hw/acpi/meson.build                          |   1 +
 hw/acpi/vmclock.c                            | 179 ++++++++++++++++++++++++++
 hw/i386/Kconfig                              |   1 +
 hw/i386/acpi-build.c                         |  10 +-
 include/hw/acpi/vmclock.h                    |  34 +++++
 include/standard-headers/linux/vmclock-abi.h | 182 +++++++++++++++++++++++++++
 linux-headers/linux/iommufd.h                |  31 +++--
 linux-headers/linux/stddef.h                 |  13 +-
 scripts/update-linux-headers.sh              |   1 +
 11 files changed, 447 insertions(+), 15 deletions(-)

Comments

Michael S. Tsirkin Jan. 16, 2025, 2:44 p.m. UTC | #1
On Thu, Jan 16, 2025 at 01:59:40PM +0000, David Woodhouse wrote:
> (Posting one last time with the header commits split out).
> 
> The vmclock device addresses the problem of live migration with
> precision clocks. The tolerances of a hardware counter (e.g. TSC) are
> typically around ±50PPM. A guest will use NTP/PTP/PPS to discipline that
> counter against an external source of 'real' time, and track the precise
> frequency of the counter as it changes with environmental conditions.
> 
> When a guest is live migrated, anything it knows about the frequency of
> the underlying counter becomes invalid. It may move from a host where
> the counter running at -50PPM of its nominal frequency, to a host where
> it runs at +50PPM. There will also be a step change in the value of the
> counter, as the correctness of its absolute value at migration is
> limited by the accuracy of the source and destination host's time
> synchronization.
> 
> The device exposes a shared memory region to guests, which can be mapped
> all the way to userspace. In the first phase, this merely advertises a
> 'disruption_marker', which indicates that the guest should throw away any
> NTP synchronization it thinks it has, and start again.
> 
> Because the region can be exposed all the way to userspace, applications
> can still use time from a fast vDSO 'system call', and check the
> disruption marker to be sure that their timestamp is indeed truthful.
> 
> The structure also allows for the precise time, as known by the host, to
> be exposed directly to guests so that they don't have to wait for NTP to
> resync from scratch.
> 
> The values and fields are based on the nascent virtio-rtc specification,
> and the intent is that a version (hopefully precisely this version) of
> this structure will be included as an optional part of that spec. In the
> meantime, a simple ACPI device along the lines of VMGENID is perfectly
> sufficient and is compatible with what's being shipped in certain
> commercial hypervisors.
> 
> Linux guest support was merged into the 6.13-rc1 kernel:
> https://git.kernel.org/torvalds/c/205032724226



Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

feel free to merge.

> ---
> v7:
>  • Split update-kernel-headers.sh and the addition of the new header
>    file into separate commits, add MAINTAINERS entry.
> 
> v6:
>  • Rebase for DEFINE_PROP_END_OF_LIST removal and sysemu→system
>    rename.
> 
> v5:
>  • Trivial simplification to AML generation.
>  • Import vmclock-abi.h from Linux now the guest support is merged.
> 
> v4:
>  • Trivial checkpatch fixes and comment improvements.
> 
> v3:
>  • Add comment that vmclock-abi.h will come from the Linux kernel
>    headers once it gets merged there.
> 
> v2:
>  • Change esterror/maxerror fields to nanoseconds.
>  • Change to officially assigned AMZNC10C ACPI HID.
>  • Fix little-endian handling of fields in update.
> 
> David Woodhouse (3):
>       linux-headers: Add vmclock-abi.h
>       linux-headers: Update to Linux 6.13-rc7
>       hw/acpi: Add vmclock device
> 
>  MAINTAINERS                                  |   5 +
>  hw/acpi/Kconfig                              |   5 +
>  hw/acpi/meson.build                          |   1 +
>  hw/acpi/vmclock.c                            | 179 ++++++++++++++++++++++++++
>  hw/i386/Kconfig                              |   1 +
>  hw/i386/acpi-build.c                         |  10 +-
>  include/hw/acpi/vmclock.h                    |  34 +++++
>  include/standard-headers/linux/vmclock-abi.h | 182 +++++++++++++++++++++++++++
>  linux-headers/linux/iommufd.h                |  31 +++--
>  linux-headers/linux/stddef.h                 |  13 +-
>  scripts/update-linux-headers.sh              |   1 +
>  11 files changed, 447 insertions(+), 15 deletions(-)
>
David Woodhouse Jan. 16, 2025, 2:54 p.m. UTC | #2
On Thu, 2025-01-16 at 09:44 -0500, Michael S. Tsirkin wrote:
> 
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> 
> feel free to merge.

Thanks. I've added your R-b to all three (replacing your previous
Acked-by), and will post the PR tomorrow to give others a chance to
comment on the header bits.