Message ID | 20210716212629.2232756-1-oupton@google.com (mailing list archive) |
---|---|
Headers | show |
Series | KVM: Add idempotent controls for migrating system counter state | expand |
On Fri, Jul 16, 2021 at 2:26 PM Oliver Upton <oupton@google.com> wrote: > > KVM's current means of saving/restoring system counters is plagued with > temporal issues. At least on ARM64 and x86, we migrate the guest's > system counter by-value through the respective guest system register > values (cntvct_el0, ia32_tsc). Restoring system counters by-value is > brittle as the state is not idempotent: the host system counter is still > oscillating between the attempted save and restore. Furthermore, VMMs > may wish to transparently live migrate guest VMs, meaning that they > include the elapsed time due to live migration blackout in the guest > system counter view. The VMM thread could be preempted for any number of > reasons (scheduler, L0 hypervisor under nested) between the time that > it calculates the desired guest counter value and when KVM actually sets > this counter state. > > Despite the value-based interface that we present to userspace, KVM > actually has idempotent guest controls by way of system counter offsets. > We can avoid all of the issues associated with a value-based interface > by abstracting these offset controls in new ioctls. This series > introduces new vCPU device attributes to provide userspace access to the > vCPU's system counter offset. > > Patch 1 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK > ioctls to provide userspace with a (host_tsc, realtime) instant. This is > essential for a VMM to perform precise migration of the guest's system > counters. > > Patches 2-3 add support for x86 by shoehorning the new controls into the > pre-existing synchronization heuristics. > > Patches 4-5 implement a test for the new additions to > KVM_{GET,SET}_CLOCK. > > Patches 6-7 implement at test for the tsc offset attribute introduced in > patch 3. > > Patch 8 adds a device attribute for the arm64 virtual counter-timer > offset. > > Patch 9 extends the test from patch 7 to cover the arm64 virtual > counter-timer offset. > > Patch 10 adds a device attribute for the arm64 physical counter-timer > offset. Currently, this is implemented as a synthetic register, forcing > the guest to trap to the host and emulating the offset in the fast exit > path. Later down the line we will have hardware with FEAT_ECV, which > allows the hypervisor to perform physical counter-timer offsetting in > hardware (CNTPOFF_EL2). > > Patch 11 extends the test from patch 7 to cover the arm64 physical > counter-timer offset. > > Patch 12 introduces a benchmark to measure the overhead of emulation in > patch 10. > > Physical counter benchmark > -------------------------- > > The following data was collected by running 10000 iterations of the > benchmark test from Patch 6 on an Ampere Mt. Jade reference server, A 2S > machine with 2 80-core Ampere Altra SoCs. Measurements were collected > for both VHE and nVHE operation using the `kvm-arm.mode=` command-line > parameter. > > nVHE > ---- > > +--------------------+--------+---------+ > | Metric | Native | Trapped | > +--------------------+--------+---------+ > | Average | 54ns | 148ns | > | Standard Deviation | 124ns | 122ns | > | 95th Percentile | 258ns | 348ns | > +--------------------+--------+---------+ > > VHE > --- > > +--------------------+--------+---------+ > | Metric | Native | Trapped | > +--------------------+--------+---------+ > | Average | 53ns | 152ns | > | Standard Deviation | 92ns | 94ns | > | 95th Percentile | 204ns | 307ns | > +--------------------+--------+---------+ > > This series applies cleanly to the following commit: > > 1889228d80fe ("KVM: selftests: smm_test: Test SMM enter from L2") v1: https://lore.kernel.org/kvm/20210608214742.1897483-1-oupton@google.com/ > v1 -> v2: > - Reimplemented as vCPU device attributes instead of a distinct ioctl. > - Added the (realtime, host_tsc) instant support to > KVM_{GET,SET}_CLOCK > - Changed the arm64 implementation to broadcast counter offset values > to all vCPUs in a guest. This upholds the architectural expectations > of a consistent counter-timer across CPUs. > - Fixed a bug with traps in VHE mode. We now configure traps on every > transition into a guest to handle differing VMs (trapped, emulated). > > Oliver Upton (12): > KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK > KVM: x86: Refactor tsc synchronization code > KVM: x86: Expose TSC offset controls to userspace > tools: arch: x86: pull in pvclock headers > selftests: KVM: Add test for KVM_{GET,SET}_CLOCK > selftests: KVM: Add helpers for vCPU device attributes > selftests: KVM: Introduce system counter offset test > KVM: arm64: Allow userspace to configure a vCPU's virtual offset > selftests: KVM: Add support for aarch64 to system_counter_offset_test > KVM: arm64: Provide userspace access to the physical counter offset > selftests: KVM: Test physical counter offsetting > selftests: KVM: Add counter emulation benchmark > > Documentation/virt/kvm/api.rst | 42 +- > Documentation/virt/kvm/locking.rst | 11 + > arch/arm64/include/asm/kvm_host.h | 1 + > arch/arm64/include/asm/kvm_hyp.h | 2 - > arch/arm64/include/asm/sysreg.h | 1 + > arch/arm64/include/uapi/asm/kvm.h | 2 + > arch/arm64/kvm/arch_timer.c | 118 ++++- > arch/arm64/kvm/arm.c | 4 +- > arch/arm64/kvm/hyp/include/hyp/switch.h | 23 + > arch/arm64/kvm/hyp/include/hyp/timer-sr.h | 26 ++ > arch/arm64/kvm/hyp/nvhe/switch.c | 2 - > arch/arm64/kvm/hyp/nvhe/timer-sr.c | 21 +- > arch/arm64/kvm/hyp/vhe/timer-sr.c | 27 ++ > arch/x86/include/asm/kvm_host.h | 4 + > arch/x86/include/uapi/asm/kvm.h | 4 + > arch/x86/kvm/x86.c | 421 ++++++++++++++---- > include/kvm/arm_arch_timer.h | 2 - > include/uapi/linux/kvm.h | 7 +- > tools/arch/x86/include/asm/pvclock-abi.h | 48 ++ > tools/arch/x86/include/asm/pvclock.h | 103 +++++ > tools/testing/selftests/kvm/.gitignore | 3 + > tools/testing/selftests/kvm/Makefile | 4 + > .../kvm/aarch64/counter_emulation_benchmark.c | 215 +++++++++ > .../selftests/kvm/include/aarch64/processor.h | 24 + > .../testing/selftests/kvm/include/kvm_util.h | 11 + > tools/testing/selftests/kvm/lib/kvm_util.c | 38 ++ > .../kvm/system_counter_offset_test.c | 206 +++++++++ > .../selftests/kvm/x86_64/kvm_clock_test.c | 210 +++++++++ > 28 files changed, 1447 insertions(+), 133 deletions(-) > create mode 100644 arch/arm64/kvm/hyp/include/hyp/timer-sr.h > create mode 100644 tools/arch/x86/include/asm/pvclock-abi.h > create mode 100644 tools/arch/x86/include/asm/pvclock.h > create mode 100644 tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c > create mode 100644 tools/testing/selftests/kvm/system_counter_offset_test.c > create mode 100644 tools/testing/selftests/kvm/x86_64/kvm_clock_test.c > > -- > 2.32.0.402.g57bb445576-goog >
On Fri, Jul 16, 2021 at 09:26:17PM +0000, Oliver Upton wrote: > KVM's current means of saving/restoring system counters is plagued with > temporal issues. At least on ARM64 and x86, we migrate the guest's > system counter by-value through the respective guest system register > values (cntvct_el0, ia32_tsc). Restoring system counters by-value is > brittle as the state is not idempotent: the host system counter is still > oscillating between the attempted save and restore. Furthermore, VMMs > may wish to transparently live migrate guest VMs, meaning that they > include the elapsed time due to live migration blackout in the guest > system counter view. The VMM thread could be preempted for any number of > reasons (scheduler, L0 hypervisor under nested) between the time that > it calculates the desired guest counter value and when KVM actually sets > this counter state. > > Despite the value-based interface that we present to userspace, KVM > actually has idempotent guest controls by way of system counter offsets. > We can avoid all of the issues associated with a value-based interface > by abstracting these offset controls in new ioctls. This series > introduces new vCPU device attributes to provide userspace access to the > vCPU's system counter offset. > > Patch 1 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK > ioctls to provide userspace with a (host_tsc, realtime) instant. This is > essential for a VMM to perform precise migration of the guest's system > counters. > > Patches 2-3 add support for x86 by shoehorning the new controls into the > pre-existing synchronization heuristics. > > Patches 4-5 implement a test for the new additions to > KVM_{GET,SET}_CLOCK. > > Patches 6-7 implement at test for the tsc offset attribute introduced in > patch 3. > > Patch 8 adds a device attribute for the arm64 virtual counter-timer > offset. > > Patch 9 extends the test from patch 7 to cover the arm64 virtual > counter-timer offset. > > Patch 10 adds a device attribute for the arm64 physical counter-timer > offset. Currently, this is implemented as a synthetic register, forcing > the guest to trap to the host and emulating the offset in the fast exit > path. Later down the line we will have hardware with FEAT_ECV, which > allows the hypervisor to perform physical counter-timer offsetting in > hardware (CNTPOFF_EL2). > > Patch 11 extends the test from patch 7 to cover the arm64 physical > counter-timer offset. > > Patch 12 introduces a benchmark to measure the overhead of emulation in > patch 10. > > Physical counter benchmark > -------------------------- > > The following data was collected by running 10000 iterations of the > benchmark test from Patch 6 on an Ampere Mt. Jade reference server, A 2S > machine with 2 80-core Ampere Altra SoCs. Measurements were collected > for both VHE and nVHE operation using the `kvm-arm.mode=` command-line > parameter. > > nVHE > ---- > > +--------------------+--------+---------+ > | Metric | Native | Trapped | > +--------------------+--------+---------+ > | Average | 54ns | 148ns | > | Standard Deviation | 124ns | 122ns | > | 95th Percentile | 258ns | 348ns | > +--------------------+--------+---------+ > > VHE > --- > > +--------------------+--------+---------+ > | Metric | Native | Trapped | > +--------------------+--------+---------+ > | Average | 53ns | 152ns | > | Standard Deviation | 92ns | 94ns | > | 95th Percentile | 204ns | 307ns | > +--------------------+--------+---------+ > > This series applies cleanly to the following commit: > > 1889228d80fe ("KVM: selftests: smm_test: Test SMM enter from L2") > > v1 -> v2: > - Reimplemented as vCPU device attributes instead of a distinct ioctl. > - Added the (realtime, host_tsc) instant support to > KVM_{GET,SET}_CLOCK > - Changed the arm64 implementation to broadcast counter offset values > to all vCPUs in a guest. This upholds the architectural expectations > of a consistent counter-timer across CPUs. > - Fixed a bug with traps in VHE mode. We now configure traps on every > transition into a guest to handle differing VMs (trapped, emulated). > Oops, I see there's a v3 of this series. I'll switch to reviewing that. I think my comments / r-b's apply to that version as well though. Thanks, drew
On Wed, Jul 21, 2021 at 8:28 AM Andrew Jones <drjones@redhat.com> wrote: > > On Fri, Jul 16, 2021 at 09:26:17PM +0000, Oliver Upton wrote: > > KVM's current means of saving/restoring system counters is plagued with > > temporal issues. At least on ARM64 and x86, we migrate the guest's > > system counter by-value through the respective guest system register > > values (cntvct_el0, ia32_tsc). Restoring system counters by-value is > > brittle as the state is not idempotent: the host system counter is still > > oscillating between the attempted save and restore. Furthermore, VMMs > > may wish to transparently live migrate guest VMs, meaning that they > > include the elapsed time due to live migration blackout in the guest > > system counter view. The VMM thread could be preempted for any number of > > reasons (scheduler, L0 hypervisor under nested) between the time that > > it calculates the desired guest counter value and when KVM actually sets > > this counter state. > > > > Despite the value-based interface that we present to userspace, KVM > > actually has idempotent guest controls by way of system counter offsets. > > We can avoid all of the issues associated with a value-based interface > > by abstracting these offset controls in new ioctls. This series > > introduces new vCPU device attributes to provide userspace access to the > > vCPU's system counter offset. > > > > Patch 1 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK > > ioctls to provide userspace with a (host_tsc, realtime) instant. This is > > essential for a VMM to perform precise migration of the guest's system > > counters. > > > > Patches 2-3 add support for x86 by shoehorning the new controls into the > > pre-existing synchronization heuristics. > > > > Patches 4-5 implement a test for the new additions to > > KVM_{GET,SET}_CLOCK. > > > > Patches 6-7 implement at test for the tsc offset attribute introduced in > > patch 3. > > > > Patch 8 adds a device attribute for the arm64 virtual counter-timer > > offset. > > > > Patch 9 extends the test from patch 7 to cover the arm64 virtual > > counter-timer offset. > > > > Patch 10 adds a device attribute for the arm64 physical counter-timer > > offset. Currently, this is implemented as a synthetic register, forcing > > the guest to trap to the host and emulating the offset in the fast exit > > path. Later down the line we will have hardware with FEAT_ECV, which > > allows the hypervisor to perform physical counter-timer offsetting in > > hardware (CNTPOFF_EL2). > > > > Patch 11 extends the test from patch 7 to cover the arm64 physical > > counter-timer offset. > > > > Patch 12 introduces a benchmark to measure the overhead of emulation in > > patch 10. > > > > Physical counter benchmark > > -------------------------- > > > > The following data was collected by running 10000 iterations of the > > benchmark test from Patch 6 on an Ampere Mt. Jade reference server, A 2S > > machine with 2 80-core Ampere Altra SoCs. Measurements were collected > > for both VHE and nVHE operation using the `kvm-arm.mode=` command-line > > parameter. > > > > nVHE > > ---- > > > > +--------------------+--------+---------+ > > | Metric | Native | Trapped | > > +--------------------+--------+---------+ > > | Average | 54ns | 148ns | > > | Standard Deviation | 124ns | 122ns | > > | 95th Percentile | 258ns | 348ns | > > +--------------------+--------+---------+ > > > > VHE > > --- > > > > +--------------------+--------+---------+ > > | Metric | Native | Trapped | > > +--------------------+--------+---------+ > > | Average | 53ns | 152ns | > > | Standard Deviation | 92ns | 94ns | > > | 95th Percentile | 204ns | 307ns | > > +--------------------+--------+---------+ > > > > This series applies cleanly to the following commit: > > > > 1889228d80fe ("KVM: selftests: smm_test: Test SMM enter from L2") > > > > v1 -> v2: > > - Reimplemented as vCPU device attributes instead of a distinct ioctl. > > - Added the (realtime, host_tsc) instant support to > > KVM_{GET,SET}_CLOCK > > - Changed the arm64 implementation to broadcast counter offset values > > to all vCPUs in a guest. This upholds the architectural expectations > > of a consistent counter-timer across CPUs. > > - Fixed a bug with traps in VHE mode. We now configure traps on every > > transition into a guest to handle differing VMs (trapped, emulated). > > > > Oops, I see there's a v3 of this series. I'll switch to reviewing that. I > think my comments / r-b's apply to that version as well though. Hey Drew, Thanks for the review. I'll address your comments from both v2 and v3 in the next series. -- Thanks, Oliver