Message ID | 20181128144527.44710-1-steven.price@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | arm64: Paravirtualized time support | expand |
On Wed, Nov 28, 2018 at 02:45:15PM +0000, Steven Price wrote: > This series add support for paravirtualized time for Arm64 guests and > KVM hosts following the specification in Arm's document DEN 0057A: > > https://developer.arm.com/docs/den0057/a Hi Steven, As that specification is still a draft, then I guess this series is an RFC. I just wanted to point that out, as I believe that tag should be used in future postings until the spec is approved. Regarding the spec, my understanding from kvm forum was that there was still a need to explain why kvmclock, or an extension to kvmclock, is insufficient. If there hasn't been anything written up about that yet, would you mind doing so? > > It implements support for Live Physical Time (LPT) which provides the > guest with a method to derive a stable counter of time during which the > guest is executing even when the guest is being migrated between hosts > with different physical counter frequencies. Intel has TSC scaling. Is there any reason Arm is proposing a PV solution instead of adding a similar virt extension? Thanks, drew > > It also implements support for stolen time, allowing the guest to > identify time when it is forcibly not executing. > > Patch 1 provides some documentation > Patches 2-4, 8 and 11 provide some refactoring of existing code > Patch 5 implements the new PV_FEATURES discovery mechanism > Patches 6-7 implement live physical time > Patches 9-10 implement stolen time > Patch 12 adds the 'PV_TIME' device for user space to enable the features > > Christoffer Dall (2): > KVM: arm/arm64: Factor out hypercall handling from PSCI code > KVM: Export mark_page_dirty_in_slot > > Steven Price (10): > KVM: arm64: Document PV-time interface > arm/arm64: Provide a wrapper for SMCCC 1.1 calls > arm/arm64: Make use of the SMCCC 1.1 wrapper > KVM: arm64: Implement PV_FEATURES call > KVM: arm64: Support Live Physical Time reporting > clocksource: arm_arch_timer: Use paravirtualized LPT > KVM: arm64: Support stolen time reporting via shared page > arm64: Retrieve stolen time as paravirtualized guest > KVM: Allow kvm_device_ops to be const > KVM: arm64: Provide a PV_TIME device to user space > > Documentation/virtual/kvm/arm/pvtime.txt | 169 ++++++++++++++ > arch/arm/kvm/Makefile | 2 +- > arch/arm/kvm/handle_exit.c | 2 +- > arch/arm/mm/proc-v7-bugs.c | 46 ++-- > arch/arm64/include/asm/arch_timer.h | 32 ++- > arch/arm64/include/asm/kvm_host.h | 16 ++ > arch/arm64/include/asm/kvm_mmu.h | 2 + > arch/arm64/include/asm/pvclock-abi.h | 32 +++ > arch/arm64/include/uapi/asm/kvm.h | 8 + > arch/arm64/kernel/Makefile | 1 + > arch/arm64/kernel/cpu_errata.c | 47 +--- > arch/arm64/kernel/cpuinfo.c | 2 +- > arch/arm64/kernel/kvm.c | 156 +++++++++++++ > arch/arm64/kvm/Kconfig | 1 + > arch/arm64/kvm/Makefile | 2 + > arch/arm64/kvm/handle_exit.c | 4 +- > drivers/clocksource/arm_arch_timer.c | 176 ++++++++++++++- > include/kvm/arm_arch_timer.h | 2 + > include/kvm/arm_hypercalls.h | 44 ++++ > include/kvm/arm_psci.h | 2 +- > include/kvm/arm_pv.h | 28 +++ > include/linux/arm-smccc.h | 45 ++++ > include/linux/cpuhotplug.h | 1 + > include/linux/kvm_host.h | 5 +- > include/linux/kvm_types.h | 2 + > include/uapi/linux/kvm.h | 2 + > virt/kvm/arm/arm.c | 25 +- > virt/kvm/arm/hypercalls.c | 276 +++++++++++++++++++++++ > virt/kvm/arm/mmu.c | 44 ++++ > virt/kvm/arm/psci.c | 76 +------ > virt/kvm/arm/pvtime.c | 243 ++++++++++++++++++++ > virt/kvm/kvm_main.c | 12 +- > 32 files changed, 1348 insertions(+), 157 deletions(-) > create mode 100644 Documentation/virtual/kvm/arm/pvtime.txt > create mode 100644 arch/arm64/include/asm/pvclock-abi.h > create mode 100644 arch/arm64/kernel/kvm.c > create mode 100644 include/kvm/arm_hypercalls.h > create mode 100644 include/kvm/arm_pv.h > create mode 100644 virt/kvm/arm/hypercalls.c > create mode 100644 virt/kvm/arm/pvtime.c > > -- > 2.19.2 > > _______________________________________________ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
On 03/12/2018 13:25, Andrew Jones wrote: > On Wed, Nov 28, 2018 at 02:45:15PM +0000, Steven Price wrote: [...] >> It implements support for Live Physical Time (LPT) which provides the >> guest with a method to derive a stable counter of time during which the >> guest is executing even when the guest is being migrated between hosts >> with different physical counter frequencies. > > Intel has TSC scaling. Is there any reason Arm is proposing a PV > solution instead of adding a similar virt extension? Even if we were to add such an extension tomorrow, we need something that works for today's system. Not to mention that "tomorrow in the architecture" usually translates as "in silicon in 3 years"... Thanks, M.
On 03/12/2018 13:25, Andrew Jones wrote: > On Wed, Nov 28, 2018 at 02:45:15PM +0000, Steven Price wrote: >> This series add support for paravirtualized time for Arm64 guests and >> KVM hosts following the specification in Arm's document DEN 0057A: >> >> https://developer.arm.com/docs/den0057/a > > Hi Steven, > > As that specification is still a draft, then I guess this series is an > RFC. I just wanted to point that out, as I believe that tag should be > used in future postings until the spec is approved. Hi, Yes, sorry I should have included the RFC tag as we want feedback before the specification is finalised. > Regarding the spec, my understanding from kvm forum was that there was > still a need to explain why kvmclock, or an extension to kvmclock, is > insufficient. > > If there hasn't been anything written up about that yet, would you mind > doing so? There are obviously similarities to kvmclock, but there are a few differences that enable some optimisations: * The coefficient for scaling from native frequency to PV frequency is a 64 bit integer that can be efficiently multiplied on arm64 hardware. kvmclock provides us with a 32 bit multiplier. * Rather than providing an offset in the structure (via tsc_timestamp/system_time) the CPU's CNTVOFF support is used to provide a virtual counter than simply needs scaling (no offset is applied by the guest). Given this is seemed sensible not to confuse matters by (ab)using the existing kvmclock implementation. >> >> It implements support for Live Physical Time (LPT) which provides the >> guest with a method to derive a stable counter of time during which the >> guest is executing even when the guest is being migrated between hosts >> with different physical counter frequencies. > > Intel has TSC scaling. Is there any reason Arm is proposing a PV > solution instead of adding a similar virt extension? As Marc has already pointed out - hardware changes will take a while to happen, this works for today's systems. Thanks, Steve > > Thanks, > drew > >> >> It also implements support for stolen time, allowing the guest to >> identify time when it is forcibly not executing. >> >> Patch 1 provides some documentation >> Patches 2-4, 8 and 11 provide some refactoring of existing code >> Patch 5 implements the new PV_FEATURES discovery mechanism >> Patches 6-7 implement live physical time >> Patches 9-10 implement stolen time >> Patch 12 adds the 'PV_TIME' device for user space to enable the features >> >> Christoffer Dall (2): >> KVM: arm/arm64: Factor out hypercall handling from PSCI code >> KVM: Export mark_page_dirty_in_slot >> >> Steven Price (10): >> KVM: arm64: Document PV-time interface >> arm/arm64: Provide a wrapper for SMCCC 1.1 calls >> arm/arm64: Make use of the SMCCC 1.1 wrapper >> KVM: arm64: Implement PV_FEATURES call >> KVM: arm64: Support Live Physical Time reporting >> clocksource: arm_arch_timer: Use paravirtualized LPT >> KVM: arm64: Support stolen time reporting via shared page >> arm64: Retrieve stolen time as paravirtualized guest >> KVM: Allow kvm_device_ops to be const >> KVM: arm64: Provide a PV_TIME device to user space >> >> Documentation/virtual/kvm/arm/pvtime.txt | 169 ++++++++++++++ >> arch/arm/kvm/Makefile | 2 +- >> arch/arm/kvm/handle_exit.c | 2 +- >> arch/arm/mm/proc-v7-bugs.c | 46 ++-- >> arch/arm64/include/asm/arch_timer.h | 32 ++- >> arch/arm64/include/asm/kvm_host.h | 16 ++ >> arch/arm64/include/asm/kvm_mmu.h | 2 + >> arch/arm64/include/asm/pvclock-abi.h | 32 +++ >> arch/arm64/include/uapi/asm/kvm.h | 8 + >> arch/arm64/kernel/Makefile | 1 + >> arch/arm64/kernel/cpu_errata.c | 47 +--- >> arch/arm64/kernel/cpuinfo.c | 2 +- >> arch/arm64/kernel/kvm.c | 156 +++++++++++++ >> arch/arm64/kvm/Kconfig | 1 + >> arch/arm64/kvm/Makefile | 2 + >> arch/arm64/kvm/handle_exit.c | 4 +- >> drivers/clocksource/arm_arch_timer.c | 176 ++++++++++++++- >> include/kvm/arm_arch_timer.h | 2 + >> include/kvm/arm_hypercalls.h | 44 ++++ >> include/kvm/arm_psci.h | 2 +- >> include/kvm/arm_pv.h | 28 +++ >> include/linux/arm-smccc.h | 45 ++++ >> include/linux/cpuhotplug.h | 1 + >> include/linux/kvm_host.h | 5 +- >> include/linux/kvm_types.h | 2 + >> include/uapi/linux/kvm.h | 2 + >> virt/kvm/arm/arm.c | 25 +- >> virt/kvm/arm/hypercalls.c | 276 +++++++++++++++++++++++ >> virt/kvm/arm/mmu.c | 44 ++++ >> virt/kvm/arm/psci.c | 76 +------ >> virt/kvm/arm/pvtime.c | 243 ++++++++++++++++++++ >> virt/kvm/kvm_main.c | 12 +- >> 32 files changed, 1348 insertions(+), 157 deletions(-) >> create mode 100644 Documentation/virtual/kvm/arm/pvtime.txt >> create mode 100644 arch/arm64/include/asm/pvclock-abi.h >> create mode 100644 arch/arm64/kernel/kvm.c >> create mode 100644 include/kvm/arm_hypercalls.h >> create mode 100644 include/kvm/arm_pv.h >> create mode 100644 virt/kvm/arm/hypercalls.c >> create mode 100644 virt/kvm/arm/pvtime.c >> >> -- >> 2.19.2 >> >> _______________________________________________ >> kvmarm mailing list >> kvmarm@lists.cs.columbia.edu >> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm > _______________________________________________ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm >
On Wed, Nov 28, 2018 at 02:45:15PM +0000, Steven Price wrote: > This series add support for paravirtualized time for Arm64 guests and > KVM hosts following the specification in Arm's document DEN 0057A: > > https://developer.arm.com/docs/den0057/a > > It implements support for Live Physical Time (LPT) which provides the > guest with a method to derive a stable counter of time during which the > guest is executing even when the guest is being migrated between hosts > with different physical counter frequencies. > > It also implements support for stolen time, allowing the guest to > identify time when it is forcibly not executing. I know that stolen time reporting is important, and I think that we definitely want to pick up that part of the spec (once it is published in some non-draft form). However, I am very concerned with the pv-freq part of LPT, and I'd like to avoid that if at all possible. I say that because: * By design, it breaks architectural guarantees from the PoV of SW in the guest. A VM may host multiple SW agents serially (e.g. when booting, or across kexec), or concurrently (e.g. Linux w/ EFI runtime services), and the host has no way to tell whether all software in the guest will function correctly. Due to this, it's not possible to have a guest opt-in to the architecturally-broken timekeeping. Existing guests will not work correctly once pv-freq is in use, and if configured without pv-freq (or if the guest fails to discover pv-freq for any reason), the administrator may encounter anything between subtle breakage or fatally incorrect timekeeping. There's plenty of SW agents other than Linux which runs in a guest, which would need to be updated to handle pv-freq, e.g. GRUB, *BSD, iPXE. Given this, I think that this is going to lead to subtle breakage in real-world scenarios. * It is (necessarily) invasive to the low-level arch timer code. This is unfortunate, and I strongly suspect this is going to be an area with long-term subtle breakage. * It's not clear to me how strongly people need this. My understanding is that datacenters would run largely homogeneous platforms. I suspect large datacenters which would use migration are in a position to mandate a standard timer frequency from their OEMs or SiPs. I strongly believe that an architectural fix (e.g. in-hw scaling) would be the better solution. I understand that LPT is supposed to account for time lost during the migration. Can we account for this without pv-freq? e.g. is it possible to account for this in the same way as stolen time? Thanks, Mark.
On 10/12/2018 11:40, Mark Rutland wrote: > On Wed, Nov 28, 2018 at 02:45:15PM +0000, Steven Price wrote: >> This series add support for paravirtualized time for Arm64 guests and >> KVM hosts following the specification in Arm's document DEN 0057A: >> >> https://developer.arm.com/docs/den0057/a >> >> It implements support for Live Physical Time (LPT) which provides the >> guest with a method to derive a stable counter of time during which the >> guest is executing even when the guest is being migrated between hosts >> with different physical counter frequencies. >> >> It also implements support for stolen time, allowing the guest to >> identify time when it is forcibly not executing. > > I know that stolen time reporting is important, and I think that we > definitely want to pick up that part of the spec (once it is published > in some non-draft form). > > However, I am very concerned with the pv-freq part of LPT, and I'd like > to avoid that if at all possible. I say that because: > > * By design, it breaks architectural guarantees from the PoV of SW in > the guest. > > A VM may host multiple SW agents serially (e.g. when booting, or > across kexec), or concurrently (e.g. Linux w/ EFI runtime services), > and the host has no way to tell whether all software in the guest will > function correctly. Due to this, it's not possible to have a guest > opt-in to the architecturally-broken timekeeping. > > Existing guests will not work correctly once pv-freq is in use, and if > configured without pv-freq (or if the guest fails to discover pv-freq > for any reason), the administrator may encounter anything between > subtle breakage or fatally incorrect timekeeping. > > There's plenty of SW agents other than Linux which runs in a guest, > which would need to be updated to handle pv-freq, e.g. GRUB, *BSD, > iPXE. > > Given this, I think that this is going to lead to subtle breakage in > real-world scenarios. LPT only changes things on migration. Up until migration the (architectural) clocks still behave perfectly normally. A guest which opts in to LPT can derive a clock with a different frequency, but the underlying clock doesn't change. When migration happens it's a different story. If the frequency of the new host matches the old host then again the clocks behave 'normally': CNTVOFF is used to hide the change in offset such that the guest at worst sees time pause during the actual migration. But the whole point of LPT is to deal with the situation if the clock frequency has changed. A guest (or SW agent) which doesn't know about PV will experience one of two things: * Without LPT: the clock frequency will suddenly change without warning, but the virtual counter is monotonically increasing. * With LPT: the clock frequency will suddenly change and the virtual counter will jump (it won't be monotonically increasing). So I agree the situation with LPT is worse (we lose the monotonicity), but any guest/agent which didn't understand about the migration is in trouble if it cares about time. > * It is (necessarily) invasive to the low-level arch timer code. This is > unfortunate, and I strongly suspect this is going to be an area with > long-term subtle breakage. I can't argue against that - I've tried to limit how invasive the code changes are, but ultimately we're changing the interpretation of low-level timers. > * It's not clear to me how strongly people need this. My understanding > is that datacenters would run largely homogeneous platforms. I suspect > large datacenters which would use migration are in a position to > mandate a standard timer frequency from their OEMs or SiPs. > > I strongly believe that an architectural fix (e.g. in-hw scaling) > would be the better solution. An architectural fix in hardware is clearly the best solution. The question is whether we want to support the use-case with today's hardware. While mandating a particular 'standard' timer frequency is a good idea, there's currently no standard. Large datacenters might be able to mandate that, and maybe there'll be sufficient consensus that this doesn't matter. But I seem to have misplaced my crystal ball... > I understand that LPT is supposed to account for time lost during the > migration. Can we account for this without pv-freq? e.g. is it possible > to account for this in the same way as stolen time? LPT isn't really about accounting for the time lost (to some extent this is already done by saving/restoring the "KVM_REG_ARM_TIMER_CNT" register) but about ensuring that the guest can derive a monotonically increasing counter which maintains a stable frequency when migrated. I'm going to respin the series with the LPT parts split out to the end, that way we can (hopefully) agree on the stolen time parts and can defer the LPT part if necessary. Thanks, Steve > Thanks, > Mark. > _______________________________________________ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm >
On Mon, Dec 10, 2018 at 11:40:47AM +0000, Mark Rutland wrote: > On Wed, Nov 28, 2018 at 02:45:15PM +0000, Steven Price wrote: > > This series add support for paravirtualized time for Arm64 guests and > > KVM hosts following the specification in Arm's document DEN 0057A: > > > > https://developer.arm.com/docs/den0057/a > > > > It implements support for Live Physical Time (LPT) which provides the > > guest with a method to derive a stable counter of time during which the > > guest is executing even when the guest is being migrated between hosts > > with different physical counter frequencies. > > > > It also implements support for stolen time, allowing the guest to > > identify time when it is forcibly not executing. > > I know that stolen time reporting is important, and I think that we > definitely want to pick up that part of the spec (once it is published > in some non-draft form). > > However, I am very concerned with the pv-freq part of LPT, and I'd like > to avoid that if at all possible. I say that because: > > * By design, it breaks architectural guarantees from the PoV of SW in > the guest. > > A VM may host multiple SW agents serially (e.g. when booting, or > across kexec), or concurrently (e.g. Linux w/ EFI runtime services), > and the host has no way to tell whether all software in the guest will > function correctly. Due to this, it's not possible to have a guest > opt-in to the architecturally-broken timekeeping. Is this necessarily true? As I understood the intention of the spec, there would be no change to behavior of the timers as exposed by the hypervisor unless a software agent specifically ops-int to LPT and pv-freq. In a scenario with Linux and UEFI running, they must clearly agree on using functionality that changes the underlying behavior. For kdump/kexec scenarios, the OS would have to tear down the functionality to work across migration after loading a secondary SW agent, which probably needs adding to the spec. > > Existing guests will not work correctly once pv-freq is in use, and if > configured without pv-freq (or if the guest fails to discover pv-freq > for any reason), the administrator may encounter anything between > subtle breakage or fatally incorrect timekeeping. > > There's plenty of SW agents other than Linux which runs in a guest, > which would need to be updated to handle pv-freq, e.g. GRUB, *BSD, > iPXE. > > Given this, I think that this is going to lead to subtle breakage in > real-world scenarios. I think we'd definitely need to limit the exposure of pv-freq to Linux and (if necessary) UEFI runtime services. Do you see scenarios where this would not be possible? [...] > > I understand that LPT is supposed to account for time lost during the > migration. Can we account for this without pv-freq? e.g. is it possible > to account for this in the same way as stolen time? > I think we can indeed account for lost time during migration or host system suspend by simply adjusting CNTVOFF_EL2 (as Steve points out, KVM already supports this, but QEMU doesn't make use of that today -- there were some patches attempting to address that recently). Thanks, Christoffer