diff mbox series

[v10,3/7] arm64: hyperv: Add Hyper-V clocksource/clockevent support

Message ID 1620841067-46606-4-git-send-email-mikelley@microsoft.com (mailing list archive)
State New, archived
Headers show
Series Enable Linux guests on Hyper-V on ARM64 | expand

Commit Message

Michael Kelley (LINUX) May 12, 2021, 5:37 p.m. UTC
Add architecture specific definitions and functions needed
by the architecture independent Hyper-V clocksource driver.
Update the Hyper-V clocksource driver to be initialized
on ARM64.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
---
 arch/arm64/include/asm/mshyperv.h  | 12 ++++++++++++
 drivers/clocksource/hyperv_timer.c | 14 ++++++++++++++
 2 files changed, 26 insertions(+)

Comments

Mark Rutland May 14, 2021, 12:37 p.m. UTC | #1
Hi Michael,

On Wed, May 12, 2021 at 10:37:43AM -0700, Michael Kelley wrote:
> Add architecture specific definitions and functions needed
> by the architecture independent Hyper-V clocksource driver.
> Update the Hyper-V clocksource driver to be initialized
> on ARM64.

Previously we've said that for a clocksource we must use the architected
counter, since that's necessary for things like the VDSO to work
correctly and efficiently.

Given that, I'm a bit confused that we're registering a per-cpu
clocksource that is in part based on the architected counter. Likewise,
I don't entirely follow why it's necessary to PV the clock_event_device.

Are the architected counter and timer reliable without this PV
infrastructure? Why do we need to PV either of those?

Thanks,
Mark.

> 
> Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> Reviewed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> ---
>  arch/arm64/include/asm/mshyperv.h  | 12 ++++++++++++
>  drivers/clocksource/hyperv_timer.c | 14 ++++++++++++++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/mshyperv.h b/arch/arm64/include/asm/mshyperv.h
> index c448704..b17299c 100644
> --- a/arch/arm64/include/asm/mshyperv.h
> +++ b/arch/arm64/include/asm/mshyperv.h
> @@ -21,6 +21,7 @@
>  #include <linux/types.h>
>  #include <linux/arm-smccc.h>
>  #include <asm/hyperv-tlfs.h>
> +#include <clocksource/arm_arch_timer.h>
>  
>  /*
>   * Declare calls to get and set Hyper-V VP register values on ARM64, which
> @@ -41,6 +42,17 @@ static inline u64 hv_get_register(unsigned int reg)
>  	return hv_get_vpreg(reg);
>  }
>  
> +/* Define the interrupt ID used by STIMER0 Direct Mode interrupts. This
> + * value can't come from ACPI tables because it is needed before the
> + * Linux ACPI subsystem is initialized.
> + */
> +#define HYPERV_STIMER0_VECTOR	31
> +
> +static inline u64 hv_get_raw_timer(void)
> +{
> +	return arch_timer_read_counter();
> +}
> +
>  /* SMCCC hypercall parameters */
>  #define HV_SMCCC_FUNC_NUMBER	1
>  #define HV_FUNC_ID	ARM_SMCCC_CALL_VAL(			\
> diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
> index 977fd05..270ad9c 100644
> --- a/drivers/clocksource/hyperv_timer.c
> +++ b/drivers/clocksource/hyperv_timer.c
> @@ -569,3 +569,17 @@ void __init hv_init_clocksource(void)
>  	hv_setup_sched_clock(read_hv_sched_clock_msr);
>  }
>  EXPORT_SYMBOL_GPL(hv_init_clocksource);
> +
> +/* Initialize everything on ARM64 */
> +static int __init hyperv_timer_init(struct acpi_table_header *table)
> +{
> +	if (!hv_is_hyperv_initialized())
> +		return -EINVAL;
> +
> +	hv_init_clocksource();
> +	if (hv_stimer_alloc(true))
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +TIMER_ACPI_DECLARE(hyperv, ACPI_SIG_GTDT, hyperv_timer_init);
> -- 
> 1.8.3.1
>
Michael Kelley (LINUX) May 14, 2021, 3:35 p.m. UTC | #2
From: Mark Rutland <mark.rutland@arm.com> Sent: Friday, May 14, 2021 5:37 AM
> 
> Hi Michael,
> 
> On Wed, May 12, 2021 at 10:37:43AM -0700, Michael Kelley wrote:
> > Add architecture specific definitions and functions needed
> > by the architecture independent Hyper-V clocksource driver.
> > Update the Hyper-V clocksource driver to be initialized
> > on ARM64.
> 
> Previously we've said that for a clocksource we must use the architected
> counter, since that's necessary for things like the VDSO to work
> correctly and efficiently.
> 
> Given that, I'm a bit confused that we're registering a per-cpu
> clocksource that is in part based on the architected counter. Likewise,
> I don't entirely follow why it's necessary to PV the clock_event_device.
> 
> Are the architected counter and timer reliable without this PV
> infrastructure? Why do we need to PV either of those?
> 
> Thanks,
> Mark.
> 

For the clocksource, we have a requirement to live migrate VMs
between Hyper-V hosts running on hardware that may have different
arch counter frequencies (it's not conformant to the ARM v8.6 1 GHz
requirement).  The Hyper-V virtualization does scaling to handle the
frequency difference.  And yes, there's a tradeoff with vDSO not
working, though we have an out-of-tree vDSO implementation that
we can use when necessary.

For clockevents, the only timer interrupt that Hyper-V provides
in a guest VM is its virtualized "STIMER" interrupt.  There's no
virtualization of the ARM arch timer in the guest.

Michael

> >
> > Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> > Reviewed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> > ---
> >  arch/arm64/include/asm/mshyperv.h  | 12 ++++++++++++
> >  drivers/clocksource/hyperv_timer.c | 14 ++++++++++++++
> >  2 files changed, 26 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/mshyperv.h b/arch/arm64/include/asm/mshyperv.h
> > index c448704..b17299c 100644
> > --- a/arch/arm64/include/asm/mshyperv.h
> > +++ b/arch/arm64/include/asm/mshyperv.h
> > @@ -21,6 +21,7 @@
> >  #include <linux/types.h>
> >  #include <linux/arm-smccc.h>
> >  #include <asm/hyperv-tlfs.h>
> > +#include <clocksource/arm_arch_timer.h>
> >
> >  /*
> >   * Declare calls to get and set Hyper-V VP register values on ARM64, which
> > @@ -41,6 +42,17 @@ static inline u64 hv_get_register(unsigned int reg)
> >  	return hv_get_vpreg(reg);
> >  }
> >
> > +/* Define the interrupt ID used by STIMER0 Direct Mode interrupts. This
> > + * value can't come from ACPI tables because it is needed before the
> > + * Linux ACPI subsystem is initialized.
> > + */
> > +#define HYPERV_STIMER0_VECTOR	31
> > +
> > +static inline u64 hv_get_raw_timer(void)
> > +{
> > +	return arch_timer_read_counter();
> > +}
> > +
> >  /* SMCCC hypercall parameters */
> >  #define HV_SMCCC_FUNC_NUMBER	1
> >  #define HV_FUNC_ID	ARM_SMCCC_CALL_VAL(			\
> > diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
> > index 977fd05..270ad9c 100644
> > --- a/drivers/clocksource/hyperv_timer.c
> > +++ b/drivers/clocksource/hyperv_timer.c
> > @@ -569,3 +569,17 @@ void __init hv_init_clocksource(void)
> >  	hv_setup_sched_clock(read_hv_sched_clock_msr);
> >  }
> >  EXPORT_SYMBOL_GPL(hv_init_clocksource);
> > +
> > +/* Initialize everything on ARM64 */
> > +static int __init hyperv_timer_init(struct acpi_table_header *table)
> > +{
> > +	if (!hv_is_hyperv_initialized())
> > +		return -EINVAL;
> > +
> > +	hv_init_clocksource();
> > +	if (hv_stimer_alloc(true))
> > +		return -EINVAL;
> > +
> > +	return 0;
> > +}
> > +TIMER_ACPI_DECLARE(hyperv, ACPI_SIG_GTDT, hyperv_timer_init);
> > --
> > 1.8.3.1
> >
Mark Rutland May 17, 2021, 1:08 p.m. UTC | #3
On Fri, May 14, 2021 at 03:35:15PM +0000, Michael Kelley wrote:
> From: Mark Rutland <mark.rutland@arm.com> Sent: Friday, May 14, 2021 5:37 AM
> > On Wed, May 12, 2021 at 10:37:43AM -0700, Michael Kelley wrote:
> > > Add architecture specific definitions and functions needed
> > > by the architecture independent Hyper-V clocksource driver.
> > > Update the Hyper-V clocksource driver to be initialized
> > > on ARM64.
> > 
> > Previously we've said that for a clocksource we must use the architected
> > counter, since that's necessary for things like the VDSO to work
> > correctly and efficiently.
> > 
> > Given that, I'm a bit confused that we're registering a per-cpu
> > clocksource that is in part based on the architected counter. Likewise,
> > I don't entirely follow why it's necessary to PV the clock_event_device.
> > 
> > Are the architected counter and timer reliable without this PV
> > infrastructure? Why do we need to PV either of those?
> > 
> > Thanks,
> > Mark.
> 
> For the clocksource, we have a requirement to live migrate VMs
> between Hyper-V hosts running on hardware that may have different
> arch counter frequencies (it's not conformant to the ARM v8.6 1 GHz
> requirement).  The Hyper-V virtualization does scaling to handle the
> frequency difference.  And yes, there's a tradeoff with vDSO not
> working, though we have an out-of-tree vDSO implementation that
> we can use when necessary.

Just to be clear, the vDSO is *one example* of something that won't
function correctly. More generally, because this undermines core
architectural guarantees, it requires more invasive changes (e.g. we'd
have to weaken the sanity checks, and not use the counter in things like
kexec paths), impacts any architectural features tied to the generic
timer/counter (e.g. the event stream, SPE and tracing, future features),
and means that other SW (e.g. bootloaders and other EFI applications)
are unlikley to function correctly in this environment.

I am very much not keen on trying to PV this.

What does the guest see when it reads CNTFRQ_EL0? Does this match the
real HW value (and can this change over time)? Or is this entirely
synthetic?

What do the ACPI tables look like in the guest? Is there a GTDT table at
all?

How does the counter event stream behave?

Are there other architectural features which Hyper-V does not implement
for a guest?

Is there anything else that may change across a migration? e.g. MIDR?
MPIDR? Any of the ID registers?

> For clockevents, the only timer interrupt that Hyper-V provides
> in a guest VM is its virtualized "STIMER" interrupt.  There's no
> virtualization of the ARM arch timer in the guest.

I think that is rather unfortunate, given it's a core architectural
feature. Is it just the interrupt that's missing? i.e. does all the
PE-local functionality behave as the architecture requires?

Thanks,
Mark.

> 
> > >
> > > Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> > > Reviewed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> > > ---
> > >  arch/arm64/include/asm/mshyperv.h  | 12 ++++++++++++
> > >  drivers/clocksource/hyperv_timer.c | 14 ++++++++++++++
> > >  2 files changed, 26 insertions(+)
> > >
> > > diff --git a/arch/arm64/include/asm/mshyperv.h b/arch/arm64/include/asm/mshyperv.h
> > > index c448704..b17299c 100644
> > > --- a/arch/arm64/include/asm/mshyperv.h
> > > +++ b/arch/arm64/include/asm/mshyperv.h
> > > @@ -21,6 +21,7 @@
> > >  #include <linux/types.h>
> > >  #include <linux/arm-smccc.h>
> > >  #include <asm/hyperv-tlfs.h>
> > > +#include <clocksource/arm_arch_timer.h>
> > >
> > >  /*
> > >   * Declare calls to get and set Hyper-V VP register values on ARM64, which
> > > @@ -41,6 +42,17 @@ static inline u64 hv_get_register(unsigned int reg)
> > >  	return hv_get_vpreg(reg);
> > >  }
> > >
> > > +/* Define the interrupt ID used by STIMER0 Direct Mode interrupts. This
> > > + * value can't come from ACPI tables because it is needed before the
> > > + * Linux ACPI subsystem is initialized.
> > > + */
> > > +#define HYPERV_STIMER0_VECTOR	31
> > > +
> > > +static inline u64 hv_get_raw_timer(void)
> > > +{
> > > +	return arch_timer_read_counter();
> > > +}
> > > +
> > >  /* SMCCC hypercall parameters */
> > >  #define HV_SMCCC_FUNC_NUMBER	1
> > >  #define HV_FUNC_ID	ARM_SMCCC_CALL_VAL(			\
> > > diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
> > > index 977fd05..270ad9c 100644
> > > --- a/drivers/clocksource/hyperv_timer.c
> > > +++ b/drivers/clocksource/hyperv_timer.c
> > > @@ -569,3 +569,17 @@ void __init hv_init_clocksource(void)
> > >  	hv_setup_sched_clock(read_hv_sched_clock_msr);
> > >  }
> > >  EXPORT_SYMBOL_GPL(hv_init_clocksource);
> > > +
> > > +/* Initialize everything on ARM64 */
> > > +static int __init hyperv_timer_init(struct acpi_table_header *table)
> > > +{
> > > +	if (!hv_is_hyperv_initialized())
> > > +		return -EINVAL;
> > > +
> > > +	hv_init_clocksource();
> > > +	if (hv_stimer_alloc(true))
> > > +		return -EINVAL;
> > > +
> > > +	return 0;
> > > +}
> > > +TIMER_ACPI_DECLARE(hyperv, ACPI_SIG_GTDT, hyperv_timer_init);
> > > --
> > > 1.8.3.1
> > >
Michael Kelley (LINUX) May 17, 2021, 5:27 p.m. UTC | #4
From: Mark Rutland <mark.rutland@arm.com> Sent: Monday, May 17, 2021 6:08 AM
> 
> On Fri, May 14, 2021 at 03:35:15PM +0000, Michael Kelley wrote:
> > From: Mark Rutland <mark.rutland@arm.com> Sent: Friday, May 14, 2021 5:37 AM
> > > On Wed, May 12, 2021 at 10:37:43AM -0700, Michael Kelley wrote:
> > > > Add architecture specific definitions and functions needed
> > > > by the architecture independent Hyper-V clocksource driver.
> > > > Update the Hyper-V clocksource driver to be initialized
> > > > on ARM64.
> > >
> > > Previously we've said that for a clocksource we must use the architected
> > > counter, since that's necessary for things like the VDSO to work
> > > correctly and efficiently.
> > >
> > > Given that, I'm a bit confused that we're registering a per-cpu
> > > clocksource that is in part based on the architected counter. Likewise,
> > > I don't entirely follow why it's necessary to PV the clock_event_device.
> > >
> > > Are the architected counter and timer reliable without this PV
> > > infrastructure? Why do we need to PV either of those?
> > >
> > > Thanks,
> > > Mark.
> >
> > For the clocksource, we have a requirement to live migrate VMs
> > between Hyper-V hosts running on hardware that may have different
> > arch counter frequencies (it's not conformant to the ARM v8.6 1 GHz
> > requirement).  The Hyper-V virtualization does scaling to handle the
> > frequency difference.  And yes, there's a tradeoff with vDSO not
> > working, though we have an out-of-tree vDSO implementation that
> > we can use when necessary.
> 
> Just to be clear, the vDSO is *one example* of something that won't
> function correctly. More generally, because this undermines core
> architectural guarantees, it requires more invasive changes (e.g. we'd
> have to weaken the sanity checks, and not use the counter in things like
> kexec paths), impacts any architectural features tied to the generic
> timer/counter (e.g. the event stream, SPE and tracing, future features),
> and means that other SW (e.g. bootloaders and other EFI applications)
> are unlikley to function correctly in this environment.
> 
> I am very much not keen on trying to PV this.
> 
> What does the guest see when it reads CNTFRQ_EL0? Does this match the
> real HW value (and can this change over time)? Or is this entirely
> synthetic?
> 
> What do the ACPI tables look like in the guest? Is there a GTDT table at
> all?
> 
> How does the counter event stream behave?
> 
> Are there other architectural features which Hyper-V does not implement
> for a guest?
> 
> Is there anything else that may change across a migration? e.g. MIDR?
> MPIDR? Any of the ID registers?

The ARMv8 architectural system counter and associated registers are visible
and functional in a VM on Hyper-V.   The "arch_sys_counter" clocksource is
instantiated by the arm_arch_timer.c driver based on the GTDT in the guest,
and a Linux guest on Hyper-V runs fine with this clocksource.  Low level code
like bootloaders and EFI applications work normally.

The Hyper-V virtualization provides another Linux clocksource that is an
overlay on the arch counter and that provides time consistency across a live
migration. Live migration of ARM64 VMs on Hyper-V is not functional today,
but the Hyper-V team believes they can make it functional.  I have not
explored with them the live migration implications of things beyond time
consistency, like event streams, CNTFRQ_EL0, MIDR/MPIDR, etc.

Would a summary of your point be that live migration across hardware
with different arch counter frequencies is likely to not be possible with
100% fidelity because of these other dependencies on the arch counter
frequency?  (hence the fixed 1 GHz frequency in ARM v8.6)

> 
> > For clockevents, the only timer interrupt that Hyper-V provides
> > in a guest VM is its virtualized "STIMER" interrupt.  There's no
> > virtualization of the ARM arch timer in the guest.
> 
> I think that is rather unfortunate, given it's a core architectural
> feature. Is it just the interrupt that's missing? i.e. does all the
> PE-local functionality behave as the architecture requires?

Right off the bat, I don't know about timer-related PE-local
functionality as it's not exercised in a Linux VM on Hyper-V that is
using STIMER-based clockevents.  I'll explore with the Hyper-V
team.  My impression is that enabling the ARM arch timer in a
guest VM is more work for Hyper-V than just wiring up an
interrupt.

Michael

> 
> Thanks,
> Mark.
> 
> >
> > > >
> > > > Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> > > > Reviewed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
> > > > ---
> > > >  arch/arm64/include/asm/mshyperv.h  | 12 ++++++++++++
> > > >  drivers/clocksource/hyperv_timer.c | 14 ++++++++++++++
> > > >  2 files changed, 26 insertions(+)
> > > >
> > > > diff --git a/arch/arm64/include/asm/mshyperv.h
> b/arch/arm64/include/asm/mshyperv.h
> > > > index c448704..b17299c 100644
> > > > --- a/arch/arm64/include/asm/mshyperv.h
> > > > +++ b/arch/arm64/include/asm/mshyperv.h
> > > > @@ -21,6 +21,7 @@
> > > >  #include <linux/types.h>
> > > >  #include <linux/arm-smccc.h>
> > > >  #include <asm/hyperv-tlfs.h>
> > > > +#include <clocksource/arm_arch_timer.h>
> > > >
> > > >  /*
> > > >   * Declare calls to get and set Hyper-V VP register values on ARM64, which
> > > > @@ -41,6 +42,17 @@ static inline u64 hv_get_register(unsigned int reg)
> > > >  	return hv_get_vpreg(reg);
> > > >  }
> > > >
> > > > +/* Define the interrupt ID used by STIMER0 Direct Mode interrupts. This
> > > > + * value can't come from ACPI tables because it is needed before the
> > > > + * Linux ACPI subsystem is initialized.
> > > > + */
> > > > +#define HYPERV_STIMER0_VECTOR	31
> > > > +
> > > > +static inline u64 hv_get_raw_timer(void)
> > > > +{
> > > > +	return arch_timer_read_counter();
> > > > +}
> > > > +
> > > >  /* SMCCC hypercall parameters */
> > > >  #define HV_SMCCC_FUNC_NUMBER	1
> > > >  #define HV_FUNC_ID	ARM_SMCCC_CALL_VAL(			\
> > > > diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
> > > > index 977fd05..270ad9c 100644
> > > > --- a/drivers/clocksource/hyperv_timer.c
> > > > +++ b/drivers/clocksource/hyperv_timer.c
> > > > @@ -569,3 +569,17 @@ void __init hv_init_clocksource(void)
> > > >  	hv_setup_sched_clock(read_hv_sched_clock_msr);
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(hv_init_clocksource);
> > > > +
> > > > +/* Initialize everything on ARM64 */
> > > > +static int __init hyperv_timer_init(struct acpi_table_header *table)
> > > > +{
> > > > +	if (!hv_is_hyperv_initialized())
> > > > +		return -EINVAL;
> > > > +
> > > > +	hv_init_clocksource();
> > > > +	if (hv_stimer_alloc(true))
> > > > +		return -EINVAL;
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +TIMER_ACPI_DECLARE(hyperv, ACPI_SIG_GTDT, hyperv_timer_init);
> > > > --
> > > > 1.8.3.1
> > > >
Mark Rutland May 18, 2021, 5 p.m. UTC | #5
On Mon, May 17, 2021 at 05:27:49PM +0000, Michael Kelley wrote:
> From: Mark Rutland <mark.rutland@arm.com> Sent: Monday, May 17, 2021 6:08 AM
> > 
> > On Fri, May 14, 2021 at 03:35:15PM +0000, Michael Kelley wrote:
> > > From: Mark Rutland <mark.rutland@arm.com> Sent: Friday, May 14, 2021 5:37 AM
> > > > On Wed, May 12, 2021 at 10:37:43AM -0700, Michael Kelley wrote:
> > > > > Add architecture specific definitions and functions needed
> > > > > by the architecture independent Hyper-V clocksource driver.
> > > > > Update the Hyper-V clocksource driver to be initialized
> > > > > on ARM64.
> > > >
> > > > Previously we've said that for a clocksource we must use the architected
> > > > counter, since that's necessary for things like the VDSO to work
> > > > correctly and efficiently.
> > > >
> > > > Given that, I'm a bit confused that we're registering a per-cpu
> > > > clocksource that is in part based on the architected counter. Likewise,
> > > > I don't entirely follow why it's necessary to PV the clock_event_device.
> > > >
> > > > Are the architected counter and timer reliable without this PV
> > > > infrastructure? Why do we need to PV either of those?
> > > >
> > > > Thanks,
> > > > Mark.
> > >
> > > For the clocksource, we have a requirement to live migrate VMs
> > > between Hyper-V hosts running on hardware that may have different
> > > arch counter frequencies (it's not conformant to the ARM v8.6 1 GHz
> > > requirement).  The Hyper-V virtualization does scaling to handle the
> > > frequency difference.  And yes, there's a tradeoff with vDSO not
> > > working, though we have an out-of-tree vDSO implementation that
> > > we can use when necessary.
> > 
> > Just to be clear, the vDSO is *one example* of something that won't
> > function correctly. More generally, because this undermines core
> > architectural guarantees, it requires more invasive changes (e.g. we'd
> > have to weaken the sanity checks, and not use the counter in things like
> > kexec paths), impacts any architectural features tied to the generic
> > timer/counter (e.g. the event stream, SPE and tracing, future features),
> > and means that other SW (e.g. bootloaders and other EFI applications)
> > are unlikley to function correctly in this environment.
> > 
> > I am very much not keen on trying to PV this.
> > 
> > What does the guest see when it reads CNTFRQ_EL0? Does this match the
> > real HW value (and can this change over time)? Or is this entirely
> > synthetic?
> > 
> > What do the ACPI tables look like in the guest? Is there a GTDT table at
> > all?
> > 
> > How does the counter event stream behave?
> > 
> > Are there other architectural features which Hyper-V does not implement
> > for a guest?
> > 
> > Is there anything else that may change across a migration? e.g. MIDR?
> > MPIDR? Any of the ID registers?
> 
> The ARMv8 architectural system counter and associated registers are visible
> and functional in a VM on Hyper-V.   The "arch_sys_counter" clocksource is
> instantiated by the arm_arch_timer.c driver based on the GTDT in the guest,
> and a Linux guest on Hyper-V runs fine with this clocksource.  Low level code
> like bootloaders and EFI applications work normally.

That's good to hear!

One potential issue worth noting is that as those pieces of software are
unlikely to handle counter frequency changes reliably, and so may not
behave correctly if live-migrated.

> The Hyper-V virtualization provides another Linux clocksource that is an
> overlay on the arch counter and that provides time consistency across a live
> migration. Live migration of ARM64 VMs on Hyper-V is not functional today,
> but the Hyper-V team believes they can make it functional.  I have not
> explored with them the live migration implications of things beyond time
> consistency, like event streams, CNTFRQ_EL0, MIDR/MPIDR, etc.
> 
> Would a summary of your point be that live migration across hardware
> with different arch counter frequencies is likely to not be possible with
> 100% fidelity because of these other dependencies on the arch counter
> frequency?  (hence the fixed 1 GHz frequency in ARM v8.6)

Yes.

In addition, there are a larger set of things necessarily exposed to VMs
that mean that live migration isn't all that practical except betweenm
identical machines (where the counter frequency should be identical),
and the timer frequency might just be the canary in the coalmine. For
example, the cache properties enumerated in CTR_EL0 cannot necessarily
be emulated on another machine.

> > > For clockevents, the only timer interrupt that Hyper-V provides
> > > in a guest VM is its virtualized "STIMER" interrupt.  There's no
> > > virtualization of the ARM arch timer in the guest.
> > 
> > I think that is rather unfortunate, given it's a core architectural
> > feature. Is it just the interrupt that's missing? i.e. does all the
> > PE-local functionality behave as the architecture requires?
> 
> Right off the bat, I don't know about timer-related PE-local
> functionality as it's not exercised in a Linux VM on Hyper-V that is
> using STIMER-based clockevents.  I'll explore with the Hyper-V
> team.  My impression is that enabling the ARM arch timer in a
> guest VM is more work for Hyper-V than just wiring up an
> interrupt.

Thanks for chasing this up!

Mark.
Michael Kelley (LINUX) June 8, 2021, 3:36 p.m. UTC | #6
From: Mark Rutland <mark.rutland@arm.com> Sent: Tuesday, May 18, 2021 10:00 AM
> 
> On Mon, May 17, 2021 at 05:27:49PM +0000, Michael Kelley wrote:
> > From: Mark Rutland <mark.rutland@arm.com> Sent: Monday, May 17, 2021 6:08 AM
> > >
> > > On Fri, May 14, 2021 at 03:35:15PM +0000, Michael Kelley wrote:
> > > > From: Mark Rutland <mark.rutland@arm.com> Sent: Friday, May 14, 2021 5:37 AM
> > > > > On Wed, May 12, 2021 at 10:37:43AM -0700, Michael Kelley wrote:
> > > > > > Add architecture specific definitions and functions needed
> > > > > > by the architecture independent Hyper-V clocksource driver.
> > > > > > Update the Hyper-V clocksource driver to be initialized
> > > > > > on ARM64.
> > > > >
> > > > > Previously we've said that for a clocksource we must use the architected
> > > > > counter, since that's necessary for things like the VDSO to work
> > > > > correctly and efficiently.
> > > > >
> > > > > Given that, I'm a bit confused that we're registering a per-cpu
> > > > > clocksource that is in part based on the architected counter. Likewise,
> > > > > I don't entirely follow why it's necessary to PV the clock_event_device.
> > > > >
> > > > > Are the architected counter and timer reliable without this PV
> > > > > infrastructure? Why do we need to PV either of those?
> > > > >
> > > > > Thanks,
> > > > > Mark.
> > > >
> > > > For the clocksource, we have a requirement to live migrate VMs
> > > > between Hyper-V hosts running on hardware that may have different
> > > > arch counter frequencies (it's not conformant to the ARM v8.6 1 GHz
> > > > requirement).  The Hyper-V virtualization does scaling to handle the
> > > > frequency difference.  And yes, there's a tradeoff with vDSO not
> > > > working, though we have an out-of-tree vDSO implementation that
> > > > we can use when necessary.
> > >
> > > Just to be clear, the vDSO is *one example* of something that won't
> > > function correctly. More generally, because this undermines core
> > > architectural guarantees, it requires more invasive changes (e.g. we'd
> > > have to weaken the sanity checks, and not use the counter in things like
> > > kexec paths), impacts any architectural features tied to the generic
> > > timer/counter (e.g. the event stream, SPE and tracing, future features),
> > > and means that other SW (e.g. bootloaders and other EFI applications)
> > > are unlikley to function correctly in this environment.
> > >
> > > I am very much not keen on trying to PV this.
> > >
> > > What does the guest see when it reads CNTFRQ_EL0? Does this match the
> > > real HW value (and can this change over time)? Or is this entirely
> > > synthetic?
> > >
> > > What do the ACPI tables look like in the guest? Is there a GTDT table at
> > > all?
> > >
> > > How does the counter event stream behave?
> > >
> > > Are there other architectural features which Hyper-V does not implement
> > > for a guest?
> > >
> > > Is there anything else that may change across a migration? e.g. MIDR?
> > > MPIDR? Any of the ID registers?
> >
> > The ARMv8 architectural system counter and associated registers are visible
> > and functional in a VM on Hyper-V.   The "arch_sys_counter" clocksource is
> > instantiated by the arm_arch_timer.c driver based on the GTDT in the guest,
> > and a Linux guest on Hyper-V runs fine with this clocksource.  Low level code
> > like bootloaders and EFI applications work normally.
> 
> That's good to hear!
> 
> One potential issue worth noting is that as those pieces of software are
> unlikely to handle counter frequency changes reliably, and so may not
> behave correctly if live-migrated.
> 
> > The Hyper-V virtualization provides another Linux clocksource that is an
> > overlay on the arch counter and that provides time consistency across a live
> > migration. Live migration of ARM64 VMs on Hyper-V is not functional today,
> > but the Hyper-V team believes they can make it functional.  I have not
> > explored with them the live migration implications of things beyond time
> > consistency, like event streams, CNTFRQ_EL0, MIDR/MPIDR, etc.
> >
> > Would a summary of your point be that live migration across hardware
> > with different arch counter frequencies is likely to not be possible with
> > 100% fidelity because of these other dependencies on the arch counter
> > frequency?  (hence the fixed 1 GHz frequency in ARM v8.6)
> 
> Yes.
> 
> In addition, there are a larger set of things necessarily exposed to VMs
> that mean that live migration isn't all that practical except betweenm
> identical machines (where the counter frequency should be identical),
> and the timer frequency might just be the canary in the coalmine. For
> example, the cache properties enumerated in CTR_EL0 cannot necessarily
> be emulated on another machine.
> 
> > > > For clockevents, the only timer interrupt that Hyper-V provides
> > > > in a guest VM is its virtualized "STIMER" interrupt.  There's no
> > > > virtualization of the ARM arch timer in the guest.
> > >
> > > I think that is rather unfortunate, given it's a core architectural
> > > feature. Is it just the interrupt that's missing? i.e. does all the
> > > PE-local functionality behave as the architecture requires?
> >
> > Right off the bat, I don't know about timer-related PE-local
> > functionality as it's not exercised in a Linux VM on Hyper-V that is
> > using STIMER-based clockevents.  I'll explore with the Hyper-V
> > team.  My impression is that enabling the ARM arch timer in a
> > guest VM is more work for Hyper-V than just wiring up an
> > interrupt.
> 
> Thanks for chasing this up!
> 

I've had a couple rounds of discussions with the Hyper-V team.   For
the clocksource we've agreed to table the live migration discussion, and
I'll resubmit the code so that arm_arch_timer.c provides the
standard arch_sys_counter clocksource.  As noted previously, this just
works for a Hyper-V guest.  The live migration discussion may come
back later after a deeper investigation by Hyper-V.

For clockevents, there's not a near term fix.  It's more than just plumbing
an interrupt for Hyper-V to virtualize the ARM64 arch timer in a guest VM.
From their perspective there's also benefit in having a timer abstraction
that's independent of the architecture, and in the Linux guest, the STIMER
code is common across x86/x64 and ARM64.  It follows the standard Linux
clockevents model, as it should. The code is already in use in out-of-tree
builds in the Linux VMs included in Windows 10 on ARM64 as part of the
so-called "Windows Subsystem for Linux".

So I'm hoping we can get this core support for ARM64 guests on Hyper-V
into upstream using the existing STIMER support.  At some point, Hyper-V
will do the virtualization of the ARM64 arch timer, but we don't want to
have to stay out-of-tree until after that happens.

Thoughts?

Michael
Mark Rutland June 10, 2021, 4:45 p.m. UTC | #7
Hi Michael,

[trimming the bulk of the thrread]

On Tue, Jun 08, 2021 at 03:36:06PM +0000, Michael Kelley wrote:
> I've had a couple rounds of discussions with the Hyper-V team.   For
> the clocksource we've agreed to table the live migration discussion, and
> I'll resubmit the code so that arm_arch_timer.c provides the
> standard arch_sys_counter clocksource.  As noted previously, this just
> works for a Hyper-V guest.  The live migration discussion may come
> back later after a deeper investigation by Hyper-V.

Great; thanks for this!

> For clockevents, there's not a near term fix.  It's more than just plumbing
> an interrupt for Hyper-V to virtualize the ARM64 arch timer in a guest VM.
> From their perspective there's also benefit in having a timer abstraction
> that's independent of the architecture, and in the Linux guest, the STIMER
> code is common across x86/x64 and ARM64.  It follows the standard Linux
> clockevents model, as it should. The code is already in use in out-of-tree
> builds in the Linux VMs included in Windows 10 on ARM64 as part of the
> so-called "Windows Subsystem for Linux".
> 
> So I'm hoping we can get this core support for ARM64 guests on Hyper-V
> into upstream using the existing STIMER support.  At some point, Hyper-V
> will do the virtualization of the ARM64 arch timer, but we don't want to
> have to stay out-of-tree until after that happens.

My main concern here is making sure that we can rely on architected
properties, and don't have to special-case architected bits for hyperv
(or any other hypervisor), since that inevitably causes longer-term
pain.

While in abstract I'm not as worried about using the timer
clock_event_device specifically, that same driver provides the
clocksource and the event stream, and I want those to work as usual,
without being tied into the hyperv code. IIUC that will require some
work, since the driver won't register if the GTDT is missing timer
interrupts (or if there is no GTDT).

I think it really depends on what that looks like.

Thanks,
Mark.
Michael Kelley (LINUX) June 14, 2021, 2:42 a.m. UTC | #8
From: Mark Rutland <mark.rutland@arm.com> Sent: Thursday, June 10, 2021 9:45 AM
> 
> Hi Michael,
> 
> [trimming the bulk of the thrread]
> 
> On Tue, Jun 08, 2021 at 03:36:06PM +0000, Michael Kelley wrote:
> > I've had a couple rounds of discussions with the Hyper-V team.   For
> > the clocksource we've agreed to table the live migration discussion, and
> > I'll resubmit the code so that arm_arch_timer.c provides the
> > standard arch_sys_counter clocksource.  As noted previously, this just
> > works for a Hyper-V guest.  The live migration discussion may come
> > back later after a deeper investigation by Hyper-V.
> 
> Great; thanks for this!
> 
> > For clockevents, there's not a near term fix.  It's more than just plumbing
> > an interrupt for Hyper-V to virtualize the ARM64 arch timer in a guest VM.
> > From their perspective there's also benefit in having a timer abstraction
> > that's independent of the architecture, and in the Linux guest, the STIMER
> > code is common across x86/x64 and ARM64.  It follows the standard Linux
> > clockevents model, as it should. The code is already in use in out-of-tree
> > builds in the Linux VMs included in Windows 10 on ARM64 as part of the
> > so-called "Windows Subsystem for Linux".
> >
> > So I'm hoping we can get this core support for ARM64 guests on Hyper-V
> > into upstream using the existing STIMER support.  At some point, Hyper-V
> > will do the virtualization of the ARM64 arch timer, but we don't want to
> > have to stay out-of-tree until after that happens.
> 
> My main concern here is making sure that we can rely on architected
> properties, and don't have to special-case architected bits for hyperv
> (or any other hypervisor), since that inevitably causes longer-term
> pain.
> 
> While in abstract I'm not as worried about using the timer
> clock_event_device specifically, that same driver provides the
> clocksource and the event stream, and I want those to work as usual,
> without being tied into the hyperv code. IIUC that will require some
> work, since the driver won't register if the GTDT is missing timer
> interrupts (or if there is no GTDT).
> 
> I think it really depends on what that looks like.

Mark,

Here are the details:

The existing initialization and registration code in arm_arch_timer.c
works in a Hyper-V guest with no changes.  As previously mentioned,
the GTDT exists and is correctly populated.  Even though it isn't used,
there's a PPI INTID specified for the virtual timer, just so
the "arm_sys_timer" clockevent can be initialized and registered.
The IRQ shows up in the output of "cat /proc/interrupts" with zero counts
for all CPUs since no interrupts are ever generated.  The EL1 virtual
timer registers (CNTV_CVAL_EL0, CNTV_TVAL_EL0, and CNTV_CTL_EL0)
are accessible in the VM.  The "arm_sys_timer" clockevent is left in
a shutdown state with CNTV_CTL_EL0.ENABLE set to zero when the
Hyper-V STIMER clockevent is registered with a higher rating.

Event streams are initialized and the __delay() implementation
for ARM64 inside the kernel works.  However, on the Ampere
eMAG hardware I'm using for testing, the WFE instruction returns
more quickly than it should even though the event stream fields in
CNTKCTL_EL1 are correct.  I have a query in to the Hyper-V team 
to see if they are trapping WFE and just returning, vs. perhaps the
eMAG processor takes the easy way out and has WFE just return
immediately.  I'm not knowledgeable about other uses of timer
event streams, so let me know if there are other usage scenarios
I should check.

Finally, the "arch_sys_counter" clocksource gets initialized and
setup correctly.  If the Hyper-V clocksource is also initialized,
you can flip between the two clocksources at runtime as expected.
If the Hyper-V clocksource is not setup, then Linux in the VM runs
fine with the "arch_sys_counter" clocksource.

Michael
Michael Kelley (LINUX) June 16, 2021, 8:17 p.m. UTC | #9
From: Michael Kelley <mikelley@microsoft.com> Sent: Sunday, June 13, 2021 7:42 PM
> 
> From: Mark Rutland <mark.rutland@arm.com> Sent: Thursday, June 10, 2021 9:45 AM
> >
> > Hi Michael,
> >
> > [trimming the bulk of the thrread]
> >
> > On Tue, Jun 08, 2021 at 03:36:06PM +0000, Michael Kelley wrote:
> > > I've had a couple rounds of discussions with the Hyper-V team.   For
> > > the clocksource we've agreed to table the live migration discussion, and
> > > I'll resubmit the code so that arm_arch_timer.c provides the
> > > standard arch_sys_counter clocksource.  As noted previously, this just
> > > works for a Hyper-V guest.  The live migration discussion may come
> > > back later after a deeper investigation by Hyper-V.
> >
> > Great; thanks for this!
> >
> > > For clockevents, there's not a near term fix.  It's more than just plumbing
> > > an interrupt for Hyper-V to virtualize the ARM64 arch timer in a guest VM.
> > > From their perspective there's also benefit in having a timer abstraction
> > > that's independent of the architecture, and in the Linux guest, the STIMER
> > > code is common across x86/x64 and ARM64.  It follows the standard Linux
> > > clockevents model, as it should. The code is already in use in out-of-tree
> > > builds in the Linux VMs included in Windows 10 on ARM64 as part of the
> > > so-called "Windows Subsystem for Linux".
> > >
> > > So I'm hoping we can get this core support for ARM64 guests on Hyper-V
> > > into upstream using the existing STIMER support.  At some point, Hyper-V
> > > will do the virtualization of the ARM64 arch timer, but we don't want to
> > > have to stay out-of-tree until after that happens.
> >
> > My main concern here is making sure that we can rely on architected
> > properties, and don't have to special-case architected bits for hyperv
> > (or any other hypervisor), since that inevitably causes longer-term
> > pain.
> >
> > While in abstract I'm not as worried about using the timer
> > clock_event_device specifically, that same driver provides the
> > clocksource and the event stream, and I want those to work as usual,
> > without being tied into the hyperv code. IIUC that will require some
> > work, since the driver won't register if the GTDT is missing timer
> > interrupts (or if there is no GTDT).
> >
> > I think it really depends on what that looks like.
> 
> Mark,
> 
> Here are the details:
> 
> The existing initialization and registration code in arm_arch_timer.c
> works in a Hyper-V guest with no changes.  As previously mentioned,
> the GTDT exists and is correctly populated.  Even though it isn't used,
> there's a PPI INTID specified for the virtual timer, just so
> the "arm_sys_timer" clockevent can be initialized and registered.
> The IRQ shows up in the output of "cat /proc/interrupts" with zero counts
> for all CPUs since no interrupts are ever generated.  The EL1 virtual
> timer registers (CNTV_CVAL_EL0, CNTV_TVAL_EL0, and CNTV_CTL_EL0)
> are accessible in the VM.  The "arm_sys_timer" clockevent is left in
> a shutdown state with CNTV_CTL_EL0.ENABLE set to zero when the
> Hyper-V STIMER clockevent is registered with a higher rating.
> 
> Event streams are initialized and the __delay() implementation
> for ARM64 inside the kernel works.  However, on the Ampere
> eMAG hardware I'm using for testing, the WFE instruction returns
> more quickly than it should even though the event stream fields in
> CNTKCTL_EL1 are correct.  I have a query in to the Hyper-V team
> to see if they are trapping WFE and just returning, vs. perhaps the
> eMAG processor takes the easy way out and has WFE just return
> immediately.  I'm not knowledgeable about other uses of timer
> event streams, so let me know if there are other usage scenarios
> I should check.

I confirmed that Hyper-V is not trapping the WFE instruction.  And
on a Marvell TX2 and on an Ampere Altra, the counter event stream
and WFE provide the expected delay.  Evidently WFE on the eMAG
doesn't actually delay.  Bottom line:  event streams work as expected
in a Hyper-V VM.  No changes needed to arm_arch_timer.[ch].

Michael

> 
> Finally, the "arch_sys_counter" clocksource gets initialized and
> setup correctly.  If the Hyper-V clocksource is also initialized,
> you can flip between the two clocksources at runtime as expected.
> If the Hyper-V clocksource is not setup, then Linux in the VM runs
> fine with the "arch_sys_counter" clocksource.
> 
> Michael
Mark Rutland June 22, 2021, 9:54 a.m. UTC | #10
Hi Michael,

Thanks for all this; comments inline below. I've added Marc Zyngier, who
co-maintains the architected timer code.

On Mon, Jun 14, 2021 at 02:42:23AM +0000, Michael Kelley wrote:
> From: Mark Rutland <mark.rutland@arm.com> Sent: Thursday, June 10, 2021 9:45 AM
> > On Tue, Jun 08, 2021 at 03:36:06PM +0000, Michael Kelley wrote:
> > > I've had a couple rounds of discussions with the Hyper-V team.   For
> > > the clocksource we've agreed to table the live migration discussion, and
> > > I'll resubmit the code so that arm_arch_timer.c provides the
> > > standard arch_sys_counter clocksource.  As noted previously, this just
> > > works for a Hyper-V guest.  The live migration discussion may come
> > > back later after a deeper investigation by Hyper-V.
> > 
> > Great; thanks for this!
> > 
> > > For clockevents, there's not a near term fix.  It's more than just plumbing
> > > an interrupt for Hyper-V to virtualize the ARM64 arch timer in a guest VM.
> > > From their perspective there's also benefit in having a timer abstraction
> > > that's independent of the architecture, and in the Linux guest, the STIMER
> > > code is common across x86/x64 and ARM64.  It follows the standard Linux
> > > clockevents model, as it should. The code is already in use in out-of-tree
> > > builds in the Linux VMs included in Windows 10 on ARM64 as part of the
> > > so-called "Windows Subsystem for Linux".
> > >
> > > So I'm hoping we can get this core support for ARM64 guests on Hyper-V
> > > into upstream using the existing STIMER support.  At some point, Hyper-V
> > > will do the virtualization of the ARM64 arch timer, but we don't want to
> > > have to stay out-of-tree until after that happens.
> > 
> > My main concern here is making sure that we can rely on architected
> > properties, and don't have to special-case architected bits for hyperv
> > (or any other hypervisor), since that inevitably causes longer-term
> > pain.
> > 
> > While in abstract I'm not as worried about using the timer
> > clock_event_device specifically, that same driver provides the
> > clocksource and the event stream, and I want those to work as usual,
> > without being tied into the hyperv code. IIUC that will require some
> > work, since the driver won't register if the GTDT is missing timer
> > interrupts (or if there is no GTDT).
> > 
> > I think it really depends on what that looks like.
> 
> Mark,
> 
> Here are the details:
> 
> The existing initialization and registration code in arm_arch_timer.c
> works in a Hyper-V guest with no changes.  As previously mentioned,
> the GTDT exists and is correctly populated.  Even though it isn't used,
> there's a PPI INTID specified for the virtual timer, just so
> the "arm_sys_timer" clockevent can be initialized and registered.
> The IRQ shows up in the output of "cat /proc/interrupts" with zero counts
> for all CPUs since no interrupts are ever generated. The EL1 virtual
> timer registers (CNTV_CVAL_EL0, CNTV_TVAL_EL0, and CNTV_CTL_EL0)
> are accessible in the VM.  The "arm_sys_timer" clockevent is left in
> a shutdown state with CNTV_CTL_EL0.ENABLE set to zero when the
> Hyper-V STIMER clockevent is registered with a higher rating.

This concerns me, since we're lying to the kernel, and assuming that it
will never try to use this timer. I appreciate that evidently we don't
happen to rely on that today if you register a higher priority timer,
but that does open us up to future fragility (e.g. if we added sanity
checks when registering timers), and IIRC there are ways for userspace
to change the clockevent device today.

> Event streams are initialized and the __delay() implementation
> for ARM64 inside the kernel works.  However, on the Ampere
> eMAG hardware I'm using for testing, the WFE instruction returns
> more quickly than it should even though the event stream fields in
> CNTKCTL_EL1 are correct.  I have a query in to the Hyper-V team 
> to see if they are trapping WFE and just returning, vs. perhaps the
> eMAG processor takes the easy way out and has WFE just return
> immediately.  I'm not knowledgeable about other uses of timer
> event streams, so let me know if there are other usage scenarios
> I should check.

I saw your reply confirming that this is gnerally working as expected
(and that Hyper-V is not trapping WFE) so this sounds fine to me.

> Finally, the "arch_sys_counter" clocksource gets initialized and
> setup correctly.  If the Hyper-V clocksource is also initialized,
> you can flip between the two clocksources at runtime as expected.
> If the Hyper-V clocksource is not setup, then Linux in the VM runs
> fine with the "arch_sys_counter" clocksource.

Great!

As above, my remaining concern here is fragility around the
clockevent_device; I'm not keen that we're lying (in the GTDT) that
interrupts are wired up when they not functional, and while you can get
away with that today, that relies on kernel implementation details that
could change.

Ideally, Hyper-V would provide the architectural timer (as it's already
claiming to in the GTDT), things would "just work", and the Hyper-V
timer would be an optimization rather than a functional necessity.

You mentioned above that Hyper-V will virtualize the timer "at some
point" -- is that already planned, and when is that likely to be?

Marc, do you have any thoughts on this?

Thanks,
Mark.
Marc Zyngier June 23, 2021, 8:56 a.m. UTC | #11
On Tue, 22 Jun 2021 10:54:12 +0100,
Mark Rutland <mark.rutland@arm.com> wrote:
> 
> Hi Michael,
> 
> Thanks for all this; comments inline below. I've added Marc Zyngier, who
> co-maintains the architected timer code.
> 
> On Mon, Jun 14, 2021 at 02:42:23AM +0000, Michael Kelley wrote:
> > From: Mark Rutland <mark.rutland@arm.com> Sent: Thursday, June 10, 2021 9:45 AM
> > > On Tue, Jun 08, 2021 at 03:36:06PM +0000, Michael Kelley wrote:
> > > > I've had a couple rounds of discussions with the Hyper-V team.   For
> > > > the clocksource we've agreed to table the live migration discussion, and
> > > > I'll resubmit the code so that arm_arch_timer.c provides the
> > > > standard arch_sys_counter clocksource.  As noted previously, this just
> > > > works for a Hyper-V guest.  The live migration discussion may come
> > > > back later after a deeper investigation by Hyper-V.
> > > 
> > > Great; thanks for this!
> > > 
> > > > For clockevents, there's not a near term fix.  It's more than just plumbing
> > > > an interrupt for Hyper-V to virtualize the ARM64 arch timer in a guest VM.
> > > > From their perspective there's also benefit in having a timer abstraction
> > > > that's independent of the architecture, and in the Linux guest, the STIMER
> > > > code is common across x86/x64 and ARM64.  It follows the standard Linux
> > > > clockevents model, as it should. The code is already in use in out-of-tree
> > > > builds in the Linux VMs included in Windows 10 on ARM64 as part of the
> > > > so-called "Windows Subsystem for Linux".
> > > >
> > > > So I'm hoping we can get this core support for ARM64 guests on Hyper-V
> > > > into upstream using the existing STIMER support.  At some point, Hyper-V
> > > > will do the virtualization of the ARM64 arch timer, but we don't want to
> > > > have to stay out-of-tree until after that happens.
> > > 
> > > My main concern here is making sure that we can rely on architected
> > > properties, and don't have to special-case architected bits for hyperv
> > > (or any other hypervisor), since that inevitably causes longer-term
> > > pain.
> > > 
> > > While in abstract I'm not as worried about using the timer
> > > clock_event_device specifically, that same driver provides the
> > > clocksource and the event stream, and I want those to work as usual,
> > > without being tied into the hyperv code. IIUC that will require some
> > > work, since the driver won't register if the GTDT is missing timer
> > > interrupts (or if there is no GTDT).
> > > 
> > > I think it really depends on what that looks like.
> > 
> > Mark,
> > 
> > Here are the details:
> > 
> > The existing initialization and registration code in arm_arch_timer.c
> > works in a Hyper-V guest with no changes.  As previously mentioned,
> > the GTDT exists and is correctly populated.  Even though it isn't used,
> > there's a PPI INTID specified for the virtual timer, just so
> > the "arm_sys_timer" clockevent can be initialized and registered.
> > The IRQ shows up in the output of "cat /proc/interrupts" with zero counts
> > for all CPUs since no interrupts are ever generated. The EL1 virtual
> > timer registers (CNTV_CVAL_EL0, CNTV_TVAL_EL0, and CNTV_CTL_EL0)
> > are accessible in the VM.  The "arm_sys_timer" clockevent is left in
> > a shutdown state with CNTV_CTL_EL0.ENABLE set to zero when the
> > Hyper-V STIMER clockevent is registered with a higher rating.
> 
> This concerns me, since we're lying to the kernel, and assuming that it
> will never try to use this timer. I appreciate that evidently we don't
> happen to rely on that today if you register a higher priority timer,
> but that does open us up to future fragility (e.g. if we added sanity
> checks when registering timers), and IIRC there are ways for userspace
> to change the clockevent device today.

Indeed. Userspace can perfectly unbind the clockevent using
/sys/devices/system/clockevents/clockevent*/unbind_device, and the
kernel will be happy to switch to the next per-cpu timer, which
happens to be the arch timer. Oh wait...

>
> > Event streams are initialized and the __delay() implementation
> > for ARM64 inside the kernel works.  However, on the Ampere
> > eMAG hardware I'm using for testing, the WFE instruction returns
> > more quickly than it should even though the event stream fields in
> > CNTKCTL_EL1 are correct.  I have a query in to the Hyper-V team 
> > to see if they are trapping WFE and just returning, vs. perhaps the
> > eMAG processor takes the easy way out and has WFE just return
> > immediately.  I'm not knowledgeable about other uses of timer
> > event streams, so let me know if there are other usage scenarios
> > I should check.
> 
> I saw your reply confirming that this is gnerally working as expected
> (and that Hyper-V is not trapping WFE) so this sounds fine to me.
> 
> > Finally, the "arch_sys_counter" clocksource gets initialized and
> > setup correctly.  If the Hyper-V clocksource is also initialized,
> > you can flip between the two clocksources at runtime as expected.
> > If the Hyper-V clocksource is not setup, then Linux in the VM runs
> > fine with the "arch_sys_counter" clocksource.
> 
> Great!
> 
> As above, my remaining concern here is fragility around the
> clockevent_device; I'm not keen that we're lying (in the GTDT) that
> interrupts are wired up when they not functional, and while you can get
> away with that today, that relies on kernel implementation details that
> could change.
> 
> Ideally, Hyper-V would provide the architectural timer (as it's already
> claiming to in the GTDT), things would "just work", and the Hyper-V
> timer would be an optimization rather than a functional necessity.
> 
> You mentioned above that Hyper-V will virtualize the timer "at some
> point" -- is that already planned, and when is that likely to be?
> 
> Marc, do you have any thoughts on this?

Overall, lying to the kernel is a bad idea. Only implementing half of
the architecture is another bad idea. I doubt the combination of two
bad ideas produces a good one.

If Hyper-V guests need to use another timer (for migration purposes?),
that's fine. But we rely on both the base architecture to be
completely implemented *and* on advertised features to be functional.
I think this has been our position since the first Hyper-V patches
were posted... 3 years ago?

What is the hold up for reliably virtualising the arch timer,
including interrupt delivery?

Thanks,

	M.
Michael Kelley (LINUX) June 28, 2021, 2:21 a.m. UTC | #12
From: Marc Zyngier <maz@kernel.org> Sent: Wednesday, June 23, 2021 1:56 AM
> 
> On Tue, 22 Jun 2021 10:54:12 +0100,
> Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > Hi Michael,
> >
> > Thanks for all this; comments inline below. I've added Marc Zyngier, who
> > co-maintains the architected timer code.
> >
> > On Mon, Jun 14, 2021 at 02:42:23AM +0000, Michael Kelley wrote:
> > > From: Mark Rutland <mark.rutland@arm.com> Sent: Thursday, June 10, 2021 9:45 AM
> > > > On Tue, Jun 08, 2021 at 03:36:06PM +0000, Michael Kelley wrote:
> > > > > I've had a couple rounds of discussions with the Hyper-V team.   For
> > > > > the clocksource we've agreed to table the live migration discussion, and
> > > > > I'll resubmit the code so that arm_arch_timer.c provides the
> > > > > standard arch_sys_counter clocksource.  As noted previously, this just
> > > > > works for a Hyper-V guest.  The live migration discussion may come
> > > > > back later after a deeper investigation by Hyper-V.
> > > >
> > > > Great; thanks for this!
> > > >
> > > > > For clockevents, there's not a near term fix.  It's more than just plumbing
> > > > > an interrupt for Hyper-V to virtualize the ARM64 arch timer in a guest VM.
> > > > > From their perspective there's also benefit in having a timer abstraction
> > > > > that's independent of the architecture, and in the Linux guest, the STIMER
> > > > > code is common across x86/x64 and ARM64.  It follows the standard Linux
> > > > > clockevents model, as it should. The code is already in use in out-of-tree
> > > > > builds in the Linux VMs included in Windows 10 on ARM64 as part of the
> > > > > so-called "Windows Subsystem for Linux".
> > > > >
> > > > > So I'm hoping we can get this core support for ARM64 guests on Hyper-V
> > > > > into upstream using the existing STIMER support.  At some point, Hyper-V
> > > > > will do the virtualization of the ARM64 arch timer, but we don't want to
> > > > > have to stay out-of-tree until after that happens.
> > > >
> > > > My main concern here is making sure that we can rely on architected
> > > > properties, and don't have to special-case architected bits for hyperv
> > > > (or any other hypervisor), since that inevitably causes longer-term
> > > > pain.
> > > >
> > > > While in abstract I'm not as worried about using the timer
> > > > clock_event_device specifically, that same driver provides the
> > > > clocksource and the event stream, and I want those to work as usual,
> > > > without being tied into the hyperv code. IIUC that will require some
> > > > work, since the driver won't register if the GTDT is missing timer
> > > > interrupts (or if there is no GTDT).
> > > >
> > > > I think it really depends on what that looks like.
> > >
> > > Mark,
> > >
> > > Here are the details:
> > >
> > > The existing initialization and registration code in arm_arch_timer.c
> > > works in a Hyper-V guest with no changes.  As previously mentioned,
> > > the GTDT exists and is correctly populated.  Even though it isn't used,
> > > there's a PPI INTID specified for the virtual timer, just so
> > > the "arm_sys_timer" clockevent can be initialized and registered.
> > > The IRQ shows up in the output of "cat /proc/interrupts" with zero counts
> > > for all CPUs since no interrupts are ever generated. The EL1 virtual
> > > timer registers (CNTV_CVAL_EL0, CNTV_TVAL_EL0, and CNTV_CTL_EL0)
> > > are accessible in the VM.  The "arm_sys_timer" clockevent is left in
> > > a shutdown state with CNTV_CTL_EL0.ENABLE set to zero when the
> > > Hyper-V STIMER clockevent is registered with a higher rating.
> >
> > This concerns me, since we're lying to the kernel, and assuming that it
> > will never try to use this timer. I appreciate that evidently we don't
> > happen to rely on that today if you register a higher priority timer,
> > but that does open us up to future fragility (e.g. if we added sanity
> > checks when registering timers), and IIRC there are ways for userspace
> > to change the clockevent device today.
> 
> Indeed. Userspace can perfectly unbind the clockevent using
> /sys/devices/system/clockevents/clockevent*/unbind_device, and the
> kernel will be happy to switch to the next per-cpu timer, which
> happens to be the arch timer. Oh wait...
> 
> >
> > > Event streams are initialized and the __delay() implementation
> > > for ARM64 inside the kernel works.  However, on the Ampere
> > > eMAG hardware I'm using for testing, the WFE instruction returns
> > > more quickly than it should even though the event stream fields in
> > > CNTKCTL_EL1 are correct.  I have a query in to the Hyper-V team
> > > to see if they are trapping WFE and just returning, vs. perhaps the
> > > eMAG processor takes the easy way out and has WFE just return
> > > immediately.  I'm not knowledgeable about other uses of timer
> > > event streams, so let me know if there are other usage scenarios
> > > I should check.
> >
> > I saw your reply confirming that this is gnerally working as expected
> > (and that Hyper-V is not trapping WFE) so this sounds fine to me.
> >
> > > Finally, the "arch_sys_counter" clocksource gets initialized and
> > > setup correctly.  If the Hyper-V clocksource is also initialized,
> > > you can flip between the two clocksources at runtime as expected.
> > > If the Hyper-V clocksource is not setup, then Linux in the VM runs
> > > fine with the "arch_sys_counter" clocksource.
> >
> > Great!
> >
> > As above, my remaining concern here is fragility around the
> > clockevent_device; I'm not keen that we're lying (in the GTDT) that
> > interrupts are wired up when they not functional, and while you can get
> > away with that today, that relies on kernel implementation details that
> > could change.
> >
> > Ideally, Hyper-V would provide the architectural timer (as it's already
> > claiming to in the GTDT), things would "just work", and the Hyper-V
> > timer would be an optimization rather than a functional necessity.
> >
> > You mentioned above that Hyper-V will virtualize the timer "at some
> > point" -- is that already planned, and when is that likely to be?
> >
> > Marc, do you have any thoughts on this?
> 
> Overall, lying to the kernel is a bad idea. Only implementing half of
> the architecture is another bad idea. I doubt the combination of two
> bad ideas produces a good one.
> 
> If Hyper-V guests need to use another timer (for migration purposes?),
> that's fine. But we rely on both the base architecture to be
> completely implemented *and* on advertised features to be functional.
> I think this has been our position since the first Hyper-V patches
> were posted... 3 years ago?
> 
> What is the hold up for reliably virtualising the arch timer,
> including interrupt delivery?

Marc (and Mark) --

In our early interactions about the Hyper-V clocks and timers, the code
was a bit spread out, and you suggested moving all the clocksource
and clockevent stuff to a driver under drivers/clocksource.  See
https://lore.kernel.org/lkml/e0374a07-809c-cabd-2eb6-e6b5ad84742e@arm.com/.
That was a good change independent of any ARM64 considerations,
but I read (or perhaps overread) your comments to say that it was OK
to use these Hyper-V para-virtualized clocks/timers instead of the ARM64
architectural ones in a Hyper-V VM.  They work and it's what the Hyper-V
guys wanted to do anyway, so having Hyper-V offer the ARM64 arch
counter and timer in a VM hasn't been a priority.  They had other stuff that
didn't work at all on ARM64, so that's where their attention went.

I agree that it would be better to have the ARM64 arch counter/timer
fully implemented in a Hyper-V VM.  But we're wanting to find a practical
way to move forward, and in doing so confine any rough edges to Hyper-V
VMs and the Hyper-V specific code in the kernel tree. We're maintaining
and shipping the code out-of-tree based on Hyper-V ARM64 current behavior
and would like to get this core enablement code upstream. Sure, unbinding
the Hyper-V clockevent doesn't work, but that's not a problem in any use
cases we see from customers.

All that said, our discussions with the Hyper-V team are continuing.  We're
still in the process of seeing what's practical to get and when.

Michael

> 
> Thanks,
> 
> 	M.
> 
> --
> Without deviation from the norm, progress is not possible.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/mshyperv.h b/arch/arm64/include/asm/mshyperv.h
index c448704..b17299c 100644
--- a/arch/arm64/include/asm/mshyperv.h
+++ b/arch/arm64/include/asm/mshyperv.h
@@ -21,6 +21,7 @@ 
 #include <linux/types.h>
 #include <linux/arm-smccc.h>
 #include <asm/hyperv-tlfs.h>
+#include <clocksource/arm_arch_timer.h>
 
 /*
  * Declare calls to get and set Hyper-V VP register values on ARM64, which
@@ -41,6 +42,17 @@  static inline u64 hv_get_register(unsigned int reg)
 	return hv_get_vpreg(reg);
 }
 
+/* Define the interrupt ID used by STIMER0 Direct Mode interrupts. This
+ * value can't come from ACPI tables because it is needed before the
+ * Linux ACPI subsystem is initialized.
+ */
+#define HYPERV_STIMER0_VECTOR	31
+
+static inline u64 hv_get_raw_timer(void)
+{
+	return arch_timer_read_counter();
+}
+
 /* SMCCC hypercall parameters */
 #define HV_SMCCC_FUNC_NUMBER	1
 #define HV_FUNC_ID	ARM_SMCCC_CALL_VAL(			\
diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index 977fd05..270ad9c 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -569,3 +569,17 @@  void __init hv_init_clocksource(void)
 	hv_setup_sched_clock(read_hv_sched_clock_msr);
 }
 EXPORT_SYMBOL_GPL(hv_init_clocksource);
+
+/* Initialize everything on ARM64 */
+static int __init hyperv_timer_init(struct acpi_table_header *table)
+{
+	if (!hv_is_hyperv_initialized())
+		return -EINVAL;
+
+	hv_init_clocksource();
+	if (hv_stimer_alloc(true))
+		return -EINVAL;
+
+	return 0;
+}
+TIMER_ACPI_DECLARE(hyperv, ACPI_SIG_GTDT, hyperv_timer_init);