diff mbox series

[2/4] arm64: hyperv: Add support for Hyper-V as a hypervisor

Message ID 20181122031059.16338-2-kys@linuxonhyperv.com (mailing list archive)
State New, archived
Headers show
Series Hyper-V: Enable Linux guests on Hyper-V on ARM64 | expand

Commit Message

kys@linuxonhyperv.com Nov. 22, 2018, 3:10 a.m. UTC
From: Michael Kelley <mikelley@microsoft.com>

Add ARM64-specific code to enable Hyper-V. This code includes:
* Detecting Hyper-V and initializing the guest/Hyper-V interface
* Setting up Hyper-V's synthetic clocks
* Making hypercalls using the HVC instruction
* Setting up VMbus and stimer0 interrupts
* Setting up kexec and crash handlers
This code is architecture dependent code and is mostly driven by
architecture independent code in the VMbus driver in drivers/hv/hv.c
and drivers/hv/vmbus_drv.c.

This code is built only when CONFIG_HYPERV is enabled.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
---
 MAINTAINERS                  |   1 +
 arch/arm64/Makefile          |   1 +
 arch/arm64/hyperv/Makefile   |   2 +
 arch/arm64/hyperv/hv_hvc.S   |  54 +++++
 arch/arm64/hyperv/hv_init.c  | 441 +++++++++++++++++++++++++++++++++++
 arch/arm64/hyperv/mshyperv.c | 178 ++++++++++++++
 6 files changed, 677 insertions(+)
 create mode 100644 arch/arm64/hyperv/Makefile
 create mode 100644 arch/arm64/hyperv/hv_hvc.S
 create mode 100644 arch/arm64/hyperv/hv_init.c
 create mode 100644 arch/arm64/hyperv/mshyperv.c

Comments

Greg Kroah-Hartman Nov. 26, 2018, 7:19 p.m. UTC | #1
On Thu, Nov 22, 2018 at 03:10:57AM +0000, kys@linuxonhyperv.com wrote:
> --- /dev/null
> +++ b/arch/arm64/hyperv/hv_hvc.S
> @@ -0,0 +1,54 @@
> +/* SPDX-License-Identifier: GPL-2.0 */

Using SPDX is good, but:

> +
> +/*
> + * Microsoft Hyper-V hypervisor invocation routines
> + *
> + * Copyright (C) 2018, Microsoft, Inc.
> + *
> + * Author : Michael Kelley <mikelley@microsoft.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published
> + * by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT.  See the GNU General Public License for more
> + * details.

No need for these two paragraphs at all.  If you add them now, someone
will just come along later and rip them out.  Please don't include them
in the first place.

Not to mention the first paragraph doesn't make sense, think about
"which" version 2 the FSF published (hint, there was 5 or 7 or so...)
That's why we have one in the kernel tree for everyone to rely on.

thanks,

greg k-h
Will Deacon Dec. 7, 2018, 1:42 p.m. UTC | #2
On Thu, Nov 22, 2018 at 03:10:57AM +0000, kys@linuxonhyperv.com wrote:
> From: Michael Kelley <mikelley@microsoft.com>
> 
> Add ARM64-specific code to enable Hyper-V. This code includes:
> * Detecting Hyper-V and initializing the guest/Hyper-V interface
> * Setting up Hyper-V's synthetic clocks
> * Making hypercalls using the HVC instruction
> * Setting up VMbus and stimer0 interrupts
> * Setting up kexec and crash handlers
> This code is architecture dependent code and is mostly driven by
> architecture independent code in the VMbus driver in drivers/hv/hv.c
> and drivers/hv/vmbus_drv.c.
> 
> This code is built only when CONFIG_HYPERV is enabled.
> 
> Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
> ---
>  MAINTAINERS                  |   1 +
>  arch/arm64/Makefile          |   1 +
>  arch/arm64/hyperv/Makefile   |   2 +
>  arch/arm64/hyperv/hv_hvc.S   |  54 +++++
>  arch/arm64/hyperv/hv_init.c  | 441 +++++++++++++++++++++++++++++++++++
>  arch/arm64/hyperv/mshyperv.c | 178 ++++++++++++++
>  6 files changed, 677 insertions(+)
>  create mode 100644 arch/arm64/hyperv/Makefile
>  create mode 100644 arch/arm64/hyperv/hv_hvc.S
>  create mode 100644 arch/arm64/hyperv/hv_init.c
>  create mode 100644 arch/arm64/hyperv/mshyperv.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 72f19cef4c48..326eeb32a0cd 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6837,6 +6837,7 @@ F:	arch/x86/kernel/cpu/mshyperv.c
>  F:	arch/x86/hyperv
>  F:	arch/arm64/include/asm/hyperv-tlfs.h
>  F:	arch/arm64/include/asm/mshyperv.h
> +F:	arch/arm64/hyperv
>  F:	drivers/hid/hid-hyperv.c
>  F:	drivers/hv/
>  F:	drivers/input/serio/hyperv-keyboard.c
> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> index 6cb9fc7e9382..ad9ec0579553 100644
> --- a/arch/arm64/Makefile
> +++ b/arch/arm64/Makefile
> @@ -106,6 +106,7 @@ core-y		+= arch/arm64/kernel/ arch/arm64/mm/
>  core-$(CONFIG_NET) += arch/arm64/net/
>  core-$(CONFIG_KVM) += arch/arm64/kvm/
>  core-$(CONFIG_XEN) += arch/arm64/xen/
> +core-$(CONFIG_HYPERV) += arch/arm64/hyperv/
>  core-$(CONFIG_CRYPTO) += arch/arm64/crypto/
>  libs-y		:= arch/arm64/lib/ $(libs-y)
>  core-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
> diff --git a/arch/arm64/hyperv/Makefile b/arch/arm64/hyperv/Makefile
> new file mode 100644
> index 000000000000..988eda55330c
> --- /dev/null
> +++ b/arch/arm64/hyperv/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0
> +obj-y		:= hv_init.o hv_hvc.o mshyperv.o
> diff --git a/arch/arm64/hyperv/hv_hvc.S b/arch/arm64/hyperv/hv_hvc.S
> new file mode 100644
> index 000000000000..82636969b4f2
> --- /dev/null
> +++ b/arch/arm64/hyperv/hv_hvc.S
> @@ -0,0 +1,54 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * Microsoft Hyper-V hypervisor invocation routines
> + *
> + * Copyright (C) 2018, Microsoft, Inc.
> + *
> + * Author : Michael Kelley <mikelley@microsoft.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published
> + * by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT.  See the GNU General Public License for more
> + * details.
> + */
> +
> +#include <linux/linkage.h>
> +
> +	.text
> +/*
> + * Do the HVC instruction.  For Hyper-V the argument is always 1.
> + * x0 contains the hypercall control value, while additional registers
> + * vary depending on the hypercall, and whether the hypercall arguments
> + * are in memory or in registers (a "fast" hypercall per the Hyper-V
> + * TLFS).  When the arguments are in memory x1 is the guest physical
> + * address of the input arguments, and x2 is the guest physical
> + * address of the output arguments.  When the arguments are in
> + * registers, the register values depends on the hypercall.  Note
> + * that this version cannot return any values in registers.
> + */
> +ENTRY(hv_do_hvc)
> +	hvc #1
> +	ret
> +ENDPROC(hv_do_hvc)
> +
> +/*
> + * This variant of HVC invocation is for hv_get_vpreg and
> + * hv_get_vpreg_128. The input parameters are passed in registers
> + * along with a pointer in x4 to where the output result should
> + * be stored. The output is returned in x15 and x16.  x18 is used as
> + * scratch space to avoid buildng a stack frame, as Hyper-V does
> + * not preserve registers x0-x17.
> + */
> +ENTRY(hv_do_hvc_fast_get)
> +	mov x18, x4
> +	hvc #1
> +	str x15,[x18]
> +	str x16,[x18,#8]
> +	ret
> +ENDPROC(hv_do_hvc_fast_get)

Why are you not following the ARM SMCCC? It would be /much/ better if you
could just follow the standard, which is already implemented by
include/linux/smccc.h.

Will
Marc Zyngier Dec. 7, 2018, 2:43 p.m. UTC | #3
On 22/11/2018 03:10, kys@linuxonhyperv.com wrote:
> From: Michael Kelley <mikelley@microsoft.com>
> 
> Add ARM64-specific code to enable Hyper-V. This code includes:
> * Detecting Hyper-V and initializing the guest/Hyper-V interface
> * Setting up Hyper-V's synthetic clocks
> * Making hypercalls using the HVC instruction
> * Setting up VMbus and stimer0 interrupts
> * Setting up kexec and crash handlers

This commit message is a clear indication that this should be split in
at least 5 different patches.

> This code is architecture dependent code and is mostly driven by
> architecture independent code in the VMbus driver in drivers/hv/hv.c
> and drivers/hv/vmbus_drv.c.
> 
> This code is built only when CONFIG_HYPERV is enabled.
> 
> Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
> ---
>  MAINTAINERS                  |   1 +
>  arch/arm64/Makefile          |   1 +
>  arch/arm64/hyperv/Makefile   |   2 +
>  arch/arm64/hyperv/hv_hvc.S   |  54 +++++
>  arch/arm64/hyperv/hv_init.c  | 441 +++++++++++++++++++++++++++++++++++
>  arch/arm64/hyperv/mshyperv.c | 178 ++++++++++++++
>  6 files changed, 677 insertions(+)
>  create mode 100644 arch/arm64/hyperv/Makefile
>  create mode 100644 arch/arm64/hyperv/hv_hvc.S
>  create mode 100644 arch/arm64/hyperv/hv_init.c
>  create mode 100644 arch/arm64/hyperv/mshyperv.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 72f19cef4c48..326eeb32a0cd 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6837,6 +6837,7 @@ F:	arch/x86/kernel/cpu/mshyperv.c
>  F:	arch/x86/hyperv
>  F:	arch/arm64/include/asm/hyperv-tlfs.h
>  F:	arch/arm64/include/asm/mshyperv.h
> +F:	arch/arm64/hyperv
>  F:	drivers/hid/hid-hyperv.c
>  F:	drivers/hv/
>  F:	drivers/input/serio/hyperv-keyboard.c
> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> index 6cb9fc7e9382..ad9ec0579553 100644
> --- a/arch/arm64/Makefile
> +++ b/arch/arm64/Makefile
> @@ -106,6 +106,7 @@ core-y		+= arch/arm64/kernel/ arch/arm64/mm/
>  core-$(CONFIG_NET) += arch/arm64/net/
>  core-$(CONFIG_KVM) += arch/arm64/kvm/
>  core-$(CONFIG_XEN) += arch/arm64/xen/
> +core-$(CONFIG_HYPERV) += arch/arm64/hyperv/
>  core-$(CONFIG_CRYPTO) += arch/arm64/crypto/
>  libs-y		:= arch/arm64/lib/ $(libs-y)
>  core-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
> diff --git a/arch/arm64/hyperv/Makefile b/arch/arm64/hyperv/Makefile
> new file mode 100644
> index 000000000000..988eda55330c
> --- /dev/null
> +++ b/arch/arm64/hyperv/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0
> +obj-y		:= hv_init.o hv_hvc.o mshyperv.o
> diff --git a/arch/arm64/hyperv/hv_hvc.S b/arch/arm64/hyperv/hv_hvc.S
> new file mode 100644
> index 000000000000..82636969b4f2
> --- /dev/null
> +++ b/arch/arm64/hyperv/hv_hvc.S
> @@ -0,0 +1,54 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * Microsoft Hyper-V hypervisor invocation routines
> + *
> + * Copyright (C) 2018, Microsoft, Inc.
> + *
> + * Author : Michael Kelley <mikelley@microsoft.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published
> + * by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT.  See the GNU General Public License for more
> + * details.
> + */
> +
> +#include <linux/linkage.h>
> +
> +	.text
> +/*
> + * Do the HVC instruction.  For Hyper-V the argument is always 1.
> + * x0 contains the hypercall control value, while additional registers
> + * vary depending on the hypercall, and whether the hypercall arguments
> + * are in memory or in registers (a "fast" hypercall per the Hyper-V
> + * TLFS).  When the arguments are in memory x1 is the guest physical
> + * address of the input arguments, and x2 is the guest physical
> + * address of the output arguments.  When the arguments are in
> + * registers, the register values depends on the hypercall.  Note
> + * that this version cannot return any values in registers.
> + */
> +ENTRY(hv_do_hvc)
> +	hvc #1
> +	ret
> +ENDPROC(hv_do_hvc)
> +
> +/*
> + * This variant of HVC invocation is for hv_get_vpreg and
> + * hv_get_vpreg_128. The input parameters are passed in registers
> + * along with a pointer in x4 to where the output result should
> + * be stored. The output is returned in x15 and x16.  x18 is used as
> + * scratch space to avoid buildng a stack frame, as Hyper-V does
> + * not preserve registers x0-x17.
> + */
> +ENTRY(hv_do_hvc_fast_get)
> +	mov x18, x4
> +	hvc #1
> +	str x15,[x18]
> +	str x16,[x18,#8]
> +	ret
> +ENDPROC(hv_do_hvc_fast_get)

As Will said, this isn't a viable option. Please follow SMCCC 1.1.

> diff --git a/arch/arm64/hyperv/hv_init.c b/arch/arm64/hyperv/hv_init.c
> new file mode 100644
> index 000000000000..aa1a8c09d989
> --- /dev/null
> +++ b/arch/arm64/hyperv/hv_init.c
> @@ -0,0 +1,441 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Initialization of the interface with Microsoft's Hyper-V hypervisor,
> + * and various low level utility routines for interacting with Hyper-V.
> + *
> + * Copyright (C) 2018, Microsoft, Inc.
> + *
> + * Author : Michael Kelley <mikelley@microsoft.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published
> + * by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT.  See the GNU General Public License for more
> + * details.
> + */
> +
> +
> +#include <linux/types.h>
> +#include <linux/version.h>
> +#include <linux/export.h>
> +#include <linux/vmalloc.h>
> +#include <linux/mm.h>
> +#include <linux/clocksource.h>
> +#include <linux/sched_clock.h>
> +#include <linux/acpi.h>
> +#include <linux/module.h>
> +#include <linux/hyperv.h>
> +#include <linux/slab.h>
> +#include <linux/cpuhotplug.h>
> +#include <linux/psci.h>
> +#include <asm-generic/bug.h>
> +#include <asm/hypervisor.h>
> +#include <asm/hyperv-tlfs.h>
> +#include <asm/mshyperv.h>
> +#include <asm/sysreg.h>
> +#include <clocksource/arm_arch_timer.h>
> +
> +static bool	hyperv_initialized;
> +struct		ms_hyperv_info ms_hyperv;
> +EXPORT_SYMBOL_GPL(ms_hyperv);

Who are the users of this structure? Should they go via accessors instead?

> +
> +static struct ms_hyperv_tsc_page *tsc_pg;
> +
> +struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
> +{
> +	return tsc_pg;
> +}
> +EXPORT_SYMBOL_GPL(hv_get_tsc_page);
> +
> +static u64 read_hv_sched_clock_tsc(void)
> +{
> +	u64 current_tick = hv_read_tsc_page(tsc_pg);
> +
> +	if (current_tick == U64_MAX)
> +		current_tick = hv_get_vpreg(HV_REGISTER_TIME_REFCOUNT);
> +
> +	return current_tick;
> +}
> +
> +static u64 read_hv_clock_tsc(struct clocksource *arg)
> +{
> +	u64 current_tick = hv_read_tsc_page(tsc_pg);
> +
> +	if (current_tick == U64_MAX)
> +		current_tick = hv_get_vpreg(HV_REGISTER_TIME_REFCOUNT);
> +
> +	return current_tick;
> +}

Can't you define one function in terms of the other?

> +
> +static struct clocksource hyperv_cs_tsc = {
> +		.name		= "hyperv_clocksource_tsc_page",
> +		.rating		= 400,
> +		.read		= read_hv_clock_tsc,
> +		.mask		= CLOCKSOURCE_MASK(64),
> +		.flags		= CLOCK_SOURCE_IS_CONTINUOUS,
> +};
> +
> +static u64 read_hv_sched_clock_msr(void)
> +{
> +	return hv_get_vpreg(HV_REGISTER_TIME_REFCOUNT);
> +}
> +
> +static u64 read_hv_clock_msr(struct clocksource *arg)
> +{
> +	return hv_get_vpreg(HV_REGISTER_TIME_REFCOUNT);
> +}
> +
> +static struct clocksource hyperv_cs_msr = {
> +	.name		= "hyperv_clocksource_msr",
> +	.rating		= 400,
> +	.read		= read_hv_clock_msr,
> +	.mask		= CLOCKSOURCE_MASK(64),
> +	.flags		= CLOCK_SOURCE_IS_CONTINUOUS,
> +};
> +
> +struct clocksource *hyperv_cs;
> +EXPORT_SYMBOL_GPL(hyperv_cs);

Why? Who needs to poke this?

> +
> +u32 *hv_vp_index;
> +EXPORT_SYMBOL_GPL(hv_vp_index);
> +
> +u32 hv_max_vp_index;

Same thing.

> +
> +static int hv_cpu_init(unsigned int cpu)
> +{
> +	u64 msr_vp_index;
> +
> +	hv_get_vp_index(msr_vp_index);
> +
> +	hv_vp_index[smp_processor_id()] = msr_vp_index;
> +
> +	if (msr_vp_index > hv_max_vp_index)
> +		hv_max_vp_index = msr_vp_index;
> +
> +	return 0;
> +}

Is that some new way to describe a CPU topology? If so, why isn't that
exposed via the ACPI tables that the kernel already parses?

> +
> +/*
> + * This function is invoked via the ACPI clocksource probe mechanism. We
> + * don't actually use any values from the ACPI GTDT table, but we set up

This doesn't feel like a good idea at all. Piggy-backing on an existing
mechanism and use it for something completely different is not exactly
future-proof.

Also, if this is supposed to be a clocksource, why isn't that a
clocksource driver on its own right?

> + * the Hyper-V synthetic clocksource and do other initialization for
> + * interacting with Hyper-V the first time.  Using early_initcall to invoke
> + * this function is too late because interrupts are already enabled at that
> + * point, and sched_clock_register must run before interrupts are enabled.
> + *
> + * 1. Setup the guest ID.
> + * 2. Get features and hints info from Hyper-V
> + * 3. Setup per-cpu VP indices.
> + * 4. Register Hyper-V specific clocksource.
> + * 5. Register the scheduler clock.
> + */
> +
> +static int __init hyperv_init(struct acpi_table_header *table)
> +{
> +	struct hv_get_vp_register_output result;
> +	u32	a, b, c, d;
> +	u64	guest_id;
> +	int	i;
> +
> +	/*
> +	 * If we're in a VM on Hyper-V, the ACPI hypervisor_id field will
> +	 * have the string "MsHyperV".
> +	 */
> +	if (strncmp((char *)&acpi_gbl_FADT.hypervisor_id, "MsHyperV", 8))
> +		return 1;
> +
> +	/* Setup the guest ID */
> +	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
> +	hv_set_vpreg(HV_REGISTER_GUEST_OSID, guest_id);
> +
> +	/* Get the features and hints from Hyper-V */
> +	hv_get_vpreg_128(HV_REGISTER_PRIVILEGES_AND_FEATURES, &result);
> +	ms_hyperv.features = lower_32_bits(result.registervaluelow);
> +	ms_hyperv.misc_features = upper_32_bits(result.registervaluehigh);
> +
> +	hv_get_vpreg_128(HV_REGISTER_FEATURES, &result);
> +	ms_hyperv.hints = lower_32_bits(result.registervaluelow);
> +
> +	pr_info("Hyper-V: Features 0x%x, hints 0x%x\n",
> +		ms_hyperv.features, ms_hyperv.hints);
> +
> +	/*
> +	 * Direct mode is the only option for STIMERs provided Hyper-V
> +	 * on ARM64, so Hyper-V doesn't actually set the flag.  But add the
> +	 * flag so the architecture independent code in drivers/hv/hv.c
> +	 * will correctly use that mode.
> +	 */
> +	ms_hyperv.misc_features |= HV_STIMER_DIRECT_MODE_AVAILABLE;
> +
> +	/*
> +	 * Hyper-V on ARM64 doesn't support AutoEOI.  Add the hint
> +	 * that tells architecture independent code not to use this
> +	 * feature.
> +	 */
> +	ms_hyperv.hints |= HV_DEPRECATING_AEOI_RECOMMENDED;
> +
> +	/* Get information about the Hyper-V host version */
> +	hv_get_vpreg_128(HV_REGISTER_HYPERVISOR_VERSION, &result);
> +	a = lower_32_bits(result.registervaluelow);
> +	b = upper_32_bits(result.registervaluelow);
> +	c = lower_32_bits(result.registervaluehigh);
> +	d = upper_32_bits(result.registervaluehigh);
> +	pr_info("Hyper-V: Host Build %d.%d.%d.%d-%d-%d\n",
> +		b >> 16, b & 0xFFFF, a, d & 0xFFFFFF, c, d >> 24);
> +
> +	/* Allocate percpu VP index */
> +	hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
> +				    GFP_KERNEL);

Why isn't this a percpu variable?

> +	if (!hv_vp_index)
> +		return 1;
> +
> +	for (i = 0; i < num_possible_cpus(); i++)
> +		hv_vp_index[i] = VP_INVAL;
> +
> +	if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "arm64/hyperv_init:online",
> +			      hv_cpu_init, NULL) < 0)
> +		goto free_vp_index;
> +
> +	/*
> +	 * Try to set up what Hyper-V calls the "TSC reference page", which
> +	 * uses the ARM Generic Timer virtual counter with some scaling
> +	 * information to provide a fast and stable guest VM clocksource.
> +	 * If the TSC reference page can't be set up, fall back to reading
> +	 * the guest clock provided by Hyper-V's synthetic reference time
> +	 * register.
> +	 */
> +	if (ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE) {
> +
> +		u64		tsc_msr;
> +		phys_addr_t	phys_addr;
> +
> +		tsc_pg = __vmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL);

And why not vmalloc?

> +		if (tsc_pg) {
> +			phys_addr = page_to_phys(vmalloc_to_page(tsc_pg));
> +			tsc_msr = hv_get_vpreg(HV_REGISTER_REFERENCE_TSC);
> +			tsc_msr &= GENMASK_ULL(11, 0);

Magic.

> +			tsc_msr = tsc_msr | 0x1 | (u64)phys_addr;
> +			hv_set_vpreg(HV_REGISTER_REFERENCE_TSC, tsc_msr);
> +			hyperv_cs = &hyperv_cs_tsc;
> +			sched_clock_register(read_hv_sched_clock_tsc,
> +						64, HV_CLOCK_HZ);
> +		}
> +	}
> +
> +	if (!hyperv_cs &&
> +	    (ms_hyperv.features & HV_MSR_TIME_REF_COUNT_AVAILABLE)) {
> +		hyperv_cs = &hyperv_cs_msr;
> +		sched_clock_register(read_hv_sched_clock_msr,
> +						64, HV_CLOCK_HZ);
> +	}
> +
> +	if (hyperv_cs) {
> +		hyperv_cs->archdata.vdso_direct = false;
> +		clocksource_register_hz(hyperv_cs, HV_CLOCK_HZ);
> +	}
> +
> +	hyperv_initialized = true;
> +	return 0;
> +
> +free_vp_index:
> +	kfree(hv_vp_index);
> +	hv_vp_index = NULL;
> +	return 1;

????

> +}
> +TIMER_ACPI_DECLARE(hyperv, ACPI_SIG_GTDT, hyperv_init);
> +
> +/*
> + * This routine is called before kexec/kdump, it does the required cleanup.
> + */
> +void hyperv_cleanup(void)
> +{
> +	/* Reset our OS id */
> +	hv_set_vpreg(HV_REGISTER_GUEST_OSID, 0);
> +
> +}
> +EXPORT_SYMBOL_GPL(hyperv_cleanup);
> +
> +/*
> + * hv_do_hypercall- Invoke the specified hypercall
> + */
> +u64 hv_do_hypercall(u64 control, void *input, void *output)
> +{
> +	u64 input_address;
> +	u64 output_address;
> +
> +	input_address = input ? virt_to_phys(input) : 0;
> +	output_address = output ? virt_to_phys(output) : 0;
> +	return hv_do_hvc(control, input_address, output_address);
> +}
> +EXPORT_SYMBOL_GPL(hv_do_hypercall);
> +
> +/*
> + * hv_do_fast_hypercall8 -- Invoke the specified hypercall
> + * with arguments in registers instead of physical memory.
> + * Avoids the overhead of virt_to_phys for simple hypercalls.
> + */
> +
> +u64 hv_do_fast_hypercall8(u16 code, u64 input)
> +{
> +	u64 control;
> +
> +	control = (u64)code | HV_HYPERCALL_FAST_BIT;
> +	return hv_do_hvc(control, input);
> +}
> +EXPORT_SYMBOL_GPL(hv_do_fast_hypercall8);
> +
> +
> +/*
> + * Set a single VP register to a 64-bit value.
> + */
> +void hv_set_vpreg(u32 msr, u64 value)
> +{
> +	union hv_hypercall_status status;
> +
> +	status.as_uint64 = hv_do_hvc(
> +		HVCALL_SET_VP_REGISTERS | HV_HYPERCALL_FAST_BIT |
> +			HV_HYPERCALL_REP_COUNT_1,
> +		HV_PARTITION_ID_SELF,
> +		HV_VP_INDEX_SELF,
> +		msr,
> +		0,
> +		value,
> +		0);
> +
> +	/*
> +	 * Something is fundamentally broken in the hypervisor if
> +	 * setting a VP register fails. There's really no way to
> +	 * continue as a guest VM, so panic.
> +	 */
> +	BUG_ON(status.status != HV_STATUS_SUCCESS);
> +}
> +EXPORT_SYMBOL_GPL(hv_set_vpreg);
> +
> +
> +/*
> + * Get the value of a single VP register, and only the low order 64 bits.
> + */
> +u64 hv_get_vpreg(u32 msr)
> +{
> +	union hv_hypercall_status status;
> +	struct hv_get_vp_register_output output;
> +
> +	status.as_uint64 = hv_do_hvc_fast_get(
> +		HVCALL_GET_VP_REGISTERS | HV_HYPERCALL_FAST_BIT |
> +			HV_HYPERCALL_REP_COUNT_1,
> +		HV_PARTITION_ID_SELF,
> +		HV_VP_INDEX_SELF,
> +		msr,
> +		&output);
> +
> +	/*
> +	 * Something is fundamentally broken in the hypervisor if
> +	 * getting a VP register fails. There's really no way to
> +	 * continue as a guest VM, so panic.
> +	 */
> +	BUG_ON(status.status != HV_STATUS_SUCCESS);
> +
> +	return output.registervaluelow;
> +}
> +EXPORT_SYMBOL_GPL(hv_get_vpreg);
> +
> +/*
> + * Get the value of a single VP register that is 128 bits in size.  This is a
> + * separate call in order to avoid complicating the calling sequence for
> + * the much more frequently used 64-bit version.
> + */
> +void hv_get_vpreg_128(u32 msr, struct hv_get_vp_register_output *result)
> +{
> +	union hv_hypercall_status status;
> +
> +	status.as_uint64 = hv_do_hvc_fast_get(
> +		HVCALL_GET_VP_REGISTERS | HV_HYPERCALL_FAST_BIT |
> +			HV_HYPERCALL_REP_COUNT_1,
> +		HV_PARTITION_ID_SELF,
> +		HV_VP_INDEX_SELF,
> +		msr,
> +		result);
> +
> +	/*
> +	 * Something is fundamentally broken in the hypervisor if
> +	 * getting a VP register fails. There's really no way to
> +	 * continue as a guest VM, so panic.
> +	 */
> +	BUG_ON(status.status != HV_STATUS_SUCCESS);
> +
> +	return;
> +
> +}
> +EXPORT_SYMBOL_GPL(hv_get_vpreg_128);
> +
> +void hyperv_report_panic(struct pt_regs *regs, long err)
> +{
> +	static bool panic_reported;
> +	u64 guest_id;
> +
> +	/*
> +	 * We prefer to report panic on 'die' chain as we have proper
> +	 * registers to report, but if we miss it (e.g. on BUG()) we need
> +	 * to report it on 'panic'.
> +	 */
> +	if (panic_reported)
> +		return;
> +	panic_reported = true;
> +
> +	guest_id = hv_get_vpreg(HV_REGISTER_GUEST_OSID);
> +
> +	/*
> +	 * Hyper-V provides the ability to store only 5 values.
> +	 * Pick the passed in error value, the guest_id, and the PC.
> +	 * The first two general registers are added arbitrarily.
> +	 */
> +	hv_set_vpreg(HV_REGISTER_CRASH_P0, err);
> +	hv_set_vpreg(HV_REGISTER_CRASH_P1, guest_id);
> +	hv_set_vpreg(HV_REGISTER_CRASH_P2, regs->pc);
> +	hv_set_vpreg(HV_REGISTER_CRASH_P3, regs->regs[0]);
> +	hv_set_vpreg(HV_REGISTER_CRASH_P4, regs->regs[1]);
> +
> +	/*
> +	 * Let Hyper-V know there is crash data available
> +	 */
> +	hv_set_vpreg(HV_REGISTER_CRASH_CTL, HV_CRASH_CTL_CRASH_NOTIFY);
> +}
> +EXPORT_SYMBOL_GPL(hyperv_report_panic);
> +
> +/*
> + * hyperv_report_panic_msg - report panic message to Hyper-V
> + * @pa: physical address of the panic page containing the message
> + * @size: size of the message in the page
> + */
> +void hyperv_report_panic_msg(phys_addr_t pa, size_t size)
> +{
> +	/*
> +	 * P3 to contain the physical address of the panic page & P4 to
> +	 * contain the size of the panic data in that page. Rest of the
> +	 * registers are no-op when the NOTIFY_MSG flag is set.
> +	 */
> +	hv_set_vpreg(HV_REGISTER_CRASH_P0, 0);
> +	hv_set_vpreg(HV_REGISTER_CRASH_P1, 0);
> +	hv_set_vpreg(HV_REGISTER_CRASH_P2, 0);
> +	hv_set_vpreg(HV_REGISTER_CRASH_P3, pa);
> +	hv_set_vpreg(HV_REGISTER_CRASH_P4, size);
> +
> +	/*
> +	 * Let Hyper-V know there is crash data available along with
> +	 * the panic message.
> +	 */
> +	hv_set_vpreg(HV_REGISTER_CRASH_CTL,
> +	       (HV_CRASH_CTL_CRASH_NOTIFY | HV_CRASH_CTL_CRASH_NOTIFY_MSG));
> +}
> +EXPORT_SYMBOL_GPL(hyperv_report_panic_msg);
> +
> +bool hv_is_hyperv_initialized(void)
> +{
> +	return hyperv_initialized;
> +}
> +EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
> diff --git a/arch/arm64/hyperv/mshyperv.c b/arch/arm64/hyperv/mshyperv.c
> new file mode 100644
> index 000000000000..3ef055599412
> --- /dev/null
> +++ b/arch/arm64/hyperv/mshyperv.c
> @@ -0,0 +1,178 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Core routines for interacting with Microsoft's Hyper-V hypervisor,
> + * including setting up VMbus and STIMER interrupts, and handling
> + * crashes and kexecs. These interactions are through a set of
> + * static "handler" variables set by the architecture independent
> + * VMbus and STIMER drivers.  This design is used to meet x86/x64
> + * requirements for avoiding direct linkages and allowing the VMbus
> + * and STIMER drivers to be unloaded and reloaded.
> + *
> + * Copyright (C) 2018, Microsoft, Inc.
> + *
> + * Author : Michael Kelley <mikelley@microsoft.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published
> + * by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT.  See the GNU General Public License for more
> + * details.
> + */
> +
> +#include <linux/types.h>
> +#include <linux/export.h>
> +#include <linux/interrupt.h>
> +#include <linux/kexec.h>
> +#include <linux/acpi.h>
> +#include <linux/ptrace.h>
> +#include <asm/hyperv-tlfs.h>
> +#include <asm/mshyperv.h>
> +
> +static void (*vmbus_handler)(void);
> +static void (*hv_stimer0_handler)(void);
> +static void (*hv_kexec_handler)(void);
> +static void (*hv_crash_handler)(struct pt_regs *regs);
> +
> +static int vmbus_irq;
> +static long __percpu *vmbus_evt;
> +static long __percpu *stimer0_evt;
> +
> +irqreturn_t hyperv_vector_handler(int irq, void *dev_id)
> +{
> +	if (vmbus_handler)
> +		vmbus_handler();

In which circumstances can this be NULL?

> +	return IRQ_HANDLED;
> +}
> +
> +/* Must be done just once */
> +void hv_setup_vmbus_irq(void (*handler)(void))
> +{
> +	int result;
> +
> +	vmbus_handler = handler;

If thismust only be done once, maybe you could check that it hasn't been
done already?

> +	vmbus_irq = acpi_register_gsi(NULL, HYPERVISOR_CALLBACK_VECTOR,
> +				 ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH);
> +	if (vmbus_irq <= 0) {
> +		pr_err("Can't register Hyper-V VMBus GSI. Error %d",
> +			vmbus_irq);
> +		vmbus_irq = 0;
> +		return;
> +	}
> +	vmbus_evt = alloc_percpu(long);
> +	result = request_percpu_irq(vmbus_irq, hyperv_vector_handler,
> +			"Hyper-V VMbus", vmbus_evt);

If this is a per-cpu interrupt, why isn't it signalled as a PPI, in an
architecture compliant way?

> +	if (result) {
> +		pr_err("Can't request Hyper-V VMBus IRQ %d. Error %d",
> +			vmbus_irq, result);
> +		free_percpu(vmbus_evt);
> +		acpi_unregister_gsi(vmbus_irq);
> +		vmbus_irq = 0;
> +	}
> +}
> +EXPORT_SYMBOL_GPL(hv_setup_vmbus_irq);
> +
> +/* Must be done just once */
> +void hv_remove_vmbus_irq(void)
> +{
> +	if (vmbus_irq) {
> +		free_percpu_irq(vmbus_irq, vmbus_evt);
> +		free_percpu(vmbus_evt);
> +		acpi_unregister_gsi(vmbus_irq);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(hv_remove_vmbus_irq);
> +
> +/* Must be done by each CPU */
> +void hv_enable_vmbus_irq(void)
> +{
> +	enable_percpu_irq(vmbus_irq, 0);
> +}
> +EXPORT_SYMBOL_GPL(hv_enable_vmbus_irq);
> +
> +/* Must be done by each CPU */
> +void hv_disable_vmbus_irq(void)
> +{
> +	disable_percpu_irq(vmbus_irq);
> +}
> +EXPORT_SYMBOL_GPL(hv_disable_vmbus_irq);
> +
> +/* Routines to do per-architecture handling of STIMER0 when in Direct Mode */
> +
> +static irqreturn_t hv_stimer0_vector_handler(int irq, void *dev_id)
> +{
> +	if (hv_stimer0_handler)
> +		hv_stimer0_handler();
> +	return IRQ_HANDLED;
> +}
> +
> +int hv_setup_stimer0_irq(int *irq, int *vector, void (*handler)(void))
> +{
> +	int localirq;
> +	int result;
> +
> +	localirq = acpi_register_gsi(NULL, HV_STIMER0_IRQNR,
> +			ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH);
> +	if (localirq <= 0) {
> +		pr_err("Can't register Hyper-V stimer0 GSI. Error %d",
> +			localirq);
> +		*irq = 0;
> +		return -1;
> +	}
> +	stimer0_evt = alloc_percpu(long);
> +	result = request_percpu_irq(localirq, hv_stimer0_vector_handler,
> +					 "Hyper-V stimer0", stimer0_evt);
> +	if (result) {
> +		pr_err("Can't request Hyper-V stimer0 IRQ %d. Error %d",
> +			localirq, result);
> +		free_percpu(stimer0_evt);
> +		acpi_unregister_gsi(localirq);
> +		*irq = 0;
> +		return -1;
> +	}
> +
> +	hv_stimer0_handler = handler;
> +	*vector = HV_STIMER0_IRQNR;
> +	*irq = localirq;
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(hv_setup_stimer0_irq);
> +
> +void hv_remove_stimer0_irq(int irq)
> +{
> +	hv_stimer0_handler = NULL;
> +	if (irq) {
> +		free_percpu_irq(irq, stimer0_evt);
> +		free_percpu(stimer0_evt);
> +		acpi_unregister_gsi(irq);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(hv_remove_stimer0_irq);
> +
> +void hv_setup_kexec_handler(void (*handler)(void))
> +{
> +	hv_kexec_handler = handler;
> +}
> +EXPORT_SYMBOL_GPL(hv_setup_kexec_handler);
> +
> +void hv_remove_kexec_handler(void)
> +{
> +	hv_kexec_handler = NULL;
> +}
> +EXPORT_SYMBOL_GPL(hv_remove_kexec_handler);
> +
> +void hv_setup_crash_handler(void (*handler)(struct pt_regs *regs))
> +{
> +	hv_crash_handler = handler;
> +}
> +EXPORT_SYMBOL_GPL(hv_setup_crash_handler);
> +
> +void hv_remove_crash_handler(void)
> +{
> +	hv_crash_handler = NULL;
> +}
> +EXPORT_SYMBOL_GPL(hv_remove_crash_handler);
> 

Overall, this patch needs a massive split up, clean up, and a good dose
of documentation so that we can understand what you are trying to achieve.

Thanks,

	M.
Michael Kelley (LINUX) Dec. 12, 2018, 5 a.m. UTC | #4
From: Marc Zyngier <marc.zyngier@arm.com>  Sent: Friday, December 7, 2018 6:43 AM

> > Add ARM64-specific code to enable Hyper-V. This code includes:
> > * Detecting Hyper-V and initializing the guest/Hyper-V interface
> > * Setting up Hyper-V's synthetic clocks
> > * Making hypercalls using the HVC instruction
> > * Setting up VMbus and stimer0 interrupts
> > * Setting up kexec and crash handlers
> 
> This commit message is a clear indication that this should be split in
> at least 5 different patches.

OK, I'll work on separating into multiple layered patches in the next
version.

> 
> > This code is architecture dependent code and is mostly driven by
> > architecture independent code in the VMbus driver in drivers/hv/hv.c
> > and drivers/hv/vmbus_drv.c.
> >
> > This code is built only when CONFIG_HYPERV is enabled.
> >
> > Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> > Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
> > ---
> >  MAINTAINERS                  |   1 +
> >  arch/arm64/Makefile          |   1 +
> >  arch/arm64/hyperv/Makefile   |   2 +
> >  arch/arm64/hyperv/hv_hvc.S   |  54 +++++
> >  arch/arm64/hyperv/hv_init.c  | 441 +++++++++++++++++++++++++++++++++++
> >  arch/arm64/hyperv/mshyperv.c | 178 ++++++++++++++
> >  6 files changed, 677 insertions(+)
> >  create mode 100644 arch/arm64/hyperv/Makefile
> >  create mode 100644 arch/arm64/hyperv/hv_hvc.S
> >  create mode 100644 arch/arm64/hyperv/hv_init.c
> >  create mode 100644 arch/arm64/hyperv/mshyperv.c
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 72f19cef4c48..326eeb32a0cd 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -6837,6 +6837,7 @@ F:	arch/x86/kernel/cpu/mshyperv.c
> >  F:	arch/x86/hyperv
> >  F:	arch/arm64/include/asm/hyperv-tlfs.h
> >  F:	arch/arm64/include/asm/mshyperv.h
> > +F:	arch/arm64/hyperv
> >  F:	drivers/hid/hid-hyperv.c
> >  F:	drivers/hv/
> >  F:	drivers/input/serio/hyperv-keyboard.c
> > diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> > index 6cb9fc7e9382..ad9ec0579553 100644
> > --- a/arch/arm64/Makefile
> > +++ b/arch/arm64/Makefile
> > @@ -106,6 +106,7 @@ core-y		+= arch/arm64/kernel/ arch/arm64/mm/
> >  core-$(CONFIG_NET) += arch/arm64/net/
> >  core-$(CONFIG_KVM) += arch/arm64/kvm/
> >  core-$(CONFIG_XEN) += arch/arm64/xen/
> > +core-$(CONFIG_HYPERV) += arch/arm64/hyperv/
> >  core-$(CONFIG_CRYPTO) += arch/arm64/crypto/
> >  libs-y		:= arch/arm64/lib/ $(libs-y)
> >  core-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
> > diff --git a/arch/arm64/hyperv/Makefile b/arch/arm64/hyperv/Makefile
> > new file mode 100644
> > index 000000000000..988eda55330c
> > --- /dev/null
> > +++ b/arch/arm64/hyperv/Makefile
> > @@ -0,0 +1,2 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +obj-y		:= hv_init.o hv_hvc.o mshyperv.o
> > diff --git a/arch/arm64/hyperv/hv_hvc.S b/arch/arm64/hyperv/hv_hvc.S
> > new file mode 100644
> > index 000000000000..82636969b4f2
> > --- /dev/null
> > +++ b/arch/arm64/hyperv/hv_hvc.S
> > @@ -0,0 +1,54 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +/*
> > + * Microsoft Hyper-V hypervisor invocation routines
> > + *
> > + * Copyright (C) 2018, Microsoft, Inc.
> > + *
> > + * Author : Michael Kelley <mikelley@microsoft.com>
> > + *
> > + * This program is free software; you can redistribute it and/or modify it
> > + * under the terms of the GNU General Public License version 2 as published
> > + * by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful, but
> > + * WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> > + * NON INFRINGEMENT.  See the GNU General Public License for more
> > + * details.
> > + */
> > +
> > +#include <linux/linkage.h>
> > +
> > +	.text
> > +/*
> > + * Do the HVC instruction.  For Hyper-V the argument is always 1.
> > + * x0 contains the hypercall control value, while additional registers
> > + * vary depending on the hypercall, and whether the hypercall arguments
> > + * are in memory or in registers (a "fast" hypercall per the Hyper-V
> > + * TLFS).  When the arguments are in memory x1 is the guest physical
> > + * address of the input arguments, and x2 is the guest physical
> > + * address of the output arguments.  When the arguments are in
> > + * registers, the register values depends on the hypercall.  Note
> > + * that this version cannot return any values in registers.
> > + */
> > +ENTRY(hv_do_hvc)
> > +	hvc #1
> > +	ret
> > +ENDPROC(hv_do_hvc)
> > +
> > +/*
> > + * This variant of HVC invocation is for hv_get_vpreg and
> > + * hv_get_vpreg_128. The input parameters are passed in registers
> > + * along with a pointer in x4 to where the output result should
> > + * be stored. The output is returned in x15 and x16.  x18 is used as
> > + * scratch space to avoid buildng a stack frame, as Hyper-V does
> > + * not preserve registers x0-x17.
> > + */
> > +ENTRY(hv_do_hvc_fast_get)
> > +	mov x18, x4
> > +	hvc #1
> > +	str x15,[x18]
> > +	str x16,[x18,#8]
> > +	ret
> > +ENDPROC(hv_do_hvc_fast_get)
> 
> As Will said, this isn't a viable option. Please follow SMCCC 1.1.

I'll have to start a conversation with the Hyper-V team about this.
I don't know why they chose to use HVC #1 or this register scheme
for output values.  It may be tough to change at this point because
there are Windows guests on Hyper-V for ARM64 that are already
using this approach.

> 
> > diff --git a/arch/arm64/hyperv/hv_init.c b/arch/arm64/hyperv/hv_init.c
> > new file mode 100644
> > index 000000000000..aa1a8c09d989
> > --- /dev/null
> > +++ b/arch/arm64/hyperv/hv_init.c
> > @@ -0,0 +1,441 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * Initialization of the interface with Microsoft's Hyper-V hypervisor,
> > + * and various low level utility routines for interacting with Hyper-V.
> > + *
> > + * Copyright (C) 2018, Microsoft, Inc.
> > + *
> > + * Author : Michael Kelley <mikelley@microsoft.com>
> > + *
> > + * This program is free software; you can redistribute it and/or modify it
> > + * under the terms of the GNU General Public License version 2 as published
> > + * by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful, but
> > + * WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> > + * NON INFRINGEMENT.  See the GNU General Public License for more
> > + * details.
> > + */
> > +
> > +
> > +#include <linux/types.h>
> > +#include <linux/version.h>
> > +#include <linux/export.h>
> > +#include <linux/vmalloc.h>
> > +#include <linux/mm.h>
> > +#include <linux/clocksource.h>
> > +#include <linux/sched_clock.h>
> > +#include <linux/acpi.h>
> > +#include <linux/module.h>
> > +#include <linux/hyperv.h>
> > +#include <linux/slab.h>
> > +#include <linux/cpuhotplug.h>
> > +#include <linux/psci.h>
> > +#include <asm-generic/bug.h>
> > +#include <asm/hypervisor.h>
> > +#include <asm/hyperv-tlfs.h>
> > +#include <asm/mshyperv.h>
> > +#include <asm/sysreg.h>
> > +#include <clocksource/arm_arch_timer.h>
> > +
> > +static bool	hyperv_initialized;
> > +struct		ms_hyperv_info ms_hyperv;
> > +EXPORT_SYMBOL_GPL(ms_hyperv);
> 
> Who are the users of this structure? Should they go via accessors instead?

The structure is an aggregation of several flags fields that describe a myriad
of features and hints that may or may not be present on any particular version
of Hyper-V, plus the max virtual processor ID values.  Everything is read-only
after initialization.   Most of the references are to test one of the flags.  It's
a judgment call, but there are a lot of different flags with long names, and
writing accessors for each one doesn't seem to me to add any clarity.

> 
> > +
> > +static struct ms_hyperv_tsc_page *tsc_pg;
> > +
> > +struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
> > +{
> > +	return tsc_pg;
> > +}
> > +EXPORT_SYMBOL_GPL(hv_get_tsc_page);
> > +
> > +static u64 read_hv_sched_clock_tsc(void)
> > +{
> > +	u64 current_tick = hv_read_tsc_page(tsc_pg);
> > +
> > +	if (current_tick == U64_MAX)
> > +		current_tick = hv_get_vpreg(HV_REGISTER_TIME_REFCOUNT);
> > +
> > +	return current_tick;
> > +}
> > +
> > +static u64 read_hv_clock_tsc(struct clocksource *arg)
> > +{
> > +	u64 current_tick = hv_read_tsc_page(tsc_pg);
> > +
> > +	if (current_tick == U64_MAX)
> > +		current_tick = hv_get_vpreg(HV_REGISTER_TIME_REFCOUNT);
> > +
> > +	return current_tick;
> > +}
> 
> Can't you define one function in terms of the other?

Yes, it looks like that works.  I'll change it in the next version of the
patch.

> 
> > +
> > +static struct clocksource hyperv_cs_tsc = {
> > +		.name		= "hyperv_clocksource_tsc_page",
> > +		.rating		= 400,
> > +		.read		= read_hv_clock_tsc,
> > +		.mask		= CLOCKSOURCE_MASK(64),
> > +		.flags		= CLOCK_SOURCE_IS_CONTINUOUS,
> > +};
> > +
> > +static u64 read_hv_sched_clock_msr(void)
> > +{
> > +	return hv_get_vpreg(HV_REGISTER_TIME_REFCOUNT);
> > +}
> > +
> > +static u64 read_hv_clock_msr(struct clocksource *arg)
> > +{
> > +	return hv_get_vpreg(HV_REGISTER_TIME_REFCOUNT);
> > +}
> > +
> > +static struct clocksource hyperv_cs_msr = {
> > +	.name		= "hyperv_clocksource_msr",
> > +	.rating		= 400,
> > +	.read		= read_hv_clock_msr,
> > +	.mask		= CLOCKSOURCE_MASK(64),
> > +	.flags		= CLOCK_SOURCE_IS_CONTINUOUS,
> > +};
> > +
> > +struct clocksource *hyperv_cs;
> > +EXPORT_SYMBOL_GPL(hyperv_cs);
> 
> Why? Who needs to poke this?

It's referenced in the architecture independent driver code
for the Hyper-V clocksource in drivers/hv/hv.c, and in the
code to sync the time with the Hyper-V host in
drivers/hv/hv_util.c.

> 
> > +
> > +u32 *hv_vp_index;
> > +EXPORT_SYMBOL_GPL(hv_vp_index);
> > +
> > +u32 hv_max_vp_index;
> 
> Same thing.

Agreed -- this one doesn't seem to be needed.  The variable
is not used outside of the architecture dependent Hyper-V code.

> 
> > +
> > +static int hv_cpu_init(unsigned int cpu)
> > +{
> > +	u64 msr_vp_index;
> > +
> > +	hv_get_vp_index(msr_vp_index);
> > +
> > +	hv_vp_index[smp_processor_id()] = msr_vp_index;
> > +
> > +	if (msr_vp_index > hv_max_vp_index)
> > +		hv_max_vp_index = msr_vp_index;
> > +
> > +	return 0;
> > +}
> 
> Is that some new way to describe a CPU topology? If so, why isn't that
> exposed via the ACPI tables that the kernel already parses?

Hyper-V's hypercall interface uses vCPU identifiers that are not
guaranteed to be consecutive integers or to match what ACPI shows.
No topology information is implied -- it's just unique identifiers.  The 
hv_vp_index array provides easy mapping from Linux's consecutive
integer IDs for CPUs when needed to construct hypercall arguments.

> 
> > +
> > +/*
> > + * This function is invoked via the ACPI clocksource probe mechanism. We
> > + * don't actually use any values from the ACPI GTDT table, but we set up
> 
> This doesn't feel like a good idea at all. Piggy-backing on an existing
> mechanism and use it for something completely different is not exactly
> future-proof.
> 
> Also, if this is supposed to be a clocksource, why isn't that a
> clocksource driver on its own right?

I agree this is not the right long term solution.  Is there a better place to
hang the initialization code?  Or should I just make an explicit call to
initialize Hyper-V at the right place?  On the x86 side, there's an
explicit framework for hypervisor-specific initialization routines to plug
into.  Maybe it's time for a basic version of such a framework on the
ARM64 side.  Thoughts on the best approach, both in the short-term and
the longer-term? If we put a framework in place, does that need to
happen before adding Hyper-V code, or afterwards as a cleanup?

And yes, Hyper-V does effectively have its own clocksource.  The
main code is in drivers/hv/hv.c, but it's not broken out as a separate
driver in drivers/clocksource, probably due to some history on the
x86 side that pre-dates me.  I'll have to research.

> 
> > + * the Hyper-V synthetic clocksource and do other initialization for
> > + * interacting with Hyper-V the first time.  Using early_initcall to invoke
> > + * this function is too late because interrupts are already enabled at that
> > + * point, and sched_clock_register must run before interrupts are enabled.
> > + *
> > + * 1. Setup the guest ID.
> > + * 2. Get features and hints info from Hyper-V
> > + * 3. Setup per-cpu VP indices.
> > + * 4. Register Hyper-V specific clocksource.
> > + * 5. Register the scheduler clock.
> > + */
> > +
> > +static int __init hyperv_init(struct acpi_table_header *table)
> > +{
> > +	struct hv_get_vp_register_output result;
> > +	u32	a, b, c, d;
> > +	u64	guest_id;
> > +	int	i;
> > +
> > +	/*
> > +	 * If we're in a VM on Hyper-V, the ACPI hypervisor_id field will
> > +	 * have the string "MsHyperV".
> > +	 */
> > +	if (strncmp((char *)&acpi_gbl_FADT.hypervisor_id, "MsHyperV", 8))
> > +		return 1;
> > +
> > +	/* Setup the guest ID */
> > +	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
> > +	hv_set_vpreg(HV_REGISTER_GUEST_OSID, guest_id);
> > +
> > +	/* Get the features and hints from Hyper-V */
> > +	hv_get_vpreg_128(HV_REGISTER_PRIVILEGES_AND_FEATURES, &result);
> > +	ms_hyperv.features = lower_32_bits(result.registervaluelow);
> > +	ms_hyperv.misc_features = upper_32_bits(result.registervaluehigh);
> > +
> > +	hv_get_vpreg_128(HV_REGISTER_FEATURES, &result);
> > +	ms_hyperv.hints = lower_32_bits(result.registervaluelow);
> > +
> > +	pr_info("Hyper-V: Features 0x%x, hints 0x%x\n",
> > +		ms_hyperv.features, ms_hyperv.hints);
> > +
> > +	/*
> > +	 * Direct mode is the only option for STIMERs provided Hyper-V
> > +	 * on ARM64, so Hyper-V doesn't actually set the flag.  But add the
> > +	 * flag so the architecture independent code in drivers/hv/hv.c
> > +	 * will correctly use that mode.
> > +	 */
> > +	ms_hyperv.misc_features |= HV_STIMER_DIRECT_MODE_AVAILABLE;
> > +
> > +	/*
> > +	 * Hyper-V on ARM64 doesn't support AutoEOI.  Add the hint
> > +	 * that tells architecture independent code not to use this
> > +	 * feature.
> > +	 */
> > +	ms_hyperv.hints |= HV_DEPRECATING_AEOI_RECOMMENDED;
> > +
> > +	/* Get information about the Hyper-V host version */
> > +	hv_get_vpreg_128(HV_REGISTER_HYPERVISOR_VERSION, &result);
> > +	a = lower_32_bits(result.registervaluelow);
> > +	b = upper_32_bits(result.registervaluelow);
> > +	c = lower_32_bits(result.registervaluehigh);
> > +	d = upper_32_bits(result.registervaluehigh);
> > +	pr_info("Hyper-V: Host Build %d.%d.%d.%d-%d-%d\n",
> > +		b >> 16, b & 0xFFFF, a, d & 0xFFFFFF, c, d >> 24);
> > +
> > +	/* Allocate percpu VP index */
> > +	hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
> > +				    GFP_KERNEL);
> 
> Why isn't this a percpu variable?

In current code in the architecture independent Hyper-V drivers (as well
as some future Hyper-V enlightenments that aren't yet implemented
for ARM64), the running CPU needs to get the VP index values for any CPUs
in the hv_vp_index array.  Some of the code is performance sensitive, and
accessing a global array is faster than accessing other CPUs' per-cpu data.

> 
> > +	if (!hv_vp_index)
> > +		return 1;
> > +
> > +	for (i = 0; i < num_possible_cpus(); i++)
> > +		hv_vp_index[i] = VP_INVAL;
> > +
> > +	if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "arm64/hyperv_init:online",
> > +			      hv_cpu_init, NULL) < 0)
> > +		goto free_vp_index;
> > +
> > +	/*
> > +	 * Try to set up what Hyper-V calls the "TSC reference page", which
> > +	 * uses the ARM Generic Timer virtual counter with some scaling
> > +	 * information to provide a fast and stable guest VM clocksource.
> > +	 * If the TSC reference page can't be set up, fall back to reading
> > +	 * the guest clock provided by Hyper-V's synthetic reference time
> > +	 * register.
> > +	 */
> > +	if (ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE) {
> > +
> > +		u64		tsc_msr;
> > +		phys_addr_t	phys_addr;
> > +
> > +		tsc_pg = __vmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL);
> 
> And why not vmalloc?

Looks like it could be just vmalloc().  I'll change it in the next version.

> 
> > +		if (tsc_pg) {
> > +			phys_addr = page_to_phys(vmalloc_to_page(tsc_pg));
> > +			tsc_msr = hv_get_vpreg(HV_REGISTER_REFERENCE_TSC);
> > +			tsc_msr &= GENMASK_ULL(11, 0);
> 
> Magic.

Per Section 12.6.1 of the Hyper-V TLFS, bits 11 thru 1 must be preserved, and
the line below sets bit 0.  I'll add some comments. :-)

> 
> > +			tsc_msr = tsc_msr | 0x1 | (u64)phys_addr;
> > +			hv_set_vpreg(HV_REGISTER_REFERENCE_TSC, tsc_msr);
> > +			hyperv_cs = &hyperv_cs_tsc;
> > +			sched_clock_register(read_hv_sched_clock_tsc,
> > +						64, HV_CLOCK_HZ);
> > +		}
> > +	}
> > +
> > +	if (!hyperv_cs &&
> > +	    (ms_hyperv.features & HV_MSR_TIME_REF_COUNT_AVAILABLE)) {
> > +		hyperv_cs = &hyperv_cs_msr;
> > +		sched_clock_register(read_hv_sched_clock_msr,
> > +						64, HV_CLOCK_HZ);
> > +	}
> > +
> > +	if (hyperv_cs) {
> > +		hyperv_cs->archdata.vdso_direct = false;
> > +		clocksource_register_hz(hyperv_cs, HV_CLOCK_HZ);
> > +	}
> > +
> > +	hyperv_initialized = true;
> > +	return 0;
> > +
> > +free_vp_index:
> > +	kfree(hv_vp_index);
> > +	hv_vp_index = NULL;
> > +	return 1;
> 
> ????

The return value is there because this function is implemented as a timer
initialization function, and that's what the function signature requires.
But maybe I'm not understanding your question.

> 
> > +}
> > +TIMER_ACPI_DECLARE(hyperv, ACPI_SIG_GTDT, hyperv_init);
> > +
> > +/*
> > + * This routine is called before kexec/kdump, it does the required cleanup.
> > + */
> > +void hyperv_cleanup(void)
> > +{
> > +	/* Reset our OS id */
> > +	hv_set_vpreg(HV_REGISTER_GUEST_OSID, 0);
> > +
> > +}
> > +EXPORT_SYMBOL_GPL(hyperv_cleanup);
> > +
> > +/*
> > + * hv_do_hypercall- Invoke the specified hypercall
> > + */
> > +u64 hv_do_hypercall(u64 control, void *input, void *output)
> > +{
> > +	u64 input_address;
> > +	u64 output_address;
> > +
> > +	input_address = input ? virt_to_phys(input) : 0;
> > +	output_address = output ? virt_to_phys(output) : 0;
> > +	return hv_do_hvc(control, input_address, output_address);
> > +}
> > +EXPORT_SYMBOL_GPL(hv_do_hypercall);
> > +
> > +/*
> > + * hv_do_fast_hypercall8 -- Invoke the specified hypercall
> > + * with arguments in registers instead of physical memory.
> > + * Avoids the overhead of virt_to_phys for simple hypercalls.
> > + */
> > +
> > +u64 hv_do_fast_hypercall8(u16 code, u64 input)
> > +{
> > +	u64 control;
> > +
> > +	control = (u64)code | HV_HYPERCALL_FAST_BIT;
> > +	return hv_do_hvc(control, input);
> > +}
> > +EXPORT_SYMBOL_GPL(hv_do_fast_hypercall8);
> > +
> > +
> > +/*
> > + * Set a single VP register to a 64-bit value.
> > + */
> > +void hv_set_vpreg(u32 msr, u64 value)
> > +{
> > +	union hv_hypercall_status status;
> > +
> > +	status.as_uint64 = hv_do_hvc(
> > +		HVCALL_SET_VP_REGISTERS | HV_HYPERCALL_FAST_BIT |
> > +			HV_HYPERCALL_REP_COUNT_1,
> > +		HV_PARTITION_ID_SELF,
> > +		HV_VP_INDEX_SELF,
> > +		msr,
> > +		0,
> > +		value,
> > +		0);
> > +
> > +	/*
> > +	 * Something is fundamentally broken in the hypervisor if
> > +	 * setting a VP register fails. There's really no way to
> > +	 * continue as a guest VM, so panic.
> > +	 */
> > +	BUG_ON(status.status != HV_STATUS_SUCCESS);
> > +}
> > +EXPORT_SYMBOL_GPL(hv_set_vpreg);
> > +
> > +
> > +/*
> > + * Get the value of a single VP register, and only the low order 64 bits.
> > + */
> > +u64 hv_get_vpreg(u32 msr)
> > +{
> > +	union hv_hypercall_status status;
> > +	struct hv_get_vp_register_output output;
> > +
> > +	status.as_uint64 = hv_do_hvc_fast_get(
> > +		HVCALL_GET_VP_REGISTERS | HV_HYPERCALL_FAST_BIT |
> > +			HV_HYPERCALL_REP_COUNT_1,
> > +		HV_PARTITION_ID_SELF,
> > +		HV_VP_INDEX_SELF,
> > +		msr,
> > +		&output);
> > +
> > +	/*
> > +	 * Something is fundamentally broken in the hypervisor if
> > +	 * getting a VP register fails. There's really no way to
> > +	 * continue as a guest VM, so panic.
> > +	 */
> > +	BUG_ON(status.status != HV_STATUS_SUCCESS);
> > +
> > +	return output.registervaluelow;
> > +}
> > +EXPORT_SYMBOL_GPL(hv_get_vpreg);
> > +
> > +/*
> > + * Get the value of a single VP register that is 128 bits in size.  This is a
> > + * separate call in order to avoid complicating the calling sequence for
> > + * the much more frequently used 64-bit version.
> > + */
> > +void hv_get_vpreg_128(u32 msr, struct hv_get_vp_register_output *result)
> > +{
> > +	union hv_hypercall_status status;
> > +
> > +	status.as_uint64 = hv_do_hvc_fast_get(
> > +		HVCALL_GET_VP_REGISTERS | HV_HYPERCALL_FAST_BIT |
> > +			HV_HYPERCALL_REP_COUNT_1,
> > +		HV_PARTITION_ID_SELF,
> > +		HV_VP_INDEX_SELF,
> > +		msr,
> > +		result);
> > +
> > +	/*
> > +	 * Something is fundamentally broken in the hypervisor if
> > +	 * getting a VP register fails. There's really no way to
> > +	 * continue as a guest VM, so panic.
> > +	 */
> > +	BUG_ON(status.status != HV_STATUS_SUCCESS);
> > +
> > +	return;
> > +
> > +}
> > +EXPORT_SYMBOL_GPL(hv_get_vpreg_128);
> > +
> > +void hyperv_report_panic(struct pt_regs *regs, long err)
> > +{
> > +	static bool panic_reported;
> > +	u64 guest_id;
> > +
> > +	/*
> > +	 * We prefer to report panic on 'die' chain as we have proper
> > +	 * registers to report, but if we miss it (e.g. on BUG()) we need
> > +	 * to report it on 'panic'.
> > +	 */
> > +	if (panic_reported)
> > +		return;
> > +	panic_reported = true;
> > +
> > +	guest_id = hv_get_vpreg(HV_REGISTER_GUEST_OSID);
> > +
> > +	/*
> > +	 * Hyper-V provides the ability to store only 5 values.
> > +	 * Pick the passed in error value, the guest_id, and the PC.
> > +	 * The first two general registers are added arbitrarily.
> > +	 */
> > +	hv_set_vpreg(HV_REGISTER_CRASH_P0, err);
> > +	hv_set_vpreg(HV_REGISTER_CRASH_P1, guest_id);
> > +	hv_set_vpreg(HV_REGISTER_CRASH_P2, regs->pc);
> > +	hv_set_vpreg(HV_REGISTER_CRASH_P3, regs->regs[0]);
> > +	hv_set_vpreg(HV_REGISTER_CRASH_P4, regs->regs[1]);
> > +
> > +	/*
> > +	 * Let Hyper-V know there is crash data available
> > +	 */
> > +	hv_set_vpreg(HV_REGISTER_CRASH_CTL, HV_CRASH_CTL_CRASH_NOTIFY);
> > +}
> > +EXPORT_SYMBOL_GPL(hyperv_report_panic);
> > +
> > +/*
> > + * hyperv_report_panic_msg - report panic message to Hyper-V
> > + * @pa: physical address of the panic page containing the message
> > + * @size: size of the message in the page
> > + */
> > +void hyperv_report_panic_msg(phys_addr_t pa, size_t size)
> > +{
> > +	/*
> > +	 * P3 to contain the physical address of the panic page & P4 to
> > +	 * contain the size of the panic data in that page. Rest of the
> > +	 * registers are no-op when the NOTIFY_MSG flag is set.
> > +	 */
> > +	hv_set_vpreg(HV_REGISTER_CRASH_P0, 0);
> > +	hv_set_vpreg(HV_REGISTER_CRASH_P1, 0);
> > +	hv_set_vpreg(HV_REGISTER_CRASH_P2, 0);
> > +	hv_set_vpreg(HV_REGISTER_CRASH_P3, pa);
> > +	hv_set_vpreg(HV_REGISTER_CRASH_P4, size);
> > +
> > +	/*
> > +	 * Let Hyper-V know there is crash data available along with
> > +	 * the panic message.
> > +	 */
> > +	hv_set_vpreg(HV_REGISTER_CRASH_CTL,
> > +	       (HV_CRASH_CTL_CRASH_NOTIFY | HV_CRASH_CTL_CRASH_NOTIFY_MSG));
> > +}
> > +EXPORT_SYMBOL_GPL(hyperv_report_panic_msg);
> > +
> > +bool hv_is_hyperv_initialized(void)
> > +{
> > +	return hyperv_initialized;
> > +}
> > +EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
> > diff --git a/arch/arm64/hyperv/mshyperv.c b/arch/arm64/hyperv/mshyperv.c
> > new file mode 100644
> > index 000000000000..3ef055599412
> > --- /dev/null
> > +++ b/arch/arm64/hyperv/mshyperv.c
> > @@ -0,0 +1,178 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * Core routines for interacting with Microsoft's Hyper-V hypervisor,
> > + * including setting up VMbus and STIMER interrupts, and handling
> > + * crashes and kexecs. These interactions are through a set of
> > + * static "handler" variables set by the architecture independent
> > + * VMbus and STIMER drivers.  This design is used to meet x86/x64
> > + * requirements for avoiding direct linkages and allowing the VMbus
> > + * and STIMER drivers to be unloaded and reloaded.
> > + *
> > + * Copyright (C) 2018, Microsoft, Inc.
> > + *
> > + * Author : Michael Kelley <mikelley@microsoft.com>
> > + *
> > + * This program is free software; you can redistribute it and/or modify it
> > + * under the terms of the GNU General Public License version 2 as published
> > + * by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful, but
> > + * WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> > + * NON INFRINGEMENT.  See the GNU General Public License for more
> > + * details.
> > + */
> > +
> > +#include <linux/types.h>
> > +#include <linux/export.h>
> > +#include <linux/interrupt.h>
> > +#include <linux/kexec.h>
> > +#include <linux/acpi.h>
> > +#include <linux/ptrace.h>
> > +#include <asm/hyperv-tlfs.h>
> > +#include <asm/mshyperv.h>
> > +
> > +static void (*vmbus_handler)(void);
> > +static void (*hv_stimer0_handler)(void);
> > +static void (*hv_kexec_handler)(void);
> > +static void (*hv_crash_handler)(struct pt_regs *regs);
> > +
> > +static int vmbus_irq;
> > +static long __percpu *vmbus_evt;
> > +static long __percpu *stimer0_evt;
> > +
> > +irqreturn_t hyperv_vector_handler(int irq, void *dev_id)
> > +{
> > +	if (vmbus_handler)
> > +		vmbus_handler();
> 
> In which circumstances can this be NULL?

Perhaps none.  This may be a leftover from the equivalent
x86 code.  I'll check and probably remove it.

> 
> > +	return IRQ_HANDLED;
> > +}
> > +
> > +/* Must be done just once */
> > +void hv_setup_vmbus_irq(void (*handler)(void))
> > +{
> > +	int result;
> > +
> > +	vmbus_handler = handler;
> 
> If thismust only be done once, maybe you could check that it hasn't been
> done already?

OK.

> 
> > +	vmbus_irq = acpi_register_gsi(NULL, HYPERVISOR_CALLBACK_VECTOR,
> > +				 ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH);
> > +	if (vmbus_irq <= 0) {
> > +		pr_err("Can't register Hyper-V VMBus GSI. Error %d",
> > +			vmbus_irq);
> > +		vmbus_irq = 0;
> > +		return;
> > +	}
> > +	vmbus_evt = alloc_percpu(long);
> > +	result = request_percpu_irq(vmbus_irq, hyperv_vector_handler,
> > +			"Hyper-V VMbus", vmbus_evt);
> 
> If this is a per-cpu interrupt, why isn't it signalled as a PPI, in an
> architecture compliant way?

Except for the code in this module, the interrupt handler for VMbus
interrupts is architecture independent.  But there's no support for
per-process interrupts on the x86, so the hypervisor interrupt vectors
are hard coded in the same IDT entry across all processors, and the
normal IRQ allocation mechanism is bypassed.  The above approach
assigns an ARM64 PPI (HYPERVISOR_CALLBACK_VECTOR is 16) in a
way that works with the arch independent interrupt handler.

Or maybe I'm missing your point.  If so, please set me straight.

> 
> > +	if (result) {
> > +		pr_err("Can't request Hyper-V VMBus IRQ %d. Error %d",
> > +			vmbus_irq, result);
> > +		free_percpu(vmbus_evt);
> > +		acpi_unregister_gsi(vmbus_irq);
> > +		vmbus_irq = 0;
> > +	}
> > +}
> > +EXPORT_SYMBOL_GPL(hv_setup_vmbus_irq);
> > +
> > +/* Must be done just once */
> > +void hv_remove_vmbus_irq(void)
> > +{
> > +	if (vmbus_irq) {
> > +		free_percpu_irq(vmbus_irq, vmbus_evt);
> > +		free_percpu(vmbus_evt);
> > +		acpi_unregister_gsi(vmbus_irq);
> > +	}
> > +}
> > +EXPORT_SYMBOL_GPL(hv_remove_vmbus_irq);
> > +
> > +/* Must be done by each CPU */
> > +void hv_enable_vmbus_irq(void)
> > +{
> > +	enable_percpu_irq(vmbus_irq, 0);
> > +}
> > +EXPORT_SYMBOL_GPL(hv_enable_vmbus_irq);
> > +
> > +/* Must be done by each CPU */
> > +void hv_disable_vmbus_irq(void)
> > +{
> > +	disable_percpu_irq(vmbus_irq);
> > +}
> > +EXPORT_SYMBOL_GPL(hv_disable_vmbus_irq);
> > +
> > +/* Routines to do per-architecture handling of STIMER0 when in Direct Mode */
> > +
> > +static irqreturn_t hv_stimer0_vector_handler(int irq, void *dev_id)
> > +{
> > +	if (hv_stimer0_handler)
> > +		hv_stimer0_handler();
> > +	return IRQ_HANDLED;
> > +}
> > +
> > +int hv_setup_stimer0_irq(int *irq, int *vector, void (*handler)(void))
> > +{
> > +	int localirq;
> > +	int result;
> > +
> > +	localirq = acpi_register_gsi(NULL, HV_STIMER0_IRQNR,
> > +			ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH);
> > +	if (localirq <= 0) {
> > +		pr_err("Can't register Hyper-V stimer0 GSI. Error %d",
> > +			localirq);
> > +		*irq = 0;
> > +		return -1;
> > +	}
> > +	stimer0_evt = alloc_percpu(long);
> > +	result = request_percpu_irq(localirq, hv_stimer0_vector_handler,
> > +					 "Hyper-V stimer0", stimer0_evt);
> > +	if (result) {
> > +		pr_err("Can't request Hyper-V stimer0 IRQ %d. Error %d",
> > +			localirq, result);
> > +		free_percpu(stimer0_evt);
> > +		acpi_unregister_gsi(localirq);
> > +		*irq = 0;
> > +		return -1;
> > +	}
> > +
> > +	hv_stimer0_handler = handler;
> > +	*vector = HV_STIMER0_IRQNR;
> > +	*irq = localirq;
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(hv_setup_stimer0_irq);
> > +
> > +void hv_remove_stimer0_irq(int irq)
> > +{
> > +	hv_stimer0_handler = NULL;
> > +	if (irq) {
> > +		free_percpu_irq(irq, stimer0_evt);
> > +		free_percpu(stimer0_evt);
> > +		acpi_unregister_gsi(irq);
> > +	}
> > +}
> > +EXPORT_SYMBOL_GPL(hv_remove_stimer0_irq);
> > +
> > +void hv_setup_kexec_handler(void (*handler)(void))
> > +{
> > +	hv_kexec_handler = handler;
> > +}
> > +EXPORT_SYMBOL_GPL(hv_setup_kexec_handler);
> > +
> > +void hv_remove_kexec_handler(void)
> > +{
> > +	hv_kexec_handler = NULL;
> > +}
> > +EXPORT_SYMBOL_GPL(hv_remove_kexec_handler);
> > +
> > +void hv_setup_crash_handler(void (*handler)(struct pt_regs *regs))
> > +{
> > +	hv_crash_handler = handler;
> > +}
> > +EXPORT_SYMBOL_GPL(hv_setup_crash_handler);
> > +
> > +void hv_remove_crash_handler(void)
> > +{
> > +	hv_crash_handler = NULL;
> > +}
> > +EXPORT_SYMBOL_GPL(hv_remove_crash_handler);
> >
> 
> Overall, this patch needs a massive split up, clean up, and a good dose
> of documentation so that we can understand what you are trying to achieve.

Thanks for the feedback.  I'll start work on a new version for the
changes that are clear, and await additional discussion on
a few issues where I need some further input.

Michael

> 
> Thanks,
> 
> 	M.
> --
> Jazz is not dead. It just smells funny...
Marc Zyngier Dec. 13, 2018, 11:23 a.m. UTC | #5
Hi Michael,

On 12/12/2018 05:00, Michael Kelley wrote:
> From: Marc Zyngier <marc.zyngier@arm.com>  Sent: Friday, December 7, 2018 6:43 AM
> 
>>> Add ARM64-specific code to enable Hyper-V. This code includes:
>>> * Detecting Hyper-V and initializing the guest/Hyper-V interface
>>> * Setting up Hyper-V's synthetic clocks
>>> * Making hypercalls using the HVC instruction
>>> * Setting up VMbus and stimer0 interrupts
>>> * Setting up kexec and crash handlers
>>
>> This commit message is a clear indication that this should be split in
>> at least 5 different patches.
> 
> OK, I'll work on separating into multiple layered patches in the next
> version.

Thanks. This will definitely help the review process.

>>> +/*
>>> + * This variant of HVC invocation is for hv_get_vpreg and
>>> + * hv_get_vpreg_128. The input parameters are passed in registers
>>> + * along with a pointer in x4 to where the output result should
>>> + * be stored. The output is returned in x15 and x16.  x18 is used as
>>> + * scratch space to avoid buildng a stack frame, as Hyper-V does
>>> + * not preserve registers x0-x17.
>>> + */
>>> +ENTRY(hv_do_hvc_fast_get)
>>> +	mov x18, x4
>>> +	hvc #1
>>> +	str x15,[x18]
>>> +	str x16,[x18,#8]
>>> +	ret
>>> +ENDPROC(hv_do_hvc_fast_get)
>>
>> As Will said, this isn't a viable option. Please follow SMCCC 1.1.
> 
> I'll have to start a conversation with the Hyper-V team about this.
> I don't know why they chose to use HVC #1 or this register scheme
> for output values.  It may be tough to change at this point because
> there are Windows guests on Hyper-V for ARM64 that are already
> using this approach.

I appreciate you already have stuff in the wild, but there is definitely
a case to be made for supporting architecturally specified mechanisms in
a hypervisor, and SMCCC is definitely part of it (I'm certainly curious
of how you support the Spectre mitigation otherwise).

> 
>>
>>> diff --git a/arch/arm64/hyperv/hv_init.c b/arch/arm64/hyperv/hv_init.c
>>> new file mode 100644
>>> index 000000000000..aa1a8c09d989
>>> --- /dev/null
>>> +++ b/arch/arm64/hyperv/hv_init.c
>>> @@ -0,0 +1,441 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +/*
>>> + * Initialization of the interface with Microsoft's Hyper-V hypervisor,
>>> + * and various low level utility routines for interacting with Hyper-V.
>>> + *
>>> + * Copyright (C) 2018, Microsoft, Inc.
>>> + *
>>> + * Author : Michael Kelley <mikelley@microsoft.com>
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify it
>>> + * under the terms of the GNU General Public License version 2 as published
>>> + * by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope that it will be useful, but
>>> + * WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
>>> + * NON INFRINGEMENT.  See the GNU General Public License for more
>>> + * details.
>>> + */
>>> +
>>> +
>>> +#include <linux/types.h>
>>> +#include <linux/version.h>
>>> +#include <linux/export.h>
>>> +#include <linux/vmalloc.h>
>>> +#include <linux/mm.h>
>>> +#include <linux/clocksource.h>
>>> +#include <linux/sched_clock.h>
>>> +#include <linux/acpi.h>
>>> +#include <linux/module.h>
>>> +#include <linux/hyperv.h>
>>> +#include <linux/slab.h>
>>> +#include <linux/cpuhotplug.h>
>>> +#include <linux/psci.h>
>>> +#include <asm-generic/bug.h>
>>> +#include <asm/hypervisor.h>
>>> +#include <asm/hyperv-tlfs.h>
>>> +#include <asm/mshyperv.h>
>>> +#include <asm/sysreg.h>
>>> +#include <clocksource/arm_arch_timer.h>
>>> +
>>> +static bool	hyperv_initialized;
>>> +struct		ms_hyperv_info ms_hyperv;
>>> +EXPORT_SYMBOL_GPL(ms_hyperv);
>>
>> Who are the users of this structure? Should they go via accessors instead?
> 
> The structure is an aggregation of several flags fields that describe a myriad
> of features and hints that may or may not be present on any particular version
> of Hyper-V, plus the max virtual processor ID values.  Everything is read-only
> after initialization.   

nit: please consider using a __ro_after_init annotation in that case.

> Most of the references are to test one of the flags.  It's
> a judgment call, but there are a lot of different flags with long names, and
> writing accessors for each one doesn't seem to me to add any clarity.

Looking at that structure, it doesn't seem so bad, and you could easily
have generic accessors that take a flag as a parameter. Your call.

[...]

>>> +static struct clocksource hyperv_cs_msr = {
>>> +	.name		= "hyperv_clocksource_msr",
>>> +	.rating		= 400,
>>> +	.read		= read_hv_clock_msr,
>>> +	.mask		= CLOCKSOURCE_MASK(64),
>>> +	.flags		= CLOCK_SOURCE_IS_CONTINUOUS,
>>> +};
>>> +
>>> +struct clocksource *hyperv_cs;
>>> +EXPORT_SYMBOL_GPL(hyperv_cs);
>>
>> Why? Who needs to poke this?
> 
> It's referenced in the architecture independent driver code
> for the Hyper-V clocksource in drivers/hv/hv.c, and in the
> code to sync the time with the Hyper-V host in
> drivers/hv/hv_util.c.

Fair enough.

>>> +static int hv_cpu_init(unsigned int cpu)
>>> +{
>>> +	u64 msr_vp_index;
>>> +
>>> +	hv_get_vp_index(msr_vp_index);
>>> +
>>> +	hv_vp_index[smp_processor_id()] = msr_vp_index;
>>> +
>>> +	if (msr_vp_index > hv_max_vp_index)
>>> +		hv_max_vp_index = msr_vp_index;
>>> +
>>> +	return 0;
>>> +}
>>
>> Is that some new way to describe a CPU topology? If so, why isn't that
>> exposed via the ACPI tables that the kernel already parses?
> 
> Hyper-V's hypercall interface uses vCPU identifiers that are not
> guaranteed to be consecutive integers or to match what ACPI shows.
> No topology information is implied -- it's just unique identifiers.  The 
> hv_vp_index array provides easy mapping from Linux's consecutive
> integer IDs for CPUs when needed to construct hypercall arguments.

That's extremely odd. The hypervisor obviously knows which vCPU is doing
a hypercall, and if referencing another vCPU, the virtualized MPIDR_EL1
value should be used. I don't think deviating from the architecture is a
good idea (but I appreciate this is none of your doing). Following the
architecture would allow this code to directly use the cpu_logical_map
infrastructure we alreadu have.

> 
>>
>>> +
>>> +/*
>>> + * This function is invoked via the ACPI clocksource probe mechanism. We
>>> + * don't actually use any values from the ACPI GTDT table, but we set up
>>
>> This doesn't feel like a good idea at all. Piggy-backing on an existing
>> mechanism and use it for something completely different is not exactly
>> future-proof.
>>
>> Also, if this is supposed to be a clocksource, why isn't that a
>> clocksource driver on its own right?
> 
> I agree this is not the right long term solution.  Is there a better place to
> hang the initialization code?  Or should I just make an explicit call to
> initialize Hyper-V at the right place?  On the x86 side, there's an
> explicit framework for hypervisor-specific initialization routines to plug
> into.  Maybe it's time for a basic version of such a framework on the
> ARM64 side.  Thoughts on the best approach, both in the short-term and
> the longer-term? If we put a framework in place, does that need to
> happen before adding Hyper-V code, or afterwards as a cleanup?

If we're introducing an infrastructure, doing it before introducing more
stuff seems to be the logical option. But I'm not sure we really need
such an infrastructure in this case, unless we need to enforce some
specific ordering.

You could start by moving all the clocksource stuff to
drivers/clocksource and keep the same init mechanism for the time being.

> 
> And yes, Hyper-V does effectively have its own clocksource.  The
> main code is in drivers/hv/hv.c, but it's not broken out as a separate
> driver in drivers/clocksource, probably due to some history on the
> x86 side that pre-dates me.  I'll have to research.

OK.

> 
>>
>>> + * the Hyper-V synthetic clocksource and do other initialization for
>>> + * interacting with Hyper-V the first time.  Using early_initcall to invoke
>>> + * this function is too late because interrupts are already enabled at that
>>> + * point, and sched_clock_register must run before interrupts are enabled.
>>> + *
>>> + * 1. Setup the guest ID.
>>> + * 2. Get features and hints info from Hyper-V
>>> + * 3. Setup per-cpu VP indices.
>>> + * 4. Register Hyper-V specific clocksource.
>>> + * 5. Register the scheduler clock.
>>> + */
>>> +
>>> +static int __init hyperv_init(struct acpi_table_header *table)
>>> +{
>>> +	struct hv_get_vp_register_output result;
>>> +	u32	a, b, c, d;
>>> +	u64	guest_id;
>>> +	int	i;
>>> +
>>> +	/*
>>> +	 * If we're in a VM on Hyper-V, the ACPI hypervisor_id field will
>>> +	 * have the string "MsHyperV".
>>> +	 */
>>> +	if (strncmp((char *)&acpi_gbl_FADT.hypervisor_id, "MsHyperV", 8))
>>> +		return 1;
>>> +
>>> +	/* Setup the guest ID */
>>> +	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
>>> +	hv_set_vpreg(HV_REGISTER_GUEST_OSID, guest_id);
>>> +
>>> +	/* Get the features and hints from Hyper-V */
>>> +	hv_get_vpreg_128(HV_REGISTER_PRIVILEGES_AND_FEATURES, &result);
>>> +	ms_hyperv.features = lower_32_bits(result.registervaluelow);
>>> +	ms_hyperv.misc_features = upper_32_bits(result.registervaluehigh);
>>> +
>>> +	hv_get_vpreg_128(HV_REGISTER_FEATURES, &result);
>>> +	ms_hyperv.hints = lower_32_bits(result.registervaluelow);
>>> +
>>> +	pr_info("Hyper-V: Features 0x%x, hints 0x%x\n",
>>> +		ms_hyperv.features, ms_hyperv.hints);
>>> +
>>> +	/*
>>> +	 * Direct mode is the only option for STIMERs provided Hyper-V
>>> +	 * on ARM64, so Hyper-V doesn't actually set the flag.  But add the
>>> +	 * flag so the architecture independent code in drivers/hv/hv.c
>>> +	 * will correctly use that mode.
>>> +	 */
>>> +	ms_hyperv.misc_features |= HV_STIMER_DIRECT_MODE_AVAILABLE;
>>> +
>>> +	/*
>>> +	 * Hyper-V on ARM64 doesn't support AutoEOI.  Add the hint
>>> +	 * that tells architecture independent code not to use this
>>> +	 * feature.
>>> +	 */
>>> +	ms_hyperv.hints |= HV_DEPRECATING_AEOI_RECOMMENDED;
>>> +
>>> +	/* Get information about the Hyper-V host version */
>>> +	hv_get_vpreg_128(HV_REGISTER_HYPERVISOR_VERSION, &result);
>>> +	a = lower_32_bits(result.registervaluelow);
>>> +	b = upper_32_bits(result.registervaluelow);
>>> +	c = lower_32_bits(result.registervaluehigh);
>>> +	d = upper_32_bits(result.registervaluehigh);
>>> +	pr_info("Hyper-V: Host Build %d.%d.%d.%d-%d-%d\n",
>>> +		b >> 16, b & 0xFFFF, a, d & 0xFFFFFF, c, d >> 24);
>>> +
>>> +	/* Allocate percpu VP index */
>>> +	hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
>>> +				    GFP_KERNEL);
>>
>> Why isn't this a percpu variable?
> 
> In current code in the architecture independent Hyper-V drivers (as well
> as some future Hyper-V enlightenments that aren't yet implemented
> for ARM64), the running CPU needs to get the VP index values for any CPUs
> in the hv_vp_index array.  Some of the code is performance sensitive, and
> accessing a global array is faster than accessing other CPUs' per-cpu data.

Fair enough. But his seems to tie into the above discussion about the
use of MPIDR_EL1 and the logical CPU mapping.

[...]

>>> +free_vp_index:
>>> +	kfree(hv_vp_index);
>>> +	hv_vp_index = NULL;
>>> +	return 1;
>>
>> ????
> 
> The return value is there because this function is implemented as a timer
> initialization function, and that's what the function signature requires.
> But maybe I'm not understanding your question.

I'm slightly miffed by the "return 1". I'd expect explicit negative
return codes that describe the error.

[...]

>>> +	vmbus_irq = acpi_register_gsi(NULL, HYPERVISOR_CALLBACK_VECTOR,
>>> +				 ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH);
>>> +	if (vmbus_irq <= 0) {
>>> +		pr_err("Can't register Hyper-V VMBus GSI. Error %d",
>>> +			vmbus_irq);
>>> +		vmbus_irq = 0;
>>> +		return;
>>> +	}
>>> +	vmbus_evt = alloc_percpu(long);
>>> +	result = request_percpu_irq(vmbus_irq, hyperv_vector_handler,
>>> +			"Hyper-V VMbus", vmbus_evt);
>>
>> If this is a per-cpu interrupt, why isn't it signalled as a PPI, in an
>> architecture compliant way?
> 
> Except for the code in this module, the interrupt handler for VMbus
> interrupts is architecture independent.  But there's no support for
> per-process interrupts on the x86, so the hypervisor interrupt vectors
> are hard coded in the same IDT entry across all processors, and the
> normal IRQ allocation mechanism is bypassed.  The above approach
> assigns an ARM64 PPI (HYPERVISOR_CALLBACK_VECTOR is 16) in a
> way that works with the arch independent interrupt handler.
> 
> Or maybe I'm missing your point.  If so, please set me straight.

Sorry, I missed that HYPERVISOR_CALLBACK_VECTOR was a PPI. It looks OK now.

Thanks,

	M.
Michael Kelley (LINUX) Jan. 4, 2019, 8:05 p.m. UTC | #6
From: Marc Zyngier <marc.zyngier@arm.com> Sent: Thursday, December 13, 2018 3:23 AM

> >> As Will said, this isn't a viable option. Please follow SMCCC 1.1.
> >
> > I'll have to start a conversation with the Hyper-V team about this.
> > I don't know why they chose to use HVC #1 or this register scheme
> > for output values.  It may be tough to change at this point because
> > there are Windows guests on Hyper-V for ARM64 that are already
> > using this approach.
> 
> I appreciate you already have stuff in the wild, but there is definitely
> a case to be made for supporting architecturally specified mechanisms in
> a hypervisor, and SMCCC is definitely part of it (I'm certainly curious
> of how you support the Spectre mitigation otherwise).
>

The Hyper-V guys I need to discuss this with are not back from the
holidays until January 7th.  I'll follow up on this thread once I've
had that conversation.

> >>> +static int hv_cpu_init(unsigned int cpu)
> >>> +{
> >>> +	u64 msr_vp_index;
> >>> +
> >>> +	hv_get_vp_index(msr_vp_index);
> >>> +
> >>> +	hv_vp_index[smp_processor_id()] = msr_vp_index;
> >>> +
> >>> +	if (msr_vp_index > hv_max_vp_index)
> >>> +		hv_max_vp_index = msr_vp_index;
> >>> +
> >>> +	return 0;
> >>> +}
> >>
> >> Is that some new way to describe a CPU topology? If so, why isn't that
> >> exposed via the ACPI tables that the kernel already parses?
> >
> > Hyper-V's hypercall interface uses vCPU identifiers that are not
> > guaranteed to be consecutive integers or to match what ACPI shows.
> > No topology information is implied -- it's just unique identifiers.  The
> > hv_vp_index array provides easy mapping from Linux's consecutive
> > integer IDs for CPUs when needed to construct hypercall arguments.
> 
> That's extremely odd. The hypervisor obviously knows which vCPU is doing
> a hypercall, and if referencing another vCPU, the virtualized MPIDR_EL1
> value should be used. I don't think deviating from the architecture is a
> good idea (but I appreciate this is none of your doing). Following the
> architecture would allow this code to directly use the cpu_logical_map
> infrastructure we already have.

I see what you are getting at.  However, some Hyper-V hypercalls allow
specifying arbitrary sets of vCPUs.  These hypercalls are used to define
target processors in the virtual PCI code (which I have not yet brought over
to ARM64) and in enlightenments for IPIs and TLB flushes (used by
Windows guests and Linux guests on x86, but not yet brought over to Linux
ARM64, if they ever will be).  These hypercalls take bitmaps as arguments,
similar to a Linux cpumask, as defined in Sections 7.8.7.3 thru 7.8.7.5 in the
Hyper-V TLFS.  So Hyper-V defines its own VP index that is akin to the index 
into the cpu_logical_map, though it may not be the same mapping.  My earlier
comments may have been misleading -- the Hyper-V VP index is an integer
ranging from 0 thru (# vCPUs - 1).

With these requirements, Hyper-V defining its own VP index seems like
a reasonable thing to do.  And since Hyper-V provides the same hypercall
interfaces for both x86 and ARM64 implementations, and for Windows
guests, there's not much choice but to use the Hyper-V VP index as specified.

Michael
Michael Kelley (LINUX) Jan. 21, 2019, 4:38 a.m. UTC | #7
From: Michael Kelley  Sent: Friday, January 4, 2019 12:05 PM
> 
> > >> As Will said, this isn't a viable option. Please follow SMCCC 1.1.
> > >
> > > I'll have to start a conversation with the Hyper-V team about this.
> > > I don't know why they chose to use HVC #1 or this register scheme
> > > for output values.  It may be tough to change at this point because
> > > there are Windows guests on Hyper-V for ARM64 that are already
> > > using this approach.
> >
> > I appreciate you already have stuff in the wild, but there is definitely
> > a case to be made for supporting architecturally specified mechanisms in
> > a hypervisor, and SMCCC is definitely part of it (I'm certainly curious
> > of how you support the Spectre mitigation otherwise).
> >
> 
> The Hyper-V guys I need to discuss this with are not back from the
> holidays until January 7th.  I'll follow up on this thread once I've
> had that conversation.
> 

Feedback from the Hyper-V guys is that they believe the Hyper-V
specific hypercall sequence *is* compliant with SMCCC 1.1, in the
sense of being outside the requirements per Section 2.9 since the
Hyper-V hypercall sequence uses HVC #1.  Hyper-V wanted to use a
simpler calling sequence that doesn't have the full register
save/restore requirements, for better performance.  The details
of the Hyper-V hypercall sequence are documented internally,
but we do still need to get it published externally as part of a
Hyper-V TLFS version that includes ARM64.

Hyper-V uses the full SMC Calling Conventions in other places,
such as for PSCI calls, for SMCs to EL3, and for the Spectre
mitigation related calls.

Michael
diff mbox series

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index 72f19cef4c48..326eeb32a0cd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6837,6 +6837,7 @@  F:	arch/x86/kernel/cpu/mshyperv.c
 F:	arch/x86/hyperv
 F:	arch/arm64/include/asm/hyperv-tlfs.h
 F:	arch/arm64/include/asm/mshyperv.h
+F:	arch/arm64/hyperv
 F:	drivers/hid/hid-hyperv.c
 F:	drivers/hv/
 F:	drivers/input/serio/hyperv-keyboard.c
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 6cb9fc7e9382..ad9ec0579553 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -106,6 +106,7 @@  core-y		+= arch/arm64/kernel/ arch/arm64/mm/
 core-$(CONFIG_NET) += arch/arm64/net/
 core-$(CONFIG_KVM) += arch/arm64/kvm/
 core-$(CONFIG_XEN) += arch/arm64/xen/
+core-$(CONFIG_HYPERV) += arch/arm64/hyperv/
 core-$(CONFIG_CRYPTO) += arch/arm64/crypto/
 libs-y		:= arch/arm64/lib/ $(libs-y)
 core-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
diff --git a/arch/arm64/hyperv/Makefile b/arch/arm64/hyperv/Makefile
new file mode 100644
index 000000000000..988eda55330c
--- /dev/null
+++ b/arch/arm64/hyperv/Makefile
@@ -0,0 +1,2 @@ 
+# SPDX-License-Identifier: GPL-2.0
+obj-y		:= hv_init.o hv_hvc.o mshyperv.o
diff --git a/arch/arm64/hyperv/hv_hvc.S b/arch/arm64/hyperv/hv_hvc.S
new file mode 100644
index 000000000000..82636969b4f2
--- /dev/null
+++ b/arch/arm64/hyperv/hv_hvc.S
@@ -0,0 +1,54 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Microsoft Hyper-V hypervisor invocation routines
+ *
+ * Copyright (C) 2018, Microsoft, Inc.
+ *
+ * Author : Michael Kelley <mikelley@microsoft.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT.  See the GNU General Public License for more
+ * details.
+ */
+
+#include <linux/linkage.h>
+
+	.text
+/*
+ * Do the HVC instruction.  For Hyper-V the argument is always 1.
+ * x0 contains the hypercall control value, while additional registers
+ * vary depending on the hypercall, and whether the hypercall arguments
+ * are in memory or in registers (a "fast" hypercall per the Hyper-V
+ * TLFS).  When the arguments are in memory x1 is the guest physical
+ * address of the input arguments, and x2 is the guest physical
+ * address of the output arguments.  When the arguments are in
+ * registers, the register values depends on the hypercall.  Note
+ * that this version cannot return any values in registers.
+ */
+ENTRY(hv_do_hvc)
+	hvc #1
+	ret
+ENDPROC(hv_do_hvc)
+
+/*
+ * This variant of HVC invocation is for hv_get_vpreg and
+ * hv_get_vpreg_128. The input parameters are passed in registers
+ * along with a pointer in x4 to where the output result should
+ * be stored. The output is returned in x15 and x16.  x18 is used as
+ * scratch space to avoid buildng a stack frame, as Hyper-V does
+ * not preserve registers x0-x17.
+ */
+ENTRY(hv_do_hvc_fast_get)
+	mov x18, x4
+	hvc #1
+	str x15,[x18]
+	str x16,[x18,#8]
+	ret
+ENDPROC(hv_do_hvc_fast_get)
diff --git a/arch/arm64/hyperv/hv_init.c b/arch/arm64/hyperv/hv_init.c
new file mode 100644
index 000000000000..aa1a8c09d989
--- /dev/null
+++ b/arch/arm64/hyperv/hv_init.c
@@ -0,0 +1,441 @@ 
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Initialization of the interface with Microsoft's Hyper-V hypervisor,
+ * and various low level utility routines for interacting with Hyper-V.
+ *
+ * Copyright (C) 2018, Microsoft, Inc.
+ *
+ * Author : Michael Kelley <mikelley@microsoft.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT.  See the GNU General Public License for more
+ * details.
+ */
+
+
+#include <linux/types.h>
+#include <linux/version.h>
+#include <linux/export.h>
+#include <linux/vmalloc.h>
+#include <linux/mm.h>
+#include <linux/clocksource.h>
+#include <linux/sched_clock.h>
+#include <linux/acpi.h>
+#include <linux/module.h>
+#include <linux/hyperv.h>
+#include <linux/slab.h>
+#include <linux/cpuhotplug.h>
+#include <linux/psci.h>
+#include <asm-generic/bug.h>
+#include <asm/hypervisor.h>
+#include <asm/hyperv-tlfs.h>
+#include <asm/mshyperv.h>
+#include <asm/sysreg.h>
+#include <clocksource/arm_arch_timer.h>
+
+static bool	hyperv_initialized;
+struct		ms_hyperv_info ms_hyperv;
+EXPORT_SYMBOL_GPL(ms_hyperv);
+
+static struct ms_hyperv_tsc_page *tsc_pg;
+
+struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
+{
+	return tsc_pg;
+}
+EXPORT_SYMBOL_GPL(hv_get_tsc_page);
+
+static u64 read_hv_sched_clock_tsc(void)
+{
+	u64 current_tick = hv_read_tsc_page(tsc_pg);
+
+	if (current_tick == U64_MAX)
+		current_tick = hv_get_vpreg(HV_REGISTER_TIME_REFCOUNT);
+
+	return current_tick;
+}
+
+static u64 read_hv_clock_tsc(struct clocksource *arg)
+{
+	u64 current_tick = hv_read_tsc_page(tsc_pg);
+
+	if (current_tick == U64_MAX)
+		current_tick = hv_get_vpreg(HV_REGISTER_TIME_REFCOUNT);
+
+	return current_tick;
+}
+
+static struct clocksource hyperv_cs_tsc = {
+		.name		= "hyperv_clocksource_tsc_page",
+		.rating		= 400,
+		.read		= read_hv_clock_tsc,
+		.mask		= CLOCKSOURCE_MASK(64),
+		.flags		= CLOCK_SOURCE_IS_CONTINUOUS,
+};
+
+static u64 read_hv_sched_clock_msr(void)
+{
+	return hv_get_vpreg(HV_REGISTER_TIME_REFCOUNT);
+}
+
+static u64 read_hv_clock_msr(struct clocksource *arg)
+{
+	return hv_get_vpreg(HV_REGISTER_TIME_REFCOUNT);
+}
+
+static struct clocksource hyperv_cs_msr = {
+	.name		= "hyperv_clocksource_msr",
+	.rating		= 400,
+	.read		= read_hv_clock_msr,
+	.mask		= CLOCKSOURCE_MASK(64),
+	.flags		= CLOCK_SOURCE_IS_CONTINUOUS,
+};
+
+struct clocksource *hyperv_cs;
+EXPORT_SYMBOL_GPL(hyperv_cs);
+
+u32 *hv_vp_index;
+EXPORT_SYMBOL_GPL(hv_vp_index);
+
+u32 hv_max_vp_index;
+
+static int hv_cpu_init(unsigned int cpu)
+{
+	u64 msr_vp_index;
+
+	hv_get_vp_index(msr_vp_index);
+
+	hv_vp_index[smp_processor_id()] = msr_vp_index;
+
+	if (msr_vp_index > hv_max_vp_index)
+		hv_max_vp_index = msr_vp_index;
+
+	return 0;
+}
+
+/*
+ * This function is invoked via the ACPI clocksource probe mechanism. We
+ * don't actually use any values from the ACPI GTDT table, but we set up
+ * the Hyper-V synthetic clocksource and do other initialization for
+ * interacting with Hyper-V the first time.  Using early_initcall to invoke
+ * this function is too late because interrupts are already enabled at that
+ * point, and sched_clock_register must run before interrupts are enabled.
+ *
+ * 1. Setup the guest ID.
+ * 2. Get features and hints info from Hyper-V
+ * 3. Setup per-cpu VP indices.
+ * 4. Register Hyper-V specific clocksource.
+ * 5. Register the scheduler clock.
+ */
+
+static int __init hyperv_init(struct acpi_table_header *table)
+{
+	struct hv_get_vp_register_output result;
+	u32	a, b, c, d;
+	u64	guest_id;
+	int	i;
+
+	/*
+	 * If we're in a VM on Hyper-V, the ACPI hypervisor_id field will
+	 * have the string "MsHyperV".
+	 */
+	if (strncmp((char *)&acpi_gbl_FADT.hypervisor_id, "MsHyperV", 8))
+		return 1;
+
+	/* Setup the guest ID */
+	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
+	hv_set_vpreg(HV_REGISTER_GUEST_OSID, guest_id);
+
+	/* Get the features and hints from Hyper-V */
+	hv_get_vpreg_128(HV_REGISTER_PRIVILEGES_AND_FEATURES, &result);
+	ms_hyperv.features = lower_32_bits(result.registervaluelow);
+	ms_hyperv.misc_features = upper_32_bits(result.registervaluehigh);
+
+	hv_get_vpreg_128(HV_REGISTER_FEATURES, &result);
+	ms_hyperv.hints = lower_32_bits(result.registervaluelow);
+
+	pr_info("Hyper-V: Features 0x%x, hints 0x%x\n",
+		ms_hyperv.features, ms_hyperv.hints);
+
+	/*
+	 * Direct mode is the only option for STIMERs provided Hyper-V
+	 * on ARM64, so Hyper-V doesn't actually set the flag.  But add the
+	 * flag so the architecture independent code in drivers/hv/hv.c
+	 * will correctly use that mode.
+	 */
+	ms_hyperv.misc_features |= HV_STIMER_DIRECT_MODE_AVAILABLE;
+
+	/*
+	 * Hyper-V on ARM64 doesn't support AutoEOI.  Add the hint
+	 * that tells architecture independent code not to use this
+	 * feature.
+	 */
+	ms_hyperv.hints |= HV_DEPRECATING_AEOI_RECOMMENDED;
+
+	/* Get information about the Hyper-V host version */
+	hv_get_vpreg_128(HV_REGISTER_HYPERVISOR_VERSION, &result);
+	a = lower_32_bits(result.registervaluelow);
+	b = upper_32_bits(result.registervaluelow);
+	c = lower_32_bits(result.registervaluehigh);
+	d = upper_32_bits(result.registervaluehigh);
+	pr_info("Hyper-V: Host Build %d.%d.%d.%d-%d-%d\n",
+		b >> 16, b & 0xFFFF, a, d & 0xFFFFFF, c, d >> 24);
+
+	/* Allocate percpu VP index */
+	hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
+				    GFP_KERNEL);
+	if (!hv_vp_index)
+		return 1;
+
+	for (i = 0; i < num_possible_cpus(); i++)
+		hv_vp_index[i] = VP_INVAL;
+
+	if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "arm64/hyperv_init:online",
+			      hv_cpu_init, NULL) < 0)
+		goto free_vp_index;
+
+	/*
+	 * Try to set up what Hyper-V calls the "TSC reference page", which
+	 * uses the ARM Generic Timer virtual counter with some scaling
+	 * information to provide a fast and stable guest VM clocksource.
+	 * If the TSC reference page can't be set up, fall back to reading
+	 * the guest clock provided by Hyper-V's synthetic reference time
+	 * register.
+	 */
+	if (ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE) {
+
+		u64		tsc_msr;
+		phys_addr_t	phys_addr;
+
+		tsc_pg = __vmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL);
+		if (tsc_pg) {
+			phys_addr = page_to_phys(vmalloc_to_page(tsc_pg));
+			tsc_msr = hv_get_vpreg(HV_REGISTER_REFERENCE_TSC);
+			tsc_msr &= GENMASK_ULL(11, 0);
+			tsc_msr = tsc_msr | 0x1 | (u64)phys_addr;
+			hv_set_vpreg(HV_REGISTER_REFERENCE_TSC, tsc_msr);
+			hyperv_cs = &hyperv_cs_tsc;
+			sched_clock_register(read_hv_sched_clock_tsc,
+						64, HV_CLOCK_HZ);
+		}
+	}
+
+	if (!hyperv_cs &&
+	    (ms_hyperv.features & HV_MSR_TIME_REF_COUNT_AVAILABLE)) {
+		hyperv_cs = &hyperv_cs_msr;
+		sched_clock_register(read_hv_sched_clock_msr,
+						64, HV_CLOCK_HZ);
+	}
+
+	if (hyperv_cs) {
+		hyperv_cs->archdata.vdso_direct = false;
+		clocksource_register_hz(hyperv_cs, HV_CLOCK_HZ);
+	}
+
+	hyperv_initialized = true;
+	return 0;
+
+free_vp_index:
+	kfree(hv_vp_index);
+	hv_vp_index = NULL;
+	return 1;
+}
+TIMER_ACPI_DECLARE(hyperv, ACPI_SIG_GTDT, hyperv_init);
+
+/*
+ * This routine is called before kexec/kdump, it does the required cleanup.
+ */
+void hyperv_cleanup(void)
+{
+	/* Reset our OS id */
+	hv_set_vpreg(HV_REGISTER_GUEST_OSID, 0);
+
+}
+EXPORT_SYMBOL_GPL(hyperv_cleanup);
+
+/*
+ * hv_do_hypercall- Invoke the specified hypercall
+ */
+u64 hv_do_hypercall(u64 control, void *input, void *output)
+{
+	u64 input_address;
+	u64 output_address;
+
+	input_address = input ? virt_to_phys(input) : 0;
+	output_address = output ? virt_to_phys(output) : 0;
+	return hv_do_hvc(control, input_address, output_address);
+}
+EXPORT_SYMBOL_GPL(hv_do_hypercall);
+
+/*
+ * hv_do_fast_hypercall8 -- Invoke the specified hypercall
+ * with arguments in registers instead of physical memory.
+ * Avoids the overhead of virt_to_phys for simple hypercalls.
+ */
+
+u64 hv_do_fast_hypercall8(u16 code, u64 input)
+{
+	u64 control;
+
+	control = (u64)code | HV_HYPERCALL_FAST_BIT;
+	return hv_do_hvc(control, input);
+}
+EXPORT_SYMBOL_GPL(hv_do_fast_hypercall8);
+
+
+/*
+ * Set a single VP register to a 64-bit value.
+ */
+void hv_set_vpreg(u32 msr, u64 value)
+{
+	union hv_hypercall_status status;
+
+	status.as_uint64 = hv_do_hvc(
+		HVCALL_SET_VP_REGISTERS | HV_HYPERCALL_FAST_BIT |
+			HV_HYPERCALL_REP_COUNT_1,
+		HV_PARTITION_ID_SELF,
+		HV_VP_INDEX_SELF,
+		msr,
+		0,
+		value,
+		0);
+
+	/*
+	 * Something is fundamentally broken in the hypervisor if
+	 * setting a VP register fails. There's really no way to
+	 * continue as a guest VM, so panic.
+	 */
+	BUG_ON(status.status != HV_STATUS_SUCCESS);
+}
+EXPORT_SYMBOL_GPL(hv_set_vpreg);
+
+
+/*
+ * Get the value of a single VP register, and only the low order 64 bits.
+ */
+u64 hv_get_vpreg(u32 msr)
+{
+	union hv_hypercall_status status;
+	struct hv_get_vp_register_output output;
+
+	status.as_uint64 = hv_do_hvc_fast_get(
+		HVCALL_GET_VP_REGISTERS | HV_HYPERCALL_FAST_BIT |
+			HV_HYPERCALL_REP_COUNT_1,
+		HV_PARTITION_ID_SELF,
+		HV_VP_INDEX_SELF,
+		msr,
+		&output);
+
+	/*
+	 * Something is fundamentally broken in the hypervisor if
+	 * getting a VP register fails. There's really no way to
+	 * continue as a guest VM, so panic.
+	 */
+	BUG_ON(status.status != HV_STATUS_SUCCESS);
+
+	return output.registervaluelow;
+}
+EXPORT_SYMBOL_GPL(hv_get_vpreg);
+
+/*
+ * Get the value of a single VP register that is 128 bits in size.  This is a
+ * separate call in order to avoid complicating the calling sequence for
+ * the much more frequently used 64-bit version.
+ */
+void hv_get_vpreg_128(u32 msr, struct hv_get_vp_register_output *result)
+{
+	union hv_hypercall_status status;
+
+	status.as_uint64 = hv_do_hvc_fast_get(
+		HVCALL_GET_VP_REGISTERS | HV_HYPERCALL_FAST_BIT |
+			HV_HYPERCALL_REP_COUNT_1,
+		HV_PARTITION_ID_SELF,
+		HV_VP_INDEX_SELF,
+		msr,
+		result);
+
+	/*
+	 * Something is fundamentally broken in the hypervisor if
+	 * getting a VP register fails. There's really no way to
+	 * continue as a guest VM, so panic.
+	 */
+	BUG_ON(status.status != HV_STATUS_SUCCESS);
+
+	return;
+
+}
+EXPORT_SYMBOL_GPL(hv_get_vpreg_128);
+
+void hyperv_report_panic(struct pt_regs *regs, long err)
+{
+	static bool panic_reported;
+	u64 guest_id;
+
+	/*
+	 * We prefer to report panic on 'die' chain as we have proper
+	 * registers to report, but if we miss it (e.g. on BUG()) we need
+	 * to report it on 'panic'.
+	 */
+	if (panic_reported)
+		return;
+	panic_reported = true;
+
+	guest_id = hv_get_vpreg(HV_REGISTER_GUEST_OSID);
+
+	/*
+	 * Hyper-V provides the ability to store only 5 values.
+	 * Pick the passed in error value, the guest_id, and the PC.
+	 * The first two general registers are added arbitrarily.
+	 */
+	hv_set_vpreg(HV_REGISTER_CRASH_P0, err);
+	hv_set_vpreg(HV_REGISTER_CRASH_P1, guest_id);
+	hv_set_vpreg(HV_REGISTER_CRASH_P2, regs->pc);
+	hv_set_vpreg(HV_REGISTER_CRASH_P3, regs->regs[0]);
+	hv_set_vpreg(HV_REGISTER_CRASH_P4, regs->regs[1]);
+
+	/*
+	 * Let Hyper-V know there is crash data available
+	 */
+	hv_set_vpreg(HV_REGISTER_CRASH_CTL, HV_CRASH_CTL_CRASH_NOTIFY);
+}
+EXPORT_SYMBOL_GPL(hyperv_report_panic);
+
+/*
+ * hyperv_report_panic_msg - report panic message to Hyper-V
+ * @pa: physical address of the panic page containing the message
+ * @size: size of the message in the page
+ */
+void hyperv_report_panic_msg(phys_addr_t pa, size_t size)
+{
+	/*
+	 * P3 to contain the physical address of the panic page & P4 to
+	 * contain the size of the panic data in that page. Rest of the
+	 * registers are no-op when the NOTIFY_MSG flag is set.
+	 */
+	hv_set_vpreg(HV_REGISTER_CRASH_P0, 0);
+	hv_set_vpreg(HV_REGISTER_CRASH_P1, 0);
+	hv_set_vpreg(HV_REGISTER_CRASH_P2, 0);
+	hv_set_vpreg(HV_REGISTER_CRASH_P3, pa);
+	hv_set_vpreg(HV_REGISTER_CRASH_P4, size);
+
+	/*
+	 * Let Hyper-V know there is crash data available along with
+	 * the panic message.
+	 */
+	hv_set_vpreg(HV_REGISTER_CRASH_CTL,
+	       (HV_CRASH_CTL_CRASH_NOTIFY | HV_CRASH_CTL_CRASH_NOTIFY_MSG));
+}
+EXPORT_SYMBOL_GPL(hyperv_report_panic_msg);
+
+bool hv_is_hyperv_initialized(void)
+{
+	return hyperv_initialized;
+}
+EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
diff --git a/arch/arm64/hyperv/mshyperv.c b/arch/arm64/hyperv/mshyperv.c
new file mode 100644
index 000000000000..3ef055599412
--- /dev/null
+++ b/arch/arm64/hyperv/mshyperv.c
@@ -0,0 +1,178 @@ 
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Core routines for interacting with Microsoft's Hyper-V hypervisor,
+ * including setting up VMbus and STIMER interrupts, and handling
+ * crashes and kexecs. These interactions are through a set of
+ * static "handler" variables set by the architecture independent
+ * VMbus and STIMER drivers.  This design is used to meet x86/x64
+ * requirements for avoiding direct linkages and allowing the VMbus
+ * and STIMER drivers to be unloaded and reloaded.
+ *
+ * Copyright (C) 2018, Microsoft, Inc.
+ *
+ * Author : Michael Kelley <mikelley@microsoft.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT.  See the GNU General Public License for more
+ * details.
+ */
+
+#include <linux/types.h>
+#include <linux/export.h>
+#include <linux/interrupt.h>
+#include <linux/kexec.h>
+#include <linux/acpi.h>
+#include <linux/ptrace.h>
+#include <asm/hyperv-tlfs.h>
+#include <asm/mshyperv.h>
+
+static void (*vmbus_handler)(void);
+static void (*hv_stimer0_handler)(void);
+static void (*hv_kexec_handler)(void);
+static void (*hv_crash_handler)(struct pt_regs *regs);
+
+static int vmbus_irq;
+static long __percpu *vmbus_evt;
+static long __percpu *stimer0_evt;
+
+irqreturn_t hyperv_vector_handler(int irq, void *dev_id)
+{
+	if (vmbus_handler)
+		vmbus_handler();
+	return IRQ_HANDLED;
+}
+
+/* Must be done just once */
+void hv_setup_vmbus_irq(void (*handler)(void))
+{
+	int result;
+
+	vmbus_handler = handler;
+	vmbus_irq = acpi_register_gsi(NULL, HYPERVISOR_CALLBACK_VECTOR,
+				 ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH);
+	if (vmbus_irq <= 0) {
+		pr_err("Can't register Hyper-V VMBus GSI. Error %d",
+			vmbus_irq);
+		vmbus_irq = 0;
+		return;
+	}
+	vmbus_evt = alloc_percpu(long);
+	result = request_percpu_irq(vmbus_irq, hyperv_vector_handler,
+			"Hyper-V VMbus", vmbus_evt);
+	if (result) {
+		pr_err("Can't request Hyper-V VMBus IRQ %d. Error %d",
+			vmbus_irq, result);
+		free_percpu(vmbus_evt);
+		acpi_unregister_gsi(vmbus_irq);
+		vmbus_irq = 0;
+	}
+}
+EXPORT_SYMBOL_GPL(hv_setup_vmbus_irq);
+
+/* Must be done just once */
+void hv_remove_vmbus_irq(void)
+{
+	if (vmbus_irq) {
+		free_percpu_irq(vmbus_irq, vmbus_evt);
+		free_percpu(vmbus_evt);
+		acpi_unregister_gsi(vmbus_irq);
+	}
+}
+EXPORT_SYMBOL_GPL(hv_remove_vmbus_irq);
+
+/* Must be done by each CPU */
+void hv_enable_vmbus_irq(void)
+{
+	enable_percpu_irq(vmbus_irq, 0);
+}
+EXPORT_SYMBOL_GPL(hv_enable_vmbus_irq);
+
+/* Must be done by each CPU */
+void hv_disable_vmbus_irq(void)
+{
+	disable_percpu_irq(vmbus_irq);
+}
+EXPORT_SYMBOL_GPL(hv_disable_vmbus_irq);
+
+/* Routines to do per-architecture handling of STIMER0 when in Direct Mode */
+
+static irqreturn_t hv_stimer0_vector_handler(int irq, void *dev_id)
+{
+	if (hv_stimer0_handler)
+		hv_stimer0_handler();
+	return IRQ_HANDLED;
+}
+
+int hv_setup_stimer0_irq(int *irq, int *vector, void (*handler)(void))
+{
+	int localirq;
+	int result;
+
+	localirq = acpi_register_gsi(NULL, HV_STIMER0_IRQNR,
+			ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH);
+	if (localirq <= 0) {
+		pr_err("Can't register Hyper-V stimer0 GSI. Error %d",
+			localirq);
+		*irq = 0;
+		return -1;
+	}
+	stimer0_evt = alloc_percpu(long);
+	result = request_percpu_irq(localirq, hv_stimer0_vector_handler,
+					 "Hyper-V stimer0", stimer0_evt);
+	if (result) {
+		pr_err("Can't request Hyper-V stimer0 IRQ %d. Error %d",
+			localirq, result);
+		free_percpu(stimer0_evt);
+		acpi_unregister_gsi(localirq);
+		*irq = 0;
+		return -1;
+	}
+
+	hv_stimer0_handler = handler;
+	*vector = HV_STIMER0_IRQNR;
+	*irq = localirq;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(hv_setup_stimer0_irq);
+
+void hv_remove_stimer0_irq(int irq)
+{
+	hv_stimer0_handler = NULL;
+	if (irq) {
+		free_percpu_irq(irq, stimer0_evt);
+		free_percpu(stimer0_evt);
+		acpi_unregister_gsi(irq);
+	}
+}
+EXPORT_SYMBOL_GPL(hv_remove_stimer0_irq);
+
+void hv_setup_kexec_handler(void (*handler)(void))
+{
+	hv_kexec_handler = handler;
+}
+EXPORT_SYMBOL_GPL(hv_setup_kexec_handler);
+
+void hv_remove_kexec_handler(void)
+{
+	hv_kexec_handler = NULL;
+}
+EXPORT_SYMBOL_GPL(hv_remove_kexec_handler);
+
+void hv_setup_crash_handler(void (*handler)(struct pt_regs *regs))
+{
+	hv_crash_handler = handler;
+}
+EXPORT_SYMBOL_GPL(hv_setup_crash_handler);
+
+void hv_remove_crash_handler(void)
+{
+	hv_crash_handler = NULL;
+}
+EXPORT_SYMBOL_GPL(hv_remove_crash_handler);