diff mbox series

[v7,22/22] Documentation: arm64: describe asymmetric 32-bit support

Message ID 20210525151432.16875-23-will@kernel.org (mailing list archive)
State New, archived
Headers show
Series Add support for 32-bit tasks on asymmetric AArch32 systems | expand

Commit Message

Will Deacon May 25, 2021, 3:14 p.m. UTC
Document support for running 32-bit tasks on asymmetric 32-bit systems
and its impact on the user ABI when enabled.

Signed-off-by: Will Deacon <will@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         |   3 +
 Documentation/arm64/asymmetric-32bit.rst      | 154 ++++++++++++++++++
 Documentation/arm64/index.rst                 |   1 +
 3 files changed, 158 insertions(+)
 create mode 100644 Documentation/arm64/asymmetric-32bit.rst

Comments

Marc Zyngier May 25, 2021, 5:13 p.m. UTC | #1
On Tue, 25 May 2021 16:14:32 +0100,
Will Deacon <will@kernel.org> wrote:
> 
> Document support for running 32-bit tasks on asymmetric 32-bit systems
> and its impact on the user ABI when enabled.
> 
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  .../admin-guide/kernel-parameters.txt         |   3 +
>  Documentation/arm64/asymmetric-32bit.rst      | 154 ++++++++++++++++++
>  Documentation/arm64/index.rst                 |   1 +
>  3 files changed, 158 insertions(+)
>  create mode 100644 Documentation/arm64/asymmetric-32bit.rst
>

[...]

> +KVM
> +---
> +
> +Although KVM will not advertise 32-bit EL0 support to any vCPUs on an
> +asymmetric system, a broken guest at EL1 could still attempt to execute
> +32-bit code at EL0. In this case, an exit from a vCPU thread in 32-bit
> +mode will return to host userspace with an ``exit_reason`` of
> +``KVM_EXIT_FAIL_ENTRY``.

Nit: there is a bit more to it. The vcpu will be left in a permanent
non-runnable state until KVM_ARM_VCPU_INIT is issued to reset the vcpu
into a saner state.

Thanks,

	M.
Will Deacon May 25, 2021, 5:27 p.m. UTC | #2
On Tue, May 25, 2021 at 06:13:58PM +0100, Marc Zyngier wrote:
> On Tue, 25 May 2021 16:14:32 +0100,
> Will Deacon <will@kernel.org> wrote:
> > 
> > Document support for running 32-bit tasks on asymmetric 32-bit systems
> > and its impact on the user ABI when enabled.
> > 
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  .../admin-guide/kernel-parameters.txt         |   3 +
> >  Documentation/arm64/asymmetric-32bit.rst      | 154 ++++++++++++++++++
> >  Documentation/arm64/index.rst                 |   1 +
> >  3 files changed, 158 insertions(+)
> >  create mode 100644 Documentation/arm64/asymmetric-32bit.rst
> >
> 
> [...]
> 
> > +KVM
> > +---
> > +
> > +Although KVM will not advertise 32-bit EL0 support to any vCPUs on an
> > +asymmetric system, a broken guest at EL1 could still attempt to execute
> > +32-bit code at EL0. In this case, an exit from a vCPU thread in 32-bit
> > +mode will return to host userspace with an ``exit_reason`` of
> > +``KVM_EXIT_FAIL_ENTRY``.
> 
> Nit: there is a bit more to it. The vcpu will be left in a permanent
> non-runnable state until KVM_ARM_VCPU_INIT is issued to reset the vcpu
> into a saner state.

Thanks, I'll add "and will remain non-runnable until re-initialised by a
subsequent KVM_ARM_VCPU_INIT operation".

Can the VMM tell that it needs to do that? I wonder if we should be
setting 'hardware_entry_failure_reason' to distinguish this case.

Will
Marc Zyngier May 25, 2021, 6:11 p.m. UTC | #3
On Tue, 25 May 2021 18:27:03 +0100,
Will Deacon <will@kernel.org> wrote:
> 
> On Tue, May 25, 2021 at 06:13:58PM +0100, Marc Zyngier wrote:
> > On Tue, 25 May 2021 16:14:32 +0100,
> > Will Deacon <will@kernel.org> wrote:
> > > 
> > > Document support for running 32-bit tasks on asymmetric 32-bit systems
> > > and its impact on the user ABI when enabled.
> > > 
> > > Signed-off-by: Will Deacon <will@kernel.org>
> > > ---
> > >  .../admin-guide/kernel-parameters.txt         |   3 +
> > >  Documentation/arm64/asymmetric-32bit.rst      | 154 ++++++++++++++++++
> > >  Documentation/arm64/index.rst                 |   1 +
> > >  3 files changed, 158 insertions(+)
> > >  create mode 100644 Documentation/arm64/asymmetric-32bit.rst
> > >
> > 
> > [...]
> > 
> > > +KVM
> > > +---
> > > +
> > > +Although KVM will not advertise 32-bit EL0 support to any vCPUs on an
> > > +asymmetric system, a broken guest at EL1 could still attempt to execute
> > > +32-bit code at EL0. In this case, an exit from a vCPU thread in 32-bit
> > > +mode will return to host userspace with an ``exit_reason`` of
> > > +``KVM_EXIT_FAIL_ENTRY``.
> > 
> > Nit: there is a bit more to it. The vcpu will be left in a permanent
> > non-runnable state until KVM_ARM_VCPU_INIT is issued to reset the vcpu
> > into a saner state.
> 
> Thanks, I'll add "and will remain non-runnable until re-initialised by a
> subsequent KVM_ARM_VCPU_INIT operation".

Looks good.

> Can the VMM tell that it needs to do that? I wonder if we should be
> setting 'hardware_entry_failure_reason' to distinguish this case.

The VMM should be able to notice that something is amiss, as any
subsequent KVM_RUN calls will result in -ENOEXEC being returned, and
we document this as "the vcpu hasn't been initialized or the guest
tried to execute instructions from device memory (arm64)".

However, there is another reason to get a "FAILED_ENTRY", and that if
we get an Illegal Exception Return exception when entering the
guest. That one should always be a KVM bug.

So yeah, maybe there is some ground to populate that structure with
the appropriate nastygram (completely untested).

	M.

diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 24223adae150..cf50051a9412 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -402,6 +402,10 @@ struct kvm_vcpu_events {
 #define KVM_PSCI_RET_INVAL		PSCI_RET_INVALID_PARAMS
 #define KVM_PSCI_RET_DENIED		PSCI_RET_DENIED
 
+/* KVM_EXIT_FAIL_ENTRY reasons */
+#define KVM_ARM64_FAILED_ENTRY_NO_AARCH32_ALLOWED	0xBADBAD32
+#define KVM_ARM64_FAILED_ENTRY_INTERNAL_ERROR		0xE1215BAD
+
 #endif
 
 #endif /* __ARM_KVM_H__ */
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 6f48336b1d86..e97cd4de1fa7 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -262,6 +262,10 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index)
 		 * have been corrupted somehow.  Give up.
 		 */
 		run->exit_reason = KVM_EXIT_FAIL_ENTRY;
+		run->fail_entry.hardware_entry_failure_reason = (vcpu->arch.target == -1) ?
+			KVM_ARM64_FAILED_ENTRY_NO_AARCH32_ALLOWED :
+			KVM_ARM64_FAILED_ENTRY_INTERNAL_ERROR;
+		run->fail_entry.cpu = vcpu->cpu;
 		return -EINVAL;
 	default:
 		kvm_pr_unimpl("Unsupported exception type: %d",
Will Deacon May 26, 2021, 4 p.m. UTC | #4
On Tue, May 25, 2021 at 07:11:44PM +0100, Marc Zyngier wrote:
> On Tue, 25 May 2021 18:27:03 +0100,
> Will Deacon <will@kernel.org> wrote:
> > 
> > On Tue, May 25, 2021 at 06:13:58PM +0100, Marc Zyngier wrote:
> > > On Tue, 25 May 2021 16:14:32 +0100,
> > > Will Deacon <will@kernel.org> wrote:
> > > > 
> > > > Document support for running 32-bit tasks on asymmetric 32-bit systems
> > > > and its impact on the user ABI when enabled.
> > > > 
> > > > Signed-off-by: Will Deacon <will@kernel.org>
> > > > ---
> > > >  .../admin-guide/kernel-parameters.txt         |   3 +
> > > >  Documentation/arm64/asymmetric-32bit.rst      | 154 ++++++++++++++++++
> > > >  Documentation/arm64/index.rst                 |   1 +
> > > >  3 files changed, 158 insertions(+)
> > > >  create mode 100644 Documentation/arm64/asymmetric-32bit.rst
> > > >
> > > 
> > > [...]
> > > 
> > > > +KVM
> > > > +---
> > > > +
> > > > +Although KVM will not advertise 32-bit EL0 support to any vCPUs on an
> > > > +asymmetric system, a broken guest at EL1 could still attempt to execute
> > > > +32-bit code at EL0. In this case, an exit from a vCPU thread in 32-bit
> > > > +mode will return to host userspace with an ``exit_reason`` of
> > > > +``KVM_EXIT_FAIL_ENTRY``.
> > > 
> > > Nit: there is a bit more to it. The vcpu will be left in a permanent
> > > non-runnable state until KVM_ARM_VCPU_INIT is issued to reset the vcpu
> > > into a saner state.
> > 
> > Thanks, I'll add "and will remain non-runnable until re-initialised by a
> > subsequent KVM_ARM_VCPU_INIT operation".
> 
> Looks good.

Cheers.

> > Can the VMM tell that it needs to do that? I wonder if we should be
> > setting 'hardware_entry_failure_reason' to distinguish this case.
> 
> The VMM should be able to notice that something is amiss, as any
> subsequent KVM_RUN calls will result in -ENOEXEC being returned, and
> we document this as "the vcpu hasn't been initialized or the guest
> tried to execute instructions from device memory (arm64)".
> 
> However, there is another reason to get a "FAILED_ENTRY", and that if
> we get an Illegal Exception Return exception when entering the
> guest. That one should always be a KVM bug.
> 
> So yeah, maybe there is some ground to populate that structure with
> the appropriate nastygram (completely untested).
> 
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index 24223adae150..cf50051a9412 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -402,6 +402,10 @@ struct kvm_vcpu_events {
>  #define KVM_PSCI_RET_INVAL		PSCI_RET_INVALID_PARAMS
>  #define KVM_PSCI_RET_DENIED		PSCI_RET_DENIED
>  
> +/* KVM_EXIT_FAIL_ENTRY reasons */
> +#define KVM_ARM64_FAILED_ENTRY_NO_AARCH32_ALLOWED	0xBADBAD32
> +#define KVM_ARM64_FAILED_ENTRY_INTERNAL_ERROR		0xE1215BAD

Heh, you and your magic numbers ;)

I'll leave it up to you as to whether you want to populate this -- I just
spotted it and thought it might help to indicate what went wrong. This is a
pretty daft situation to end up in so whether anybody would realistically
try to recover from it is another question entirely.

Will
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a2e453919bb6..5a1dc7e628a5 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -295,6 +295,9 @@ 
 			EL0 is indicated by /sys/devices/system/cpu/aarch32_el0
 			and hot-unplug operations may be restricted.
 
+			See Documentation/arm64/asymmetric-32bit.rst for more
+			information.
+
 	amd_iommu=	[HW,X86-64]
 			Pass parameters to the AMD IOMMU driver in the system.
 			Possible values are:
diff --git a/Documentation/arm64/asymmetric-32bit.rst b/Documentation/arm64/asymmetric-32bit.rst
new file mode 100644
index 000000000000..a70a2b97e60b
--- /dev/null
+++ b/Documentation/arm64/asymmetric-32bit.rst
@@ -0,0 +1,154 @@ 
+======================
+Asymmetric 32-bit SoCs
+======================
+
+Author: Will Deacon <will@kernel.org>
+
+This document describes the impact of asymmetric 32-bit SoCs on the
+execution of 32-bit (``AArch32``) applications.
+
+Date: 2021-05-17
+
+Introduction
+============
+
+Some Armv9 SoCs suffer from a big.LITTLE misfeature where only a subset
+of the CPUs are capable of executing 32-bit user applications. On such
+a system, Linux by default treats the asymmetry as a "mismatch" and
+disables support for both the ``PER_LINUX32`` personality and
+``execve(2)`` of 32-bit ELF binaries, with the latter returning
+``-ENOEXEC``. If the mismatch is detected during late onlining of a
+64-bit-only CPU, then the onlining operation fails and the new CPU is
+unavailable for scheduling.
+
+Surprisingly, these SoCs have been produced with the intention of
+running legacy 32-bit binaries. Unsurprisingly, that doesn't work very
+well with the default behaviour of Linux.
+
+It seems inevitable that future SoCs will drop 32-bit support
+altogether, so if you're stuck in the unenviable position of needing to
+run 32-bit code on one of these transitionary platforms then you would
+be wise to consider alternatives such as recompilation, emulation or
+retirement. If neither of those options are practical, then read on.
+
+Enabling kernel support
+=======================
+
+Since the kernel support is not completely transparent to userspace,
+allowing 32-bit tasks to run on an asymmetric 32-bit system requires an
+explicit "opt-in" and can be enabled by passing the
+``allow_mismatched_32bit_el0`` parameter on the kernel command-line.
+
+For the remainder of this document we will refer to an *asymmetric
+system* to mean an asymmetric 32-bit SoC running Linux with this kernel
+command-line option enabled.
+
+Userspace impact
+================
+
+32-bit tasks running on an asymmetric system behave in mostly the same
+way as on a homogeneous system, with a few key differences relating to
+CPU affinity.
+
+sysfs
+-----
+
+The subset of CPUs capable of running 32-bit tasks is described in
+``/sys/devices/system/cpu/aarch32_el0`` and is documented further in
+``Documentation/ABI/testing/sysfs-devices-system-cpu``.
+
+**Note:** CPUs are advertised by this file as they are detected and so
+late-onlining of 32-bit-capable CPUs can result in the file contents
+being modified by the kernel at runtime. Once advertised, CPUs are never
+removed from the file.
+
+``execve(2)``
+-------------
+
+On a homogeneous system, the CPU affinity of a task is preserved across
+``execve(2)``. This is not always possible on an asymmetric system,
+specifically when the new program being executed is 32-bit yet the
+affinity mask contains 64-bit-only CPUs. In this situation, the kernel
+determines the new affinity mask as follows:
+
+  1. If the 32-bit-capable subset of the affinity mask is not empty,
+     then the affinity is restricted to that subset and the old affinity
+     mask is saved. This saved mask is inherited over ``fork(2)`` and
+     preserved across ``execve(2)`` of 32-bit programs.
+
+     **Note:** This step does not apply to ``SCHED_DEADLINE`` tasks.
+     See `SCHED_DEADLINE`_.
+
+  2. Otherwise, the cpuset hierarchy of the task is walked until an
+     ancestor is found containing at least one 32-bit-capable CPU. The
+     affinity of the task is then changed to match the 32-bit-capable
+     subset of the cpuset determined by the walk.
+
+  3. On failure (i.e. out of memory), the affinity is changed to the set
+     of all 32-bit-capable CPUs of which the kernel is aware.
+
+A subsequent ``execve(2)`` of a 64-bit program by the 32-bit task will
+invalidate the affinity mask saved in (1) and attempt to restore the CPU
+affinity of the task using the saved mask if it was previously valid.
+This restoration may fail due to intervening changes to the deadline
+policy or cpuset hierarchy, in which case the ``execve(2)`` continues
+with the affinity unchanged.
+
+Calls to ``sched_setaffinity(2)`` for a 32-bit task will consider only
+the 32-bit-capable CPUs of the requested affinity mask. On success, the
+affinity for the task is updated and any saved mask from a prior
+``execve(2)`` is invalidated.
+
+``SCHED_DEADLINE``
+------------------
+
+Explicit admission of a 32-bit deadline task to the default root domain
+(e.g. by calling ``sched_setattr(2)``) is rejected on an asymmetric
+32-bit system unless admission control is disabled by writing -1 to
+``/proc/sys/kernel/sched_rt_runtime_us``.
+
+``execve(2)`` of a 32-bit program from a 64-bit deadline task will
+return ``-ENOEXEC`` if the root domain for the task contains any
+64-bit-only CPUs and admission control is enabled. Concurrent offlining
+of 32-bit-capable CPUs may still necessitate the procedure described in
+`execve(2)`_, in which case step (1) is skipped and a warning is
+emitted on the console.
+
+**Note:** It is recommended that a set of 32-bit-capable CPUs are placed
+into a separate root domain if ``SCHED_DEADLINE`` is to be used with
+32-bit tasks on an asymmetric system. Failure to do so is likely to
+result in missed deadlines.
+
+Cpusets
+-------
+
+The affinity of a 32-bit task on an asymmetric system may include CPUs
+that are not explicitly allowed by the cpuset to which it is attached.
+This can occur as a result of the following two situations:
+
+  - A 64-bit task attached to a cpuset which allows only 64-bit CPUs
+    executes a 32-bit program.
+
+  - All of the 32-bit-capable CPUs allowed by a cpuset containing a
+    32-bit task are offlined.
+
+In both of these cases, the new affinity is calculated according to step
+(2) of the process described in `execve(2)`_ and the cpuset hierarchy is
+unchanged irrespective of the cgroup version.
+
+CPU hotplug
+-----------
+
+On an asymmetric system, the first detected 32-bit-capable CPU is
+prevented from being offlined by userspace and any such attempt will
+return ``-EPERM``. Note that suspend is still permitted even if the
+primary CPU (i.e. CPU 0) is 64-bit-only.
+
+KVM
+---
+
+Although KVM will not advertise 32-bit EL0 support to any vCPUs on an
+asymmetric system, a broken guest at EL1 could still attempt to execute
+32-bit code at EL0. In this case, an exit from a vCPU thread in 32-bit
+mode will return to host userspace with an ``exit_reason`` of
+``KVM_EXIT_FAIL_ENTRY``.
diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
index 97d65ba12a35..4f840bac083e 100644
--- a/Documentation/arm64/index.rst
+++ b/Documentation/arm64/index.rst
@@ -10,6 +10,7 @@  ARM64 Architecture
     acpi_object_usage
     amu
     arm-acpi
+    asymmetric-32bit
     booting
     cpu-feature-registers
     elf_hwcaps