diff mbox series

riscv: Fix vector state restore in rt_sigreturn()

Message ID 20240403072638.567446-1-bjorn@kernel.org (mailing list archive)
State Accepted
Commit c27fa53b858b4ee6552a719aa599c250cf98a586
Headers show
Series riscv: Fix vector state restore in rt_sigreturn() | expand

Checks

Context Check Description
conchuod/vmtest-for-next-PR success PR summary
conchuod/patch-1-test-1 success .github/scripts/patches/tests/build_rv32_defconfig.sh
conchuod/patch-1-test-2 success .github/scripts/patches/tests/build_rv64_clang_allmodconfig.sh
conchuod/patch-1-test-3 success .github/scripts/patches/tests/build_rv64_gcc_allmodconfig.sh
conchuod/patch-1-test-4 success .github/scripts/patches/tests/build_rv64_nommu_k210_defconfig.sh
conchuod/patch-1-test-5 success .github/scripts/patches/tests/build_rv64_nommu_virt_defconfig.sh
conchuod/patch-1-test-6 warning .github/scripts/patches/tests/checkpatch.sh
conchuod/patch-1-test-7 success .github/scripts/patches/tests/dtb_warn_rv64.sh
conchuod/patch-1-test-8 success .github/scripts/patches/tests/header_inline.sh
conchuod/patch-1-test-9 success .github/scripts/patches/tests/kdoc.sh
conchuod/patch-1-test-10 success .github/scripts/patches/tests/module_param.sh
conchuod/patch-1-test-11 success .github/scripts/patches/tests/verify_fixes.sh
conchuod/patch-1-test-12 success .github/scripts/patches/tests/verify_signedoff.sh

Commit Message

Björn Töpel April 3, 2024, 7:26 a.m. UTC
From: Björn Töpel <bjorn@rivosinc.com>

The RISC-V Vector specification states in "Appendix D: Calling
Convention for Vector State" [1] that "Executing a system call causes
all caller-saved vector registers (v0-v31, vl, vtype) and vstart to
become unspecified.". In the RISC-V kernel this is called "discarding
the vstate".

Returning from a signal handler via the rt_sigreturn() syscall, vector
discard is also performed. However, this is not an issue since the
vector state should be restored from the sigcontext, and therefore not
care about the vector discard.

The "live state" is the actual vector register in the running context,
and the "vstate" is the vector state of the task. A dirty live state,
means that the vstate and live state are not in synch.

When vectorized user_from_copy() was introduced, an bug sneaked in at
the restoration code, related to the discard of the live state.

An example when this go wrong:

  1. A userland application is executing vector code
  2. The application receives a signal, and the signal handler is
     entered.
  3. The application returns from the signal handler, using the
     rt_sigreturn() syscall.
  4. The live vector state is discarded upon entering the
     rt_sigreturn(), and the live state is marked as "dirty", indicating
     that the live state need to be synchronized with the current
     vstate.
  5. rt_sigreturn() restores the vstate, except the Vector registers,
     from the sigcontext
  6. rt_sigreturn() restores the Vector registers, from the sigcontext,
     and now the vectorized user_from_copy() is used. The dirty live
     state from the discard is saved to the vstate, making the vstate
     corrupt.
  7. rt_sigreturn() returns to the application, which crashes due to
     corrupted vstate.

Note that the vectorized user_from_copy() is invoked depending on the
value of CONFIG_RISCV_ISA_V_UCOPY_THRESHOLD. Default is 768, which
means that vlen has to be larger than 128b for this bug to trigger.

The fix is simply to mark the live state as non-dirty/clean prior
performing the vstate restore.

Link: https://github.com/riscv/riscv-isa-manual/releases/download/riscv-isa-release-8abdb41-2024-03-26/unpriv-isa-asciidoc.pdf # [1]
Reported-by: Charlie Jenkins <charlie@rivosinc.com>
Reported-by: Vineet Gupta <vgupta@kernel.org>
Fixes: c2a658d41924 ("riscv: lib: vectorize copy_to_user/copy_from_user")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
---
 arch/riscv/kernel/signal.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)


base-commit: 7115ff4a8bfed3b9294bad2e111744e6abeadf1a

Comments

Andy Chiu April 3, 2024, 10:12 a.m. UTC | #1
On Wed, Apr 3, 2024 at 3:27 PM Björn Töpel <bjorn@kernel.org> wrote:
>
> From: Björn Töpel <bjorn@rivosinc.com>
>
> The RISC-V Vector specification states in "Appendix D: Calling
> Convention for Vector State" [1] that "Executing a system call causes
> all caller-saved vector registers (v0-v31, vl, vtype) and vstart to
> become unspecified.". In the RISC-V kernel this is called "discarding
> the vstate".
>
> Returning from a signal handler via the rt_sigreturn() syscall, vector
> discard is also performed. However, this is not an issue since the
> vector state should be restored from the sigcontext, and therefore not
> care about the vector discard.
>
> The "live state" is the actual vector register in the running context,
> and the "vstate" is the vector state of the task. A dirty live state,
> means that the vstate and live state are not in synch.
>
> When vectorized user_from_copy() was introduced, an bug sneaked in at
> the restoration code, related to the discard of the live state.
>
> An example when this go wrong:
>
>   1. A userland application is executing vector code
>   2. The application receives a signal, and the signal handler is
>      entered.
>   3. The application returns from the signal handler, using the
>      rt_sigreturn() syscall.
>   4. The live vector state is discarded upon entering the
>      rt_sigreturn(), and the live state is marked as "dirty", indicating
>      that the live state need to be synchronized with the current
>      vstate.
>   5. rt_sigreturn() restores the vstate, except the Vector registers,
>      from the sigcontext
>   6. rt_sigreturn() restores the Vector registers, from the sigcontext,
>      and now the vectorized user_from_copy() is used. The dirty live
>      state from the discard is saved to the vstate, making the vstate
>      corrupt.
>   7. rt_sigreturn() returns to the application, which crashes due to
>      corrupted vstate.
>
> Note that the vectorized user_from_copy() is invoked depending on the
> value of CONFIG_RISCV_ISA_V_UCOPY_THRESHOLD. Default is 768, which
> means that vlen has to be larger than 128b for this bug to trigger.
>
> The fix is simply to mark the live state as non-dirty/clean prior
> performing the vstate restore.
>
> Link: https://github.com/riscv/riscv-isa-manual/releases/download/riscv-isa-release-8abdb41-2024-03-26/unpriv-isa-asciidoc.pdf # [1]
> Reported-by: Charlie Jenkins <charlie@rivosinc.com>
> Reported-by: Vineet Gupta <vgupta@kernel.org>
> Fixes: c2a658d41924 ("riscv: lib: vectorize copy_to_user/copy_from_user")
> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>

Thanks for the findings!

Reviewed-by: Andy Chiu <andy.chiu@sifive.com>

> ---
>  arch/riscv/kernel/signal.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
> index 501e66debf69..5a2edd7f027e 100644
> --- a/arch/riscv/kernel/signal.c
> +++ b/arch/riscv/kernel/signal.c
> @@ -119,6 +119,13 @@ static long __restore_v_state(struct pt_regs *regs, void __user *sc_vec)
>         struct __sc_riscv_v_state __user *state = sc_vec;
>         void __user *datap;
>
> +       /*
> +        * Mark the vstate as clean prior performing the actual copy,
> +        * to avoid getting the vstate incorrectly clobbered by the
> +        *  discarded vector state.
> +        */
> +       riscv_v_vstate_set_restore(current, regs);
> +
>         /* Copy everything of __sc_riscv_v_state except datap. */
>         err = __copy_from_user(&current->thread.vstate, &state->v_state,
>                                offsetof(struct __riscv_v_ext_state, datap));
> @@ -133,13 +140,7 @@ static long __restore_v_state(struct pt_regs *regs, void __user *sc_vec)
>          * Copy the whole vector content from user space datap. Use
>          * copy_from_user to prevent information leak.
>          */
> -       err = copy_from_user(current->thread.vstate.datap, datap, riscv_v_vsize);
> -       if (unlikely(err))
> -               return err;
> -
> -       riscv_v_vstate_set_restore(current, regs);
> -
> -       return err;
> +       return copy_from_user(current->thread.vstate.datap, datap, riscv_v_vsize);
>  }
>  #else
>  #define save_v_state(task, regs) (0)
>
> base-commit: 7115ff4a8bfed3b9294bad2e111744e6abeadf1a
> --
> 2.40.1
>
Vineet Gupta April 3, 2024, 5:33 p.m. UTC | #2
On 4/3/24 00:26, Björn Töpel wrote:
> From: Björn Töpel <bjorn@rivosinc.com>
>
> The RISC-V Vector specification states in "Appendix D: Calling
> Convention for Vector State" [1] that "Executing a system call causes
> all caller-saved vector registers (v0-v31, vl, vtype) and vstart to
> become unspecified.". In the RISC-V kernel this is called "discarding
> the vstate".
>
> Returning from a signal handler via the rt_sigreturn() syscall, vector
> discard is also performed. However, this is not an issue since the
> vector state should be restored from the sigcontext, and therefore not
> care about the vector discard.
>
> The "live state" is the actual vector register in the running context,
> and the "vstate" is the vector state of the task. A dirty live state,
> means that the vstate and live state are not in synch.
>
> When vectorized user_from_copy() was introduced, an bug sneaked in at
> the restoration code, related to the discard of the live state.
>
> An example when this go wrong:
>
>   1. A userland application is executing vector code
>   2. The application receives a signal, and the signal handler is
>      entered.
>   3. The application returns from the signal handler, using the
>      rt_sigreturn() syscall.
>   4. The live vector state is discarded upon entering the
>      rt_sigreturn(), and the live state is marked as "dirty", indicating
>      that the live state need to be synchronized with the current
>      vstate.
>   5. rt_sigreturn() restores the vstate, except the Vector registers,
>      from the sigcontext
>   6. rt_sigreturn() restores the Vector registers, from the sigcontext,
>      and now the vectorized user_from_copy() is used. The dirty live
>      state from the discard is saved to the vstate, making the vstate
>      corrupt.
>   7. rt_sigreturn() returns to the application, which crashes due to
>      corrupted vstate.
>
> Note that the vectorized user_from_copy() is invoked depending on the
> value of CONFIG_RISCV_ISA_V_UCOPY_THRESHOLD. Default is 768, which
> means that vlen has to be larger than 128b for this bug to trigger.
>
> The fix is simply to mark the live state as non-dirty/clean prior
> performing the vstate restore.
>
> Link: https://github.com/riscv/riscv-isa-manual/releases/download/riscv-isa-release-8abdb41-2024-03-26/unpriv-isa-asciidoc.pdf # [1]
> Reported-by: Charlie Jenkins <charlie@rivosinc.com>
> Reported-by: Vineet Gupta <vgupta@kernel.org>
> Fixes: c2a658d41924 ("riscv: lib: vectorize copy_to_user/copy_from_user")
> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>

Tested-by: Vineet Gupta <vineetg@rivosinc.com>

For completeness (and fun)

1. The issue was triggered on dual core spike run with a seemingly
benign workload (the key is repeated fork/execve/exit with a little I/O)

    some-shell-script.sh

    #!/bin/bash

    (while true; do ls; done) &

    for i in $seq (1 20); do
       <long running job>
    done

2. The issue initially appears as follows: Vector store instruction,
before starting to run invalidates it's own context (page fault ->
preemption -> handle-signal -> sigreturn -> VILL / V-clobber), so when
it eventually runs, it takes an illegal instruction exception, taking
down the entire program.

Thx,
-Vineet
patchwork-bot+linux-riscv@kernel.org April 4, 2024, 7:40 p.m. UTC | #3
Hello:

This patch was applied to riscv/linux.git (fixes)
by Palmer Dabbelt <palmer@rivosinc.com>:

On Wed,  3 Apr 2024 09:26:38 +0200 you wrote:
> From: Björn Töpel <bjorn@rivosinc.com>
> 
> The RISC-V Vector specification states in "Appendix D: Calling
> Convention for Vector State" [1] that "Executing a system call causes
> all caller-saved vector registers (v0-v31, vl, vtype) and vstart to
> become unspecified.". In the RISC-V kernel this is called "discarding
> the vstate".
> 
> [...]

Here is the summary with links:
  - riscv: Fix vector state restore in rt_sigreturn()
    https://git.kernel.org/riscv/c/c27fa53b858b

You are awesome, thank you!
diff mbox series

Patch

diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
index 501e66debf69..5a2edd7f027e 100644
--- a/arch/riscv/kernel/signal.c
+++ b/arch/riscv/kernel/signal.c
@@ -119,6 +119,13 @@  static long __restore_v_state(struct pt_regs *regs, void __user *sc_vec)
 	struct __sc_riscv_v_state __user *state = sc_vec;
 	void __user *datap;
 
+	/*
+	 * Mark the vstate as clean prior performing the actual copy,
+	 * to avoid getting the vstate incorrectly clobbered by the
+	 *  discarded vector state.
+	 */
+	riscv_v_vstate_set_restore(current, regs);
+
 	/* Copy everything of __sc_riscv_v_state except datap. */
 	err = __copy_from_user(&current->thread.vstate, &state->v_state,
 			       offsetof(struct __riscv_v_ext_state, datap));
@@ -133,13 +140,7 @@  static long __restore_v_state(struct pt_regs *regs, void __user *sc_vec)
 	 * Copy the whole vector content from user space datap. Use
 	 * copy_from_user to prevent information leak.
 	 */
-	err = copy_from_user(current->thread.vstate.datap, datap, riscv_v_vsize);
-	if (unlikely(err))
-		return err;
-
-	riscv_v_vstate_set_restore(current, regs);
-
-	return err;
+	return copy_from_user(current->thread.vstate.datap, datap, riscv_v_vsize);
 }
 #else
 #define save_v_state(task, regs) (0)