diff mbox series

RISC-V: Clobber V registers on syscalls

Message ID 20230614163534.18668-1-palmer@rivosinc.com (mailing list archive)
State Superseded
Headers show
Series RISC-V: Clobber V registers on syscalls | expand

Checks

Context Check Description
conchuod/cover_letter success Single patches do not need cover letters
conchuod/tree_selection success Guessed tree name to be for-next at HEAD d5e45e810e0e
conchuod/fixes_present success Fixes tag not required for -next series
conchuod/maintainers_pattern success MAINTAINERS pattern errors before the patch: 6 and now 6
conchuod/verify_signedoff success Signed-off-by tag matches author and committer
conchuod/kdoc success Errors and warnings before: 0 this patch: 0
conchuod/build_rv64_clang_allmodconfig success Errors and warnings before: 8 this patch: 8
conchuod/module_param success Was 0 now: 0
conchuod/build_rv64_gcc_allmodconfig success Errors and warnings before: 8 this patch: 8
conchuod/build_rv32_defconfig success Build OK
conchuod/dtb_warn_rv64 success Errors and warnings before: 3 this patch: 3
conchuod/header_inline success No static functions without inline keyword in header files
conchuod/checkpatch success total: 0 errors, 0 warnings, 0 checks, 16 lines checked
conchuod/build_rv64_nommu_k210_defconfig success Build OK
conchuod/verify_fixes success No Fixes tag
conchuod/build_rv64_nommu_virt_defconfig success Build OK

Commit Message

Palmer Dabbelt June 14, 2023, 4:35 p.m. UTC
The V registers are clobbered by standard ABI functions, so userspace
probably doesn't have anything useful in them by the time we get to the
kernel.  So let's just document that they're clobbered by syscalls and
proactively clobber them.

Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
---
IIRC we'd talked about doing this, but I didn't see anything in the
docs.  I figure it's better to just proactively clobber the registers on
syscalls, as that way userspace can't end up accidentally depending on
them.
---
 Documentation/riscv/vector.rst | 5 +++++
 arch/riscv/kernel/traps.c      | 2 ++
 2 files changed, 7 insertions(+)

Comments

Rémi Denis-Courmont June 15, 2023, 5:36 p.m. UTC | #1
Le keskiviikkona 14. kesäkuuta 2023, 19.35.34 EEST Palmer Dabbelt a écrit :
> The V registers are clobbered by standard ABI functions, so userspace
> probably doesn't have anything useful in them by the time we get to the
> kernel.

Indeed, for your typical system call, wrapped by two or more layers of 
function calls inside libc, userspace will treat the registers as clobbered 
anyhow.

But AFAIU, other architectures don't gratuitiously clobber SIMD or vector 
registers, even those that are callee-clobbered by their respective function 
calling convention, or do they? FWIW, Arm is going the opposite direction with 
their higher privilege calls (newer versions of SMCCC define how to preserve 
SVE vectors).

The kernel cannot simply clobber registers, as that would likely cause data 
leakage from kernel to user mode. So it is unclear what the benefits would be 
here. And I fear that there will be less conventional use cases whence it 
makes sense to preserve registers on system calls.

For example an inline or compiler intrinsic implementation of C++20/C2X 
atomic-wait/atomic-notify, which would presumably invoke the futex() syscall 
on Linux, maybe??
Palmer Dabbelt June 15, 2023, 8:33 p.m. UTC | #2
On Thu, 15 Jun 2023 10:36:31 PDT (-0700), remi@remlab.net wrote:
> Le keskiviikkona 14. kesäkuuta 2023, 19.35.34 EEST Palmer Dabbelt a écrit :
>> The V registers are clobbered by standard ABI functions, so userspace
>> probably doesn't have anything useful in them by the time we get to the
>> kernel.
>
> Indeed, for your typical system call, wrapped by two or more layers of 
> function calls inside libc, userspace will treat the registers as clobbered 
> anyhow.
>
> But AFAIU, other architectures don't gratuitiously clobber SIMD or vector 
> registers, even those that are callee-clobbered by their respective function 
> calling convention, or do they?

IIUC arm64 has some similar code, at least that's what the comment says 
(and I got the clobbering V state from Arm)

    /*
     * As per the ABI exit SME streaming mode and clear the SVE state not
     * shared with FPSIMD on syscall entry.
     */
    static inline void fp_user_discard(void)

if we don't clobber on syscalls then we'll likely need some way for 
userspace to inform the kernel that V state can be discarded.

> FWIW, Arm is going the opposite direction with 
> their higher privilege calls (newer versions of SMCCC define how to preserve 
> SVE vectors).

That has a slightly different cost structure, though: in the kernel V 
would usually be off, so there's already a strong indication when the 
save/restore is useful.

> The kernel cannot simply clobber registers, as that would likely cause data 
> leakage from kernel to user mode. So it is unclear what the benefits would be 

What's the data leakage?  Unless I'm missing something setting the 
sstatus.vs=off will result in userspace trapping in any V state access, 
so if we're leaking something we're probably also at risk of leaking it 
for new/cloned processes.

That said, we do need to think about speculative side-channels: with the 
V crypto stuff there will be keys in V registers and other architectures 
have had exploitable issues related to lazy save/restore and 
speculation.  Maybe it's best to just wait on that, though?  We'd 
ideally want some canonical sequence in the ISA but the fastest way to 
do that is probably to just wait for an exploit to show up.

> here. And I fear that there will be less conventional use cases whence it 
> makes sense to preserve registers on system calls.
>
> For example an inline or compiler intrinsic implementation of C++20/C2X 
> atomic-wait/atomic-notify, which would presumably invoke the futex() syscall 
> on Linux, maybe??

It'd have to be a pretty special case: at least in libstdc++ and glibc 
the futex calls are behind function calls, so the V registers are 
already clobbered by the time the kernel has been entered (at least for 
anything following the standard ABIs).

>
> -- 
> 雷米‧德尼-库尔蒙
> http://www.remlab.net/
Björn Töpel June 16, 2023, 7:47 p.m. UTC | #3
Rémi Denis-Courmont <remi@remlab.net> writes:

> Le keskiviikkona 14. kesäkuuta 2023, 19.35.34 EEST Palmer Dabbelt a écrit :
>> The V registers are clobbered by standard ABI functions, so userspace
>> probably doesn't have anything useful in them by the time we get to the
>> kernel.
>
> Indeed, for your typical system call, wrapped by two or more layers of 
> function calls inside libc, userspace will treat the registers as clobbered 
> anyhow.
>
> But AFAIU, other architectures don't gratuitiously clobber SIMD or vector 
> registers, even those that are callee-clobbered by their respective function 
> calling convention, or do they? FWIW, Arm is going the opposite direction with 
> their higher privilege calls (newer versions of SMCCC define how to preserve 
> SVE vectors).

Actually, it's from the V spec:
    riscv-v-spec-1.0-4.pdf:
      Executing a system call causes all caller-saved vector registers
      (v0-v31, vl, vtype) and vstart to become unspecified.

AFAIU Arm's SVE/SME has that as well.


Björn
Rémi Denis-Courmont June 16, 2023, 7:58 p.m. UTC | #4
Le torstaina 15. kesäkuuta 2023, 23.33.44 EEST Palmer Dabbelt a écrit :
> > The kernel cannot simply clobber registers, as that would likely cause
> > data leakage from kernel to user mode. So it is unclear what the benefits
> > would be
> What's the data leakage?

Typically "clobbering" the register means that you are writing something else 
in them. If you don't restore them (or expressly reset them to zero or some 
other fixed value), then you leak daata.

Of course, if you don't actually use the register, then you don't leak 
anything in them. But then it's unclear what the benefit of marking them as 
clobbered is.

(...)
> It'd have to be a pretty special case: at least in libstdc++ and glibc
> the futex calls are behind function calls,

Traditionally, atomic variable methods are intrinsics, which result in either 
inline or outline C runtime calls (with some ad-hoc ABI that clobbers very 
little). They cannot be C functions, since they accept parameters of several 
different types.

atomic_notify_one, atomic_notify_all, and atomic_wait or however their 
standardised names end up, will presumably be outlines of the later type, that 
just happen to wrap futex() on Linux.

But anyway, if the spec says that registers are clobbered by system calls as 
Björn pointed out, then that's that.
Björn Töpel June 16, 2023, 8:12 p.m. UTC | #5
Palmer Dabbelt <palmer@rivosinc.com> writes:

> The V registers are clobbered by standard ABI functions, so userspace
> probably doesn't have anything useful in them by the time we get to the
> kernel.  So let's just document that they're clobbered by syscalls and
> proactively clobber them.
>
> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
> ---
> IIRC we'd talked about doing this, but I didn't see anything in the
> docs.  I figure it's better to just proactively clobber the registers on
> syscalls, as that way userspace can't end up accidentally depending on
> them.
> ---
>  Documentation/riscv/vector.rst | 5 +++++
>  arch/riscv/kernel/traps.c      | 2 ++
>  2 files changed, 7 insertions(+)
>
> diff --git a/Documentation/riscv/vector.rst b/Documentation/riscv/vector.rst
> index 48f189d79e41..a4dfa954215b 100644
> --- a/Documentation/riscv/vector.rst
> +++ b/Documentation/riscv/vector.rst
> @@ -130,3 +130,8 @@ processes in form of sysctl knob:
>  
>      Modifying the system default enablement status does not affect the enablement
>      status of any existing process of thread that do not make an execve() call.
> +
> +3.  Vector Register State Across System Calls
> +---------------------------------------------
> +
> +Vector registers are clobbered by system calls.
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 05ffdcd1424e..bb99a6379b37 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -295,6 +295,8 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>  		regs->epc += 4;
>  		regs->orig_a0 = regs->a0;
>  
> +		riscv_v_vstate_off(regs);
> +

Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to
call? Something like:

static void vstate_discard(struct pt_regs *regs)
{
       if ((regs->status & SR_VS) == SR_VS_DIRTY)
               __riscv_v_vstate_clean(regs);
}

Complemented by a !V config variant.


Björn
Palmer Dabbelt June 19, 2023, 6:18 p.m. UTC | #6
On Fri, 16 Jun 2023 13:12:14 PDT (-0700), bjorn@kernel.org wrote:
> Palmer Dabbelt <palmer@rivosinc.com> writes:
>
>> The V registers are clobbered by standard ABI functions, so userspace
>> probably doesn't have anything useful in them by the time we get to the
>> kernel.  So let's just document that they're clobbered by syscalls and
>> proactively clobber them.
>>
>> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
>> ---
>> IIRC we'd talked about doing this, but I didn't see anything in the
>> docs.  I figure it's better to just proactively clobber the registers on
>> syscalls, as that way userspace can't end up accidentally depending on
>> them.
>> ---
>>  Documentation/riscv/vector.rst | 5 +++++
>>  arch/riscv/kernel/traps.c      | 2 ++
>>  2 files changed, 7 insertions(+)
>>
>> diff --git a/Documentation/riscv/vector.rst b/Documentation/riscv/vector.rst
>> index 48f189d79e41..a4dfa954215b 100644
>> --- a/Documentation/riscv/vector.rst
>> +++ b/Documentation/riscv/vector.rst
>> @@ -130,3 +130,8 @@ processes in form of sysctl knob:
>>  
>>      Modifying the system default enablement status does not affect the enablement
>>      status of any existing process of thread that do not make an execve() call.
>> +
>> +3.  Vector Register State Across System Calls
>> +---------------------------------------------
>> +
>> +Vector registers are clobbered by system calls.
>> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
>> index 05ffdcd1424e..bb99a6379b37 100644
>> --- a/arch/riscv/kernel/traps.c
>> +++ b/arch/riscv/kernel/traps.c
>> @@ -295,6 +295,8 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>>  		regs->epc += 4;
>>  		regs->orig_a0 = regs->a0;
>>  
>> +		riscv_v_vstate_off(regs);
>> +
>
> Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to
> call? Something like:
>
> static void vstate_discard(struct pt_regs *regs)
> {
>        if ((regs->status & SR_VS) == SR_VS_DIRTY)
>                __riscv_v_vstate_clean(regs);
> }
>
> Complemented by a !V config variant.

I think it's just a question of what we're trying to do here: clean 
avoids the kernel V state save, but unless the kernel decides to use V 
during the syscall the register contents will still be usable by 
userspace.  Maybe that's fine and we can just rely on the ISA spec, 
though?  I sent another patch to just document it in Linux, even if it's 
in the ISA spec it seems worth having in the kernel as well.

That said, I think the right thing to do here might be to zero the V 
register state and set it to initial: that way we can prevent userspace 
from accidentally relying on the state save, but we can also avoid the 
trap that would come from turning it off.  That lets us give the 
hardware a nice clean indication when the V state isn't in use, which 
will hopefully help us avoid the save/restore performance issues that 
other ports have hit.

I think the issue with zeroing the registers in that it may be slow on 
some implementations, as it requires a bunch of V register writes and 
those could be multi-cycle.  I'd lean towards doing the zeroing now, as 
it'll make sure userspace respects the uABI and we don't have any HW to 
measure the performance on.  Maybe the zeroing will be enough to get HW 
to make that fast, if not we can always roll it back when HW starts 
showing up.

There's also some questions as to whether or not HW is going to bother 
respecting the intermediate states, as IIRC it's pretty common for HW to 
ignore them for the F/D extensions (at least the old SiFive cores do).  
I think there's just not a whole lot we can do there, HW that 
inaccurately tracks the metadata will just end up with more 
save/restore time.

> Björn
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Björn Töpel June 19, 2023, 7:01 p.m. UTC | #7
Palmer Dabbelt <palmer@rivosinc.com> writes:

[...]

>>> +		riscv_v_vstate_off(regs);
>>> +
>>
>> Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to
>> call? Something like:
>>
>> static void vstate_discard(struct pt_regs *regs)
>> {
>>        if ((regs->status & SR_VS) == SR_VS_DIRTY)
>>                __riscv_v_vstate_clean(regs);
>> }
>>
>> Complemented by a !V config variant.
>
> I think it's just a question of what we're trying to do here: clean 
> avoids the kernel V state save, but unless the kernel decides to use V 
> during the syscall the register contents will still be usable by 
> userspace.  Maybe that's fine and we can just rely on the ISA spec, 
> though?  I sent another patch to just document it in Linux, even if it's 
> in the ISA spec it seems worth having in the kernel as well.
>
> That said, I think the right thing to do here might be to zero the V 
> register state and set it to initial: that way we can prevent userspace 
> from accidentally relying on the state save, but we can also avoid the 
> trap that would come from turning it off.  That lets us give the 
> hardware a nice clean indication when the V state isn't in use, which 
> will hopefully help us avoid the save/restore performance issues that 
> other ports have hit.

FWIW, I think that's a much better idea than turning V off. I also like
that it'll preventing userland to rely on pre-ecall state.


Björn
Palmer Dabbelt June 19, 2023, 7:05 p.m. UTC | #8
On Mon, 19 Jun 2023 12:01:20 PDT (-0700), bjorn@kernel.org wrote:
> Palmer Dabbelt <palmer@rivosinc.com> writes:
>
> [...]
>
>>>> +		riscv_v_vstate_off(regs);
>>>> +
>>>
>>> Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to
>>> call? Something like:
>>>
>>> static void vstate_discard(struct pt_regs *regs)
>>> {
>>>        if ((regs->status & SR_VS) == SR_VS_DIRTY)
>>>                __riscv_v_vstate_clean(regs);
>>> }
>>>
>>> Complemented by a !V config variant.
>>
>> I think it's just a question of what we're trying to do here: clean 
>> avoids the kernel V state save, but unless the kernel decides to use V 
>> during the syscall the register contents will still be usable by 
>> userspace.  Maybe that's fine and we can just rely on the ISA spec, 
>> though?  I sent another patch to just document it in Linux, even if it's 
>> in the ISA spec it seems worth having in the kernel as well.
>>
>> That said, I think the right thing to do here might be to zero the V 
>> register state and set it to initial: that way we can prevent userspace 
>> from accidentally relying on the state save, but we can also avoid the 
>> trap that would come from turning it off.  That lets us give the 
>> hardware a nice clean indication when the V state isn't in use, which 
>> will hopefully help us avoid the save/restore performance issues that 
>> other ports have hit.
>
> FWIW, I think that's a much better idea than turning V off. I also like
> that it'll preventing userland to rely on pre-ecall state.

OK, anyone else opposed?

We're kind of in the weeds on performance, I think we'd need HW to know 
for sure if either is an issue.  Seems best to just play it safe WRT the 
uABI for now, we can always deal with any performance issues if the 
exist.

> Björn
Björn Töpel June 21, 2023, 2:26 p.m. UTC | #9
Palmer Dabbelt <palmer@rivosinc.com> writes:

> On Mon, 19 Jun 2023 12:01:20 PDT (-0700), bjorn@kernel.org wrote:
>> Palmer Dabbelt <palmer@rivosinc.com> writes:
>>
>> [...]
>>
>>>>> +		riscv_v_vstate_off(regs);
>>>>> +
>>>>
>>>> Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to
>>>> call? Something like:
>>>>
>>>> static void vstate_discard(struct pt_regs *regs)
>>>> {
>>>>        if ((regs->status & SR_VS) == SR_VS_DIRTY)
>>>>                __riscv_v_vstate_clean(regs);
>>>> }
>>>>
>>>> Complemented by a !V config variant.
>>>
>>> I think it's just a question of what we're trying to do here: clean 
>>> avoids the kernel V state save, but unless the kernel decides to use V 
>>> during the syscall the register contents will still be usable by 
>>> userspace.  Maybe that's fine and we can just rely on the ISA spec, 
>>> though?  I sent another patch to just document it in Linux, even if it's 
>>> in the ISA spec it seems worth having in the kernel as well.
>>>
>>> That said, I think the right thing to do here might be to zero the V 
>>> register state and set it to initial: that way we can prevent userspace 
>>> from accidentally relying on the state save, but we can also avoid the 
>>> trap that would come from turning it off.  That lets us give the 
>>> hardware a nice clean indication when the V state isn't in use, which 
>>> will hopefully help us avoid the save/restore performance issues that 
>>> other ports have hit.
>>
>> FWIW, I think that's a much better idea than turning V off. I also like
>> that it'll preventing userland to rely on pre-ecall state.
>
> OK, anyone else opposed?
>
> We're kind of in the weeds on performance, I think we'd need HW to know 
> for sure if either is an issue.  Seems best to just play it safe WRT the 
> uABI for now, we can always deal with any performance issues if the 
> exist.

Here's the patch you mentioned at the PW synchup; I've kept the Subject
and such if you wan't to apply it. LMK if you'd like a proper one.

--

Subject: [PATCH] riscv: Discard vector state on syscalls
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The RISC-V vector specification states:
  Executing a system call causes all caller-saved vector registers
  (v0-v31, vl, vtype) and vstart to become unspecified.

The vector status is set to Initial, and the vector state is
explicitly zeroed. That way we can prevent userspace from accidentally
relying on the stated save.

Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
---
arch/riscv/include/asm/vector.h | 24 ++++++++++++++++++++++++
 arch/riscv/kernel/traps.c       |  2 ++
 2 files changed, 26 insertions(+)

diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index 04c0b07bf6cd..b3020d064f42 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -163,6 +163,29 @@ static inline void __switch_to_vector(struct task_struct *prev,
 void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
 bool riscv_v_vstate_ctrl_user_allowed(void);
 
+static inline void riscv_v_vstate_discard(struct pt_regs *regs)
+{
+	unsigned long vl;
+
+	if (!riscv_v_vstate_query(regs))
+		return;
+
+	riscv_v_vstate_on(regs);
+
+	riscv_v_enable();
+	asm volatile (
+		".option push\n\t"
+		".option arch, +v\n\t"
+		"vsetvli	%0, x0, e8, m8, ta, ma\n\t"
+		"vmv.v.i	v0, 0\n\t"
+		"vmv.v.i	v8, 0\n\t"
+		"vmv.v.i	v16, 0\n\t"
+		"vmv.v.i	v24, 0\n\t"
+		".option pop\n\t"
+		: "=&r" (vl) : : "memory");
+	riscv_v_disable();
+}
+
 #else /* ! CONFIG_RISCV_ISA_V  */
 
 struct pt_regs;
@@ -178,6 +201,7 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; }
 #define __switch_to_vector(__prev, __next)	do {} while (0)
 #define riscv_v_vstate_off(regs)		do {} while (0)
 #define riscv_v_vstate_on(regs)			do {} while (0)
+#define riscv_v_vstate_discard(regs)		do {} while (0)
 
 #endif /* CONFIG_RISCV_ISA_V */
 
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 05ffdcd1424e..00c68b57ff88 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -295,6 +295,8 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
 		regs->epc += 4;
 		regs->orig_a0 = regs->a0;
 
+		riscv_v_vstate_discard(regs);
+
 		syscall = syscall_enter_from_user_mode(regs, syscall);
 
 		if (syscall < NR_syscalls)

base-commit: abd6152d6046ddc4be1040b6206bee2e025e8a79
Darius Rad June 21, 2023, 2:44 p.m. UTC | #10
On Wed, Jun 21, 2023 at 04:26:14PM +0200, Björn Töpel wrote:
> Palmer Dabbelt <palmer@rivosinc.com> writes:
> 
> > On Mon, 19 Jun 2023 12:01:20 PDT (-0700), bjorn@kernel.org wrote:
> >> Palmer Dabbelt <palmer@rivosinc.com> writes:
> >>
> >> [...]
> >>
> >>>>> +		riscv_v_vstate_off(regs);
> >>>>> +
> >>>>
> >>>> Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to
> >>>> call? Something like:
> >>>>
> >>>> static void vstate_discard(struct pt_regs *regs)
> >>>> {
> >>>>        if ((regs->status & SR_VS) == SR_VS_DIRTY)
> >>>>                __riscv_v_vstate_clean(regs);
> >>>> }
> >>>>
> >>>> Complemented by a !V config variant.
> >>>
> >>> I think it's just a question of what we're trying to do here: clean 
> >>> avoids the kernel V state save, but unless the kernel decides to use V 
> >>> during the syscall the register contents will still be usable by 
> >>> userspace.  Maybe that's fine and we can just rely on the ISA spec, 
> >>> though?  I sent another patch to just document it in Linux, even if it's 
> >>> in the ISA spec it seems worth having in the kernel as well.
> >>>
> >>> That said, I think the right thing to do here might be to zero the V 
> >>> register state and set it to initial: that way we can prevent userspace 
> >>> from accidentally relying on the state save, but we can also avoid the 
> >>> trap that would come from turning it off.  That lets us give the 
> >>> hardware a nice clean indication when the V state isn't in use, which 
> >>> will hopefully help us avoid the save/restore performance issues that 
> >>> other ports have hit.
> >>
> >> FWIW, I think that's a much better idea than turning V off. I also like
> >> that it'll preventing userland to rely on pre-ecall state.
> >
> > OK, anyone else opposed?
> >
> > We're kind of in the weeds on performance, I think we'd need HW to know 
> > for sure if either is an issue.  Seems best to just play it safe WRT the 
> > uABI for now, we can always deal with any performance issues if the 
> > exist.
> 
> Here's the patch you mentioned at the PW synchup; I've kept the Subject
> and such if you wan't to apply it. LMK if you'd like a proper one.
> 
> --
> 
> Subject: [PATCH] riscv: Discard vector state on syscalls
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> The RISC-V vector specification states:
>   Executing a system call causes all caller-saved vector registers
>   (v0-v31, vl, vtype) and vstart to become unspecified.
> 
> The vector status is set to Initial, and the vector state is
> explicitly zeroed. That way we can prevent userspace from accidentally
> relying on the stated save.

Is it worth clobbering with all 1s, rather than zero, for consistency with
other vector behavior (i.e., tail/mask agnostic) and for the reasons given
in the vector spec for not doing so with zero?

> 
> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
> ---
> arch/riscv/include/asm/vector.h | 24 ++++++++++++++++++++++++
>  arch/riscv/kernel/traps.c       |  2 ++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
> index 04c0b07bf6cd..b3020d064f42 100644
> --- a/arch/riscv/include/asm/vector.h
> +++ b/arch/riscv/include/asm/vector.h
> @@ -163,6 +163,29 @@ static inline void __switch_to_vector(struct task_struct *prev,
>  void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
>  bool riscv_v_vstate_ctrl_user_allowed(void);
>  
> +static inline void riscv_v_vstate_discard(struct pt_regs *regs)
> +{
> +	unsigned long vl;
> +
> +	if (!riscv_v_vstate_query(regs))
> +		return;
> +
> +	riscv_v_vstate_on(regs);
> +
> +	riscv_v_enable();
> +	asm volatile (
> +		".option push\n\t"
> +		".option arch, +v\n\t"
> +		"vsetvli	%0, x0, e8, m8, ta, ma\n\t"
> +		"vmv.v.i	v0, 0\n\t"
> +		"vmv.v.i	v8, 0\n\t"
> +		"vmv.v.i	v16, 0\n\t"
> +		"vmv.v.i	v24, 0\n\t"
> +		".option pop\n\t"
> +		: "=&r" (vl) : : "memory");
> +	riscv_v_disable();
> +}
> +
>  #else /* ! CONFIG_RISCV_ISA_V  */
>  
>  struct pt_regs;
> @@ -178,6 +201,7 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; }
>  #define __switch_to_vector(__prev, __next)	do {} while (0)
>  #define riscv_v_vstate_off(regs)		do {} while (0)
>  #define riscv_v_vstate_on(regs)			do {} while (0)
> +#define riscv_v_vstate_discard(regs)		do {} while (0)
>  
>  #endif /* CONFIG_RISCV_ISA_V */
>  
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 05ffdcd1424e..00c68b57ff88 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -295,6 +295,8 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>  		regs->epc += 4;
>  		regs->orig_a0 = regs->a0;
>  
> +		riscv_v_vstate_discard(regs);
> +
>  		syscall = syscall_enter_from_user_mode(regs, syscall);
>  
>  		if (syscall < NR_syscalls)
> 
> base-commit: abd6152d6046ddc4be1040b6206bee2e025e8a79
> -- 
> 2.39.2
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Andy Chiu June 21, 2023, 2:50 p.m. UTC | #11
On Wed, Jun 21, 2023 at 10:26 PM Björn Töpel <bjorn@kernel.org> wrote:
>
> Palmer Dabbelt <palmer@rivosinc.com> writes:
>
> > On Mon, 19 Jun 2023 12:01:20 PDT (-0700), bjorn@kernel.org wrote:
> >> Palmer Dabbelt <palmer@rivosinc.com> writes:
> >>
> >> [...]
> >>
> >>>>> +         riscv_v_vstate_off(regs);
> >>>>> +
> >>>>
> >>>> Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to
> >>>> call? Something like:
> >>>>
> >>>> static void vstate_discard(struct pt_regs *regs)
> >>>> {
> >>>>        if ((regs->status & SR_VS) == SR_VS_DIRTY)
> >>>>                __riscv_v_vstate_clean(regs);
> >>>> }
> >>>>
> >>>> Complemented by a !V config variant.
> >>>
> >>> I think it's just a question of what we're trying to do here: clean
> >>> avoids the kernel V state save, but unless the kernel decides to use V
> >>> during the syscall the register contents will still be usable by
> >>> userspace.  Maybe that's fine and we can just rely on the ISA spec,
> >>> though?  I sent another patch to just document it in Linux, even if it's
> >>> in the ISA spec it seems worth having in the kernel as well.
> >>>
> >>> That said, I think the right thing to do here might be to zero the V
> >>> register state and set it to initial: that way we can prevent userspace
> >>> from accidentally relying on the state save, but we can also avoid the
> >>> trap that would come from turning it off.  That lets us give the
> >>> hardware a nice clean indication when the V state isn't in use, which
> >>> will hopefully help us avoid the save/restore performance issues that
> >>> other ports have hit.
> >>
> >> FWIW, I think that's a much better idea than turning V off. I also like
> >> that it'll preventing userland to rely on pre-ecall state.
> >
> > OK, anyone else opposed?
> >
> > We're kind of in the weeds on performance, I think we'd need HW to know
> > for sure if either is an issue.  Seems best to just play it safe WRT the
> > uABI for now, we can always deal with any performance issues if the
> > exist.
>
> Here's the patch you mentioned at the PW synchup; I've kept the Subject
> and such if you wan't to apply it. LMK if you'd like a proper one.
>
> --
>
> Subject: [PATCH] riscv: Discard vector state on syscalls
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> The RISC-V vector specification states:
>   Executing a system call causes all caller-saved vector registers
>   (v0-v31, vl, vtype) and vstart to become unspecified.
>
> The vector status is set to Initial, and the vector state is
> explicitly zeroed. That way we can prevent userspace from accidentally
> relying on the stated save.
>
> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
> ---
> arch/riscv/include/asm/vector.h | 24 ++++++++++++++++++++++++
>  arch/riscv/kernel/traps.c       |  2 ++
>  2 files changed, 26 insertions(+)
>
> diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
> index 04c0b07bf6cd..b3020d064f42 100644
> --- a/arch/riscv/include/asm/vector.h
> +++ b/arch/riscv/include/asm/vector.h
> @@ -163,6 +163,29 @@ static inline void __switch_to_vector(struct task_struct *prev,
>  void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
>  bool riscv_v_vstate_ctrl_user_allowed(void);
>
> +static inline void riscv_v_vstate_discard(struct pt_regs *regs)
> +{
> +       unsigned long vl;
> +
> +       if (!riscv_v_vstate_query(regs))
> +               return;
> +
> +       riscv_v_vstate_on(regs);

Do we need this riscv_v_vstate_on()?  If it is not on we'd return
early in the previous "if" statement, right?

> +
> +       riscv_v_enable();
> +       asm volatile (
> +               ".option push\n\t"
> +               ".option arch, +v\n\t"
> +               "vsetvli        %0, x0, e8, m8, ta, ma\n\t"
> +               "vmv.v.i        v0, 0\n\t"
> +               "vmv.v.i        v8, 0\n\t"
> +               "vmv.v.i        v16, 0\n\t"
> +               "vmv.v.i        v24, 0\n\t"
> +               ".option pop\n\t"
> +               : "=&r" (vl) : : "memory");
> +       riscv_v_disable();

Maybe consider cleaning the vstate (status.vs) here. As such we don't
have to save V during context switch. Or, maybe we could set vstate as
off during syscall and discard V-reg + restore status.VS when
returning back to userspace?

> +}
> +
>  #else /* ! CONFIG_RISCV_ISA_V  */
>
>  struct pt_regs;
> @@ -178,6 +201,7 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; }
>  #define __switch_to_vector(__prev, __next)     do {} while (0)
>  #define riscv_v_vstate_off(regs)               do {} while (0)
>  #define riscv_v_vstate_on(regs)                        do {} while (0)
> +#define riscv_v_vstate_discard(regs)           do {} while (0)
>
>  #endif /* CONFIG_RISCV_ISA_V */
>
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 05ffdcd1424e..00c68b57ff88 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -295,6 +295,8 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>                 regs->epc += 4;
>                 regs->orig_a0 = regs->a0;
>
> +               riscv_v_vstate_discard(regs);
> +
>                 syscall = syscall_enter_from_user_mode(regs, syscall);
>
>                 if (syscall < NR_syscalls)
>
> base-commit: abd6152d6046ddc4be1040b6206bee2e025e8a79
> --
> 2.39.2

Agree. It is better to clean V registers instead of turning off Vector.

Regards,
Andy
Rémi Denis-Courmont June 21, 2023, 4:47 p.m. UTC | #12
Le keskiviikkona 21. kesäkuuta 2023, 17.26.14 EEST Björn Töpel a écrit :
> Palmer Dabbelt <palmer@rivosinc.com> writes:
> > On Mon, 19 Jun 2023 12:01:20 PDT (-0700), bjorn@kernel.org wrote:
> >> Palmer Dabbelt <palmer@rivosinc.com> writes:
> >> 
> >> [...]
> >> 
> >>>>> +		riscv_v_vstate_off(regs);
> >>>>> +
> >>>> 
> >>>> Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to
> >>>> call? Something like:
> >>>> 
> >>>> static void vstate_discard(struct pt_regs *regs)
> >>>> {
> >>>> 
> >>>>        if ((regs->status & SR_VS) == SR_VS_DIRTY)
> >>>>        
> >>>>                __riscv_v_vstate_clean(regs);
> >>>> 
> >>>> }
> >>>> 
> >>>> Complemented by a !V config variant.
> >>> 
> >>> I think it's just a question of what we're trying to do here: clean
> >>> avoids the kernel V state save, but unless the kernel decides to use V
> >>> during the syscall the register contents will still be usable by
> >>> userspace.  Maybe that's fine and we can just rely on the ISA spec,
> >>> though?  I sent another patch to just document it in Linux, even if it's
> >>> in the ISA spec it seems worth having in the kernel as well.
> >>> 
> >>> That said, I think the right thing to do here might be to zero the V
> >>> register state and set it to initial: that way we can prevent userspace
> >>> from accidentally relying on the state save, but we can also avoid the
> >>> trap that would come from turning it off.  That lets us give the
> >>> hardware a nice clean indication when the V state isn't in use, which
> >>> will hopefully help us avoid the save/restore performance issues that
> >>> other ports have hit.
> >> 
> >> FWIW, I think that's a much better idea than turning V off. I also like
> >> that it'll preventing userland to rely on pre-ecall state.
> > 
> > OK, anyone else opposed?
> > 
> > We're kind of in the weeds on performance, I think we'd need HW to know
> > for sure if either is an issue.  Seems best to just play it safe WRT the
> > uABI for now, we can always deal with any performance issues if the
> > exist.
> 
> Here's the patch you mentioned at the PW synchup; I've kept the Subject
> and such if you wan't to apply it. LMK if you'd like a proper one.
> 
> --
> 
> Subject: [PATCH] riscv: Discard vector state on syscalls
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> The RISC-V vector specification states:
>   Executing a system call causes all caller-saved vector registers
>   (v0-v31, vl, vtype) and vstart to become unspecified.
> 
> The vector status is set to Initial, and the vector state is
> explicitly zeroed. That way we can prevent userspace from accidentally
> relying on the stated save.
> 
> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
> ---
> arch/riscv/include/asm/vector.h | 24 ++++++++++++++++++++++++
>  arch/riscv/kernel/traps.c       |  2 ++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/vector.h
> b/arch/riscv/include/asm/vector.h index 04c0b07bf6cd..b3020d064f42 100644
> --- a/arch/riscv/include/asm/vector.h
> +++ b/arch/riscv/include/asm/vector.h
> @@ -163,6 +163,29 @@ static inline void __switch_to_vector(struct
> task_struct *prev, void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
>  bool riscv_v_vstate_ctrl_user_allowed(void);
> 
> +static inline void riscv_v_vstate_discard(struct pt_regs *regs)
> +{
> +	unsigned long vl;
> +
> +	if (!riscv_v_vstate_query(regs))
> +		return;
> +
> +	riscv_v_vstate_on(regs);
> +
> +	riscv_v_enable();
> +	asm volatile (
> +		".option push\n\t"
> +		".option arch, +v\n\t"
> +		"vsetvli	%0, x0, e8, m8, ta, ma\n\t"
> +		"vmv.v.i	v0, 0\n\t"
> +		"vmv.v.i	v8, 0\n\t"
> +		"vmv.v.i	v16, 0\n\t"
> +		"vmv.v.i	v24, 0\n\t"
> +		".option pop\n\t"
> +		: "=&r" (vl) : : "memory");
> +	riscv_v_disable();

Shouldn't this also set `vill` to 1 using `vsetvl`?

In fact, a faster alternative may yet be to *only* set an invalid vector 
configuration. It's rather unlikely that user-space code would set a valid 
configuration and use vectors without loading them first. If it ever does, then 
it's so broken that the kernel probably doesn't need to care.
Palmer Dabbelt June 21, 2023, 6:16 p.m. UTC | #13
On Wed, 21 Jun 2023 09:47:37 PDT (-0700), remi@remlab.net wrote:
> Le keskiviikkona 21. kesäkuuta 2023, 17.26.14 EEST Björn Töpel a écrit :
>> Palmer Dabbelt <palmer@rivosinc.com> writes:
>> > On Mon, 19 Jun 2023 12:01:20 PDT (-0700), bjorn@kernel.org wrote:
>> >> Palmer Dabbelt <palmer@rivosinc.com> writes:
>> >> 
>> >> [...]
>> >> 
>> >>>>> +		riscv_v_vstate_off(regs);
>> >>>>> +
>> >>>> 
>> >>>> Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to
>> >>>> call? Something like:
>> >>>> 
>> >>>> static void vstate_discard(struct pt_regs *regs)
>> >>>> {
>> >>>> 
>> >>>>        if ((regs->status & SR_VS) == SR_VS_DIRTY)
>> >>>>        
>> >>>>                __riscv_v_vstate_clean(regs);
>> >>>> 
>> >>>> }
>> >>>> 
>> >>>> Complemented by a !V config variant.
>> >>> 
>> >>> I think it's just a question of what we're trying to do here: clean
>> >>> avoids the kernel V state save, but unless the kernel decides to use V
>> >>> during the syscall the register contents will still be usable by
>> >>> userspace.  Maybe that's fine and we can just rely on the ISA spec,
>> >>> though?  I sent another patch to just document it in Linux, even if it's
>> >>> in the ISA spec it seems worth having in the kernel as well.
>> >>> 
>> >>> That said, I think the right thing to do here might be to zero the V
>> >>> register state and set it to initial: that way we can prevent userspace
>> >>> from accidentally relying on the state save, but we can also avoid the
>> >>> trap that would come from turning it off.  That lets us give the
>> >>> hardware a nice clean indication when the V state isn't in use, which
>> >>> will hopefully help us avoid the save/restore performance issues that
>> >>> other ports have hit.
>> >> 
>> >> FWIW, I think that's a much better idea than turning V off. I also like
>> >> that it'll preventing userland to rely on pre-ecall state.
>> > 
>> > OK, anyone else opposed?
>> > 
>> > We're kind of in the weeds on performance, I think we'd need HW to know
>> > for sure if either is an issue.  Seems best to just play it safe WRT the
>> > uABI for now, we can always deal with any performance issues if the
>> > exist.
>> 
>> Here's the patch you mentioned at the PW synchup; I've kept the Subject
>> and such if you wan't to apply it. LMK if you'd like a proper one.
>> 
>> --
>> 
>> Subject: [PATCH] riscv: Discard vector state on syscalls
>> MIME-Version: 1.0
>> Content-Type: text/plain; charset=UTF-8
>> Content-Transfer-Encoding: 8bit
>> 
>> The RISC-V vector specification states:
>>   Executing a system call causes all caller-saved vector registers
>>   (v0-v31, vl, vtype) and vstart to become unspecified.
>> 
>> The vector status is set to Initial, and the vector state is
>> explicitly zeroed. That way we can prevent userspace from accidentally
>> relying on the stated save.
>> 
>> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
>> ---
>> arch/riscv/include/asm/vector.h | 24 ++++++++++++++++++++++++
>>  arch/riscv/kernel/traps.c       |  2 ++
>>  2 files changed, 26 insertions(+)
>> 
>> diff --git a/arch/riscv/include/asm/vector.h
>> b/arch/riscv/include/asm/vector.h index 04c0b07bf6cd..b3020d064f42 100644
>> --- a/arch/riscv/include/asm/vector.h
>> +++ b/arch/riscv/include/asm/vector.h
>> @@ -163,6 +163,29 @@ static inline void __switch_to_vector(struct
>> task_struct *prev, void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
>>  bool riscv_v_vstate_ctrl_user_allowed(void);
>> 
>> +static inline void riscv_v_vstate_discard(struct pt_regs *regs)
>> +{
>> +	unsigned long vl;
>> +
>> +	if (!riscv_v_vstate_query(regs))
>> +		return;
>> +
>> +	riscv_v_vstate_on(regs);
>> +
>> +	riscv_v_enable();
>> +	asm volatile (
>> +		".option push\n\t"
>> +		".option arch, +v\n\t"
>> +		"vsetvli	%0, x0, e8, m8, ta, ma\n\t"
>> +		"vmv.v.i	v0, 0\n\t"
>> +		"vmv.v.i	v8, 0\n\t"
>> +		"vmv.v.i	v16, 0\n\t"
>> +		"vmv.v.i	v24, 0\n\t"
>> +		".option pop\n\t"
>> +		: "=&r" (vl) : : "memory");
>> +	riscv_v_disable();
>
> Shouldn't this also set `vill` to 1 using `vsetvl`?

That seems reasonable to me.

> In fact, a faster alternative may yet be to *only* set an invalid vector 
> configuration. It's rather unlikely that user-space code would set a valid 
> configuration and use vectors without loading them first. If it ever does, then 
> it's so broken that the kernel probably doesn't need to care.

I think that's sufficient to force userspace to trap on a bad value?  
Most of the unsupported value writes in RISC-V are just WARL, but as far 
as I can tell the V spec requires vill handling.  Specifically

    Implementations must consider all bits of the vtype value to 
    determine if the configuration is supported. An unsupported value in 
    any location within the vtype value must result in vill being set.

which seems pretty concrete about this being required.  That's from the 
current draft of the V spec, the wording in 1.0 isn't quite as clear: it 
sort of allows for the WARL-type behavior, but that's probably splitting 
hairs.

That said, it provides a slightly different cost curve: we'd need to 
save/restore the V registers on non-syscall traps even when vill is set 
in userspace, as they've still got state in them (userspace could be in 
the middle of some probing routine, for example).

Also from Darius' fork of the thread: IIUC there's nothing saying 0 is 
initial, or that initial even needs to work.  So I think we're just 
splitting hairs here, as long as we clobber enough state that userspace 
doesn't accidentally depend on is fine with me.

> -- 
> 雷米‧德尼-库尔蒙
> http://www.remlab.net/
Palmer Dabbelt June 21, 2023, 6:16 p.m. UTC | #14
On Wed, 21 Jun 2023 07:44:51 PDT (-0700), Darius Rad wrote:
> On Wed, Jun 21, 2023 at 04:26:14PM +0200, Björn Töpel wrote:
>> Palmer Dabbelt <palmer@rivosinc.com> writes:
>>
>> > On Mon, 19 Jun 2023 12:01:20 PDT (-0700), bjorn@kernel.org wrote:
>> >> Palmer Dabbelt <palmer@rivosinc.com> writes:
>> >>
>> >> [...]
>> >>
>> >>>>> +		riscv_v_vstate_off(regs);
>> >>>>> +
>> >>>>
>> >>>> Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to
>> >>>> call? Something like:
>> >>>>
>> >>>> static void vstate_discard(struct pt_regs *regs)
>> >>>> {
>> >>>>        if ((regs->status & SR_VS) == SR_VS_DIRTY)
>> >>>>                __riscv_v_vstate_clean(regs);
>> >>>> }
>> >>>>
>> >>>> Complemented by a !V config variant.
>> >>>
>> >>> I think it's just a question of what we're trying to do here: clean
>> >>> avoids the kernel V state save, but unless the kernel decides to use V
>> >>> during the syscall the register contents will still be usable by
>> >>> userspace.  Maybe that's fine and we can just rely on the ISA spec,
>> >>> though?  I sent another patch to just document it in Linux, even if it's
>> >>> in the ISA spec it seems worth having in the kernel as well.
>> >>>
>> >>> That said, I think the right thing to do here might be to zero the V
>> >>> register state and set it to initial: that way we can prevent userspace
>> >>> from accidentally relying on the state save, but we can also avoid the
>> >>> trap that would come from turning it off.  That lets us give the
>> >>> hardware a nice clean indication when the V state isn't in use, which
>> >>> will hopefully help us avoid the save/restore performance issues that
>> >>> other ports have hit.
>> >>
>> >> FWIW, I think that's a much better idea than turning V off. I also like
>> >> that it'll preventing userland to rely on pre-ecall state.
>> >
>> > OK, anyone else opposed?
>> >
>> > We're kind of in the weeds on performance, I think we'd need HW to know
>> > for sure if either is an issue.  Seems best to just play it safe WRT the
>> > uABI for now, we can always deal with any performance issues if the
>> > exist.
>>
>> Here's the patch you mentioned at the PW synchup; I've kept the Subject
>> and such if you wan't to apply it. LMK if you'd like a proper one.
>>
>> --
>>
>> Subject: [PATCH] riscv: Discard vector state on syscalls
>> MIME-Version: 1.0
>> Content-Type: text/plain; charset=UTF-8
>> Content-Transfer-Encoding: 8bit
>>
>> The RISC-V vector specification states:
>>   Executing a system call causes all caller-saved vector registers
>>   (v0-v31, vl, vtype) and vstart to become unspecified.
>>
>> The vector status is set to Initial, and the vector state is
>> explicitly zeroed. That way we can prevent userspace from accidentally
>> relying on the stated save.
>
> Is it worth clobbering with all 1s, rather than zero, for consistency with
> other vector behavior (i.e., tail/mask agnostic) and for the reasons given
> in the vector spec for not doing so with zero?

Might be.  I guess the assumption was that vs==initial means all 0's, 
but unless I'm missing something there's no rules for what initial means 
in the spec.

>
>>
>> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
>> ---
>> arch/riscv/include/asm/vector.h | 24 ++++++++++++++++++++++++
>>  arch/riscv/kernel/traps.c       |  2 ++
>>  2 files changed, 26 insertions(+)
>>
>> diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
>> index 04c0b07bf6cd..b3020d064f42 100644
>> --- a/arch/riscv/include/asm/vector.h
>> +++ b/arch/riscv/include/asm/vector.h
>> @@ -163,6 +163,29 @@ static inline void __switch_to_vector(struct task_struct *prev,
>>  void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
>>  bool riscv_v_vstate_ctrl_user_allowed(void);
>>
>> +static inline void riscv_v_vstate_discard(struct pt_regs *regs)
>> +{
>> +	unsigned long vl;
>> +
>> +	if (!riscv_v_vstate_query(regs))
>> +		return;
>> +
>> +	riscv_v_vstate_on(regs);
>> +
>> +	riscv_v_enable();
>> +	asm volatile (
>> +		".option push\n\t"
>> +		".option arch, +v\n\t"
>> +		"vsetvli	%0, x0, e8, m8, ta, ma\n\t"
>> +		"vmv.v.i	v0, 0\n\t"
>> +		"vmv.v.i	v8, 0\n\t"
>> +		"vmv.v.i	v16, 0\n\t"
>> +		"vmv.v.i	v24, 0\n\t"
>> +		".option pop\n\t"
>> +		: "=&r" (vl) : : "memory");
>> +	riscv_v_disable();
>> +}
>> +
>>  #else /* ! CONFIG_RISCV_ISA_V  */
>>
>>  struct pt_regs;
>> @@ -178,6 +201,7 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; }
>>  #define __switch_to_vector(__prev, __next)	do {} while (0)
>>  #define riscv_v_vstate_off(regs)		do {} while (0)
>>  #define riscv_v_vstate_on(regs)			do {} while (0)
>> +#define riscv_v_vstate_discard(regs)		do {} while (0)
>>
>>  #endif /* CONFIG_RISCV_ISA_V */
>>
>> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
>> index 05ffdcd1424e..00c68b57ff88 100644
>> --- a/arch/riscv/kernel/traps.c
>> +++ b/arch/riscv/kernel/traps.c
>> @@ -295,6 +295,8 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>>  		regs->epc += 4;
>>  		regs->orig_a0 = regs->a0;
>>
>> +		riscv_v_vstate_discard(regs);
>> +
>>  		syscall = syscall_enter_from_user_mode(regs, syscall);
>>
>>  		if (syscall < NR_syscalls)
>>
>> base-commit: abd6152d6046ddc4be1040b6206bee2e025e8a79
>> --
>> 2.39.2
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
Björn Töpel June 21, 2023, 9:40 p.m. UTC | #15
Andy Chiu <andy.chiu@sifive.com> writes:

>> +static inline void riscv_v_vstate_discard(struct pt_regs *regs)
>> +{
>> +       unsigned long vl;
>> +
>> +       if (!riscv_v_vstate_query(regs))
>> +               return;
>> +
>> +       riscv_v_vstate_on(regs);
>
> Do we need this riscv_v_vstate_on()?  If it is not on we'd return
> early in the previous "if" statement, right?

riscv_v_vstate_on() just set the state to Initial, right? Or do you mean
that riscv_v_vstate_query() is too much, and we should only check if the
state is dirty?

>> +
>> +       riscv_v_enable();
>> +       asm volatile (
>> +               ".option push\n\t"
>> +               ".option arch, +v\n\t"
>> +               "vsetvli        %0, x0, e8, m8, ta, ma\n\t"
>> +               "vmv.v.i        v0, 0\n\t"
>> +               "vmv.v.i        v8, 0\n\t"
>> +               "vmv.v.i        v16, 0\n\t"
>> +               "vmv.v.i        v24, 0\n\t"
>> +               ".option pop\n\t"
>> +               : "=&r" (vl) : : "memory");
>> +       riscv_v_disable();
>
> Maybe consider cleaning the vstate (status.vs) here. As such we don't
> have to save V during context switch. 

It's late, and I'm slower than usual. The regs are cleared, and the
state is Initial. No save on context switch, but restore, right?

> Or, maybe we could set vstate as off during syscall and discard V-reg
> + restore status.VS when returning back to userspace?

Hmm, interesting. We need to track the status.VS to restore somewhere...


Björn
Björn Töpel June 21, 2023, 9:42 p.m. UTC | #16
Palmer Dabbelt <palmer@rivosinc.com> writes:

> On Wed, 21 Jun 2023 09:47:37 PDT (-0700), remi@remlab.net wrote:
>> Le keskiviikkona 21. kesäkuuta 2023, 17.26.14 EEST Björn Töpel a écrit :
>>> Palmer Dabbelt <palmer@rivosinc.com> writes:
>>> > On Mon, 19 Jun 2023 12:01:20 PDT (-0700), bjorn@kernel.org wrote:
>>> >> Palmer Dabbelt <palmer@rivosinc.com> writes:
>>> >> 
>>> >> [...]
>>> >> 
>>> >>>>> +		riscv_v_vstate_off(regs);
>>> >>>>> +
>>> >>>> 
>>> >>>> Not off, right? Isn't it __riscv_v_vstate_clean() that you'd like to
>>> >>>> call? Something like:
>>> >>>> 
>>> >>>> static void vstate_discard(struct pt_regs *regs)
>>> >>>> {
>>> >>>> 
>>> >>>>        if ((regs->status & SR_VS) == SR_VS_DIRTY)
>>> >>>>        
>>> >>>>                __riscv_v_vstate_clean(regs);
>>> >>>> 
>>> >>>> }
>>> >>>> 
>>> >>>> Complemented by a !V config variant.
>>> >>> 
>>> >>> I think it's just a question of what we're trying to do here: clean
>>> >>> avoids the kernel V state save, but unless the kernel decides to use V
>>> >>> during the syscall the register contents will still be usable by
>>> >>> userspace.  Maybe that's fine and we can just rely on the ISA spec,
>>> >>> though?  I sent another patch to just document it in Linux, even if it's
>>> >>> in the ISA spec it seems worth having in the kernel as well.
>>> >>> 
>>> >>> That said, I think the right thing to do here might be to zero the V
>>> >>> register state and set it to initial: that way we can prevent userspace
>>> >>> from accidentally relying on the state save, but we can also avoid the
>>> >>> trap that would come from turning it off.  That lets us give the
>>> >>> hardware a nice clean indication when the V state isn't in use, which
>>> >>> will hopefully help us avoid the save/restore performance issues that
>>> >>> other ports have hit.
>>> >> 
>>> >> FWIW, I think that's a much better idea than turning V off. I also like
>>> >> that it'll preventing userland to rely on pre-ecall state.
>>> > 
>>> > OK, anyone else opposed?
>>> > 
>>> > We're kind of in the weeds on performance, I think we'd need HW to know
>>> > for sure if either is an issue.  Seems best to just play it safe WRT the
>>> > uABI for now, we can always deal with any performance issues if the
>>> > exist.
>>> 
>>> Here's the patch you mentioned at the PW synchup; I've kept the Subject
>>> and such if you wan't to apply it. LMK if you'd like a proper one.
>>> 
>>> --
>>> 
>>> Subject: [PATCH] riscv: Discard vector state on syscalls
>>> MIME-Version: 1.0
>>> Content-Type: text/plain; charset=UTF-8
>>> Content-Transfer-Encoding: 8bit
>>> 
>>> The RISC-V vector specification states:
>>>   Executing a system call causes all caller-saved vector registers
>>>   (v0-v31, vl, vtype) and vstart to become unspecified.
>>> 
>>> The vector status is set to Initial, and the vector state is
>>> explicitly zeroed. That way we can prevent userspace from accidentally
>>> relying on the stated save.
>>> 
>>> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
>>> ---
>>> arch/riscv/include/asm/vector.h | 24 ++++++++++++++++++++++++
>>>  arch/riscv/kernel/traps.c       |  2 ++
>>>  2 files changed, 26 insertions(+)
>>> 
>>> diff --git a/arch/riscv/include/asm/vector.h
>>> b/arch/riscv/include/asm/vector.h index 04c0b07bf6cd..b3020d064f42 100644
>>> --- a/arch/riscv/include/asm/vector.h
>>> +++ b/arch/riscv/include/asm/vector.h
>>> @@ -163,6 +163,29 @@ static inline void __switch_to_vector(struct
>>> task_struct *prev, void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
>>>  bool riscv_v_vstate_ctrl_user_allowed(void);
>>> 
>>> +static inline void riscv_v_vstate_discard(struct pt_regs *regs)
>>> +{
>>> +	unsigned long vl;
>>> +
>>> +	if (!riscv_v_vstate_query(regs))
>>> +		return;
>>> +
>>> +	riscv_v_vstate_on(regs);
>>> +
>>> +	riscv_v_enable();
>>> +	asm volatile (
>>> +		".option push\n\t"
>>> +		".option arch, +v\n\t"
>>> +		"vsetvli	%0, x0, e8, m8, ta, ma\n\t"
>>> +		"vmv.v.i	v0, 0\n\t"
>>> +		"vmv.v.i	v8, 0\n\t"
>>> +		"vmv.v.i	v16, 0\n\t"
>>> +		"vmv.v.i	v24, 0\n\t"
>>> +		".option pop\n\t"
>>> +		: "=&r" (vl) : : "memory");
>>> +	riscv_v_disable();
>>
>> Shouldn't this also set `vill` to 1 using `vsetvl`?
>
> That seems reasonable to me.

Something like this?
---
diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index b3020d064f42..d5f7853936d5 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -165,7 +165,7 @@ bool riscv_v_vstate_ctrl_user_allowed(void);
 
 static inline void riscv_v_vstate_discard(struct pt_regs *regs)
 {
-	unsigned long vl;
+	unsigned long vl, vtype_inval = 1UL << (BITS_PER_LONG - 1);
 
 	if (!riscv_v_vstate_query(regs))
 		return;
@@ -181,8 +181,9 @@ static inline void riscv_v_vstate_discard(struct pt_regs *regs)
 		"vmv.v.i	v8, 0\n\t"
 		"vmv.v.i	v16, 0\n\t"
 		"vmv.v.i	v24, 0\n\t"
+		"vsetvl		%0, x0, %1\n\t"
 		".option pop\n\t"
-		: "=&r" (vl) : : "memory");
+		: "=&r" (vl) : "r" (vtype_inval) : "memory");
 	riscv_v_disable();
 }
--- 


Björn
Andy Chiu June 22, 2023, 3:47 p.m. UTC | #17
On Thu, Jun 22, 2023 at 5:40 AM Björn Töpel <bjorn@kernel.org> wrote:
>
> Andy Chiu <andy.chiu@sifive.com> writes:
>
> >> +static inline void riscv_v_vstate_discard(struct pt_regs *regs)
> >> +{
> >> +       unsigned long vl;
> >> +
> >> +       if (!riscv_v_vstate_query(regs))
> >> +               return;
> >> +
> >> +       riscv_v_vstate_on(regs);
> >
> > Do we need this riscv_v_vstate_on()?  If it is not on we'd return
> > early in the previous "if" statement, right?
>
> riscv_v_vstate_on() just set the state to Initial, right? Or do you mean
> that riscv_v_vstate_query() is too much, and we should only check if the
> state is dirty?
>
> >> +
> >> +       riscv_v_enable();
> >> +       asm volatile (
> >> +               ".option push\n\t"
> >> +               ".option arch, +v\n\t"
> >> +               "vsetvli        %0, x0, e8, m8, ta, ma\n\t"
> >> +               "vmv.v.i        v0, 0\n\t"
> >> +               "vmv.v.i        v8, 0\n\t"
> >> +               "vmv.v.i        v16, 0\n\t"
> >> +               "vmv.v.i        v24, 0\n\t"
> >> +               ".option pop\n\t"
> >> +               : "=&r" (vl) : : "memory");
> >> +       riscv_v_disable();
> >
> > Maybe consider cleaning the vstate (status.vs) here. As such we don't
> > have to save V during context switch.
>
> It's late, and I'm slower than usual. The regs are cleared, and the
> state is Initial. No save on context switch, but restore, right?

Yes, it's my bad, you are right. I sometime messed around the "real"
status.VS with the one in the userspace context :P

>
> > Or, maybe we could set vstate as off during syscall and discard V-reg
> > + restore status.VS when returning back to userspace?
>
> Hmm, interesting. We need to track the status.VS to restore somewhere...

Maybe something like this?

diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index 04c0b07bf6cd..79de9ca83391 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -43,6 +43,11 @@ static inline void riscv_v_vstate_on(struct pt_regs *regs)
 	regs->status = (regs->status & ~SR_VS) | SR_VS_INITIAL;
 }
 
+static inline void riscv_v_vstate_dirty(struct pt_regs *regs)
+{
+	regs->status = (regs->status & ~SR_VS) | SR_VS_DIRTY;
+}
+
 static inline bool riscv_v_vstate_query(struct pt_regs *regs)
 {
 	return (regs->status & SR_VS) != 0;
@@ -163,6 +168,24 @@ static inline void __switch_to_vector(struct task_struct *prev,
 void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
 bool riscv_v_vstate_ctrl_user_allowed(void);
 
+static inline void riscv_v_vstate_discard(struct pt_regs *regs)
+{
+	unsigned long vl;
+
+	riscv_v_enable();
+	asm volatile (
+			".option push\n\t"
+			".option arch, +v\n\t"
+			"vsetvli        %0, x0, e8, m8, ta, ma\n\t"
+			"vmv.v.i        v0, 0\n\t"
+			"vmv.v.i        v8, 0\n\t"
+			"vmv.v.i        v16, 0\n\t"
+			"vmv.v.i        v24, 0\n\t"
+			".option pop\n\t"
+			: "=&r" (vl) : : "memory");
+	riscv_v_disable();
+}
+
 #else /* ! CONFIG_RISCV_ISA_V  */
 
 struct pt_regs;
@@ -178,6 +201,8 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; }
 #define __switch_to_vector(__prev, __next)	do {} while (0)
 #define riscv_v_vstate_off(regs)		do {} while (0)
 #define riscv_v_vstate_on(regs)			do {} while (0)
+#define riscv_v_vstate_dirty(regs)		do {} while (0)
+#define riscv_v_vstate_discard(regs)		do {} while (0)
 
 #endif /* CONFIG_RISCV_ISA_V */
 
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 24d309c6ab8d..e36b69c9b07f 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -291,10 +291,14 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
 {
 	if (user_mode(regs)) {
 		ulong syscall = regs->a7;
+		bool v_is_on;
 
 		regs->epc += 4;
 		regs->orig_a0 = regs->a0;
 
+		v_is_on = riscv_v_vstate_query(regs);
+		riscv_v_vstate_off(regs);
+
 		syscall = syscall_enter_from_user_mode(regs, syscall);
 
 		if (syscall < NR_syscalls)
@@ -303,6 +307,10 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
 			regs->a0 = -ENOSYS;
 
 		syscall_exit_to_user_mode(regs);
+		if (v_is_on) {
+			riscv_v_vstate_discard(regs);
+			riscv_v_vstate_dirty(regs);
+		}
 	} else {
 		irqentry_state_t state = irqentry_nmi_enter(regs);
 
>
>
> Björn

Thanks,
Andy
Björn Töpel June 22, 2023, 4:38 p.m. UTC | #18
Andy Chiu <andy.chiu@sifive.com> writes:

> On Thu, Jun 22, 2023 at 5:40 AM Björn Töpel <bjorn@kernel.org> wrote:
>>
>> Andy Chiu <andy.chiu@sifive.com> writes:
>>
>> >> +static inline void riscv_v_vstate_discard(struct pt_regs *regs)
>> >> +{
>> >> +       unsigned long vl;
>> >> +
>> >> +       if (!riscv_v_vstate_query(regs))
>> >> +               return;
>> >> +
>> >> +       riscv_v_vstate_on(regs);
>> >
>> > Do we need this riscv_v_vstate_on()?  If it is not on we'd return
>> > early in the previous "if" statement, right?
>>
>> riscv_v_vstate_on() just set the state to Initial, right? Or do you mean
>> that riscv_v_vstate_query() is too much, and we should only check if the
>> state is dirty?
>>
>> >> +
>> >> +       riscv_v_enable();
>> >> +       asm volatile (
>> >> +               ".option push\n\t"
>> >> +               ".option arch, +v\n\t"
>> >> +               "vsetvli        %0, x0, e8, m8, ta, ma\n\t"
>> >> +               "vmv.v.i        v0, 0\n\t"
>> >> +               "vmv.v.i        v8, 0\n\t"
>> >> +               "vmv.v.i        v16, 0\n\t"
>> >> +               "vmv.v.i        v24, 0\n\t"
>> >> +               ".option pop\n\t"
>> >> +               : "=&r" (vl) : : "memory");
>> >> +       riscv_v_disable();
>> >
>> > Maybe consider cleaning the vstate (status.vs) here. As such we don't
>> > have to save V during context switch.
>>
>> It's late, and I'm slower than usual. The regs are cleared, and the
>> state is Initial. No save on context switch, but restore, right?
>
> Yes, it's my bad, you are right. I sometime messed around the "real"
> status.VS with the one in the userspace context :P
>
>>
>> > Or, maybe we could set vstate as off during syscall and discard V-reg
>> > + restore status.VS when returning back to userspace?
>>
>> Hmm, interesting. We need to track the status.VS to restore somewhere...
>
> Maybe something like this?
>
> diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
> index 04c0b07bf6cd..79de9ca83391 100644
> --- a/arch/riscv/include/asm/vector.h
> +++ b/arch/riscv/include/asm/vector.h
> @@ -43,6 +43,11 @@ static inline void riscv_v_vstate_on(struct pt_regs *regs)
>  	regs->status = (regs->status & ~SR_VS) | SR_VS_INITIAL;
>  }
>  
> +static inline void riscv_v_vstate_dirty(struct pt_regs *regs)
> +{
> +	regs->status = (regs->status & ~SR_VS) | SR_VS_DIRTY;
> +}
> +
>  static inline bool riscv_v_vstate_query(struct pt_regs *regs)
>  {
>  	return (regs->status & SR_VS) != 0;
> @@ -163,6 +168,24 @@ static inline void __switch_to_vector(struct task_struct *prev,
>  void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
>  bool riscv_v_vstate_ctrl_user_allowed(void);
>  
> +static inline void riscv_v_vstate_discard(struct pt_regs *regs)
> +{
> +	unsigned long vl;
> +
> +	riscv_v_enable();
> +	asm volatile (
> +			".option push\n\t"
> +			".option arch, +v\n\t"
> +			"vsetvli        %0, x0, e8, m8, ta, ma\n\t"
> +			"vmv.v.i        v0, 0\n\t"
> +			"vmv.v.i        v8, 0\n\t"
> +			"vmv.v.i        v16, 0\n\t"
> +			"vmv.v.i        v24, 0\n\t"
> +			".option pop\n\t"
> +			: "=&r" (vl) : : "memory");
> +	riscv_v_disable();
> +}
> +
>  #else /* ! CONFIG_RISCV_ISA_V  */
>  
>  struct pt_regs;
> @@ -178,6 +201,8 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; }
>  #define __switch_to_vector(__prev, __next)	do {} while (0)
>  #define riscv_v_vstate_off(regs)		do {} while (0)
>  #define riscv_v_vstate_on(regs)			do {} while (0)
> +#define riscv_v_vstate_dirty(regs)		do {} while (0)
> +#define riscv_v_vstate_discard(regs)		do {} while (0)
>  
>  #endif /* CONFIG_RISCV_ISA_V */
>  
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 24d309c6ab8d..e36b69c9b07f 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -291,10 +291,14 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>  {
>  	if (user_mode(regs)) {
>  		ulong syscall = regs->a7;
> +		bool v_is_on;
>  
>  		regs->epc += 4;
>  		regs->orig_a0 = regs->a0;
>  
> +		v_is_on = riscv_v_vstate_query(regs);
> +		riscv_v_vstate_off(regs);
> +
>  		syscall = syscall_enter_from_user_mode(regs, syscall);
>  
>  		if (syscall < NR_syscalls)
> @@ -303,6 +307,10 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>  			regs->a0 = -ENOSYS;
>  
>  		syscall_exit_to_user_mode(regs);
> +		if (v_is_on) {
> +			riscv_v_vstate_discard(regs);
> +			riscv_v_vstate_dirty(regs);

Ah! Neat! Why dirty, instead of just keeping the "set to Initial" from
my diff?

This flow does avoid some context switch costs, but I wonder if this is
some that can be added later, when we can more reliable measure the
overhead. Premature optimization, and all that. ;-)


Björn
Andy Chiu June 24, 2023, 6:54 a.m. UTC | #19
On Fri, Jun 23, 2023 at 12:38 AM Björn Töpel <bjorn@kernel.org> wrote:
>
> Andy Chiu <andy.chiu@sifive.com> writes:
>
> > On Thu, Jun 22, 2023 at 5:40 AM Björn Töpel <bjorn@kernel.org> wrote:
> >>
> >> Andy Chiu <andy.chiu@sifive.com> writes:
> >>
> >> >> +static inline void riscv_v_vstate_discard(struct pt_regs *regs)
> >> >> +{
> >> >> +       unsigned long vl;
> >> >> +
> >> >> +       if (!riscv_v_vstate_query(regs))
> >> >> +               return;
> >> >> +
> >> >> +       riscv_v_vstate_on(regs);
> >> >
> >> > Do we need this riscv_v_vstate_on()?  If it is not on we'd return
> >> > early in the previous "if" statement, right?
> >>
> >> riscv_v_vstate_on() just set the state to Initial, right? Or do you mean
> >> that riscv_v_vstate_query() is too much, and we should only check if the
> >> state is dirty?
> >>
> >> >> +
> >> >> +       riscv_v_enable();
> >> >> +       asm volatile (
> >> >> +               ".option push\n\t"
> >> >> +               ".option arch, +v\n\t"
> >> >> +               "vsetvli        %0, x0, e8, m8, ta, ma\n\t"
> >> >> +               "vmv.v.i        v0, 0\n\t"
> >> >> +               "vmv.v.i        v8, 0\n\t"
> >> >> +               "vmv.v.i        v16, 0\n\t"
> >> >> +               "vmv.v.i        v24, 0\n\t"
> >> >> +               ".option pop\n\t"
> >> >> +               : "=&r" (vl) : : "memory");
> >> >> +       riscv_v_disable();
> >> >
> >> > Maybe consider cleaning the vstate (status.vs) here. As such we don't
> >> > have to save V during context switch.
> >>
> >> It's late, and I'm slower than usual. The regs are cleared, and the
> >> state is Initial. No save on context switch, but restore, right?
> >
> > Yes, it's my bad, you are right. I sometime messed around the "real"
> > status.VS with the one in the userspace context :P
> >
> >>
> >> > Or, maybe we could set vstate as off during syscall and discard V-reg
> >> > + restore status.VS when returning back to userspace?
> >>
> >> Hmm, interesting. We need to track the status.VS to restore somewhere...
> >
> > Maybe something like this?
> >
> > diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
> > index 04c0b07bf6cd..79de9ca83391 100644
> > --- a/arch/riscv/include/asm/vector.h
> > +++ b/arch/riscv/include/asm/vector.h
> > @@ -43,6 +43,11 @@ static inline void riscv_v_vstate_on(struct pt_regs *regs)
> >       regs->status = (regs->status & ~SR_VS) | SR_VS_INITIAL;
> >  }
> >
> > +static inline void riscv_v_vstate_dirty(struct pt_regs *regs)
> > +{
> > +     regs->status = (regs->status & ~SR_VS) | SR_VS_DIRTY;
> > +}
> > +
> >  static inline bool riscv_v_vstate_query(struct pt_regs *regs)
> >  {
> >       return (regs->status & SR_VS) != 0;
> > @@ -163,6 +168,24 @@ static inline void __switch_to_vector(struct task_struct *prev,
> >  void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
> >  bool riscv_v_vstate_ctrl_user_allowed(void);
> >
> > +static inline void riscv_v_vstate_discard(struct pt_regs *regs)
> > +{
> > +     unsigned long vl;
> > +
> > +     riscv_v_enable();
> > +     asm volatile (
> > +                     ".option push\n\t"
> > +                     ".option arch, +v\n\t"
> > +                     "vsetvli        %0, x0, e8, m8, ta, ma\n\t"
> > +                     "vmv.v.i        v0, 0\n\t"
> > +                     "vmv.v.i        v8, 0\n\t"
> > +                     "vmv.v.i        v16, 0\n\t"
> > +                     "vmv.v.i        v24, 0\n\t"
> > +                     ".option pop\n\t"
> > +                     : "=&r" (vl) : : "memory");
> > +     riscv_v_disable();
> > +}
> > +
> >  #else /* ! CONFIG_RISCV_ISA_V  */
> >
> >  struct pt_regs;
> > @@ -178,6 +201,8 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; }
> >  #define __switch_to_vector(__prev, __next)   do {} while (0)
> >  #define riscv_v_vstate_off(regs)             do {} while (0)
> >  #define riscv_v_vstate_on(regs)                      do {} while (0)
> > +#define riscv_v_vstate_dirty(regs)           do {} while (0)
> > +#define riscv_v_vstate_discard(regs)         do {} while (0)
> >
> >  #endif /* CONFIG_RISCV_ISA_V */
> >
> > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> > index 24d309c6ab8d..e36b69c9b07f 100644
> > --- a/arch/riscv/kernel/traps.c
> > +++ b/arch/riscv/kernel/traps.c
> > @@ -291,10 +291,14 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
> >  {
> >       if (user_mode(regs)) {
> >               ulong syscall = regs->a7;
> > +             bool v_is_on;
> >
> >               regs->epc += 4;
> >               regs->orig_a0 = regs->a0;
> >
> > +             v_is_on = riscv_v_vstate_query(regs);
> > +             riscv_v_vstate_off(regs);
> > +
> >               syscall = syscall_enter_from_user_mode(regs, syscall);
> >
> >               if (syscall < NR_syscalls)
> > @@ -303,6 +307,10 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
> >                       regs->a0 = -ENOSYS;
> >
> >               syscall_exit_to_user_mode(regs);
> > +             if (v_is_on) {
> > +                     riscv_v_vstate_discard(regs);
> > +                     riscv_v_vstate_dirty(regs);
>
> Ah! Neat! Why dirty, instead of just keeping the "set to Initial" from
> my diff?

Both work, I think. But here if we set it to "on" after discarding
V-regs, then take a context switch before executing any V instructions
in user space (does not change future vstate to dirty). Then we will
leak V-regs previously set into its vstate.datap after switching back,
because we only save V context if vstate is dirty. So, I think setting
vstate to dirty is a safer option.

In your diff case, V-regs may be restored back to the previously-saved
state if the syscall caused a context switch.

I have not had a chance to test it yet because we are having a
vacation in Taiwan, and I have some other stuff to keep me busy :)
Please correct me if my thinking was wrong and I forgot some important
idea again...

>
> This flow does avoid some context switch costs, but I wonder if this is
> some that can be added later, when we can more reliable measure the
> overhead. Premature optimization, and all that. ;-)
>
>
> Björn

Thanks,
Andy
Andy Chiu June 24, 2023, 8:41 a.m. UTC | #20
On Fri, Jun 23, 2023 at 12:38 AM Björn Töpel <bjorn@kernel.org> wrote:
> This flow does avoid some context switch costs, but I wonder if this is
> some that can be added later, when we can more reliable measure the
> overhead. Premature optimization, and all that. ;-)
>

Sure, do you suggest any kinds of measurement, experiment, or
benchmarking that could give out a figure on how things are different?

>
> Björn

Thanks,
Andy
Björn Töpel June 26, 2023, 2:54 p.m. UTC | #21
Andy Chiu <andy.chiu@sifive.com> writes:

> On Fri, Jun 23, 2023 at 12:38 AM Björn Töpel <bjorn@kernel.org> wrote:
>> This flow does avoid some context switch costs, but I wonder if this is
>> some that can be added later, when we can more reliable measure the
>> overhead. Premature optimization, and all that. ;-)
>>
>
> Sure, do you suggest any kinds of measurement, experiment, or
> benchmarking that could give out a figure on how things are different?

My take was; If you have access to actual V 1.0 hardware, and just not
Qemu, then we could do some actual real tests, measuring context switch
costs etc!


Björn
Björn Töpel June 26, 2023, 3:36 p.m. UTC | #22
Andy Chiu <andy.chiu@sifive.com> writes:

>> > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
>> > index 24d309c6ab8d..e36b69c9b07f 100644
>> > --- a/arch/riscv/kernel/traps.c
>> > +++ b/arch/riscv/kernel/traps.c
>> > @@ -291,10 +291,14 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>> >  {
>> >       if (user_mode(regs)) {
>> >               ulong syscall = regs->a7;
>> > +             bool v_is_on;
>> >
>> >               regs->epc += 4;
>> >               regs->orig_a0 = regs->a0;
>> >
>> > +             v_is_on = riscv_v_vstate_query(regs);
>> > +             riscv_v_vstate_off(regs);
>> > +
>> >               syscall = syscall_enter_from_user_mode(regs, syscall);
>> >
>> >               if (syscall < NR_syscalls)
>> > @@ -303,6 +307,10 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>> >                       regs->a0 = -ENOSYS;
>> >
>> >               syscall_exit_to_user_mode(regs);
>> > +             if (v_is_on) {
>> > +                     riscv_v_vstate_discard(regs);
>> > +                     riscv_v_vstate_dirty(regs);
>>
>> Ah! Neat! Why dirty, instead of just keeping the "set to Initial" from
>> my diff?
>
> Both work, I think. But here if we set it to "on" after discarding
> V-regs, then take a context switch before executing any V instructions
> in user space (does not change future vstate to dirty). Then we will
> leak V-regs previously set into its vstate.datap after switching back,
> because we only save V context if vstate is dirty. So, I think setting
> vstate to dirty is a safer option.

Ah, yes, good point. An alternative variant is this:

---
diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index 04c0b07bf6cd..32b6115a54a5 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -139,15 +139,51 @@ static inline void riscv_v_vstate_save(struct task_struct *task,
 	}
 }
 
+static inline void __riscv_v_vstate_discard(void)
+{
+	unsigned long vl, vtype_inval = 1UL << (BITS_PER_LONG - 1);
+
+	riscv_v_enable();
+	asm volatile (
+		".option push\n\t"
+		".option arch, +v\n\t"
+		"vsetvli	%0, x0, e8, m8, ta, ma\n\t"
+		"vmv.v.i	v0, 0\n\t"
+		"vmv.v.i	v8, 0\n\t"
+		"vmv.v.i	v16, 0\n\t"
+		"vmv.v.i	v24, 0\n\t"
+		"vsetvl		%0, x0, %1\n\t"
+		".option pop\n\t"
+		: "=&r" (vl) : "r" (vtype_inval) : "memory");
+	riscv_v_disable();
+}
+
+static inline void riscv_v_vstate_discard(struct pt_regs *regs)
+{
+	if (!riscv_v_vstate_query(regs))
+		return;
+
+	__riscv_v_vstate_discard();
+	riscv_v_vstate_on(regs);
+}
+
 static inline void riscv_v_vstate_restore(struct task_struct *task,
 					  struct pt_regs *regs)
 {
-	if ((regs->status & SR_VS) != SR_VS_OFF) {
+	unsigned long status = regs->status & SR_VS;
+
+	WARN_ON(status == SR_VS_DIRTY);
+
+	if (status == SR_VS_CLEAN) {
 		struct __riscv_v_ext_state *vstate = &task->thread.vstate;
 
 		__riscv_v_vstate_restore(vstate, vstate->datap);
 		__riscv_v_vstate_clean(regs);
+		return;
 	}
+
+	if (status == SR_VS_INITIAL)
+		__riscv_v_vstate_discard();
 }
 
 static inline void __switch_to_vector(struct task_struct *prev,
@@ -178,6 +214,7 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; }
 #define __switch_to_vector(__prev, __next)	do {} while (0)
 #define riscv_v_vstate_off(regs)		do {} while (0)
 #define riscv_v_vstate_on(regs)			do {} while (0)
+#define riscv_v_vstate_discard(regs)		do {} while (0)
 
 #endif /* CONFIG_RISCV_ISA_V */
 
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 5158961ea977..5ff63a784a6d 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -296,6 +296,8 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
 		regs->epc += 4;
 		regs->orig_a0 = regs->a0;
 
+		riscv_v_vstate_discard(regs);
+
 		syscall = syscall_enter_from_user_mode(regs, syscall);
 
 		if (syscall < NR_syscalls)

---


Here, we simply discard the regs if the state is Initial. Thoughts?


Björn
Andy Chiu June 27, 2023, 1:07 a.m. UTC | #23
On Mon, Jun 26, 2023 at 11:36 PM Björn Töpel <bjorn@kernel.org> wrote:
>
> Andy Chiu <andy.chiu@sifive.com> writes:
>
> >> > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> >> > index 24d309c6ab8d..e36b69c9b07f 100644
> >> > --- a/arch/riscv/kernel/traps.c
> >> > +++ b/arch/riscv/kernel/traps.c
> >> > @@ -291,10 +291,14 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
> >> >  {
> >> >       if (user_mode(regs)) {
> >> >               ulong syscall = regs->a7;
> >> > +             bool v_is_on;
> >> >
> >> >               regs->epc += 4;
> >> >               regs->orig_a0 = regs->a0;
> >> >
> >> > +             v_is_on = riscv_v_vstate_query(regs);
> >> > +             riscv_v_vstate_off(regs);
> >> > +
> >> >               syscall = syscall_enter_from_user_mode(regs, syscall);
> >> >
> >> >               if (syscall < NR_syscalls)
> >> > @@ -303,6 +307,10 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
> >> >                       regs->a0 = -ENOSYS;
> >> >
> >> >               syscall_exit_to_user_mode(regs);
> >> > +             if (v_is_on) {
> >> > +                     riscv_v_vstate_discard(regs);
> >> > +                     riscv_v_vstate_dirty(regs);
> >>
> >> Ah! Neat! Why dirty, instead of just keeping the "set to Initial" from
> >> my diff?
> >
> > Both work, I think. But here if we set it to "on" after discarding
> > V-regs, then take a context switch before executing any V instructions
> > in user space (does not change future vstate to dirty). Then we will
> > leak V-regs previously set into its vstate.datap after switching back,
> > because we only save V context if vstate is dirty. So, I think setting
> > vstate to dirty is a safer option.
>
> Ah, yes, good point. An alternative variant is this:
>
> ---
> diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
> index 04c0b07bf6cd..32b6115a54a5 100644
> --- a/arch/riscv/include/asm/vector.h
> +++ b/arch/riscv/include/asm/vector.h
> @@ -139,15 +139,51 @@ static inline void riscv_v_vstate_save(struct task_struct *task,
>         }
>  }
>
> +static inline void __riscv_v_vstate_discard(void)
> +{
> +       unsigned long vl, vtype_inval = 1UL << (BITS_PER_LONG - 1);
> +
> +       riscv_v_enable();
> +       asm volatile (
> +               ".option push\n\t"
> +               ".option arch, +v\n\t"
> +               "vsetvli        %0, x0, e8, m8, ta, ma\n\t"
> +               "vmv.v.i        v0, 0\n\t"
> +               "vmv.v.i        v8, 0\n\t"
> +               "vmv.v.i        v16, 0\n\t"
> +               "vmv.v.i        v24, 0\n\t"
> +               "vsetvl         %0, x0, %1\n\t"
> +               ".option pop\n\t"
> +               : "=&r" (vl) : "r" (vtype_inval) : "memory");
> +       riscv_v_disable();
> +}
> +
> +static inline void riscv_v_vstate_discard(struct pt_regs *regs)
> +{
> +       if (!riscv_v_vstate_query(regs))
> +               return;
> +
> +       __riscv_v_vstate_discard();
> +       riscv_v_vstate_on(regs);
> +}
> +
>  static inline void riscv_v_vstate_restore(struct task_struct *task,
>                                           struct pt_regs *regs)
>  {
> -       if ((regs->status & SR_VS) != SR_VS_OFF) {
> +       unsigned long status = regs->status & SR_VS;
> +
> +       WARN_ON(status == SR_VS_DIRTY);
> +
> +       if (status == SR_VS_CLEAN) {
>                 struct __riscv_v_ext_state *vstate = &task->thread.vstate;
>
>                 __riscv_v_vstate_restore(vstate, vstate->datap);
>                 __riscv_v_vstate_clean(regs);
> +               return;
>         }
> +
> +       if (status == SR_VS_INITIAL)
> +               __riscv_v_vstate_discard();
>  }
>
>  static inline void __switch_to_vector(struct task_struct *prev,
> @@ -178,6 +214,7 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; }
>  #define __switch_to_vector(__prev, __next)     do {} while (0)
>  #define riscv_v_vstate_off(regs)               do {} while (0)
>  #define riscv_v_vstate_on(regs)                        do {} while (0)
> +#define riscv_v_vstate_discard(regs)           do {} while (0)
>
>  #endif /* CONFIG_RISCV_ISA_V */
>
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 5158961ea977..5ff63a784a6d 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -296,6 +296,8 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>                 regs->epc += 4;
>                 regs->orig_a0 = regs->a0;
>
> +               riscv_v_vstate_discard(regs);
> +
>                 syscall = syscall_enter_from_user_mode(regs, syscall);
>
>                 if (syscall < NR_syscalls)
>
> ---
>
>
> Here, we simply discard the regs if the state is Initial. Thoughts?
>
>
> Björn

Yes, it makes sense to me to handle the initial state in vstate_restore.

Thanks,
Andy
Björn Töpel June 27, 2023, 6:33 a.m. UTC | #24
Andy Chiu <andy.chiu@sifive.com> writes:

> On Mon, Jun 26, 2023 at 11:36 PM Björn Töpel <bjorn@kernel.org> wrote:
>>
>> Andy Chiu <andy.chiu@sifive.com> writes:
>>
>> >> > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
>> >> > index 24d309c6ab8d..e36b69c9b07f 100644
>> >> > --- a/arch/riscv/kernel/traps.c
>> >> > +++ b/arch/riscv/kernel/traps.c
>> >> > @@ -291,10 +291,14 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>> >> >  {
>> >> >       if (user_mode(regs)) {
>> >> >               ulong syscall = regs->a7;
>> >> > +             bool v_is_on;
>> >> >
>> >> >               regs->epc += 4;
>> >> >               regs->orig_a0 = regs->a0;
>> >> >
>> >> > +             v_is_on = riscv_v_vstate_query(regs);
>> >> > +             riscv_v_vstate_off(regs);
>> >> > +
>> >> >               syscall = syscall_enter_from_user_mode(regs, syscall);
>> >> >
>> >> >               if (syscall < NR_syscalls)
>> >> > @@ -303,6 +307,10 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>> >> >                       regs->a0 = -ENOSYS;
>> >> >
>> >> >               syscall_exit_to_user_mode(regs);
>> >> > +             if (v_is_on) {
>> >> > +                     riscv_v_vstate_discard(regs);
>> >> > +                     riscv_v_vstate_dirty(regs);
>> >>
>> >> Ah! Neat! Why dirty, instead of just keeping the "set to Initial" from
>> >> my diff?
>> >
>> > Both work, I think. But here if we set it to "on" after discarding
>> > V-regs, then take a context switch before executing any V instructions
>> > in user space (does not change future vstate to dirty). Then we will
>> > leak V-regs previously set into its vstate.datap after switching back,
>> > because we only save V context if vstate is dirty. So, I think setting
>> > vstate to dirty is a safer option.
>>
>> Ah, yes, good point. An alternative variant is this:
>>
>> ---
>> diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
>> index 04c0b07bf6cd..32b6115a54a5 100644
>> --- a/arch/riscv/include/asm/vector.h
>> +++ b/arch/riscv/include/asm/vector.h
>> @@ -139,15 +139,51 @@ static inline void riscv_v_vstate_save(struct task_struct *task,
>>         }
>>  }
>>
>> +static inline void __riscv_v_vstate_discard(void)
>> +{
>> +       unsigned long vl, vtype_inval = 1UL << (BITS_PER_LONG - 1);
>> +
>> +       riscv_v_enable();
>> +       asm volatile (
>> +               ".option push\n\t"
>> +               ".option arch, +v\n\t"
>> +               "vsetvli        %0, x0, e8, m8, ta, ma\n\t"
>> +               "vmv.v.i        v0, 0\n\t"
>> +               "vmv.v.i        v8, 0\n\t"
>> +               "vmv.v.i        v16, 0\n\t"
>> +               "vmv.v.i        v24, 0\n\t"
>> +               "vsetvl         %0, x0, %1\n\t"
>> +               ".option pop\n\t"
>> +               : "=&r" (vl) : "r" (vtype_inval) : "memory");
>> +       riscv_v_disable();
>> +}
>> +
>> +static inline void riscv_v_vstate_discard(struct pt_regs *regs)
>> +{
>> +       if (!riscv_v_vstate_query(regs))
>> +               return;
>> +
>> +       __riscv_v_vstate_discard();
>> +       riscv_v_vstate_on(regs);
>> +}
>> +
>>  static inline void riscv_v_vstate_restore(struct task_struct *task,
>>                                           struct pt_regs *regs)
>>  {
>> -       if ((regs->status & SR_VS) != SR_VS_OFF) {
>> +       unsigned long status = regs->status & SR_VS;
>> +
>> +       WARN_ON(status == SR_VS_DIRTY);
>> +
>> +       if (status == SR_VS_CLEAN) {
>>                 struct __riscv_v_ext_state *vstate = &task->thread.vstate;
>>
>>                 __riscv_v_vstate_restore(vstate, vstate->datap);
>>                 __riscv_v_vstate_clean(regs);
>> +               return;
>>         }
>> +
>> +       if (status == SR_VS_INITIAL)
>> +               __riscv_v_vstate_discard();
>>  }
>>
>>  static inline void __switch_to_vector(struct task_struct *prev,
>> @@ -178,6 +214,7 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; }
>>  #define __switch_to_vector(__prev, __next)     do {} while (0)
>>  #define riscv_v_vstate_off(regs)               do {} while (0)
>>  #define riscv_v_vstate_on(regs)                        do {} while (0)
>> +#define riscv_v_vstate_discard(regs)           do {} while (0)
>>
>>  #endif /* CONFIG_RISCV_ISA_V */
>>
>> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
>> index 5158961ea977..5ff63a784a6d 100644
>> --- a/arch/riscv/kernel/traps.c
>> +++ b/arch/riscv/kernel/traps.c
>> @@ -296,6 +296,8 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>>                 regs->epc += 4;
>>                 regs->orig_a0 = regs->a0;
>>
>> +               riscv_v_vstate_discard(regs);
>> +
>>                 syscall = syscall_enter_from_user_mode(regs, syscall);
>>
>>                 if (syscall < NR_syscalls)
>>
>> ---
>>
>>
>> Here, we simply discard the regs if the state is Initial. Thoughts?
>>
>>
>> Björn
>
> Yes, it makes sense to me to handle the initial state in vstate_restore.

Ok! I sent out a proper v2, but without the WARN_ON to match the
behavior of the the original code.

PTAL, and let me know what you think.


Björn
diff mbox series

Patch

diff --git a/Documentation/riscv/vector.rst b/Documentation/riscv/vector.rst
index 48f189d79e41..a4dfa954215b 100644
--- a/Documentation/riscv/vector.rst
+++ b/Documentation/riscv/vector.rst
@@ -130,3 +130,8 @@  processes in form of sysctl knob:
 
     Modifying the system default enablement status does not affect the enablement
     status of any existing process of thread that do not make an execve() call.
+
+3.  Vector Register State Across System Calls
+---------------------------------------------
+
+Vector registers are clobbered by system calls.
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 05ffdcd1424e..bb99a6379b37 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -295,6 +295,8 @@  asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
 		regs->epc += 4;
 		regs->orig_a0 = regs->a0;
 
+		riscv_v_vstate_off(regs);
+
 		syscall = syscall_enter_from_user_mode(regs, syscall);
 
 		if (syscall < NR_syscalls)