[v5,06/30] arm64: context switch POR_EL0 register

Message ID	20240822151113.1479789-7-joey.gouly@arm.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Joey Gouly <joey.gouly@arm.com> To: linux-arm-kernel@lists.infradead.org Cc: nd@arm.com, akpm@linux-foundation.org, aneesh.kumar@kernel.org, aneesh.kumar@linux.ibm.com, anshuman.khandual@arm.com, bp@alien8.de, broonie@kernel.org, catalin.marinas@arm.com, christophe.leroy@csgroup.eu, dave.hansen@linux.intel.com, hpa@zytor.com, joey.gouly@arm.com, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, maz@kernel.org, mingo@redhat.com, mpe@ellerman.id.au, naveen.n.rao@linux.ibm.com, npiggin@gmail.com, oliver.upton@linux.dev, shuah@kernel.org, skhan@linuxfoundation.org, szabolcs.nagy@arm.com, tglx@linutronix.de, will@kernel.org, x86@kernel.org, kvmarm@lists.linux.dev, linux-kselftest@vger.kernel.org Subject: [PATCH v5 06/30] arm64: context switch POR_EL0 register Date: Thu, 22 Aug 2024 16:10:49 +0100 Message-Id: <20240822151113.1479789-7-joey.gouly@arm.com> In-Reply-To: <20240822151113.1479789-1-joey.gouly@arm.com> References: <20240822151113.1479789-1-joey.gouly@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Permission Overlay Extension \| expand [v5,00/30] Permission Overlay Extension [v5,01/30] powerpc/mm: add ARCH_PKEY_BITS to Kconfig [v5,02/30] x86/mm: add ARCH_PKEY_BITS to Kconfig [v5,03/30] mm: use ARCH_PKEY_BITS to define VM_PKEY_BITN [v5,04/30] arm64: disable trapping of POR_EL0 to EL2 [v5,05/30] arm64: cpufeature: add Permission Overlay Extension cpucap [v5,06/30] arm64: context switch POR_EL0 register [v5,07/30] KVM: arm64: Save/restore POE registers [v5,08/30] KVM: arm64: make kvm_at() take an OP_AT_* [v5,09/30] KVM: arm64: use `at s1e1a` for POE [v5,10/30] KVM: arm64: Sanitise ID_AA64MMFR3_EL1 [v5,11/30] arm64: enable the Permission Overlay Extension for EL0 [v5,12/30] arm64: re-order MTE VM_ flags [v5,13/30] arm64: add POIndex defines [v5,14/30] arm64: convert protection key into vm_flags and pgprot values [v5,15/30] arm64: mask out POIndex when modifying a PTE [v5,16/30] arm64: handle PKEY/POE faults [v5,17/30] arm64: add pte_access_permitted_no_overlay() [v5,18/30] arm64: implement PKEYS support [v5,19/30] arm64: add POE signal support [v5,20/30] arm64/ptrace: add support for FEAT_POE [v5,21/30] arm64: enable POE and PIE to coexist [v5,22/30] arm64: enable PKEY support for CPUs with S1POE [v5,23/30] arm64: add Permission Overlay Extension Kconfig [v5,24/30] kselftest/arm64: move get_header() [v5,25/30] selftests: mm: move fpregs printing [v5,26/30] selftests: mm: make protection_keys test work on arm64 [v5,27/30] kselftest/arm64: add HWCAP test for FEAT_S1POE [v5,28/30] kselftest/arm64: parse POE_MAGIC in a signal frame [v5,29/30] kselftest/arm64: Add test case for POR_EL0 signal frame records [v5,30/30] KVM: selftests: get-reg-list: add Permission Overlay registers

Joey Gouly Aug. 22, 2024, 3:10 p.m. UTC

POR_EL0 is a register that can be modified by userspace directly,
so it must be context switched.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/include/asm/cpufeature.h |  6 ++++++
 arch/arm64/include/asm/processor.h  |  1 +
 arch/arm64/include/asm/sysreg.h     |  3 +++
 arch/arm64/kernel/process.c         | 28 ++++++++++++++++++++++++++++
 4 files changed, 38 insertions(+)

Will Deacon Aug. 23, 2024, 2:45 p.m. UTC | #1

On Thu, Aug 22, 2024 at 04:10:49PM +0100, Joey Gouly wrote:
> POR_EL0 is a register that can be modified by userspace directly,
> so it must be context switched.
> 
> Signed-off-by: Joey Gouly <joey.gouly@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
>  arch/arm64/include/asm/cpufeature.h |  6 ++++++
>  arch/arm64/include/asm/processor.h  |  1 +
>  arch/arm64/include/asm/sysreg.h     |  3 +++
>  arch/arm64/kernel/process.c         | 28 ++++++++++++++++++++++++++++
>  4 files changed, 38 insertions(+)

[...]

> +static void permission_overlay_switch(struct task_struct *next)
> +{
> +	if (!system_supports_poe())
> +		return;
> +
> +	current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> +	if (current->thread.por_el0 != next->thread.por_el0) {
> +		write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> +		/* ISB required for kernel uaccess routines when chaning POR_EL0 */

nit: typo "chaning".

But more substantially, is this just to prevent spurious faults in the
context of a new thread using a stale value for POR_EL0?

Will

Catalin Marinas Aug. 23, 2024, 4:41 p.m. UTC | #2

On Fri, Aug 23, 2024 at 03:45:32PM +0100, Will Deacon wrote:
> On Thu, Aug 22, 2024 at 04:10:49PM +0100, Joey Gouly wrote:
> > +static void permission_overlay_switch(struct task_struct *next)
> > +{
> > +	if (!system_supports_poe())
> > +		return;
> > +
> > +	current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > +	if (current->thread.por_el0 != next->thread.por_el0) {
> > +		write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > +		/* ISB required for kernel uaccess routines when chaning POR_EL0 */
> 
> nit: typo "chaning".
> 
> But more substantially, is this just to prevent spurious faults in the
> context of a new thread using a stale value for POR_EL0?

Not just prevent faults but enforce the permissions from the new
thread's POR_EL0. The kernel may continue with a uaccess routine from
here, we can't tell.

Will Deacon Aug. 23, 2024, 5:08 p.m. UTC | #3

On Fri, Aug 23, 2024 at 05:41:06PM +0100, Catalin Marinas wrote:
> On Fri, Aug 23, 2024 at 03:45:32PM +0100, Will Deacon wrote:
> > On Thu, Aug 22, 2024 at 04:10:49PM +0100, Joey Gouly wrote:
> > > +static void permission_overlay_switch(struct task_struct *next)
> > > +{
> > > +	if (!system_supports_poe())
> > > +		return;
> > > +
> > > +	current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > +	if (current->thread.por_el0 != next->thread.por_el0) {
> > > +		write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > > +		/* ISB required for kernel uaccess routines when chaning POR_EL0 */
> > 
> > nit: typo "chaning".
> > 
> > But more substantially, is this just to prevent spurious faults in the
> > context of a new thread using a stale value for POR_EL0?
> 
> Not just prevent faults but enforce the permissions from the new
> thread's POR_EL0. The kernel may continue with a uaccess routine from
> here, we can't tell.

Hmm, I wondered if that was the case. It's a bit weird though, because:

  - There's a window between switch_mm() and switch_to() where you might
    reasonably expect to be able to execute uaccess routines

  - kthread_use_mm() doesn't/can't look at this at all

  - GUP obviously doesn't care

So what do we actually gain by having the uaccess routines honour this?

Will

Catalin Marinas Aug. 23, 2024, 6:40 p.m. UTC | #4

On Fri, Aug 23, 2024 at 06:08:36PM +0100, Will Deacon wrote:
> On Fri, Aug 23, 2024 at 05:41:06PM +0100, Catalin Marinas wrote:
> > On Fri, Aug 23, 2024 at 03:45:32PM +0100, Will Deacon wrote:
> > > On Thu, Aug 22, 2024 at 04:10:49PM +0100, Joey Gouly wrote:
> > > > +static void permission_overlay_switch(struct task_struct *next)
> > > > +{
> > > > +	if (!system_supports_poe())
> > > > +		return;
> > > > +
> > > > +	current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > > +	if (current->thread.por_el0 != next->thread.por_el0) {
> > > > +		write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > > > +		/* ISB required for kernel uaccess routines when chaning POR_EL0 */
> > > 
> > > nit: typo "chaning".
> > > 
> > > But more substantially, is this just to prevent spurious faults in the
> > > context of a new thread using a stale value for POR_EL0?
> > 
> > Not just prevent faults but enforce the permissions from the new
> > thread's POR_EL0. The kernel may continue with a uaccess routine from
> > here, we can't tell.
> 
> Hmm, I wondered if that was the case. It's a bit weird though, because:
> 
>   - There's a window between switch_mm() and switch_to() where you might
>     reasonably expect to be able to execute uaccess routines

I don't think we can have any uaccess between these two switches (a
uaccess could fault, that's a pretty weird state between these two).

>   - kthread_use_mm() doesn't/can't look at this at all

No, but a kthread would have it's own, most permissive, POR_EL0.

>   - GUP obviously doesn't care
> 
> So what do we actually gain by having the uaccess routines honour this?

I guess where it matters is more like not accidentally faulting because
the previous thread had more restrictive permissions.

Will Deacon Aug. 27, 2024, 11:38 a.m. UTC | #5

On Fri, Aug 23, 2024 at 07:40:52PM +0100, Catalin Marinas wrote:
> On Fri, Aug 23, 2024 at 06:08:36PM +0100, Will Deacon wrote:
> > On Fri, Aug 23, 2024 at 05:41:06PM +0100, Catalin Marinas wrote:
> > > On Fri, Aug 23, 2024 at 03:45:32PM +0100, Will Deacon wrote:
> > > > On Thu, Aug 22, 2024 at 04:10:49PM +0100, Joey Gouly wrote:
> > > > > +static void permission_overlay_switch(struct task_struct *next)
> > > > > +{
> > > > > +	if (!system_supports_poe())
> > > > > +		return;
> > > > > +
> > > > > +	current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > > > +	if (current->thread.por_el0 != next->thread.por_el0) {
> > > > > +		write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > > > > +		/* ISB required for kernel uaccess routines when chaning POR_EL0 */
> > > > 
> > > > nit: typo "chaning".
> > > > 
> > > > But more substantially, is this just to prevent spurious faults in the
> > > > context of a new thread using a stale value for POR_EL0?
> > > 
> > > Not just prevent faults but enforce the permissions from the new
> > > thread's POR_EL0. The kernel may continue with a uaccess routine from
> > > here, we can't tell.
> > 
> > Hmm, I wondered if that was the case. It's a bit weird though, because:
> > 
> >   - There's a window between switch_mm() and switch_to() where you might
> >     reasonably expect to be able to execute uaccess routines
> 
> I don't think we can have any uaccess between these two switches (a
> uaccess could fault, that's a pretty weird state between these two).
> 
> >   - kthread_use_mm() doesn't/can't look at this at all
> 
> No, but a kthread would have it's own, most permissive, POR_EL0.
> 
> >   - GUP obviously doesn't care
> > 
> > So what do we actually gain by having the uaccess routines honour this?
> 
> I guess where it matters is more like not accidentally faulting because
> the previous thread had more restrictive permissions.

That's what I wondered initially, but won't the fault handler retry in
that case?

Will

Catalin Marinas Sept. 2, 2024, 7:08 p.m. UTC | #6

On Tue, Aug 27, 2024 at 12:38:04PM +0100, Will Deacon wrote:
> On Fri, Aug 23, 2024 at 07:40:52PM +0100, Catalin Marinas wrote:
> > On Fri, Aug 23, 2024 at 06:08:36PM +0100, Will Deacon wrote:
> > > On Fri, Aug 23, 2024 at 05:41:06PM +0100, Catalin Marinas wrote:
> > > > On Fri, Aug 23, 2024 at 03:45:32PM +0100, Will Deacon wrote:
> > > > > On Thu, Aug 22, 2024 at 04:10:49PM +0100, Joey Gouly wrote:
> > > > > > +static void permission_overlay_switch(struct task_struct *next)
> > > > > > +{
> > > > > > +	if (!system_supports_poe())
> > > > > > +		return;
> > > > > > +
> > > > > > +	current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > > > > +	if (current->thread.por_el0 != next->thread.por_el0) {
> > > > > > +		write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > > > > > +		/* ISB required for kernel uaccess routines when chaning POR_EL0 */
> > > > > 
> > > > > nit: typo "chaning".
> > > > > 
> > > > > But more substantially, is this just to prevent spurious faults in the
> > > > > context of a new thread using a stale value for POR_EL0?
> > > > 
> > > > Not just prevent faults but enforce the permissions from the new
> > > > thread's POR_EL0. The kernel may continue with a uaccess routine from
> > > > here, we can't tell.
[...]
> > > So what do we actually gain by having the uaccess routines honour this?
> > 
> > I guess where it matters is more like not accidentally faulting because
> > the previous thread had more restrictive permissions.
> 
> That's what I wondered initially, but won't the fault handler retry in
> that case?

Yes, it will retry and this should be fine (I assume you are only
talking about the dropping ISB in the context switch).

For the case of running with a more permissive stale POR_EL0, arguably it's
slightly more predictable for the user but, OTOH, some syscalls like
readv() could be routed through GUP with no checks. As with MTE, we
don't guarantee uaccesses honour the user permissions.

That said, at some point we should sanitise this path anyway and have a
single ISB at the end. In the meantime, I'm fine with dropping the ISB
here.

Joey Gouly Sept. 3, 2024, 2:54 p.m. UTC | #7

On Mon, Sep 02, 2024 at 08:08:08PM +0100, Catalin Marinas wrote:
> On Tue, Aug 27, 2024 at 12:38:04PM +0100, Will Deacon wrote:
> > On Fri, Aug 23, 2024 at 07:40:52PM +0100, Catalin Marinas wrote:
> > > On Fri, Aug 23, 2024 at 06:08:36PM +0100, Will Deacon wrote:
> > > > On Fri, Aug 23, 2024 at 05:41:06PM +0100, Catalin Marinas wrote:
> > > > > On Fri, Aug 23, 2024 at 03:45:32PM +0100, Will Deacon wrote:
> > > > > > On Thu, Aug 22, 2024 at 04:10:49PM +0100, Joey Gouly wrote:
> > > > > > > +static void permission_overlay_switch(struct task_struct *next)
> > > > > > > +{
> > > > > > > +	if (!system_supports_poe())
> > > > > > > +		return;
> > > > > > > +
> > > > > > > +	current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > > > > > +	if (current->thread.por_el0 != next->thread.por_el0) {
> > > > > > > +		write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > > > > > > +		/* ISB required for kernel uaccess routines when chaning POR_EL0 */
> > > > > >
> > > > > > nit: typo "chaning".
> > > > > >
> > > > > > But more substantially, is this just to prevent spurious faults in the
> > > > > > context of a new thread using a stale value for POR_EL0?
> > > > >
> > > > > Not just prevent faults but enforce the permissions from the new
> > > > > thread's POR_EL0. The kernel may continue with a uaccess routine from
> > > > > here, we can't tell.
> [...]
> > > > So what do we actually gain by having the uaccess routines honour this?
> > >
> > > I guess where it matters is more like not accidentally faulting because
> > > the previous thread had more restrictive permissions.
> >
> > That's what I wondered initially, but won't the fault handler retry in
> > that case?
>
> Yes, it will retry and this should be fine (I assume you are only
> talking about the dropping ISB in the context switch).
>
> For the case of running with a more permissive stale POR_EL0, arguably it's
> slightly more predictable for the user but, OTOH, some syscalls like
> readv() could be routed through GUP with no checks. As with MTE, we
> don't guarantee uaccesses honour the user permissions.
>
> That said, at some point we should sanitise this path anyway and have a
> single ISB at the end. In the meantime, I'm fine with dropping the ISB
> here.
>

commit 3141fb86bee8d48ae47cab1594dad54f974a8899
Author: Joey Gouly <joey.gouly@arm.com>
Date:   Tue Sep 3 15:47:26 2024 +0100

    fixup! arm64: context switch POR_EL0 register

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index a3a61ecdb165..c224b0955f1a 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -515,11 +515,8 @@ static void permission_overlay_switch(struct task_struct *next)
                return;

        current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
-       if (current->thread.por_el0 != next->thread.por_el0) {
+       if (current->thread.por_el0 != next->thread.por_el0)
                write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
-               /* ISB required for kernel uaccess routines when chaning POR_EL0 */
-               isb();
-       }
 }

 /*

Will, do you want me to re-send the series with this and the permissions diff from the other thread [1],
or you ok with applying them when you pull it in?

Thanks,
Joey

[1] https://lore.kernel.org/all/20240903144823.GA3669886@e124191.cambridge.arm.com/

Will Deacon Sept. 4, 2024, 10:22 a.m. UTC | #8

On Tue, Sep 03, 2024 at 03:54:13PM +0100, Joey Gouly wrote:
> On Mon, Sep 02, 2024 at 08:08:08PM +0100, Catalin Marinas wrote:
> > On Tue, Aug 27, 2024 at 12:38:04PM +0100, Will Deacon wrote:
> > > On Fri, Aug 23, 2024 at 07:40:52PM +0100, Catalin Marinas wrote:
> > > > On Fri, Aug 23, 2024 at 06:08:36PM +0100, Will Deacon wrote:
> > > > > On Fri, Aug 23, 2024 at 05:41:06PM +0100, Catalin Marinas wrote:
> > > > > > On Fri, Aug 23, 2024 at 03:45:32PM +0100, Will Deacon wrote:
> > > > > > > On Thu, Aug 22, 2024 at 04:10:49PM +0100, Joey Gouly wrote:
> > > > > > > > +static void permission_overlay_switch(struct task_struct *next)
> > > > > > > > +{
> > > > > > > > +	if (!system_supports_poe())
> > > > > > > > +		return;
> > > > > > > > +
> > > > > > > > +	current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > > > > > > +	if (current->thread.por_el0 != next->thread.por_el0) {
> > > > > > > > +		write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > > > > > > > +		/* ISB required for kernel uaccess routines when chaning POR_EL0 */
> > > > > > >
> > > > > > > nit: typo "chaning".
> > > > > > >
> > > > > > > But more substantially, is this just to prevent spurious faults in the
> > > > > > > context of a new thread using a stale value for POR_EL0?
> > > > > >
> > > > > > Not just prevent faults but enforce the permissions from the new
> > > > > > thread's POR_EL0. The kernel may continue with a uaccess routine from
> > > > > > here, we can't tell.
> > [...]
> > > > > So what do we actually gain by having the uaccess routines honour this?
> > > >
> > > > I guess where it matters is more like not accidentally faulting because
> > > > the previous thread had more restrictive permissions.
> > >
> > > That's what I wondered initially, but won't the fault handler retry in
> > > that case?
> >
> > Yes, it will retry and this should be fine (I assume you are only
> > talking about the dropping ISB in the context switch).
> >
> > For the case of running with a more permissive stale POR_EL0, arguably it's
> > slightly more predictable for the user but, OTOH, some syscalls like
> > readv() could be routed through GUP with no checks. As with MTE, we
> > don't guarantee uaccesses honour the user permissions.
> >
> > That said, at some point we should sanitise this path anyway and have a
> > single ISB at the end. In the meantime, I'm fine with dropping the ISB
> > here.
> >
> 
> commit 3141fb86bee8d48ae47cab1594dad54f974a8899
> Author: Joey Gouly <joey.gouly@arm.com>
> Date:   Tue Sep 3 15:47:26 2024 +0100
> 
>     fixup! arm64: context switch POR_EL0 register
> 
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index a3a61ecdb165..c224b0955f1a 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -515,11 +515,8 @@ static void permission_overlay_switch(struct task_struct *next)
>                 return;
> 
>         current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> -       if (current->thread.por_el0 != next->thread.por_el0) {
> +       if (current->thread.por_el0 != next->thread.por_el0)
>                 write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> -               /* ISB required for kernel uaccess routines when chaning POR_EL0 */
> -               isb();
> -       }
>  }

What about the one in flush_poe()? I'm inclined to drop that as well.

> Will, do you want me to re-send the series with this and the permissions
> diff from the other thread [1],
> or you ok with applying them when you pull it in?

I'll have a crack now, but if it fails miserably then I'll let you know.

Will

Joey Gouly Sept. 4, 2024, 11:32 a.m. UTC | #9

On Wed, Sep 04, 2024 at 11:22:54AM +0100, Will Deacon wrote:
> On Tue, Sep 03, 2024 at 03:54:13PM +0100, Joey Gouly wrote:
> > On Mon, Sep 02, 2024 at 08:08:08PM +0100, Catalin Marinas wrote:
> > > On Tue, Aug 27, 2024 at 12:38:04PM +0100, Will Deacon wrote:
> > > > On Fri, Aug 23, 2024 at 07:40:52PM +0100, Catalin Marinas wrote:
> > > > > On Fri, Aug 23, 2024 at 06:08:36PM +0100, Will Deacon wrote:
> > > > > > On Fri, Aug 23, 2024 at 05:41:06PM +0100, Catalin Marinas wrote:
> > > > > > > On Fri, Aug 23, 2024 at 03:45:32PM +0100, Will Deacon wrote:
> > > > > > > > On Thu, Aug 22, 2024 at 04:10:49PM +0100, Joey Gouly wrote:
> > > > > > > > > +static void permission_overlay_switch(struct task_struct *next)
> > > > > > > > > +{
> > > > > > > > > +	if (!system_supports_poe())
> > > > > > > > > +		return;
> > > > > > > > > +
> > > > > > > > > +	current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > > > > > > > +	if (current->thread.por_el0 != next->thread.por_el0) {
> > > > > > > > > +		write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > > > > > > > > +		/* ISB required for kernel uaccess routines when chaning POR_EL0 */
> > > > > > > >
> > > > > > > > nit: typo "chaning".
> > > > > > > >
> > > > > > > > But more substantially, is this just to prevent spurious faults in the
> > > > > > > > context of a new thread using a stale value for POR_EL0?
> > > > > > >
> > > > > > > Not just prevent faults but enforce the permissions from the new
> > > > > > > thread's POR_EL0. The kernel may continue with a uaccess routine from
> > > > > > > here, we can't tell.
> > > [...]
> > > > > > So what do we actually gain by having the uaccess routines honour this?
> > > > >
> > > > > I guess where it matters is more like not accidentally faulting because
> > > > > the previous thread had more restrictive permissions.
> > > >
> > > > That's what I wondered initially, but won't the fault handler retry in
> > > > that case?
> > >
> > > Yes, it will retry and this should be fine (I assume you are only
> > > talking about the dropping ISB in the context switch).
> > >
> > > For the case of running with a more permissive stale POR_EL0, arguably it's
> > > slightly more predictable for the user but, OTOH, some syscalls like
> > > readv() could be routed through GUP with no checks. As with MTE, we
> > > don't guarantee uaccesses honour the user permissions.
> > >
> > > That said, at some point we should sanitise this path anyway and have a
> > > single ISB at the end. In the meantime, I'm fine with dropping the ISB
> > > here.
> > >
> > 
> > commit 3141fb86bee8d48ae47cab1594dad54f974a8899
> > Author: Joey Gouly <joey.gouly@arm.com>
> > Date:   Tue Sep 3 15:47:26 2024 +0100
> > 
> >     fixup! arm64: context switch POR_EL0 register
> > 
> > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > index a3a61ecdb165..c224b0955f1a 100644
> > --- a/arch/arm64/kernel/process.c
> > +++ b/arch/arm64/kernel/process.c
> > @@ -515,11 +515,8 @@ static void permission_overlay_switch(struct task_struct *next)
> >                 return;
> > 
> >         current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > -       if (current->thread.por_el0 != next->thread.por_el0) {
> > +       if (current->thread.por_el0 != next->thread.por_el0)
> >                 write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > -               /* ISB required for kernel uaccess routines when chaning POR_EL0 */
> > -               isb();
> > -       }
> >  }
> 
> What about the one in flush_poe()? I'm inclined to drop that as well.

Yes I guess that one can be removed too. Catalin any comments?

> 
> > Will, do you want me to re-send the series with this and the permissions
> > diff from the other thread [1],
> > or you ok with applying them when you pull it in?
> 
> I'll have a crack now, but if it fails miserably then I'll let you know.

Thanks! Just to make sure, you should pick the patch up from

	https://lore.kernel.org/linux-arm-kernel/20240903152937.GA3768522@e124191.cambridge.arm.com/

Not the one I linked to in [1] in my previous e-mail.

Thanks,
Joey

Catalin Marinas Sept. 4, 2024, 11:38 a.m. UTC | #10

On Wed, Sep 04, 2024 at 11:22:54AM +0100, Will Deacon wrote:
> On Tue, Sep 03, 2024 at 03:54:13PM +0100, Joey Gouly wrote:
> > commit 3141fb86bee8d48ae47cab1594dad54f974a8899
> > Author: Joey Gouly <joey.gouly@arm.com>
> > Date:   Tue Sep 3 15:47:26 2024 +0100
> > 
> >     fixup! arm64: context switch POR_EL0 register
> > 
> > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > index a3a61ecdb165..c224b0955f1a 100644
> > --- a/arch/arm64/kernel/process.c
> > +++ b/arch/arm64/kernel/process.c
> > @@ -515,11 +515,8 @@ static void permission_overlay_switch(struct task_struct *next)
> >                 return;
> > 
> >         current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > -       if (current->thread.por_el0 != next->thread.por_el0) {
> > +       if (current->thread.por_el0 != next->thread.por_el0)
> >                 write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > -               /* ISB required for kernel uaccess routines when chaning POR_EL0 */
> > -               isb();
> > -       }
> >  }
> 
> What about the one in flush_poe()? I'm inclined to drop that as well.

Yes, that's a similar thing.

Will Deacon Sept. 4, 2024, 11:43 a.m. UTC | #11

On Wed, Sep 04, 2024 at 12:32:21PM +0100, Joey Gouly wrote:
> On Wed, Sep 04, 2024 at 11:22:54AM +0100, Will Deacon wrote:
> > On Tue, Sep 03, 2024 at 03:54:13PM +0100, Joey Gouly wrote:
> > > On Mon, Sep 02, 2024 at 08:08:08PM +0100, Catalin Marinas wrote:
> > > > On Tue, Aug 27, 2024 at 12:38:04PM +0100, Will Deacon wrote:
> > > > > On Fri, Aug 23, 2024 at 07:40:52PM +0100, Catalin Marinas wrote:
> > > > > > On Fri, Aug 23, 2024 at 06:08:36PM +0100, Will Deacon wrote:
> > > > > > > On Fri, Aug 23, 2024 at 05:41:06PM +0100, Catalin Marinas wrote:
> > > > > > > > On Fri, Aug 23, 2024 at 03:45:32PM +0100, Will Deacon wrote:
> > > > > > > > > On Thu, Aug 22, 2024 at 04:10:49PM +0100, Joey Gouly wrote:
> > > > > > > > > > +static void permission_overlay_switch(struct task_struct *next)
> > > > > > > > > > +{
> > > > > > > > > > +	if (!system_supports_poe())
> > > > > > > > > > +		return;
> > > > > > > > > > +
> > > > > > > > > > +	current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > > > > > > > > +	if (current->thread.por_el0 != next->thread.por_el0) {
> > > > > > > > > > +		write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > > > > > > > > > +		/* ISB required for kernel uaccess routines when chaning POR_EL0 */
> > > > > > > > >
> > > > > > > > > nit: typo "chaning".
> > > > > > > > >
> > > > > > > > > But more substantially, is this just to prevent spurious faults in the
> > > > > > > > > context of a new thread using a stale value for POR_EL0?
> > > > > > > >
> > > > > > > > Not just prevent faults but enforce the permissions from the new
> > > > > > > > thread's POR_EL0. The kernel may continue with a uaccess routine from
> > > > > > > > here, we can't tell.
> > > > [...]
> > > > > > > So what do we actually gain by having the uaccess routines honour this?
> > > > > >
> > > > > > I guess where it matters is more like not accidentally faulting because
> > > > > > the previous thread had more restrictive permissions.
> > > > >
> > > > > That's what I wondered initially, but won't the fault handler retry in
> > > > > that case?
> > > >
> > > > Yes, it will retry and this should be fine (I assume you are only
> > > > talking about the dropping ISB in the context switch).
> > > >
> > > > For the case of running with a more permissive stale POR_EL0, arguably it's
> > > > slightly more predictable for the user but, OTOH, some syscalls like
> > > > readv() could be routed through GUP with no checks. As with MTE, we
> > > > don't guarantee uaccesses honour the user permissions.
> > > >
> > > > That said, at some point we should sanitise this path anyway and have a
> > > > single ISB at the end. In the meantime, I'm fine with dropping the ISB
> > > > here.
> > > >
> > > 
> > > commit 3141fb86bee8d48ae47cab1594dad54f974a8899
> > > Author: Joey Gouly <joey.gouly@arm.com>
> > > Date:   Tue Sep 3 15:47:26 2024 +0100
> > > 
> > >     fixup! arm64: context switch POR_EL0 register
> > > 
> > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > > index a3a61ecdb165..c224b0955f1a 100644
> > > --- a/arch/arm64/kernel/process.c
> > > +++ b/arch/arm64/kernel/process.c
> > > @@ -515,11 +515,8 @@ static void permission_overlay_switch(struct task_struct *next)
> > >                 return;
> > > 
> > >         current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > -       if (current->thread.por_el0 != next->thread.por_el0) {
> > > +       if (current->thread.por_el0 != next->thread.por_el0)
> > >                 write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > > -               /* ISB required for kernel uaccess routines when chaning POR_EL0 */
> > > -               isb();
> > > -       }
> > >  }
> > 
> > What about the one in flush_poe()? I'm inclined to drop that as well.
> 
> Yes I guess that one can be removed too. Catalin any comments?
> 
> > 
> > > Will, do you want me to re-send the series with this and the permissions
> > > diff from the other thread [1],
> > > or you ok with applying them when you pull it in?
> > 
> > I'll have a crack now, but if it fails miserably then I'll let you know.
> 
> Thanks! Just to make sure, you should pick the patch up from
> 
> 	https://lore.kernel.org/linux-arm-kernel/20240903152937.GA3768522@e124191.cambridge.arm.com/
> 
> Not the one I linked to in [1] in my previous e-mail.

Right, there's quite a lot I need to do:

- Uncorrupt your patches
- Fix the conflict in the kvm selftests
- Drop the unnecessary ISBs
- Fix the ESR checking
- Fix the el2_setup labels
- Reorder the patches
- Drop the patch that is already in kvmarm

Working on it...

Will

Joey Gouly Sept. 4, 2024, 12:55 p.m. UTC | #12

On Wed, Sep 04, 2024 at 12:43:02PM +0100, Will Deacon wrote:
> On Wed, Sep 04, 2024 at 12:32:21PM +0100, Joey Gouly wrote:
> > On Wed, Sep 04, 2024 at 11:22:54AM +0100, Will Deacon wrote:
> > > On Tue, Sep 03, 2024 at 03:54:13PM +0100, Joey Gouly wrote:
> > > > On Mon, Sep 02, 2024 at 08:08:08PM +0100, Catalin Marinas wrote:
> > > > > On Tue, Aug 27, 2024 at 12:38:04PM +0100, Will Deacon wrote:
> > > > > > On Fri, Aug 23, 2024 at 07:40:52PM +0100, Catalin Marinas wrote:
> > > > > > > On Fri, Aug 23, 2024 at 06:08:36PM +0100, Will Deacon wrote:
> > > > > > > > On Fri, Aug 23, 2024 at 05:41:06PM +0100, Catalin Marinas wrote:
> > > > > > > > > On Fri, Aug 23, 2024 at 03:45:32PM +0100, Will Deacon wrote:
> > > > > > > > > > On Thu, Aug 22, 2024 at 04:10:49PM +0100, Joey Gouly wrote:
> > > > > > > > > > > +static void permission_overlay_switch(struct task_struct *next)
> > > > > > > > > > > +{
> > > > > > > > > > > +	if (!system_supports_poe())
> > > > > > > > > > > +		return;
> > > > > > > > > > > +
> > > > > > > > > > > +	current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > > > > > > > > > +	if (current->thread.por_el0 != next->thread.por_el0) {
> > > > > > > > > > > +		write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > > > > > > > > > > +		/* ISB required for kernel uaccess routines when chaning POR_EL0 */
> > > > > > > > > >
> > > > > > > > > > nit: typo "chaning".
> > > > > > > > > >
> > > > > > > > > > But more substantially, is this just to prevent spurious faults in the
> > > > > > > > > > context of a new thread using a stale value for POR_EL0?
> > > > > > > > >
> > > > > > > > > Not just prevent faults but enforce the permissions from the new
> > > > > > > > > thread's POR_EL0. The kernel may continue with a uaccess routine from
> > > > > > > > > here, we can't tell.
> > > > > [...]
> > > > > > > > So what do we actually gain by having the uaccess routines honour this?
> > > > > > >
> > > > > > > I guess where it matters is more like not accidentally faulting because
> > > > > > > the previous thread had more restrictive permissions.
> > > > > >
> > > > > > That's what I wondered initially, but won't the fault handler retry in
> > > > > > that case?
> > > > >
> > > > > Yes, it will retry and this should be fine (I assume you are only
> > > > > talking about the dropping ISB in the context switch).
> > > > >
> > > > > For the case of running with a more permissive stale POR_EL0, arguably it's
> > > > > slightly more predictable for the user but, OTOH, some syscalls like
> > > > > readv() could be routed through GUP with no checks. As with MTE, we
> > > > > don't guarantee uaccesses honour the user permissions.
> > > > >
> > > > > That said, at some point we should sanitise this path anyway and have a
> > > > > single ISB at the end. In the meantime, I'm fine with dropping the ISB
> > > > > here.
> > > > >
> > > > 
> > > > commit 3141fb86bee8d48ae47cab1594dad54f974a8899
> > > > Author: Joey Gouly <joey.gouly@arm.com>
> > > > Date:   Tue Sep 3 15:47:26 2024 +0100
> > > > 
> > > >     fixup! arm64: context switch POR_EL0 register
> > > > 
> > > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > > > index a3a61ecdb165..c224b0955f1a 100644
> > > > --- a/arch/arm64/kernel/process.c
> > > > +++ b/arch/arm64/kernel/process.c
> > > > @@ -515,11 +515,8 @@ static void permission_overlay_switch(struct task_struct *next)
> > > >                 return;
> > > > 
> > > >         current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > > -       if (current->thread.por_el0 != next->thread.por_el0) {
> > > > +       if (current->thread.por_el0 != next->thread.por_el0)
> > > >                 write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > > > -               /* ISB required for kernel uaccess routines when chaning POR_EL0 */
> > > > -               isb();
> > > > -       }
> > > >  }
> > > 
> > > What about the one in flush_poe()? I'm inclined to drop that as well.
> > 
> > Yes I guess that one can be removed too. Catalin any comments?
> > 
> > > 
> > > > Will, do you want me to re-send the series with this and the permissions
> > > > diff from the other thread [1],
> > > > or you ok with applying them when you pull it in?
> > > 
> > > I'll have a crack now, but if it fails miserably then I'll let you know.
> > 
> > Thanks! Just to make sure, you should pick the patch up from
> > 
> > 	https://lore.kernel.org/linux-arm-kernel/20240903152937.GA3768522@e124191.cambridge.arm.com/
> > 
> > Not the one I linked to in [1] in my previous e-mail.
> 
> Right, there's quite a lot I need to do:
> 
> - Uncorrupt your patches
> - Fix the conflict in the kvm selftests
> - Drop the unnecessary ISBs
> - Fix the ESR checking
> - Fix the el2_setup labels
> - Reorder the patches
> - Drop the patch that is already in kvmarm
> 
> Working on it...

Sorry! I'm happy to rebase onto some arm64 branch if that will help, just let me know.

> 
> Will
>

Will Deacon Sept. 4, 2024, 4:17 p.m. UTC | #13

On Wed, Sep 04, 2024 at 01:55:03PM +0100, Joey Gouly wrote:
> On Wed, Sep 04, 2024 at 12:43:02PM +0100, Will Deacon wrote:
> > Right, there's quite a lot I need to do:
> > 
> > - Uncorrupt your patches
> > - Fix the conflict in the kvm selftests
> > - Drop the unnecessary ISBs
> > - Fix the ESR checking
> > - Fix the el2_setup labels
> > - Reorder the patches
> > - Drop the patch that is already in kvmarm
> > 
> > Working on it...
> 
> Sorry! I'm happy to rebase onto some arm64 branch if that will help, just let me know.

Please have a look at for-next/poe (also merged into for-next/core and
for-kernelci) and let me know what I got wrong!

For Marc: I reordered the series so the KVM bits (and deps) are all the
beginning, should you need them. The branch is based on a merge of the
shared branch you created previously.

Cheers,

Will

Marc Zyngier Sept. 4, 2024, 5:05 p.m. UTC | #14

On Wed, 04 Sep 2024 17:17:58 +0100,
Will Deacon <will@kernel.org> wrote:
> 
> On Wed, Sep 04, 2024 at 01:55:03PM +0100, Joey Gouly wrote:
> > On Wed, Sep 04, 2024 at 12:43:02PM +0100, Will Deacon wrote:
> > > Right, there's quite a lot I need to do:
> > > 
> > > - Uncorrupt your patches
> > > - Fix the conflict in the kvm selftests
> > > - Drop the unnecessary ISBs
> > > - Fix the ESR checking
> > > - Fix the el2_setup labels
> > > - Reorder the patches
> > > - Drop the patch that is already in kvmarm
> > > 
> > > Working on it...
> > 
> > Sorry! I'm happy to rebase onto some arm64 branch if that will help, just let me know.
> 
> Please have a look at for-next/poe (also merged into for-next/core and
> for-kernelci) and let me know what I got wrong!
> 
> For Marc: I reordered the series so the KVM bits (and deps) are all the
> beginning, should you need them. The branch is based on a merge of the
> shared branch you created previously.

I just had a quick check, and while there is a small conflict with
kvmarm/next, it is extremely minor (small clash in the vcpu_sysreg,
for which the resolving order doesn't matter), and not worth dragging
additional patches in the shared branch.

However, if KVM's own S1PIE series [1] ends up being merged (which I'd
really like), I will definitely have to pull the prefix in, as this is
a bit more involved conflict wise.

Thanks,

	M.

[1] http://lore.kernel.org/all/20240903153834.1909472-1-maz@kernel.org

Joey Gouly Sept. 5, 2024, 10:36 a.m. UTC | #15

On Wed, Sep 04, 2024 at 05:17:58PM +0100, Will Deacon wrote:
> On Wed, Sep 04, 2024 at 01:55:03PM +0100, Joey Gouly wrote:
> > On Wed, Sep 04, 2024 at 12:43:02PM +0100, Will Deacon wrote:
> > > Right, there's quite a lot I need to do:
> > > 
> > > - Uncorrupt your patches
> > > - Fix the conflict in the kvm selftests
> > > - Drop the unnecessary ISBs
> > > - Fix the ESR checking
> > > - Fix the el2_setup labels
> > > - Reorder the patches
> > > - Drop the patch that is already in kvmarm
> > > 
> > > Working on it...
> > 
> > Sorry! I'm happy to rebase onto some arm64 branch if that will help, just let me know.
> 
> Please have a look at for-next/poe (also merged into for-next/core and
> for-kernelci) and let me know what I got wrong!

I pulled for-next/poe and ran the test and it works fine. Also looked at the
diff of my branch against your branch, and it looks fine too.

Thanks for your work to get this merged!

> 
> For Marc: I reordered the series so the KVM bits (and deps) are all the
> beginning, should you need them. The branch is based on a merge of the
> shared branch you created previously.
> 
> Cheers,
> 
> Will
>

Kevin Brodsky Sept. 11, 2024, 3:01 p.m. UTC | #16

On 22/08/2024 17:10, Joey Gouly wrote:
> @@ -371,6 +382,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>  		if (system_supports_tpidr2())
>  			p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
>  
> +		if (system_supports_poe())
> +			p->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);

Here we are only reloading POR_EL0's value if the target is a user
thread. However, as this series stands, POR_EL0 is also relevant to
kthreads, because any uaccess or GUP done from a kthread will also be
checked against POR_EL0. This is especially important in cases like the
io_uring kthread, which accesses the memory of the user process that
spawned it. To prevent such a kthread from inheriting a stale value of
POR_EL0, it seems that we should reload POR_EL0's value in all cases
(user and kernel thread).

Other approaches could also be considered (e.g. resetting POR_EL0 to
unrestricted when creating a kthread), see my reply on v4 [1].

Kevin

[1]
https://lore.kernel.org/linux-arm-kernel/b4f8b351-4c83-43b4-bfbe-8f67f3f56fb9@arm.com/

Dave Hansen Sept. 11, 2024, 3:33 p.m. UTC | #17

On 9/11/24 08:01, Kevin Brodsky wrote:
> On 22/08/2024 17:10, Joey Gouly wrote:
>> @@ -371,6 +382,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>>  		if (system_supports_tpidr2())
>>  			p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
>>  
>> +		if (system_supports_poe())
>> +			p->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> Here we are only reloading POR_EL0's value if the target is a user
> thread. However, as this series stands, POR_EL0 is also relevant to
> kthreads, because any uaccess or GUP done from a kthread will also be
> checked against POR_EL0. This is especially important in cases like the
> io_uring kthread, which accesses the memory of the user process that
> spawned it. To prevent such a kthread from inheriting a stale value of
> POR_EL0, it seems that we should reload POR_EL0's value in all cases
> (user and kernel thread).

The problem with this is trying to figure out which POR_EL0 to use.  The
kthread could have been spawned ages ago and might not have a POR_EL0
which is very different from the current value of any of the threads in
the process right now.

There's also no great way for a kthread to reach out and grab an updated
value.  It's all completely inherently racy.

> Other approaches could also be considered (e.g. resetting POR_EL0 to
> unrestricted when creating a kthread), see my reply on v4 [1].

I kinda think this is the only way to go.  It's the only sensible,
predictable way.  I _think_ it's what x86 will end up doing with PKRU,
but there's been enough churn there that I'd need to go double check
what happens in practice.

Either way, it would be nice to get an io_uring test in here that
actually spawns kthreads:

	tools/testing/selftests/mm/protection_keys.c

Will Deacon Sept. 12, 2024, 10:50 a.m. UTC | #18

Hi Dave,

On Wed, Sep 11, 2024 at 08:33:54AM -0700, Dave Hansen wrote:
> On 9/11/24 08:01, Kevin Brodsky wrote:
> > On 22/08/2024 17:10, Joey Gouly wrote:
> >> @@ -371,6 +382,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> >>  		if (system_supports_tpidr2())
> >>  			p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
> >>  
> >> +		if (system_supports_poe())
> >> +			p->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > Here we are only reloading POR_EL0's value if the target is a user
> > thread. However, as this series stands, POR_EL0 is also relevant to
> > kthreads, because any uaccess or GUP done from a kthread will also be
> > checked against POR_EL0. This is especially important in cases like the
> > io_uring kthread, which accesses the memory of the user process that
> > spawned it. To prevent such a kthread from inheriting a stale value of
> > POR_EL0, it seems that we should reload POR_EL0's value in all cases
> > (user and kernel thread).
> 
> The problem with this is trying to figure out which POR_EL0 to use.  The
> kthread could have been spawned ages ago and might not have a POR_EL0
> which is very different from the current value of any of the threads in
> the process right now.
> 
> There's also no great way for a kthread to reach out and grab an updated
> value.  It's all completely inherently racy.
> 
> > Other approaches could also be considered (e.g. resetting POR_EL0 to
> > unrestricted when creating a kthread), see my reply on v4 [1].
> 
> I kinda think this is the only way to go.  It's the only sensible,
> predictable way.  I _think_ it's what x86 will end up doing with PKRU,
> but there's been enough churn there that I'd need to go double check
> what happens in practice.

I agree.

> Either way, it would be nice to get an io_uring test in here that
> actually spawns kthreads:
> 
> 	tools/testing/selftests/mm/protection_keys.c

It would be good to update Documentation/core-api/protection-keys.rst
as well, since the example with read() raises more questions than it
answers!

Kevin, Joey -- I've got this series queued in arm64 as-is, so perhaps
you could send some patches on top so we can iron this out in time for
6.12? I'll also be at LPC next week if you're about.

Cheers,

Will

Joey Gouly Sept. 12, 2024, 12:48 p.m. UTC | #19

On Thu, Sep 12, 2024 at 11:50:18AM +0100, Will Deacon wrote:
> Hi Dave,
> 
> On Wed, Sep 11, 2024 at 08:33:54AM -0700, Dave Hansen wrote:
> > On 9/11/24 08:01, Kevin Brodsky wrote:
> > > On 22/08/2024 17:10, Joey Gouly wrote:
> > >> @@ -371,6 +382,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > >>  		if (system_supports_tpidr2())
> > >>  			p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
> > >>  
> > >> +		if (system_supports_poe())
> > >> +			p->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > Here we are only reloading POR_EL0's value if the target is a user
> > > thread. However, as this series stands, POR_EL0 is also relevant to
> > > kthreads, because any uaccess or GUP done from a kthread will also be
> > > checked against POR_EL0. This is especially important in cases like the
> > > io_uring kthread, which accesses the memory of the user process that
> > > spawned it. To prevent such a kthread from inheriting a stale value of
> > > POR_EL0, it seems that we should reload POR_EL0's value in all cases
> > > (user and kernel thread).
> > 
> > The problem with this is trying to figure out which POR_EL0 to use.  The
> > kthread could have been spawned ages ago and might not have a POR_EL0
> > which is very different from the current value of any of the threads in
> > the process right now.
> > 
> > There's also no great way for a kthread to reach out and grab an updated
> > value.  It's all completely inherently racy.
> > 
> > > Other approaches could also be considered (e.g. resetting POR_EL0 to
> > > unrestricted when creating a kthread), see my reply on v4 [1].
> > 
> > I kinda think this is the only way to go.  It's the only sensible,
> > predictable way.  I _think_ it's what x86 will end up doing with PKRU,
> > but there's been enough churn there that I'd need to go double check
> > what happens in practice.
> 
> I agree.
> 
> > Either way, it would be nice to get an io_uring test in here that
> > actually spawns kthreads:
> > 
> > 	tools/testing/selftests/mm/protection_keys.c
> 
> It would be good to update Documentation/core-api/protection-keys.rst
> as well, since the example with read() raises more questions than it
> answers!
> 
> Kevin, Joey -- I've got this series queued in arm64 as-is, so perhaps
> you could send some patches on top so we can iron this out in time for
> 6.12? I'll also be at LPC next week if you're about.

I found the code in arch/x86 that does this, I must have missed this previously.

arch/x86/kernel/process.c: int copy_thread()                                                                                                                   

        /* Kernel thread ? */                                                                                                                                                                  
        if (unlikely(p->flags & PF_KTHREAD)) {                                                                                                                                                 
                p->thread.pkru = pkru_get_init_value();                                                                                                                                        
                memset(childregs, 0, sizeof(struct pt_regs));                                                                                                                                  
                kthread_frame_init(frame, args->fn, args->fn_arg);                                                                                                                             
                return 0;                                                                                                                                                                      
        }

I can send a similar patch for arm64.  I have no idea how to write io_uring
code, so looking for examples I can work with to get a test written. Might just
send the arm64 fix first, if that's fine?

Thanks,
Joey

Will Deacon Sept. 13, 2024, 3:14 p.m. UTC | #20

On Thu, Sep 12, 2024 at 01:48:35PM +0100, Joey Gouly wrote:
> On Thu, Sep 12, 2024 at 11:50:18AM +0100, Will Deacon wrote:
> > On Wed, Sep 11, 2024 at 08:33:54AM -0700, Dave Hansen wrote:
> > > On 9/11/24 08:01, Kevin Brodsky wrote:
> > > > On 22/08/2024 17:10, Joey Gouly wrote:
> > > >> @@ -371,6 +382,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > > >>  		if (system_supports_tpidr2())
> > > >>  			p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
> > > >>  
> > > >> +		if (system_supports_poe())
> > > >> +			p->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > > Here we are only reloading POR_EL0's value if the target is a user
> > > > thread. However, as this series stands, POR_EL0 is also relevant to
> > > > kthreads, because any uaccess or GUP done from a kthread will also be
> > > > checked against POR_EL0. This is especially important in cases like the
> > > > io_uring kthread, which accesses the memory of the user process that
> > > > spawned it. To prevent such a kthread from inheriting a stale value of
> > > > POR_EL0, it seems that we should reload POR_EL0's value in all cases
> > > > (user and kernel thread).
> > > 
> > > The problem with this is trying to figure out which POR_EL0 to use.  The
> > > kthread could have been spawned ages ago and might not have a POR_EL0
> > > which is very different from the current value of any of the threads in
> > > the process right now.
> > > 
> > > There's also no great way for a kthread to reach out and grab an updated
> > > value.  It's all completely inherently racy.
> > > 
> > > > Other approaches could also be considered (e.g. resetting POR_EL0 to
> > > > unrestricted when creating a kthread), see my reply on v4 [1].
> > > 
> > > I kinda think this is the only way to go.  It's the only sensible,
> > > predictable way.  I _think_ it's what x86 will end up doing with PKRU,
> > > but there's been enough churn there that I'd need to go double check
> > > what happens in practice.
> > 
> > I agree.
> > 
> > > Either way, it would be nice to get an io_uring test in here that
> > > actually spawns kthreads:
> > > 
> > > 	tools/testing/selftests/mm/protection_keys.c
> > 
> > It would be good to update Documentation/core-api/protection-keys.rst
> > as well, since the example with read() raises more questions than it
> > answers!
> > 
> > Kevin, Joey -- I've got this series queued in arm64 as-is, so perhaps
> > you could send some patches on top so we can iron this out in time for
> > 6.12? I'll also be at LPC next week if you're about.
> 
> I found the code in arch/x86 that does this, I must have missed this previously.
> 
> arch/x86/kernel/process.c: int copy_thread()                                                                                                                   
> 
>         /* Kernel thread ? */                                                                                                                                                                  
>         if (unlikely(p->flags & PF_KTHREAD)) {                                                                                                                                                 
>                 p->thread.pkru = pkru_get_init_value();                                                                                                                                        
>                 memset(childregs, 0, sizeof(struct pt_regs));                                                                                                                                  
>                 kthread_frame_init(frame, args->fn, args->fn_arg);                                                                                                                             
>                 return 0;                                                                                                                                                                      
>         }
> 
> I can send a similar patch for arm64.  I have no idea how to write io_uring
> code, so looking for examples I can work with to get a test written. Might just
> send the arm64 fix first, if that's fine?

I think fix + documentation is what we need before 6.12, but you've still
got plenty of time after the merge window.

Cheers,

Will

Aneesh Kumar K.V Sept. 22, 2024, 5:49 a.m. UTC | #21

Dave Hansen <dave.hansen@intel.com> writes:

> On 9/11/24 08:01, Kevin Brodsky wrote:
>> On 22/08/2024 17:10, Joey Gouly wrote:
>>> @@ -371,6 +382,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>>>  		if (system_supports_tpidr2())
>>>  			p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
>>>  
>>> +		if (system_supports_poe())
>>> +			p->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
>> Here we are only reloading POR_EL0's value if the target is a user
>> thread. However, as this series stands, POR_EL0 is also relevant to
>> kthreads, because any uaccess or GUP done from a kthread will also be
>> checked against POR_EL0. This is especially important in cases like the
>> io_uring kthread, which accesses the memory of the user process that
>> spawned it. To prevent such a kthread from inheriting a stale value of
>> POR_EL0, it seems that we should reload POR_EL0's value in all cases
>> (user and kernel thread).
>
> The problem with this is trying to figure out which POR_EL0 to use.  The
> kthread could have been spawned ages ago and might not have a POR_EL0
> which is very different from the current value of any of the threads in
> the process right now.
>
> There's also no great way for a kthread to reach out and grab an updated
> value.  It's all completely inherently racy.
>
>> Other approaches could also be considered (e.g. resetting POR_EL0 to
>> unrestricted when creating a kthread), see my reply on v4 [1].
>
> I kinda think this is the only way to go.  It's the only sensible,
> predictable way.  I _think_ it's what x86 will end up doing with PKRU,
> but there's been enough churn there that I'd need to go double check
> what happens in practice.
>


that is also what powerpc does. 

/* usage of kthread_use_mm() should inherit the
 * AMR value of the operating address space. But, the AMR value is
 * thread-specific and we inherit the address space and not thread
 * access restrictions. Because of this ignore AMR value when accessing
 * userspace via kernel thread.
 */
static __always_inline u64 current_thread_amr(void)
{
	if (current->thread.regs)
		return current->thread.regs->amr;
	return default_amr;
}


-aneesh

[v5,06/30] arm64: context switch POR_EL0 register

Commit Message

Comments

Patch