[RFC,5/7] x86/mm/fault: hook up SCI verification

Message ID	1556228754-12996-6-git-send-email-rppt@linux.ibm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-security-module-owner@kernel.org> Gateway: Authorized Use Only! Violators will be prosecuted for <linux-security-module@vger.kernel.org> from <rppt@linux.ibm.com>; Thu, 25 Apr 2019 22:46:26 +0100 Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 25 Apr 2019 22:46:20 +0100 From: Mike Rapoport <rppt@linux.ibm.com> To: linux-kernel@vger.kernel.org Cc: Alexandre Chartre <alexandre.chartre@oracle.com>, Andy Lutomirski <luto@kernel.org>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, "H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>, James Bottomley <James.Bottomley@hansenpartnership.com>, Jonathan Adams <jwadams@google.com>, Kees Cook <keescook@chromium.org>, Paul Turner <pjt@google.com>, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, linux-mm@kvack.org, linux-security-module@vger.kernel.org, x86@kernel.org, Mike Rapoport <rppt@linux.ibm.com> Subject: [RFC PATCH 5/7] x86/mm/fault: hook up SCI verification Date: Fri, 26 Apr 2019 00:45:52 +0300 In-Reply-To: <1556228754-12996-1-git-send-email-rppt@linux.ibm.com> References: <1556228754-12996-1-git-send-email-rppt@linux.ibm.com> Message-Id: <1556228754-12996-6-git-send-email-rppt@linux.ibm.com> Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk
Series	x86: introduce system calls addess space isolation \| expand [RFC,0/7] x86: introduce system calls addess space isolation [RFC,1/7] x86/cpufeatures: add X86_FEATURE_SCI [RFC,2/7] x86/sci: add core implementation for system call isolation [RFC,3/7] x86/entry/64: add infrastructure for switching to isolated syscall context [RFC,4/7] x86/sci: hook up isolated system call entry and exit [RFC,5/7] x86/mm/fault: hook up SCI verification [RFC,6/7] security: enable system call isolation in kernel config [RFC,7/7] sci: add example system calls to exercse SCI

Message ID

1556228754-12996-6-git-send-email-rppt@linux.ibm.com (mailing list archive)

State

New, archived

Headers

From: Mike Rapoport <rppt@linux.ibm.com>
To: linux-kernel@vger.kernel.org
Cc: Alexandre Chartre <alexandre.chartre@oracle.com>,
        Andy Lutomirski <luto@kernel.org>,
        Borislav Petkov <bp@alien8.de>,
        Dave Hansen <dave.hansen@linux.intel.com>,
        "H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
        James Bottomley <James.Bottomley@hansenpartnership.com>,
        Jonathan Adams <jwadams@google.com>,
        Kees Cook <keescook@chromium.org>,
        Paul Turner <pjt@google.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Thomas Gleixner <tglx@linutronix.de>, linux-mm@kvack.org,
        linux-security-module@vger.kernel.org, x86@kernel.org,
        Mike Rapoport <rppt@linux.ibm.com>
Subject: [RFC PATCH 5/7] x86/mm/fault: hook up SCI verification
Date: Fri, 26 Apr 2019 00:45:52 +0300
In-Reply-To: <1556228754-12996-1-git-send-email-rppt@linux.ibm.com>
References: <1556228754-12996-1-git-send-email-rppt@linux.ibm.com>
Message-Id: <1556228754-12996-6-git-send-email-rppt@linux.ibm.com>
Sender: owner-linux-security-module@vger.kernel.org
Precedence: bulk

Series

x86: introduce system calls addess space isolation | expand

Commit Message

Mike Rapoport April 25, 2019, 9:45 p.m. UTC

If a system call runs in isolated context, it's accesses to kernel code and
data will be verified by SCI susbsytem.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
 arch/x86/mm/fault.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

Comments

Peter Zijlstra April 26, 2019, 7:42 a.m. UTC | #1

On Fri, Apr 26, 2019 at 12:45:52AM +0300, Mike Rapoport wrote:
> If a system call runs in isolated context, it's accesses to kernel code and
> data will be verified by SCI susbsytem.
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  arch/x86/mm/fault.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)

There's a distinct lack of touching do_double_fault(). It appears to me
that you'll instantly trigger #DF when you #PF, because the #PF handler
itself will not be able to run.

And then obviously you have to be very careful to make sure #DF can,
_at_all_times_ run, otherwise you'll tripple-fault and we all know what
that does.

Mike Rapoport April 28, 2019, 5:47 a.m. UTC | #2

On Fri, Apr 26, 2019 at 09:42:23AM +0200, Peter Zijlstra wrote:
> On Fri, Apr 26, 2019 at 12:45:52AM +0300, Mike Rapoport wrote:
> > If a system call runs in isolated context, it's accesses to kernel code and
> > data will be verified by SCI susbsytem.
> > 
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> >  arch/x86/mm/fault.c | 28 ++++++++++++++++++++++++++++
> >  1 file changed, 28 insertions(+)
> 
> There's a distinct lack of touching do_double_fault(). It appears to me
> that you'll instantly trigger #DF when you #PF, because the #PF handler
> itself will not be able to run.

The #PF handler is able to run. On interrupt/error entry the cr3 is
switched to the full kernel page tables, pretty much like PTI does for
user <-> kernel transitions. It's in the patch 3.
 
> And then obviously you have to be very careful to make sure #DF can,
> _at_all_times_ run, otherwise you'll tripple-fault and we all know what
> that does.
>

Andy Lutomirski April 30, 2019, 4:44 p.m. UTC | #3

On Sat, Apr 27, 2019 at 10:47 PM Mike Rapoport <rppt@linux.ibm.com> wrote:
>
> On Fri, Apr 26, 2019 at 09:42:23AM +0200, Peter Zijlstra wrote:
> > On Fri, Apr 26, 2019 at 12:45:52AM +0300, Mike Rapoport wrote:
> > > If a system call runs in isolated context, it's accesses to kernel code and
> > > data will be verified by SCI susbsytem.
> > >
> > > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > > ---
> > >  arch/x86/mm/fault.c | 28 ++++++++++++++++++++++++++++
> > >  1 file changed, 28 insertions(+)
> >
> > There's a distinct lack of touching do_double_fault(). It appears to me
> > that you'll instantly trigger #DF when you #PF, because the #PF handler
> > itself will not be able to run.
>
> The #PF handler is able to run. On interrupt/error entry the cr3 is
> switched to the full kernel page tables, pretty much like PTI does for
> user <-> kernel transitions. It's in the patch 3.
>
>

PeterZ meant page_fault, not do_page_fault.  In your patch, page_fault
and some of error_entry run before that magic switchover happens.  If
they're not in the page tables, you double-fault.

And don't even try to do SCI magic in the double-fault handler.  As I
understand it, the SDM and APM aren't kidding when they say that #DF
is an abort, not a fault.  There is a single case in the kernel where
we recover from #DF, and it was vetted by microcode people.

Mike Rapoport May 1, 2019, 5:39 a.m. UTC | #4

On Tue, Apr 30, 2019 at 09:44:09AM -0700, Andy Lutomirski wrote:
> On Sat, Apr 27, 2019 at 10:47 PM Mike Rapoport <rppt@linux.ibm.com> wrote:
> >
> > On Fri, Apr 26, 2019 at 09:42:23AM +0200, Peter Zijlstra wrote:
> > > On Fri, Apr 26, 2019 at 12:45:52AM +0300, Mike Rapoport wrote:
> > > > If a system call runs in isolated context, it's accesses to kernel code and
> > > > data will be verified by SCI susbsytem.
> > > >
> > > > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > > > ---
> > > >  arch/x86/mm/fault.c | 28 ++++++++++++++++++++++++++++
> > > >  1 file changed, 28 insertions(+)
> > >
> > > There's a distinct lack of touching do_double_fault(). It appears to me
> > > that you'll instantly trigger #DF when you #PF, because the #PF handler
> > > itself will not be able to run.
> >
> > The #PF handler is able to run. On interrupt/error entry the cr3 is
> > switched to the full kernel page tables, pretty much like PTI does for
> > user <-> kernel transitions. It's in the patch 3.
> >
> >
> 
> PeterZ meant page_fault, not do_page_fault.  In your patch, page_fault
> and some of error_entry run before that magic switchover happens.  If
> they're not in the page tables, you double-fault.

The entry code is in sci page tables, just like in user-space page tables
with PTI.
 
> And don't even try to do SCI magic in the double-fault handler.  As I
> understand it, the SDM and APM aren't kidding when they say that #DF
> is an abort, not a fault.  There is a single case in the kernel where
> we recover from #DF, and it was vetted by microcode people.
>

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 9d5c75f..baa2a2f 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -18,6 +18,7 @@ 
 #include <linux/uaccess.h>		/* faulthandler_disabled()	*/
 #include <linux/efi.h>			/* efi_recover_from_page_fault()*/
 #include <linux/mm_types.h>
+#include <linux/sci.h>			/* sci_verify_and_map()		*/
 
 #include <asm/cpufeature.h>		/* boot_cpu_has, ...		*/
 #include <asm/traps.h>			/* dotraplinkage, ...		*/
@@ -1254,6 +1255,30 @@  static int fault_in_kernel_space(unsigned long address)
 	return address >= TASK_SIZE_MAX;
 }
 
+#ifdef CONFIG_SYSCALL_ISOLATION
+static int sci_fault(struct pt_regs *regs, unsigned long hw_error_code,
+		     unsigned long address)
+{
+	struct task_struct *tsk = current;
+
+	if (!tsk->in_isolated_syscall)
+		return 0;
+
+	if (!sci_verify_and_map(regs, address, hw_error_code)) {
+		this_cpu_write(cpu_sci.sci_syscall, 0);
+		no_context(regs, hw_error_code, address, SIGKILL, 0);
+	}
+
+	return 1;
+}
+#else
+static inline int sci_fault(struct pt_regs *regs, unsigned long hw_error_code,
+			    unsigned long address)
+{
+	return 0;
+}
+#endif
+
 /*
  * Called for all faults where 'address' is part of the kernel address
  * space.  Might get called for faults that originate from *code* that
@@ -1301,6 +1326,9 @@  do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
 	if (kprobes_fault(regs))
 		return;
 
+	if (sci_fault(regs, hw_error_code, address))
+		return;
+
 	/*
 	 * Note, despite being a "bad area", there are quite a few
 	 * acceptable reasons to get here, such as erratum fixups

[RFC,5/7] x86/mm/fault: hook up SCI verification

Commit Message

Comments

Patch