[v33,03/21] x86/mm: x86/sgx: Signal SIGSEGV with PF_SGX

Message ID	20200617220844.57423-4-jarkko.sakkinen@linux.intel.com (mailing list archive)
State	Rejected
Headers	show Return-Path: <SRS0=OlnE=76=vger.kernel.org=linux-sgx-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A3021913 for <patchwork-linux-sgx@patchwork.kernel.org>; Wed, 17 Jun 2020 22:09:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9597921897 for <patchwork-linux-sgx@patchwork.kernel.org>; Wed, 17 Jun 2020 22:09:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727001AbgFQWJn (ORCPT <rfc822;patchwork-linux-sgx@patchwork.kernel.org>); Wed, 17 Jun 2020 18:09:43 -0400 Received: from mga17.intel.com ([192.55.52.151]:6776 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726761AbgFQWJn (ORCPT <rfc822;linux-sgx@vger.kernel.org>); Wed, 17 Jun 2020 18:09:43 -0400 IronPort-SDR: LzISDdAfkwhv91AI3pOrdTaihqS2++aGQgzh36nb/4j49IdmpqLNrEt3Pk8xO5iOjRrQFqoSIQ UD6Kw5DG/1uQ== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jun 2020 15:09:42 -0700 IronPort-SDR: OKpSvoY6XScPQyXwVwCy566jnRcZjOxQqudcA3tB2iS7nq6Spa2edilQvkDEg2leDyK93G1Cl5 bIRYNfjAeCeQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,523,1583222400"; d="scan'208";a="421287854" Received: from ysharon1-mobl1.ger.corp.intel.com (HELO localhost) ([10.252.49.131]) by orsmga004.jf.intel.com with ESMTP; 17 Jun 2020 15:09:30 -0700 From: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> To: x86@kernel.org, linux-sgx@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Sean Christopherson <sean.j.christopherson@intel.com>, Jethro Beekman <jethro@fortanix.com>, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>, akpm@linux-foundation.org, andriy.shevchenko@linux.intel.com, asapek@google.com, bp@alien8.de, cedric.xing@intel.com, chenalexchen@google.com, conradparker@google.com, cyhanish@google.com, dave.hansen@intel.com, haitao.huang@intel.com, josh@joshtriplett.org, kai.huang@intel.com, kai.svahn@intel.com, kmoy@google.com, ludloff@google.com, luto@kernel.org, nhorman@redhat.com, npmccallum@redhat.com, puiterwijk@redhat.com, rientjes@google.com, tglx@linutronix.de, yaozhangx@google.com Subject: [PATCH v33 03/21] x86/mm: x86/sgx: Signal SIGSEGV with PF_SGX Date: Thu, 18 Jun 2020 01:08:25 +0300 Message-Id: <20200617220844.57423-4-jarkko.sakkinen@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200617220844.57423-1-jarkko.sakkinen@linux.intel.com> References: <20200617220844.57423-1-jarkko.sakkinen@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-sgx-owner@vger.kernel.org Precedence: bulk List-ID: <linux-sgx.vger.kernel.org> X-Mailing-List: linux-sgx@vger.kernel.org
Series	Intel SGX foundations \| expand [v33,00/21] Intel SGX foundations [v33,01/21] x86/cpufeatures: x86/msr: Add Intel SGX hardware bits [v33,02/21] x86/cpufeatures: x86/msr: Add Intel SGX Launch Control hardware bits [v33,03/21] x86/mm: x86/sgx: Signal SIGSEGV with PF_SGX [v33,04/21] x86/sgx: Add SGX microarchitectural data structures [v33,05/21] x86/sgx: Add wrappers for ENCLS leaf functions [v33,06/21] x86/cpu/intel: Detect SGX support [v33,07/21] x86/cpu/intel: Add nosgx kernel parameter [v33,08/21] x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections [v33,09/21] x86/sgx: Add __sgx_alloc_epc_page() and sgx_free_epc_page() [v33,10/21] mm: Introduce vm_ops->may_mprotect() [v33,11/21] x86/sgx: Linux Enclave Driver [v33,12/21] x86/sgx: Allow a limited use of ATTRIBUTE.PROVISIONKEY for attestation [v33,13/21] x86/sgx: Add a page reclaimer [v33,14/21] x86/sgx: ptrace() support for the SGX driver [v33,15/21] x86/vdso: Add support for exception fixup in vDSO functions [v33,16/21] x86/fault: Add helper function to sanitize error code [v33,17/21] x86/traps: Attempt to fixup exceptions in vDSO before signaling [v33,18/21] x86/vdso: Implement a vDSO for Intel SGX enclave call [v33,19/21] selftests/x86: Add a selftest for SGX [v33,20/21] docs: x86/sgx: Document SGX micro architecture and kernel internals [v33,21/21] x86/sgx: Update MAINTAINERS

Jarkko Sakkinen June 17, 2020, 10:08 p.m. UTC

From: Sean Christopherson <sean.j.christopherson@intel.com>

Include SGX bit to the PF error codes and throw SIGSEGV with PF_SGX when
a #PF with SGX set happens.

CPU throws a #PF with the SGX bit in the event of Enclave Page Cache Map
(EPCM) conflict. The EPCM is a CPU-internal table, which describes the
properties for a enclave page. Enclaves are measured and signed software
entities, which SGX hosts. [1]

Although the primary purpose of the EPCM conflict checks  is to prevent
malicious accesses to an enclave, an illegit access can happen also for
legit reasons.

All SGX reserved memory, including EPCM is encrypted with a transient
key that does not survive from the power transition. Throwing a SIGSEGV
allows user space software react when this happens (e.g. rec-create the
enclave, which was invalidated).

[1] Intel SDM: 36.5.1 Enclave Page Cache Map (EPCM)

Acked-by: Jethro Beekman <jethro@fortanix.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 arch/x86/include/asm/traps.h |  1 +
 arch/x86/mm/fault.c          | 13 +++++++++++++
 2 files changed, 14 insertions(+)

Borislav Petkov June 25, 2020, 8:59 a.m. UTC | #1

On Thu, Jun 18, 2020 at 01:08:25AM +0300, Jarkko Sakkinen wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Include SGX bit to the PF error codes and throw SIGSEGV with PF_SGX when
> a #PF with SGX set happens.
> 
> CPU throws a #PF with the SGX bit in the event of Enclave Page Cache Map
				   ^
				   set

> (EPCM) conflict. The EPCM is a CPU-internal table, which describes the
> properties for a enclave page. Enclaves are measured and signed software
> entities, which SGX hosts. [1]
> 
> Although the primary purpose of the EPCM conflict checks  is to prevent
> malicious accesses to an enclave, an illegit access can happen also for
> legit reasons.
> 
> All SGX reserved memory, including EPCM is encrypted with a transient
> key that does not survive from the power transition. Throwing a SIGSEGV
> allows user space software react when this happens (e.g. rec-create the
			    ^
			    to				   recreate

> enclave, which was invalidated).
> 
> [1] Intel SDM: 36.5.1 Enclave Page Cache Map (EPCM)
> 
> Acked-by: Jethro Beekman <jethro@fortanix.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> ---
>  arch/x86/include/asm/traps.h |  1 +
>  arch/x86/mm/fault.c          | 13 +++++++++++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
> index 714b1a30e7b0..ee3617b67bf4 100644
> --- a/arch/x86/include/asm/traps.h
> +++ b/arch/x86/include/asm/traps.h
> @@ -58,5 +58,6 @@ enum x86_pf_error_code {
>  	X86_PF_RSVD	=		1 << 3,
>  	X86_PF_INSTR	=		1 << 4,
>  	X86_PF_PK	=		1 << 5,
> +	X86_PF_SGX	=		1 << 15,

Needs to be added to the doc above it.

>  #endif /* _ASM_X86_TRAPS_H */
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 66be9bd60307..25d48aae36c1 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1055,6 +1055,19 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
>  	if (error_code & X86_PF_PK)
>  		return 1;
>  
> +	/*
> +	 * Access is blocked by the Enclave Page Cache Map (EPCM), i.e. the
> +	 * access is allowed by the PTE but not the EPCM. This usually happens
> +	 * when the EPCM is yanked out from under us, e.g. by hardware after a
> +	 * suspend/resume cycle. In any case, software, i.e. the kernel, can't
> +	 * fix the source of the fault as the EPCM can't be directly modified by
> +	 * software. Handle the fault as an access error in order to signal
> +	 * userspace so that userspace can rebuild their enclave(s), even though
> +	 * userspace may not have actually violated access permissions.
> +	 */

Lemme check whether I understand this correctly: userspace must check
whether the SIGSEGV is generated on an access to an enclave page?

Also, do I see it correctly that when this happens, dmesg will have

        printk("%s%s[%d]: segfault at %lx ip %px sp %px error %lx",

due to:

       if (likely(show_unhandled_signals))
               show_signal_msg(regs, error_code, address, tsk);

which does:

        if (!unhandled_signal(tsk, SIGSEGV))
                return;

or is the task expected to register a SIGSEGV handler so that the
segfault doesn't land in dmesg?

If so, are we documenting this?

If not, then we should not issue any "segfault" messages to dmesg
because that would be wrong.

Or maybe I'm not seeing it right but I don't have the hardware to test
this out...

Thx.

Sean Christopherson June 25, 2020, 3:34 p.m. UTC | #2

On Thu, Jun 25, 2020 at 10:59:31AM +0200, Borislav Petkov wrote:
> On Thu, Jun 18, 2020 at 01:08:25AM +0300, Jarkko Sakkinen wrote:
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index 66be9bd60307..25d48aae36c1 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -1055,6 +1055,19 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
> >  	if (error_code & X86_PF_PK)
> >  		return 1;
> >  
> > +	/*
> > +	 * Access is blocked by the Enclave Page Cache Map (EPCM), i.e. the
> > +	 * access is allowed by the PTE but not the EPCM. This usually happens
> > +	 * when the EPCM is yanked out from under us, e.g. by hardware after a
> > +	 * suspend/resume cycle. In any case, software, i.e. the kernel, can't
> > +	 * fix the source of the fault as the EPCM can't be directly modified by
> > +	 * software. Handle the fault as an access error in order to signal
> > +	 * userspace so that userspace can rebuild their enclave(s), even though
> > +	 * userspace may not have actually violated access permissions.
> > +	 */
> 
> Lemme check whether I understand this correctly: userspace must check
> whether the SIGSEGV is generated on an access to an enclave page?

Sort of.  Technically it's that's an accurate statement, but practically
speaking userspace can only access enclave pages when it is executing in
the enclave, and exceptions in enclaves have unique behavior.  Exceptions
in enclaves essentially bounce through a userspace-software-defined
location prior to being delivered to the kernel.  The trampoline is done
by the CPU so that the CPU can scrub the GPRs, XSAVE state, etc... and
hide the true RIP of the exception.  The pre-exception enclave state is
saved into protected memory and restored when userspace resumes the enclave.

Enterring or resuming an enclave can only be done through dedicted ENCLU
instructions, so really it ends up being that the SIGSEGV handler needs to
check the IP that "caused" the fault, which is actually the IP of the
trampoline.

But, that's only the first half of the story...

> Also, do I see it correctly that when this happens, dmesg will have
> 
>         printk("%s%s[%d]: segfault at %lx ip %px sp %px error %lx",
> 
> due to:
> 
>        if (likely(show_unhandled_signals))
>                show_signal_msg(regs, error_code, address, tsk);
> 
> which does:
> 
>         if (!unhandled_signal(tsk, SIGSEGV))
>                 return;
> 
> or is the task expected to register a SIGSEGV handler so that the
> segfault doesn't land in dmesg?

Yes, without extra help, any task running an enclave is expected to register
a SIGSEGV handler so that the task can restart the enclave if the EPC is
"lost".

However, building and running enclaves is complex, and the vast majority of
SGX enabled applications are expected to leverage a library of one kind or
another to hand the bulk of the gory details.  But, signal handling in
libraries is a mess, e.g. requires filtering/forwarding, resignaling, etc...

To that end, in v14 of this patch[1], Andy Lutomirski came up with the idea
of adding a vDSO function to provide the low level enclave EENTER/ERESUME and
trampoline, and then teaching the kernel to do exception fixup on the
relevant instructions in the vDSO.  The vDSO's exception fixup then returns
to normal userspace, with a (technically optional) struct holding the details
of the exception.  That allows for synchronous delivery of exceptions in
enclaves, obviates the need for userspace to regsiter a SIGSEGV handler, and
also means the SIGSEGV will never show up in dmesg so long as userspace is
using the vDSO.  The kernel still supports direct EENTER/ERESUME, but AFAIK
everyone is moving (or has moved) to the vDSO interface.

The vDSO stuff is in patches 15-18 of this series.

There's a gigantic thread on all the alternatives that were considered[2].

[1] https://lkml.kernel.org/r/CALCETrXByb2UVuZ6AXUeOd8y90NAikbZuvdN3wf_TjHZ+CxNhA@mail.gmail.com
[2] https://lkml.kernel.org/r/CALCETrWdpoDkbZjkucKL91GWpDPG9p=VqYrULade2pFDR7S=GQ@mail.gmail.com

> 
> If so, are we documenting this?
> 
> If not, then we should not issue any "segfault" messages to dmesg
> because that would be wrong.
> 
> Or maybe I'm not seeing it right but I don't have the hardware to test
> this out...
> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

Borislav Petkov June 25, 2020, 4:49 p.m. UTC | #3

On Thu, Jun 25, 2020 at 08:34:31AM -0700, Sean Christopherson wrote:
> However, building and running enclaves is complex, and the vast majority of
> SGX enabled applications are expected to leverage a library of one kind or
> another to hand the bulk of the gory details.

I gotta say this rings a bell: dhansen alluded on IRC to the jumping
through hoops one needs to do in order to run SGX enclaves.

...

> The vDSO stuff is in patches 15-18 of this series.
> 
> There's a gigantic thread on all the alternatives that were considered[2].
> 
> [1] https://lkml.kernel.org/r/CALCETrXByb2UVuZ6AXUeOd8y90NAikbZuvdN3wf_TjHZ+CxNhA@mail.gmail.com
> [2] https://lkml.kernel.org/r/CALCETrWdpoDkbZjkucKL91GWpDPG9p=VqYrULade2pFDR7S=GQ@mail.gmail.com

Yeah, that makes it very clear. Thanks a lot for taking the time and
writing it down. I've snipped it for brevity but it is very useful!

Thx!

Jarkko Sakkinen June 25, 2020, 8:52 p.m. UTC | #4

On Thu, Jun 25, 2020 at 10:59:31AM +0200, Borislav Petkov wrote:
> On Thu, Jun 18, 2020 at 01:08:25AM +0300, Jarkko Sakkinen wrote:
> > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > 
> > Include SGX bit to the PF error codes and throw SIGSEGV with PF_SGX when
> > a #PF with SGX set happens.
> > 
> > CPU throws a #PF with the SGX bit in the event of Enclave Page Cache Map
> 				   ^
> 				   set
> 
> > (EPCM) conflict. The EPCM is a CPU-internal table, which describes the
> > properties for a enclave page. Enclaves are measured and signed software
> > entities, which SGX hosts. [1]
> > 
> > Although the primary purpose of the EPCM conflict checks  is to prevent
> > malicious accesses to an enclave, an illegit access can happen also for
> > legit reasons.
> > 
> > All SGX reserved memory, including EPCM is encrypted with a transient
> > key that does not survive from the power transition. Throwing a SIGSEGV
> > allows user space software react when this happens (e.g. rec-create the
> 			    ^
> 			    to				   recreate
> 
> > enclave, which was invalidated).
> > 
> > [1] Intel SDM: 36.5.1 Enclave Page Cache Map (EPCM)
> > 
> > Acked-by: Jethro Beekman <jethro@fortanix.com>
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> > ---
> >  arch/x86/include/asm/traps.h |  1 +
> >  arch/x86/mm/fault.c          | 13 +++++++++++++
> >  2 files changed, 14 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
> > index 714b1a30e7b0..ee3617b67bf4 100644
> > --- a/arch/x86/include/asm/traps.h
> > +++ b/arch/x86/include/asm/traps.h
> > @@ -58,5 +58,6 @@ enum x86_pf_error_code {
> >  	X86_PF_RSVD	=		1 << 3,
> >  	X86_PF_INSTR	=		1 << 4,
> >  	X86_PF_PK	=		1 << 5,
> > +	X86_PF_SGX	=		1 << 15,
> 
> Needs to be added to the doc above it.

I ended up with:

 *   bit 5 ==				1: protection keys block access
 *   bit 6 ==				1: inside SGX enclave
 */

/Jarkko

Borislav Petkov June 25, 2020, 9:11 p.m. UTC | #5

On Thu, Jun 25, 2020 at 11:52:11PM +0300, Jarkko Sakkinen wrote:
> I ended up with:
> 
>  *   bit 5 ==				1: protection keys block access
>  *   bit 6 ==				1: inside SGX enclave

You mean bit 15.

Jarkko Sakkinen June 26, 2020, 1:34 p.m. UTC | #6

On Thu, Jun 25, 2020 at 11:11:03PM +0200, Borislav Petkov wrote:
> On Thu, Jun 25, 2020 at 11:52:11PM +0300, Jarkko Sakkinen wrote:
> > I ended up with:
> > 
> >  *   bit 5 ==				1: protection keys block access
> >  *   bit 6 ==				1: inside SGX enclave
> 
> You mean bit 15.

Duh, did this last thing before falling into sleep last night :-/

Yes, it should be 15.

I'll also rephrase the text to "inside an SGX enclave".

/Jarkko

[v33,03/21] x86/mm: x86/sgx: Signal SIGSEGV with PF_SGX

Commit Message

Comments

Patch