[0/4] x86/sgx: fine grained SGX MCA behavior

Message ID	20220510031646.3181306-1-zhiquan1.li@intel.com (mailing list archive)
Headers	show Return-Path: <linux-sgx-owner@kernel.org> From: Zhiquan Li <zhiquan1.li@intel.com> To: linux-sgx@vger.kernel.org, tony.luck@intel.com Cc: jarkko@kernel.org, dave.hansen@linux.intel.com, seanjc@google.com, fan.du@intel.com, zhiquan1.li@intel.com Subject: [PATCH 0/4] x86/sgx: fine grained SGX MCA behavior Date: Tue, 10 May 2022 11:16:46 +0800 Message-Id: <20220510031646.3181306-1-zhiquan1.li@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	x86/sgx: fine grained SGX MCA behavior \| expand [0/4] x86/sgx: fine grained SGX MCA behavior [1/4] x86/sgx: Move struct sgx_vepc definition to sgx.h [2/4] x86/sgx: add struct sgx_vepc_page to manage EPC pages for vepc [3/4] x86/sgx: Fine grained SGX MCA behavior for virtualization [4/4] x86/sgx: Fine grained SGX MCA behavior for normal case

Zhiquan Li May 10, 2022, 3:16 a.m. UTC

Hi everyone,

This series contains a few patches to fine grained SGX MCA behavior.

When VM guest access a SGX EPC page with memory failure, current
behavior will kill the guest, expected only kill the SGX application
inside it.

To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra
information for hypervisor to inject #MC information to guest, which
is helpful in SGX virtualization case.

However, current SGX data structures are insufficient to track the
EPC pages for vepc, so we introduce a new struct sgx_vepc_page which
can be the owner of EPC pages for vepc and saves the useful info of
EPC pages for vepc, like struct sgx_encl_page.

Moreover, canonical memory failure collects victim tasks by iterating
all the tasks one by one and use reverse mapping to get victim tasks’
virtual address. This is not necessary for SGX - as one EPC page can
be mapped to ONE enclave only. So, this 1:1 mapping enforcement
allows us to find task virtual address with physical address
directly.

Then we extend the solution for the normal SGX case, so that the task
has opportunity to make further decision while EPC page has memory
failure.

Tests:
1. MCE injection test for SGX in VM.
   As we expected, the application was killed and VM was alive.
2. MCE injection test for SGX on host.
   As we expected, the application received SIGBUS with extra info.
3. Kernel selftest/sgx: PASS
4. Internal SGX stress test: PASS
5. kmemleak test: No memory leakage detected.

Zhiquan Li (4):
  x86/sgx: Move struct sgx_vepc definition to sgx.h
  x86/sgx: add struct sgx_vepc_page to manage EPC pages for vepc
  x86/sgx: Fine grained SGX MCA behavior for virtualization
  x86/sgx: Fine grained SGX MCA behavior for normal case

 arch/x86/kernel/cpu/sgx/main.c | 24 ++++++++++++++++++++++--
 arch/x86/kernel/cpu/sgx/sgx.h  | 12 ++++++++++++
 arch/x86/kernel/cpu/sgx/virt.c | 29 +++++++++++++++++++----------
 3 files changed, 53 insertions(+), 12 deletions(-)

Jarkko Sakkinen May 11, 2022, 10:29 a.m. UTC | #1

On Tue, May 10, 2022 at 11:16:46AM +0800, Zhiquan Li wrote:
> Hi everyone,
> 
> This series contains a few patches to fine grained SGX MCA behavior.
> 
> When VM guest access a SGX EPC page with memory failure, current
> behavior will kill the guest, expected only kill the SGX application
> inside it.
> 
> To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra
> information for hypervisor to inject #MC information to guest, which
> is helpful in SGX virtualization case.
> 
> However, current SGX data structures are insufficient to track the
> EPC pages for vepc, so we introduce a new struct sgx_vepc_page which
> can be the owner of EPC pages for vepc and saves the useful info of
> EPC pages for vepc, like struct sgx_encl_page.
> 
> Moreover, canonical memory failure collects victim tasks by iterating
> all the tasks one by one and use reverse mapping to get victim tasks’
> virtual address. This is not necessary for SGX - as one EPC page can
> be mapped to ONE enclave only. So, this 1:1 mapping enforcement
> allows us to find task virtual address with physical address
> directly.

Hmm... An enclave can be shared by multiple processes. The virtual
address is the same but there can be variable number of processes
having it mapped.

> 
> Then we extend the solution for the normal SGX case, so that the task
> has opportunity to make further decision while EPC page has memory
> failure.
> 
> Tests:
> 1. MCE injection test for SGX in VM.
>    As we expected, the application was killed and VM was alive.
> 2. MCE injection test for SGX on host.
>    As we expected, the application received SIGBUS with extra info.
> 3. Kernel selftest/sgx: PASS
> 4. Internal SGX stress test: PASS
> 5. kmemleak test: No memory leakage detected.
> 
> Zhiquan Li (4):
>   x86/sgx: Move struct sgx_vepc definition to sgx.h
>   x86/sgx: add struct sgx_vepc_page to manage EPC pages for vepc
>   x86/sgx: Fine grained SGX MCA behavior for virtualization
>   x86/sgx: Fine grained SGX MCA behavior for normal case
> 
>  arch/x86/kernel/cpu/sgx/main.c | 24 ++++++++++++++++++++++--
>  arch/x86/kernel/cpu/sgx/sgx.h  | 12 ++++++++++++
>  arch/x86/kernel/cpu/sgx/virt.c | 29 +++++++++++++++++++----------
>  3 files changed, 53 insertions(+), 12 deletions(-)
> 
> -- 
> 2.25.1
> 

BR, Jarkko

Zhiquan Li May 12, 2022, 12:03 p.m. UTC | #2

On 2022/5/11 18:29, Jarkko Sakkinen wrote:
> On Tue, May 10, 2022 at 11:16:46AM +0800, Zhiquan Li wrote:
>> Hi everyone,
>>
>> This series contains a few patches to fine grained SGX MCA behavior.
>>
>> When VM guest access a SGX EPC page with memory failure, current
>> behavior will kill the guest, expected only kill the SGX application
>> inside it.
>>
>> To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra
>> information for hypervisor to inject #MC information to guest, which
>> is helpful in SGX virtualization case.
>>
>> However, current SGX data structures are insufficient to track the
>> EPC pages for vepc, so we introduce a new struct sgx_vepc_page which
>> can be the owner of EPC pages for vepc and saves the useful info of
>> EPC pages for vepc, like struct sgx_encl_page.
>>
>> Moreover, canonical memory failure collects victim tasks by iterating
>> all the tasks one by one and use reverse mapping to get victim tasks’
>> virtual address. This is not necessary for SGX - as one EPC page can
>> be mapped to ONE enclave only. So, this 1:1 mapping enforcement
>> allows us to find task virtual address with physical address
>> directly.
> 
> Hmm... An enclave can be shared by multiple processes. The virtual
> address is the same but there can be variable number of processes
> having it mapped.

Thanks for your review, Jarkko.
You’re right, enclave can be shared.

Actually, we had discussed this issue internally. Assuming below
scenario:
An enclave provides multiple ecalls and services for several tasks. If
one task invokes an ecall and meets MCE, but the other tasks would not
use that ecall, shall we kill all the sharing tasks immediately? It looks
a little abrupt. Maybe it’s better to kill them when they really meet the
HW poison page.
Furthermore, once an EPC page has been poisoned, it will not be allocated
anymore, so it would not be propagated.
Therefore, we minimized the changes, just fine grained the behavior of
SIGBUG and kept the other behavior as before.

Do you think the processes sharing the same enclave need to be killed,
even they had not touched the EPC page with hardware error?
Any ideas are welcome.

Best Regards,
Zhiquan

> 
>>
>> Then we extend the solution for the normal SGX case, so that the task
>> has opportunity to make further decision while EPC page has memory
>> failure.
>>
>> Tests:
>> 1. MCE injection test for SGX in VM.
>>    As we expected, the application was killed and VM was alive.
>> 2. MCE injection test for SGX on host.
>>    As we expected, the application received SIGBUS with extra info.
>> 3. Kernel selftest/sgx: PASS
>> 4. Internal SGX stress test: PASS
>> 5. kmemleak test: No memory leakage detected.
>>
>> Zhiquan Li (4):
>>   x86/sgx: Move struct sgx_vepc definition to sgx.h
>>   x86/sgx: add struct sgx_vepc_page to manage EPC pages for vepc
>>   x86/sgx: Fine grained SGX MCA behavior for virtualization
>>   x86/sgx: Fine grained SGX MCA behavior for normal case
>>
>>  arch/x86/kernel/cpu/sgx/main.c | 24 ++++++++++++++++++++++--
>>  arch/x86/kernel/cpu/sgx/sgx.h  | 12 ++++++++++++
>>  arch/x86/kernel/cpu/sgx/virt.c | 29 +++++++++++++++++++----------
>>  3 files changed, 53 insertions(+), 12 deletions(-)
>>
>> -- 
>> 2.25.1
>>
> 
> BR, Jarkko

Jarkko Sakkinen May 13, 2022, 2:38 p.m. UTC | #3

On Thu, May 12, 2022 at 08:03:30PM +0800, Zhiquan Li wrote:
> 
> On 2022/5/11 18:29, Jarkko Sakkinen wrote:
> > On Tue, May 10, 2022 at 11:16:46AM +0800, Zhiquan Li wrote:
> >> Hi everyone,
> >>
> >> This series contains a few patches to fine grained SGX MCA behavior.
> >>
> >> When VM guest access a SGX EPC page with memory failure, current
> >> behavior will kill the guest, expected only kill the SGX application
> >> inside it.
> >>
> >> To fix it we send SIGBUS with code BUS_MCEERR_AR and some extra
> >> information for hypervisor to inject #MC information to guest, which
> >> is helpful in SGX virtualization case.
> >>
> >> However, current SGX data structures are insufficient to track the
> >> EPC pages for vepc, so we introduce a new struct sgx_vepc_page which
> >> can be the owner of EPC pages for vepc and saves the useful info of
> >> EPC pages for vepc, like struct sgx_encl_page.
> >>
> >> Moreover, canonical memory failure collects victim tasks by iterating
> >> all the tasks one by one and use reverse mapping to get victim tasks’
> >> virtual address. This is not necessary for SGX - as one EPC page can
> >> be mapped to ONE enclave only. So, this 1:1 mapping enforcement
> >> allows us to find task virtual address with physical address
> >> directly.
> > 
> > Hmm... An enclave can be shared by multiple processes. The virtual
> > address is the same but there can be variable number of processes
> > having it mapped.
> 
> Thanks for your review, Jarkko.
> You’re right, enclave can be shared.
> 
> Actually, we had discussed this issue internally. Assuming below
> scenario:
> An enclave provides multiple ecalls and services for several tasks. If
> one task invokes an ecall and meets MCE, but the other tasks would not
> use that ecall, shall we kill all the sharing tasks immediately? It looks
> a little abrupt. Maybe it’s better to kill them when they really meet the
> HW poison page.
> Furthermore, once an EPC page has been poisoned, it will not be allocated
> anymore, so it would not be propagated.
> Therefore, we minimized the changes, just fine grained the behavior of
> SIGBUG and kept the other behavior as before.
> 
> Do you think the processes sharing the same enclave need to be killed,
> even they had not touched the EPC page with hardware error?
> Any ideas are welcome.

I do not think the patch set is going to wrong direction. This discussion
was just missing from the cover letter.

BR, Jarkko

Luck, Tony May 13, 2022, 4:35 p.m. UTC | #4

>> Do you think the processes sharing the same enclave need to be killed,
>> even they had not touched the EPC page with hardware error?
>> Any ideas are welcome.
>
> I do not think the patch set is going to wrong direction. This discussion
> was just missing from the cover letter.

I was under the impression that when an enclave page triggers a machine check
the whole enclave is (somehow) marked bad, so that it couldn't be entered again.

Killing other processes with the same enclave mapped would perhaps be overkill,
but they are going to find that the enclave is "dead" next time they try to use it.

-Tony

Zhiquan Li May 14, 2022, 5:39 a.m. UTC | #5

On 2022/5/14 00:35, Luck, Tony wrote:
>>> Do you think the processes sharing the same enclave need to be killed,
>>> even they had not touched the EPC page with hardware error?
>>> Any ideas are welcome.
>>
>> I do not think the patch set is going to wrong direction. This discussion
>> was just missing from the cover letter.

OK, I will add this point into v2 of cover letter and patch 03.

> 
> I was under the impression that when an enclave page triggers a machine check
> the whole enclave is (somehow) marked bad, so that it couldn't be entered again.
> 
> Killing other processes with the same enclave mapped would perhaps be overkill,
> but they are going to find that the enclave is "dead" next time they try to use it.

Thanks for your clarification, Tony.
You reminded me to check Intel SDM vol.3, 38.15.1 Interactions with MCA Events:

"All architecturally visible machine check events (#MC and CMCI) that are detected
while inside an enclave cause an asynchronous enclave exit.
Any machine check exception (#MC) that occurs after Intel SGX is first enables
causes Intel SGX to be disabled, (CPUID.SGX_Leaf.0:EAX[SGX1] == 0). It cannot be
enabled until after the next reset. "

So, I suppose current behavior would be gently enough, other processes with the
same enclave mapped should get rid of it if they really need to use the enclave
again. If we expect those processes to be early killed, it worth another patch set
to archive it.

Best Regards,
Zhiquan

> 
> -Tony

Luck, Tony May 15, 2022, 3:35 a.m. UTC | #6

> Any machine check exception (#MC) that occurs after Intel SGX is first enables
> causes Intel SGX to be disabled, (CPUID.SGX_Leaf.0:EAX[SGX1] == 0). It cannot be
> enabled until after the next reset. "

That part is out of date. A machine check used to disable SGX system-wide. It now just
disables the enclave that triggered the machine check.

Is that text still in the latest SDM (version 077, April 2022)?

-Tony

Zhiquan Li May 16, 2022, 12:57 a.m. UTC | #7

On 2022/5/15 11:35, Luck, Tony wrote:
>> Any machine check exception (#MC) that occurs after Intel SGX is first enables
>> causes Intel SGX to be disabled, (CPUID.SGX_Leaf.0:EAX[SGX1] == 0). It cannot be
>> enabled until after the next reset. "
> 
> That part is out of date. A machine check used to disable SGX system-wide. It now just
> disables the enclave that triggered the machine check.
> 
> Is that text still in the latest SDM (version 077, April 2022)?

Yes, I've double checked it, version 077, Apr. 2022.
Looks like SDM needs to follow up :-)

Best Regards,
Zhiquan

> 
> -Tony

Kai Huang May 16, 2022, 2:29 a.m. UTC | #8

On Sat, 2022-05-14 at 13:39 +0800, Zhiquan Li wrote:
> On 2022/5/14 00:35, Luck, Tony wrote:
> > > > Do you think the processes sharing the same enclave need to be killed,
> > > > even they had not touched the EPC page with hardware error?
> > > > Any ideas are welcome.
> > > 
> > > I do not think the patch set is going to wrong direction. This discussion
> > > was just missing from the cover letter.
> 
> OK, I will add this point into v2 of cover letter and patch 03.
> 
> 

I don't think you should add to patch 03.  The same enclave can be shared by
multiple processes is only true for host enclaves, but not virtual EPC isntance.
Virtual EPC instance is just a raw EPC resource which is accessible to guest,
but how enclaves are created on this EPC resource is completely upto guest. 
Therefore, one virtual EPC cannot be shared by two guests.

Zhiquan Li May 16, 2022, 8:40 a.m. UTC | #9

On 2022/5/16 10:29, Kai Huang wrote:
> On Sat, 2022-05-14 at 13:39 +0800, Zhiquan Li wrote:
>> On 2022/5/14 00:35, Luck, Tony wrote:
>>>>> Do you think the processes sharing the same enclave need to be killed,
>>>>> even they had not touched the EPC page with hardware error?
>>>>> Any ideas are welcome.
>>>> I do not think the patch set is going to wrong direction. This discussion
>>>> was just missing from the cover letter.
>> OK, I will add this point into v2 of cover letter and patch 03.
>>
>>
> I don't think you should add to patch 03.  The same enclave can be shared by
> multiple processes is only true for host enclaves, but not virtual EPC isntance.
> Virtual EPC instance is just a raw EPC resource which is accessible to guest,
> but how enclaves are created on this EPC resource is completely upto guest. 
> Therefore, one virtual EPC cannot be shared by two guests.
> 

Thanks for your clarification, Kai.
Patch 04 is for the host case, we can put it there.

May I add your comments in patch 03 and add you as "Acked-by" in v2?
I suppose it is a good answer if someone has the same question.

Best Regards,
Zhiquan

> 
> -- Thanks, -Kai

Kai Huang May 17, 2022, 12:43 a.m. UTC | #10

On Mon, 2022-05-16 at 16:40 +0800, Zhiquan Li wrote:
> On 2022/5/16 10:29, Kai Huang wrote:
> > On Sat, 2022-05-14 at 13:39 +0800, Zhiquan Li wrote:
> > > On 2022/5/14 00:35, Luck, Tony wrote:
> > > > > > Do you think the processes sharing the same enclave need to be killed,
> > > > > > even they had not touched the EPC page with hardware error?
> > > > > > Any ideas are welcome.
> > > > > I do not think the patch set is going to wrong direction. This discussion
> > > > > was just missing from the cover letter.
> > > OK, I will add this point into v2 of cover letter and patch 03.
> > > 
> > > 
> > I don't think you should add to patch 03.  The same enclave can be shared by
> > multiple processes is only true for host enclaves, but not virtual EPC isntance.
> > Virtual EPC instance is just a raw EPC resource which is accessible to guest,
> > but how enclaves are created on this EPC resource is completely upto guest. 
> > Therefore, one virtual EPC cannot be shared by two guests.
> > 
> 
> Thanks for your clarification, Kai.
> Patch 04 is for the host case, we can put it there.
> 
> May I add your comments in patch 03 and add you as "Acked-by" in v2?
> I suppose it is a good answer if someone has the same question.
> 

Yes you can add the comments to the patch.

One problem is currently we don't explicitly prevent vepc being shared via
fork().  For instance, an application can open /dev/sgx_vepc, mmap(), and
fork().  Then if the child does mmap() against the fd opened by the parent, the
child will share the same vepc with the parent.

We can either explicitly enforce that only one process can mmap() to one vepc in
mmap() of vepc, or we can put into comment saying something like below:

	Unlike host encalves, virtual EPC instance cannot be shared by multiple
	VMs.  It is because how enclaves are created is totally upto the guest.
	Sharing virtual EPC instance will very likely to unexpectedly break
	enclaves in all VMs.

	SGX virtual EPC driver doesn't explicitly prevent virtual EPC instance
	being shared by multiple VMs via fork().  However KVM doesn't support
	running a VM across multiple mm structures, and the de facto userspace
	hypervisor (Qemu) doesn't use fork() to create a new VM, so in practice
	this should not happen.

I'd like to hear from others whether simply adding some comment is fine.

Zhiquan Li May 18, 2022, 1:02 a.m. UTC | #11

On 2022/5/17 08:43, Kai Huang wrote:
> On Mon, 2022-05-16 at 16:40 +0800, Zhiquan Li wrote:
>> On 2022/5/16 10:29, Kai Huang wrote:
>>> On Sat, 2022-05-14 at 13:39 +0800, Zhiquan Li wrote:
>>>> On 2022/5/14 00:35, Luck, Tony wrote:
>>>>>>> Do you think the processes sharing the same enclave need to be killed,
>>>>>>> even they had not touched the EPC page with hardware error?
>>>>>>> Any ideas are welcome.
>>>>>> I do not think the patch set is going to wrong direction. This discussion
>>>>>> was just missing from the cover letter.
>>>> OK, I will add this point into v2 of cover letter and patch 03.
>>>>
>>>>
>>> I don't think you should add to patch 03.  The same enclave can be shared by
>>> multiple processes is only true for host enclaves, but not virtual EPC isntance.
>>> Virtual EPC instance is just a raw EPC resource which is accessible to guest,
>>> but how enclaves are created on this EPC resource is completely upto guest. 
>>> Therefore, one virtual EPC cannot be shared by two guests.
>>>
>> Thanks for your clarification, Kai.
>> Patch 04 is for the host case, we can put it there.
>>
>> May I add your comments in patch 03 and add you as "Acked-by" in v2?
>> I suppose it is a good answer if someone has the same question.
>>
> Yes you can add the comments to the patch.
> 
> One problem is currently we don't explicitly prevent vepc being shared via
> fork().  For instance, an application can open /dev/sgx_vepc, mmap(), and
> fork().  Then if the child does mmap() against the fd opened by the parent, the
> child will share the same vepc with the parent.
> 
> We can either explicitly enforce that only one process can mmap() to one vepc in
> mmap() of vepc, or we can put into comment saying something like below:
> 
> 	Unlike host encalves, virtual EPC instance cannot be shared by multiple
> 	VMs.  It is because how enclaves are created is totally upto the guest.
> 	Sharing virtual EPC instance will very likely to unexpectedly break
> 	enclaves in all VMs.
> 
> 	SGX virtual EPC driver doesn't explicitly prevent virtual EPC instance
> 	being shared by multiple VMs via fork().  However KVM doesn't support
> 	running a VM across multiple mm structures, and the de facto userspace
> 	hypervisor (Qemu) doesn't use fork() to create a new VM, so in practice
> 	this should not happen.
> 

Very excellent explanation!! I will put them into patch 03.
Thanks a lot, Kai.

Best Regards,
Zhiquan

> I'd like to hear from others whether simply adding some comment is fine.
> 
> -- Thanks, -Kai

[0/4] x86/sgx: fine grained SGX MCA behavior

Message

Comments