diff mbox series

[1/2] x86: sgx_vepc: extract sgx_vepc_remove_page

Message ID 20210920125401.2389105-2-pbonzini@redhat.com (mailing list archive)
State New, archived
Headers show
Series x86: sgx_vepc: implement ioctl to EREMOVE all pages | expand

Commit Message

Paolo Bonzini Sept. 20, 2021, 12:54 p.m. UTC
For bare-metal SGX on real hardware, the hardware provides guarantees
SGX state at reboot.  For instance, all pages start out uninitialized.
The vepc driver provides a similar guarantee today for freshly-opened
vepc instances, but guests such as Windows expect all pages to be in
uninitialized state on startup, including after every guest reboot.

One way to do this is to simply close and reopen the /dev/sgx_vepc file
descriptor and re-mmap the virtual EPC.  However, this is problematic
because it prevents sandboxing the userspace (for example forbidding
open() after the guest starts, or running in a mount namespace that
does not have access to /dev; both are doable with pre-opened file
descriptors and/or SCM_RIGHTS file descriptor passing).

In order to implement this, we will need a ioctl that performs
EREMOVE on all pages mapped by a /dev/sgx_vepc file descriptor:
other possibilities, such as closing and reopening the device,
are racy.

Start the implementation by pulling the EREMOVE into a separate
function.

Tested-by: Yang Zhong <yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kernel/cpu/sgx/virt.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

Comments

Jarkko Sakkinen Sept. 21, 2021, 7:44 p.m. UTC | #1
On Mon, 2021-09-20 at 08:54 -0400, Paolo Bonzini wrote:
> For bare-metal SGX on real hardware, the hardware provides guarantees
> SGX state at reboot.  For instance, all pages start out uninitialized.
> The vepc driver provides a similar guarantee today for freshly-opened
> vepc instances, but guests such as Windows expect all pages to be in
> uninitialized state on startup, including after every guest reboot.

I would consider replacing

"For bare-metal SGX on real hardware, the hardware provides guarantees
SGX state at reboot.  For instance, all pages start out uninitialized."

something like

"On bare-metal SGX, start of a power cycle zeros all of its reserved
memory. This happens after every reboot, but in addition to that
happens after waking up from any of the sleep states."

I can speculate and imagine where this might useful, but no matter how
trivial or complex it is, this patch needs to nail a concrete usage
example. I'd presume you know well the exact changes needed for QEMU, so
from that knowledge it should be easy to write the motivational part.

For instance, point out where it is needed in QEMU and why. I.e. why
you end up in the first place having to re-use vepc buffers (or whatever
they should be called) in QEMU. When that is taken care of, then there is
a red line to eventually ack these patches.

About the motivation.

In Linux we do have a mechanism to take care of this in a guest, for which
motivation was actually first and foremost kexec. It was not done to let VMM to
give a corrupted memory state to a guest.

Even to a Linux guest, since EPC should stil be represented in the state that
matches the hardware.  It'd be essentially a corrupted state, even if there was
measures to resist this. Windows guests failing is essentially a side-effect
of an issue, not an issue in the Windows guests.

Since QEMU needs to reinitialize VEPC buffers for guests, it should be as
efficient as we ever can make it. Just fill the gap of understanding why
QEMU needs to do this for guest. This is exactly kind of stuff that you want
have documented in the commit log for future :-)

/Jarkko
Jarkko Sakkinen Sept. 21, 2021, 7:46 p.m. UTC | #2
On Tue, 2021-09-21 at 22:44 +0300, Jarkko Sakkinen wrote:
> Even to a Linux guest, since EPC should stil be represented in the state that
> matches the hardware.  It'd be essentially a corrupted state, even if there was
> measures to resist this. Windows guests failing is essentially a side-effect
> of an issue, not an issue in the Windows guests.

Ugh, typos, sorry. Even to a Linux guest it would be illegit what I was meaning
to say...

/Jarkko
Paolo Bonzini Sept. 23, 2021, 12:08 p.m. UTC | #3
On 21/09/21 21:44, Jarkko Sakkinen wrote:
> "On bare-metal SGX, start of a power cycle zeros all of its reserved 
> memory. This happens after every reboot, but in addition to that 
> happens after waking up from any of the sleep states."
> 
> I can speculate and imagine where this might useful, but no matter
> how trivial or complex it is, this patch needs to nail a concrete
> usage example. I'd presume you know well the exact changes needed for
> QEMU, so from that knowledge it should be easy to write the
> motivational part.

Assuming that it's obvious that QEMU knows how to reset a machine (which 
includes writes to the ACPI reset register, or wakeup from sleep 
states), the question of "why does userspace reuse vEPC" should be 
answered by this paragraph:

"One way to do this is to simply close and reopen the /dev/sgx_vepc file
descriptor and re-mmap the virtual EPC.  However, this is problematic
because it prevents sandboxing the userspace (for example forbidding
open() after the guest starts, or running in a mount namespace that
does not have access to /dev; both are doable with pre-opened file
descriptors and/or SCM_RIGHTS file descriptor passing)."

> Even to a Linux guest, since EPC should stil be represented in the
> state that matches the hardware.  It'd be essentially a corrupted
> state, even if there was measures to resist this. Windows guests
> failing is essentially a side-effect of an issue, not an issue in the
> Windows guests.

Right, Linux is more liberal than it needs to be and ksgxd does the 
EREMOVE itself at the beginning (__sgx_sanitize_pages).  Windows has 
stronger expectations of what can and cannot happen before it boots, 
which are entirely justified.

Paolo
Jarkko Sakkinen Sept. 23, 2021, 8:33 p.m. UTC | #4
On Thu, 2021-09-23 at 14:08 +0200, Paolo Bonzini wrote:
> On 21/09/21 21:44, Jarkko Sakkinen wrote:
> > "On bare-metal SGX, start of a power cycle zeros all of its reserved 
> > memory. This happens after every reboot, but in addition to that 
> > happens after waking up from any of the sleep states."
> > 
> > I can speculate and imagine where this might useful, but no matter
> > how trivial or complex it is, this patch needs to nail a concrete
> > usage example. I'd presume you know well the exact changes needed for
> > QEMU, so from that knowledge it should be easy to write the
> > motivational part.
> 
> Assuming that it's obvious that QEMU knows how to reset a machine (which 
> includes writes to the ACPI reset register, or wakeup from sleep 
> states), the question of "why does userspace reuse vEPC" should be 
> answered by this paragraph:
> 
> "One way to do this is to simply close and reopen the /dev/sgx_vepc file
> descriptor and re-mmap the virtual EPC.  However, this is problematic
> because it prevents sandboxing the userspace (for example forbidding
> open() after the guest starts, or running in a mount namespace that
> does not have access to /dev; both are doable with pre-opened file
> descriptors and/or SCM_RIGHTS file descriptor passing)."

Right, this makes sense.

> 
> > Even to a Linux guest, since EPC should stil be represented in the
> > state that matches the hardware.  It'd be essentially a corrupted
> > state, even if there was measures to resist this. Windows guests
> > failing is essentially a side-effect of an issue, not an issue in the
> > Windows guests.
> 
> Right, Linux is more liberal than it needs to be and ksgxd does the 
> EREMOVE itself at the beginning (__sgx_sanitize_pages).  Windows has 
> stronger expectations of what can and cannot happen before it boots, 
> which are entirely justified.
> 
> Paolo

Yep. We do it for kexec(). Alternative would be to zero at the time
of kexec() but this way things are just way more simpler, e.g. the
whole behaviour is local to the driver...

/Jarkko
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 64511c4a5200..59b9c13121cd 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -111,7 +111,7 @@  static int sgx_vepc_mmap(struct file *file, struct vm_area_struct *vma)
 	return 0;
 }
 
-static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
+static int sgx_vepc_remove_page(struct sgx_epc_page *epc_page)
 {
 	int ret;
 
@@ -140,11 +140,17 @@  static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
 		 */
 		WARN_ONCE(ret != SGX_CHILD_PRESENT, EREMOVE_ERROR_MESSAGE,
 			  ret, ret);
-		return ret;
 	}
+	return ret;
+}
 
-	sgx_free_epc_page(epc_page);
+static int sgx_vepc_free_page(struct sgx_epc_page *epc_page)
+{
+	int ret = sgx_vepc_remove_page(epc_page);
+	if (ret)
+		return ret;
 
+	sgx_free_epc_page(epc_page);
 	return 0;
 }