Message ID | cover.1543903910.git.alison.schofield@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Multi-Key Total Memory Encryption API (MKTME) | expand |
On Mon, Dec 03, 2018 at 11:39:47PM -0800, Alison Schofield wrote:
> (Multi-Key Total Memory Encryption)
I think that MKTME is a horrible name, and doesn't appear to accurately
describe what it does either. Specifically the 'total' seems out of
place, it doesn't require all memory to be encrypted.
On Tue, Dec 04, 2018 at 09:25:50AM +0000, Peter Zijlstra wrote: > On Mon, Dec 03, 2018 at 11:39:47PM -0800, Alison Schofield wrote: > > (Multi-Key Total Memory Encryption) > > I think that MKTME is a horrible name, and doesn't appear to accurately > describe what it does either. Specifically the 'total' seems out of > place, it doesn't require all memory to be encrypted. MKTME implies TME. TME is enabled by BIOS and it encrypts all memory with CPU-generated key. MKTME allows to use other keys or disable encryption for a page. But, yes, name is not good.
On Mon, Dec 3, 2018 at 11:37 PM Alison Schofield <alison.schofield@intel.com> wrote: > > Hi Thomas, David, > > Here is an updated RFC on the API's to support MKTME. > (Multi-Key Total Memory Encryption) > > This RFC presents the 2 API additions to support the creation and > usage of memory encryption keys: > 1) Kernel Key Service type "mktme" > 2) System call encrypt_mprotect() > > This patchset is built upon Kirill Shutemov's work for the core MKTME > support. > > David: Please let me know if the changes made, based on your review, > are reasonable. I don't think that the new changes touch key service > specific areas (much). > > Thomas: Please provide feedback on encrypt_mprotect(). If not a > review, then a direction check would be helpful. > I'm not Thomas, but I think it's the wrong direction. As it stands, encrypt_mprotect() is an incomplete version of mprotect() (since it's missing the protection key support), and it's also functionally just MADV_DONTNEED. In other words, the sole user-visible effect appears to be that the existing pages are blown away. The fact that it changes the key in use doesn't seem terribly useful, since it's anonymous memory, and the most secure choice is to use CPU-managed keying, which appears to be the default anyway on TME systems. It also has totally unclear semantics WRT swap, and, off the top of my head, it looks like it may have serious cache-coherency issues and like swapping the pages might corrupt them, both because there are no flushes and because the direct-map alias looks like it will use the default key and therefore appear to contain the wrong data. I would propose a very different direction: don't try to support MKTME at all for anonymous memory, and instead figure out the important use cases and support them directly. The use cases that I can think of off the top of my head are: 1. pmem. This should probably use a very different API. 2. Some kind of VM hardening, where a VM's memory can be protected a little tiny bit from the main kernel. But I don't see why this is any better than XPO (eXclusive Page-frame Ownership), which brings to mind: The main implementation concern I have with this patch set is cache coherency and handling of the direct map. Unless I missed something, you're not doing anything about the direct map, which means that you have RW aliases of the same memory with different keys. For use case #2, this probably means that you need to either get rid of the direct map and make get_user_pages() fail, or you need to change the key on the direct map as well, probably using the pageattr.c code. As for caching, As far as I can tell from reading the preliminary docs, Intel's MKTME, much like AMD's SME, is basically invisible to the hardware cache coherency mechanism. So, if you modify a physical address with one key (or SME-enable bit), and you read it with another, you get garbage unless you flush. And, if you modify memory with one key then remap it with a different key without flushing in the mean time, you risk corruption. And, what's worse, if I'm reading between the lines in the docs correctly, if you use PCONFIG to change a key, you may need to do a bunch of cache flushing to ensure you get reasonable effects. (If you have dirty cache lines for some (PA, key) and you PCONFIG to change the underlying key, you get different results depending on whether the writeback happens before or after the package doing the writeback notices the PCONFIG.) Finally, If you're going to teach the kernel how to have some user pages that aren't in the direct map, you've essentially done XPO, which is nifty but expensive. And I think that doing this gets you essentially all the benefit of MKTME for the non-pmem use case. Why exactly would any software want to use anything other than a CPU-managed key for anything other than pmem? --Andy
On Tue, Dec 4, 2018 at 11:19 AM Andy Lutomirski <luto@kernel.org> wrote: > > On Mon, Dec 3, 2018 at 11:37 PM Alison Schofield > <alison.schofield@intel.com> wrote: > > > Finally, If you're going to teach the kernel how to have some user > pages that aren't in the direct map, you've essentially done XPO, > which is nifty but expensive. And I think that doing this gets you > essentially all the benefit of MKTME for the non-pmem use case. Why > exactly would any software want to use anything other than a > CPU-managed key for anything other than pmem? > Let me say this less abstractly. Here's a somewhat concrete actual proposal. Make a new memfd_create() flag like MEMFD_ISOLATED. The semantics are that the underlying pages are made not-present in the direct map when they're allocated (which is hideously slow, but so be it), and that anything that tries to get_user_pages() the resulting pages fails. And then make sure we have all the required APIs so that QEMU can still map this stuff into a VM. If there is indeed a situation in which MKTME-ifying the memory adds some value, then we can consider doing that. And maybe we get fancy and encrypt this memory when it's swapped, but maybe we should just encrypt everything when it's swapped.
On 12/4/18 12:00 PM, Andy Lutomirski wrote: > On Tue, Dec 4, 2018 at 11:19 AM Andy Lutomirski <luto@kernel.org> wrote: >> On Mon, Dec 3, 2018 at 11:37 PM Alison Schofield <alison.schofield@intel.com> wrote: >> Finally, If you're going to teach the kernel how to have some user >> pages that aren't in the direct map, you've essentially done XPO, >> which is nifty but expensive. And I think that doing this gets you >> essentially all the benefit of MKTME for the non-pmem use case. Why >> exactly would any software want to use anything other than a >> CPU-managed key for anything other than pmem? > > Let me say this less abstractly. Here's a somewhat concrete actual > proposal. Make a new memfd_create() flag like MEMFD_ISOLATED. The > semantics are that the underlying pages are made not-present in the > direct map when they're allocated (which is hideously slow, but so be > it), and that anything that tries to get_user_pages() the resulting > pages fails. And then make sure we have all the required APIs so that > QEMU can still map this stuff into a VM. I think we need get_user_pages(). We want direct I/O to work, *and* we really want direct device assignment into VMs. > And maybe we get fancy and encrypt this memory when it's swapped, but > maybe we should just encrypt everything when it's swapped. We decided long ago (and this should be in the patches somewhere) that we wouldn't force memory to be encrypted in swap. We would just recommend it in the documentation as a best practice, especially when using MKTME. We can walk that back, of course, but that's what we're doing at the moment.
On Mon, 2018-12-03 at 23:39 -0800, Alison Schofield wrote: > Hi Thomas, David, > > Here is an updated RFC on the API's to support MKTME. > (Multi-Key Total Memory Encryption) > > This RFC presents the 2 API additions to support the creation and > usage of memory encryption keys: > 1) Kernel Key Service type "mktme" > 2) System call encrypt_mprotect() > > This patchset is built upon Kirill Shutemov's work for the core MKTME > support. Please, explain what MKTME is right here. No references, no explanations... Even with a reference, a short summary would be really nice to have. /Jarkko
On Tue, 2018-12-04 at 12:46 +0300, Kirill A. Shutemov wrote: > On Tue, Dec 04, 2018 at 09:25:50AM +0000, Peter Zijlstra wrote: > > On Mon, Dec 03, 2018 at 11:39:47PM -0800, Alison Schofield wrote: > > > (Multi-Key Total Memory Encryption) > > > > I think that MKTME is a horrible name, and doesn't appear to accurately > > describe what it does either. Specifically the 'total' seems out of > > place, it doesn't require all memory to be encrypted. > > MKTME implies TME. TME is enabled by BIOS and it encrypts all memory with > CPU-generated key. MKTME allows to use other keys or disable encryption > for a page. When you say "disable encryption to a page" does the encryption get actually disabled or does the CPU just decrypt it transparently i.e. what happens physically? > But, yes, name is not good. /Jarkko
On Tue, 2018-12-04 at 11:19 -0800, Andy Lutomirski wrote: > I'm not Thomas, but I think it's the wrong direction. As it stands, > encrypt_mprotect() is an incomplete version of mprotect() (since it's > missing the protection key support), and it's also functionally just > MADV_DONTNEED. In other words, the sole user-visible effect appears > to be that the existing pages are blown away. The fact that it > changes the key in use doesn't seem terribly useful, since it's > anonymous memory, and the most secure choice is to use CPU-managed > keying, which appears to be the default anyway on TME systems. It > also has totally unclear semantics WRT swap, and, off the top of my > head, it looks like it may have serious cache-coherency issues and > like swapping the pages might corrupt them, both because there are no > flushes and because the direct-map alias looks like it will use the > default key and therefore appear to contain the wrong data. > > I would propose a very different direction: don't try to support MKTME > at all for anonymous memory, and instead figure out the important use > cases and support them directly. The use cases that I can think of > off the top of my head are: > > 1. pmem. This should probably use a very different API. > > 2. Some kind of VM hardening, where a VM's memory can be protected a > little tiny bit from the main kernel. But I don't see why this is any > better than XPO (eXclusive Page-frame Ownership), which brings to > mind: What is the threat model anyway for AMD and Intel technologies? For me it looks like that you can read, write and even replay encrypted pages both in SME and TME. /Jarkko
On 12/4/18 11:19 AM, Andy Lutomirski wrote: > I'm not Thomas, but I think it's the wrong direction. As it stands, > encrypt_mprotect() is an incomplete version of mprotect() (since it's > missing the protection key support), I thought about this when I added mprotect_pkey(). We start with: mprotect(addr, len, prot); then mprotect_pkey(addr, len, prot); then mprotect_pkey_encrypt(addr, len, prot, key); That doesn't scale because we eventually have mprotect_and_a_history_of_mm_features(). :) What I was hoping to see was them do this (apologies for the horrible indentation: ptr = mmap(..., PROT_NONE); mprotect_pkey( addr, len, PROT_NONE, pkey); mprotect_encrypt(addr, len, PROT_NONE, keyid); mprotect( addr, len, real_prot); The point is that you *can* stack these things and don't have to have an mprotect_kitchen_sink() if you use PROT_NONE for intermediate permissions during setup. > and it's also functionally just MADV_DONTNEED. In other words, the > sole user-visible effect appears to be that the existing pages are > blown away. The fact that it changes the key in use doesn't seem > terribly useful, since it's anonymous memory, It's functionally MADV_DONTNEED, plus a future promise that your writes will never show up as plaintext on the DIMM. We also haven't settled on the file-backed properties. For file-backed, my hope was that you could do: ptr = mmap(fd, size, prot); printf("ciphertext: %x\n", *ptr); mprotect_encrypt(ptr, len, prot, keyid); printf("plaintext: %x\n", *ptr); > and the most secure choice is to use CPU-managed keying, which > appears to be the default anyway on TME systems. It also has totally > unclear semantics WRT swap, and, off the top of my head, it looks > like it may have serious cache-coherency issues and like swapping the > pages might corrupt them, both because there are no flushes and > because the direct-map alias looks like it will use the default key > and therefore appear to contain the wrong data. I think we fleshed this out on IRC a bit, but the other part of the implementation is described here: https://lwn.net/Articles/758313/, and contains a direct map per keyid. When you do phys_to_virt() and friends, you get the correct, decrypted view direct map which is appropriate for the physical page. And, yes, this has very consequential security implications. > I would propose a very different direction: don't try to support MKTME > at all for anonymous memory, and instead figure out the important use > cases and support them directly. The use cases that I can think of > off the top of my head are: > > 1. pmem. This should probably use a very different API. > > 2. Some kind of VM hardening, where a VM's memory can be protected a > little tiny bit from the main kernel. But I don't see why this is any > better than XPO (eXclusive Page-frame Ownership), which brings to > mind: The XPO approach is "fun", and would certainly be a way to keep the direct map from being exploited to get access to plain-text mappings of ciphertext. But, it also has massive performance implications and we didn't quite want to go there quite yet. > The main implementation concern I have with this patch set is cache > coherency and handling of the direct map. Unless I missed something, > you're not doing anything about the direct map, which means that you > have RW aliases of the same memory with different keys. For use case > #2, this probably means that you need to either get rid of the direct > map and make get_user_pages() fail, or you need to change the key on > the direct map as well, probably using the pageattr.c code. The current, public hardware spec has a description of what's required to maintain cache coherency. Basically, you can keep as many mappings of a physical page as you want, but only write to one mapping at a time, and clflush the old one when you want to write to a new one. > As for caching, As far as I can tell from reading the preliminary > docs, Intel's MKTME, much like AMD's SME, is basically invisible to > the hardware cache coherency mechanism. So, if you modify a physical > address with one key (or SME-enable bit), and you read it with > another, you get garbage unless you flush. And, if you modify memory > with one key then remap it with a different key without flushing in > the mean time, you risk corruption. Yes, all true (at least with respect to Intel's implementation). > And, what's worse, if I'm reading > between the lines in the docs correctly, if you use PCONFIG to change > a key, you may need to do a bunch of cache flushing to ensure you get > reasonable effects. (If you have dirty cache lines for some (PA, key) > and you PCONFIG to change the underlying key, you get different > results depending on whether the writeback happens before or after the > package doing the writeback notices the PCONFIG.) We're not going to allow a key to be PCONFIG'd while there are any physical pages still associated with it. There are per-VMA refcounts tied back to the keyid slots, IIRC. So, before PCONFIG can happen, we just need to make sure that all the VMAs are gone, all the pages are freed, and all dirty cachelines have been clflushed. This is where get_user_pages() is our mortal enemy, though. I hope we got that right. Kirill/Alison, we should chat about this one. :) > Finally, If you're going to teach the kernel how to have some user > pages that aren't in the direct map, you've essentially done XPO, > which is nifty but expensive. And I think that doing this gets you > essentially all the benefit of MKTME for the non-pmem use case. Why > exactly would any software want to use anything other than a > CPU-managed key for anything other than pmem? It is handy, for one, to let you "cluster" key usage. If you have 5 Pepsi VMs and 5 Coke VMs, each Pepsi one using the same key and each Coke one using the same key, you can boil it down to only 2 hardware keyid slots that get used, and do this transparently. But, I think what you're implying is that the security properties of user-supplied keys can only be *worse* than using CPU-generated keys (assuming the CPU does a good job generating it). So, why bother allowing user-specified keys in the first place? It's a good question and I don't have a solid answer for why folks want this. I'll find out.
On Wed, Dec 5, 2018 at 3:49 PM Dave Hansen <dave.hansen@intel.com> wrote: > > On 12/4/18 11:19 AM, Andy Lutomirski wrote: > > I'm not Thomas, but I think it's the wrong direction. As it stands, > > encrypt_mprotect() is an incomplete version of mprotect() (since it's > > missing the protection key support), > > I thought about this when I added mprotect_pkey(). We start with: > > mprotect(addr, len, prot); > > then > > mprotect_pkey(addr, len, prot); > > then > > mprotect_pkey_encrypt(addr, len, prot, key); > > That doesn't scale because we eventually have > mprotect_and_a_history_of_mm_features(). :) > > What I was hoping to see was them do this (apologies for the horrible > indentation: > > ptr = mmap(..., PROT_NONE); > mprotect_pkey( addr, len, PROT_NONE, pkey); > mprotect_encrypt(addr, len, PROT_NONE, keyid); > mprotect( addr, len, real_prot); > > The point is that you *can* stack these things and don't have to have an > mprotect_kitchen_sink() if you use PROT_NONE for intermediate > permissions during setup. Sure, but then why call it mprotect at all? How about: mmap(..., PROT_NONE); mencrypt(..., keyid); mprotect_pkey(...); But wouldn't this be much nicer: int fd = memfd_create(...); memfd_set_tme_key(fd, keyid); /* fails if len != 0 */ mmap(fd, ...); > > > and it's also functionally just MADV_DONTNEED. In other words, the > > sole user-visible effect appears to be that the existing pages are > > blown away. The fact that it changes the key in use doesn't seem > > terribly useful, since it's anonymous memory, > > It's functionally MADV_DONTNEED, plus a future promise that your writes > will never show up as plaintext on the DIMM. But that's mostly vacuous. If I read the docs right, MKTME systems also support TME, so you *already* have that promise, unless the firmware totally blew it. If we want a boot option to have the kernel use MKTME to forcibly encrypt everything regardless of what the TME MSRs say, I'd be entirely on board. Heck, the implementation would be quite simple because we mostly reuse the SME code. > > We also haven't settled on the file-backed properties. For file-backed, > my hope was that you could do: > > ptr = mmap(fd, size, prot); > printf("ciphertext: %x\n", *ptr); > mprotect_encrypt(ptr, len, prot, keyid); > printf("plaintext: %x\n", *ptr); Why would you ever want the plaintext? Also, how does this work on a normal fs, where relocation of the file would cause the ciphertext to get lost? It really seems to be that it should look more like dm-crypt where you encrypt a filesystem. Maybe you'd just configure the pmem device to be encrypted before you mount it, or you'd get a new pmem-mktme device node instead. This would also avoid some nasty multiple-copies-of-the-direct-map issue, since you'd only ever have one of them mapped. > > > The main implementation concern I have with this patch set is cache > > coherency and handling of the direct map. Unless I missed something, > > you're not doing anything about the direct map, which means that you > > have RW aliases of the same memory with different keys. For use case > > #2, this probably means that you need to either get rid of the direct > > map and make get_user_pages() fail, or you need to change the key on > > the direct map as well, probably using the pageattr.c code. > > The current, public hardware spec has a description of what's required > to maintain cache coherency. Basically, you can keep as many mappings > of a physical page as you want, but only write to one mapping at a time, > and clflush the old one when you want to write to a new one. Surely you at least have to clflush the old mapping and then the new mapping, since the new mapping could have been speculatively read. > > Finally, If you're going to teach the kernel how to have some user > > pages that aren't in the direct map, you've essentially done XPO, > > which is nifty but expensive. And I think that doing this gets you > > essentially all the benefit of MKTME for the non-pmem use case. Why > > exactly would any software want to use anything other than a > > CPU-managed key for anything other than pmem? > > It is handy, for one, to let you "cluster" key usage. If you have 5 > Pepsi VMs and 5 Coke VMs, each Pepsi one using the same key and each > Coke one using the same key, you can boil it down to only 2 hardware > keyid slots that get used, and do this transparently. I understand this from a marketing perspective but not a security perspective. Say I'm Coke and you've sold me some VMs that are "encrypted with a Coke-specific key and no other VMs get to use that key." I can't think of *any* not-exceedingly-contrived attack in which this makes the slightest difference. If Pepsi tries to attack Coke without MKTME, then they'll either need to get the hypervisor to leak Coke's data through the direct map or they'll have to find some way to corrupt a page table or use something like L1TF to read from a physical address Coke owns. With MKTME, if they can read through the host direct map, then they'll get Coke's cleartext, and if they can corrupt a page table or use L1TF to read from your memory, they'll get Coke's cleartext. TME itself provides a ton of protection -- you can't just barge into the datacenter, refrigerate the DIMMs, walk away with them, and read off everyone's data. Am I missing something? > > But, I think what you're implying is that the security properties of > user-supplied keys can only be *worse* than using CPU-generated keys > (assuming the CPU does a good job generating it). So, why bother > allowing user-specified keys in the first place? That too :)
[ only responding to the pmem side of things... ] On Wed, Dec 5, 2018 at 5:09 PM Andy Lutomirski <luto@kernel.org> wrote: > > On Wed, Dec 5, 2018 at 3:49 PM Dave Hansen <dave.hansen@intel.com> wrote: [..] > > We also haven't settled on the file-backed properties. For file-backed, > > my hope was that you could do: > > > > ptr = mmap(fd, size, prot); > > printf("ciphertext: %x\n", *ptr); > > mprotect_encrypt(ptr, len, prot, keyid); > > printf("plaintext: %x\n", *ptr); > > Why would you ever want the plaintext? Also, how does this work on a > normal fs, where relocation of the file would cause the ciphertext to > get lost? It really seems to be that it should look more like > dm-crypt where you encrypt a filesystem. Maybe you'd just configure > the pmem device to be encrypted before you mount it, or you'd get a > new pmem-mktme device node instead. This would also avoid some nasty > multiple-copies-of-the-direct-map issue, since you'd only ever have > one of them mapped. Yes, this is really the only way it can work. Otherwise you need to teach the filesystem that "these blocks can't move without the key because encryption", and have an fs-feature flag to say "you can't mount this legacy / encryption unaware filesystem from an older kernel because we're not sure you'll move something and break the encryption". So pmem namespaces (volumes) would be encrypted providing something similar to dm-crypt, although we're looking at following the lead of the fscrypt key management scheme.
On Wed, Dec 05, 2018 at 08:32:52PM +0000, Sakkinen, Jarkko wrote: > On Tue, 2018-12-04 at 12:46 +0300, Kirill A. Shutemov wrote: > > On Tue, Dec 04, 2018 at 09:25:50AM +0000, Peter Zijlstra wrote: > > > On Mon, Dec 03, 2018 at 11:39:47PM -0800, Alison Schofield wrote: > > > > (Multi-Key Total Memory Encryption) > > > > > > I think that MKTME is a horrible name, and doesn't appear to accurately > > > describe what it does either. Specifically the 'total' seems out of > > > place, it doesn't require all memory to be encrypted. > > > > MKTME implies TME. TME is enabled by BIOS and it encrypts all memory with > > CPU-generated key. MKTME allows to use other keys or disable encryption > > for a page. > > When you say "disable encryption to a page" does the encryption get > actually disabled or does the CPU just decrypt it transparently i.e. > what happens physically? Yes, it gets disabled. Physically. It overrides TME encryption.
On 12/6/18 3:22 AM, Kirill A. Shutemov wrote: >> When you say "disable encryption to a page" does the encryption get >> actually disabled or does the CPU just decrypt it transparently i.e. >> what happens physically? > Yes, it gets disabled. Physically. It overrides TME encryption. I know MKTME itself has a runtime overhead and we expect it to have a performance impact in the low single digits. Does TME have that overhead? Presumably MKTME plus no-encryption is not expected to have the overhead. We should probably mention that in the changelogs too.
On 12/5/18 5:09 PM, Andy Lutomirski wrote: > On Wed, Dec 5, 2018 at 3:49 PM Dave Hansen <dave.hansen@intel.com> wrote: >> What I was hoping to see was them do this (apologies for the horrible >> indentation: >> >> ptr = mmap(..., PROT_NONE); >> mprotect_pkey( addr, len, PROT_NONE, pkey); >> mprotect_encrypt(addr, len, PROT_NONE, keyid); >> mprotect( addr, len, real_prot); >> >> The point is that you *can* stack these things and don't have to have an >> mprotect_kitchen_sink() if you use PROT_NONE for intermediate >> permissions during setup. > > Sure, but then why call it mprotect at all? How about: > > mmap(..., PROT_NONE); > mencrypt(..., keyid); > mprotect_pkey(...); That would totally work too. I just like the idea of a family of mprotect() syscalls that do mprotect() plus one other thing. What you're proposing is totally equivalent where we have mprotect_pkey() always being the *last* thing that gets called, plus a family of things that we expect to get called on something that's probably PROT_NONE. > But wouldn't this be much nicer: > > int fd = memfd_create(...); > memfd_set_tme_key(fd, keyid); /* fails if len != 0 */ > mmap(fd, ...); No. :) One really big advantage with protection keys, or this implementation is that you don't have to implement an allocator. You can use it with any old malloc() as long as you own a whole page. The pages also fundamentally *stay* anonymous in the VM and get all the goodness that comes with that, like THP. >>> and it's also functionally just MADV_DONTNEED. In other words, the >>> sole user-visible effect appears to be that the existing pages are >>> blown away. The fact that it changes the key in use doesn't seem >>> terribly useful, since it's anonymous memory, >> >> It's functionally MADV_DONTNEED, plus a future promise that your writes >> will never show up as plaintext on the DIMM. > > But that's mostly vacuous. If I read the docs right, MKTME systems > also support TME, so you *already* have that promise, unless the > firmware totally blew it. If we want a boot option to have the kernel > use MKTME to forcibly encrypt everything regardless of what the TME > MSRs say, I'd be entirely on board. Heck, the implementation would be > quite simple because we mostly reuse the SME code. Yeah, that's true. I seem to always forget about the TME case! :) "It's functionally MADV_DONTNEED, plus a future promise that your writes will never be written to the DIMM with the TME key." But, this gets us back to your very good question about what good this does in the end. What value does _that_ scheme provide over TME? We're admittedly weak on specific examples there, but I'm working on it. >>> the direct map as well, probably using the pageattr.c code. >> >> The current, public hardware spec has a description of what's required >> to maintain cache coherency. Basically, you can keep as many mappings >> of a physical page as you want, but only write to one mapping at a time, >> and clflush the old one when you want to write to a new one. > > Surely you at least have to clflush the old mapping and then the new > mapping, since the new mapping could have been speculatively read. Nope. The coherency is "fine" unless you have writeback of an older cacheline that blows away newer data. CPUs that support MKTME are guaranteed to never do writeback of the lines that could be established speculatively or from prefetching. >>> Finally, If you're going to teach the kernel how to have some user >>> pages that aren't in the direct map, you've essentially done XPO, >>> which is nifty but expensive. And I think that doing this gets you >>> essentially all the benefit of MKTME for the non-pmem use case. Why >>> exactly would any software want to use anything other than a >>> CPU-managed key for anything other than pmem? >> >> It is handy, for one, to let you "cluster" key usage. If you have 5 >> Pepsi VMs and 5 Coke VMs, each Pepsi one using the same key and each >> Coke one using the same key, you can boil it down to only 2 hardware >> keyid slots that get used, and do this transparently. > > I understand this from a marketing perspective but not a security > perspective. Say I'm Coke and you've sold me some VMs that are > "encrypted with a Coke-specific key and no other VMs get to use that > key." I can't think of *any* not-exceedingly-contrived attack in > which this makes the slightest difference. If Pepsi tries to attack > Coke without MKTME, then they'll either need to get the hypervisor to > leak Coke's data through the direct map or they'll have to find some > way to corrupt a page table or use something like L1TF to read from a > physical address Coke owns. With MKTME, if they can read through the > host direct map, then they'll get Coke's cleartext, and if they can > corrupt a page table or use L1TF to read from your memory, they'll get > Coke's cleartext. The design definitely has the hypervisor in the trust boundary. If the hypervisor is evil, or if someone evil compromises the hypervisor, MKTME obviously provides less protection. I guess the question ends up being if this makes its protections weak enough that we should not bother merging it in its current form. I still have the homework assignment to go figure out why folks want the protections as they stand.
> On Dec 6, 2018, at 7:39 AM, Dave Hansen <dave.hansen@intel.com> wrote: >>>> the direct map as well, probably using the pageattr.c code. >>> >>> The current, public hardware spec has a description of what's required >>> to maintain cache coherency. Basically, you can keep as many mappings >>> of a physical page as you want, but only write to one mapping at a time, >>> and clflush the old one when you want to write to a new one. >> >> Surely you at least have to clflush the old mapping and then the new >> mapping, since the new mapping could have been speculatively read. > > Nope. The coherency is "fine" unless you have writeback of an older > cacheline that blows away newer data. CPUs that support MKTME are > guaranteed to never do writeback of the lines that could be established > speculatively or from prefetching. How is that sufficient? Suppose I have some physical page mapped with keys 1 and 2. #1 is logically live and I write to it. Then I prefetch or otherwise populate mapping 2 into the cache (in the S state, presumably). Now I clflush mapping 1 and read 2. It contains garbage in the cache, but the garbage in the cache is inconsistent with the garbage in memory. This can’t be a good thing, even if no writeback occurs. I suppose the right fix is to clflush the old mapping and then to zero the new mapping. > >>>> Finally, If you're going to teach the kernel how to have some user >>>> pages that aren't in the direct map, you've essentially done XPO, >>>> which is nifty but expensive. And I think that doing this gets you >>>> essentially all the benefit of MKTME for the non-pmem use case. Why >>>> exactly would any software want to use anything other than a >>>> CPU-managed key for anything other than pmem? >>> >>> It is handy, for one, to let you "cluster" key usage. If you have 5 >>> Pepsi VMs and 5 Coke VMs, each Pepsi one using the same key and each >>> Coke one using the same key, you can boil it down to only 2 hardware >>> keyid slots that get used, and do this transparently. >> >> I understand this from a marketing perspective but not a security >> perspective. Say I'm Coke and you've sold me some VMs that are >> "encrypted with a Coke-specific key and no other VMs get to use that >> key." I can't think of *any* not-exceedingly-contrived attack in >> which this makes the slightest difference. If Pepsi tries to attack >> Coke without MKTME, then they'll either need to get the hypervisor to >> leak Coke's data through the direct map or they'll have to find some >> way to corrupt a page table or use something like L1TF to read from a >> physical address Coke owns. With MKTME, if they can read through the >> host direct map, then they'll get Coke's cleartext, and if they can >> corrupt a page table or use L1TF to read from your memory, they'll get >> Coke's cleartext. > > The design definitely has the hypervisor in the trust boundary. If the > hypervisor is evil, or if someone evil compromises the hypervisor, MKTME > obviously provides less protection. > > I guess the question ends up being if this makes its protections weak > enough that we should not bother merging it in its current form. Indeed, but I’d ask another question too: I expect that MKTME is weak enough that it will be improved, and without seeing the improvement, it seems quite plausible that using the improvement will require radically reworking the kernel implementation. As a straw man, suppose we get a way to say “this key may only be accessed through such-and-such VPID or by using a special new restricted facility for the hypervisor to request access”. Now we have some degree of serious protection, but it doesn’t work, by design, for anonymous memory. Similarly, something that looks more like AMD's SEV would be very very awkward to support with anything like the current API proposal. > > I still have the homework assignment to go figure out why folks want the > protections as they stand.
On 12/6/18 11:10 AM, Andy Lutomirski wrote: >> On Dec 6, 2018, at 7:39 AM, Dave Hansen <dave.hansen@intel.com> wrote: >>The coherency is "fine" unless you have writeback of an older >> cacheline that blows away newer data. CPUs that support MKTME are >> guaranteed to never do writeback of the lines that could be established >> speculatively or from prefetching. > > How is that sufficient? Suppose I have some physical page mapped with > keys 1 and 2. #1 is logically live and I write to it. Then I prefetch > or otherwise populate mapping 2 into the cache (in the S state, > presumably). Now I clflush mapping 1 and read 2. It contains garbage > in the cache, but the garbage in the cache is inconsistent with the > garbage in memory. This can’t be a good thing, even if no writeback > occurs. > > I suppose the right fix is to clflush the old mapping and then to zero > the new mapping. Yep. Practically, you need to write to the new mapping to give it any meaning. Those writes effectively blow away any previously cached, garbage contents. I think you're right, though, that the cached data might not be _consistent_ with what is in memory. It feels really dirty, but I can't think of any problems that it actually causes.
On Thu, 2018-12-06 at 14:22 +0300, Kirill A. Shutemov wrote: > When you say "disable encryption to a page" does the encryption get > > actually disabled or does the CPU just decrypt it transparently i.e. > > what happens physically? > > Yes, it gets disabled. Physically. It overrides TME encryption. OK, thanks for confirmation. BTW, how much is the penalty to keep it always enabled? Is it something that would not make sense for some other reasons? /Jarkko
> > TME itself provides a ton of protection -- you can't just barge into > the datacenter, refrigerate the DIMMs, walk away with them, and read > off everyone's data. > > Am I missing something? I think we can make such assumption in most cases, but I think it's better that we don't make any assumption at all. For example, the admin of data center (or anyone) who has physical access to servers may do something malicious. I am not expert but there should be other physical attack methods besides coldboot attack, if the malicious employee can get physical access to server w/o being detected. > > > > > But, I think what you're implying is that the security properties of > > user-supplied keys can only be *worse* than using CPU-generated keys > > (assuming the CPU does a good job generating it). So, why bother > > allowing user-specified keys in the first place? > > That too :) I think one usage of user-specified key is for NVDIMM, since CPU key will be gone after machine reboot, therefore if NVDIMM is encrypted by CPU key we are not able to retrieve it once shutdown/reboot, etc. There are some other use cases that already require tenant to send key to CSP. For example, the VM image can be provided by tenant and encrypted by tenant's own key, and tenant needs to send key to CSP when asking CSP to run that encrypted image. But tenant will need to trust CSP in such case, which brings us why tenant wants to use his own image at first place (I have to say I myself is not convinced the value of such use case). I think there are two levels of trustiness involved here: 1) tenant needs to trust CSP anyway; 2) but CSP needs to convince tenant that CSP can be trusted, ie, by proving it can prevent potential attack from malicious employee (ie, by raising bar by using MKTME), etc. Thanks, -Kai
On Wed, 2018-12-05 at 22:19 +0000, Sakkinen, Jarkko wrote: > On Tue, 2018-12-04 at 11:19 -0800, Andy Lutomirski wrote: > > I'm not Thomas, but I think it's the wrong direction. As it stands, > > encrypt_mprotect() is an incomplete version of mprotect() (since it's > > missing the protection key support), and it's also functionally just > > MADV_DONTNEED. In other words, the sole user-visible effect appears > > to be that the existing pages are blown away. The fact that it > > changes the key in use doesn't seem terribly useful, since it's > > anonymous memory, and the most secure choice is to use CPU-managed > > keying, which appears to be the default anyway on TME systems. It > > also has totally unclear semantics WRT swap, and, off the top of my > > head, it looks like it may have serious cache-coherency issues and > > like swapping the pages might corrupt them, both because there are no > > flushes and because the direct-map alias looks like it will use the > > default key and therefore appear to contain the wrong data. > > > > I would propose a very different direction: don't try to support MKTME > > at all for anonymous memory, and instead figure out the important use > > cases and support them directly. The use cases that I can think of > > off the top of my head are: > > > > 1. pmem. This should probably use a very different API. > > > > 2. Some kind of VM hardening, where a VM's memory can be protected a > > little tiny bit from the main kernel. But I don't see why this is any > > better than XPO (eXclusive Page-frame Ownership), which brings to > > mind: > > What is the threat model anyway for AMD and Intel technologies? > > For me it looks like that you can read, write and even replay > encrypted pages both in SME and TME. Right. Neither of them (including MKTME) prevents replay attack. But in my understanding SEV doesn't prevent replay attack either since it doesn't have integrity protection. Thanks, -Kai
On 12/6/18 5:55 PM, Huang, Kai wrote: > I think one usage of user-specified key is for NVDIMM, since CPU key > will be gone after machine reboot, therefore if NVDIMM is encrypted > by CPU key we are not able to retrieve it once shutdown/reboot, etc. I think we all agree that the NVDIMM uses are really useful. But, these patches don't implement that. So, if NVDIMMs are the only reasonable use case, we shouldn't merge these patches until we add NVDIMM support.
On Thu, Dec 06, 2018 at 06:05:50PM -0800, Huang, Kai wrote: > On Wed, 2018-12-05 at 22:19 +0000, Sakkinen, Jarkko wrote: > > On Tue, 2018-12-04 at 11:19 -0800, Andy Lutomirski wrote: > > > I'm not Thomas, but I think it's the wrong direction. As it stands, > > > encrypt_mprotect() is an incomplete version of mprotect() (since it's > > > missing the protection key support), and it's also functionally just > > > MADV_DONTNEED. In other words, the sole user-visible effect appears > > > to be that the existing pages are blown away. The fact that it > > > changes the key in use doesn't seem terribly useful, since it's > > > anonymous memory, and the most secure choice is to use CPU-managed > > > keying, which appears to be the default anyway on TME systems. It > > > also has totally unclear semantics WRT swap, and, off the top of my > > > head, it looks like it may have serious cache-coherency issues and > > > like swapping the pages might corrupt them, both because there are no > > > flushes and because the direct-map alias looks like it will use the > > > default key and therefore appear to contain the wrong data. > > > > > > I would propose a very different direction: don't try to support MKTME > > > at all for anonymous memory, and instead figure out the important use > > > cases and support them directly. The use cases that I can think of > > > off the top of my head are: > > > > > > 1. pmem. This should probably use a very different API. > > > > > > 2. Some kind of VM hardening, where a VM's memory can be protected a > > > little tiny bit from the main kernel. But I don't see why this is any > > > better than XPO (eXclusive Page-frame Ownership), which brings to > > > mind: > > > > What is the threat model anyway for AMD and Intel technologies? > > > > For me it looks like that you can read, write and even replay > > encrypted pages both in SME and TME. > > Right. Neither of them (including MKTME) prevents replay attack. But > in my understanding SEV doesn't prevent replay attack either since it > doesn't have integrity protection. Yep, it doesn't :-) That's why I've been wondering after seeing presentations concerning SME and SVE what they are good for. Cold boot attacks are definitely at least something where these techs can help... /Jarkko
On Thu, 2018-12-06 at 06:59 -0800, Dave Hansen wrote: > On 12/6/18 3:22 AM, Kirill A. Shutemov wrote: > > > When you say "disable encryption to a page" does the encryption get > > > actually disabled or does the CPU just decrypt it transparently i.e. > > > what happens physically? > > > > Yes, it gets disabled. Physically. It overrides TME encryption. > > I know MKTME itself has a runtime overhead and we expect it to have a > performance impact in the low single digits. Does TME have that > overhead? Presumably MKTME plus no-encryption is not expected to have > the overhead. > > We should probably mention that in the changelogs too. > I believe in terms of hardware crypto overhead MKTME and TME should have the same (except MKTME no- encrypt case?). But MKTME might have additional overhead from software implementation in kernel? Thanks, -Kai
On Thu, Dec 06, 2018 at 09:23:20PM +0000, Sakkinen, Jarkko wrote: > On Thu, 2018-12-06 at 14:22 +0300, Kirill A. Shutemov wrote: > > When you say "disable encryption to a page" does the encryption get > > > actually disabled or does the CPU just decrypt it transparently i.e. > > > what happens physically? > > > > Yes, it gets disabled. Physically. It overrides TME encryption. > > OK, thanks for confirmation. BTW, how much is the penalty to keep it > always enabled? Is it something that would not make sense for some > other reasons? We don't have any numbers to share at this point.
On Wed, Dec 05, 2018 at 10:19:20PM +0000, Sakkinen, Jarkko wrote: > On Tue, 2018-12-04 at 11:19 -0800, Andy Lutomirski wrote: > > I'm not Thomas, but I think it's the wrong direction. As it stands, > > encrypt_mprotect() is an incomplete version of mprotect() (since it's > > missing the protection key support), and it's also functionally just > > MADV_DONTNEED. In other words, the sole user-visible effect appears > > to be that the existing pages are blown away. The fact that it > > changes the key in use doesn't seem terribly useful, since it's > > anonymous memory, and the most secure choice is to use CPU-managed > > keying, which appears to be the default anyway on TME systems. It > > also has totally unclear semantics WRT swap, and, off the top of my > > head, it looks like it may have serious cache-coherency issues and > > like swapping the pages might corrupt them, both because there are no > > flushes and because the direct-map alias looks like it will use the > > default key and therefore appear to contain the wrong data. > > > > I would propose a very different direction: don't try to support MKTME > > at all for anonymous memory, and instead figure out the important use > > cases and support them directly. The use cases that I can think of > > off the top of my head are: > > > > 1. pmem. This should probably use a very different API. > > > > 2. Some kind of VM hardening, where a VM's memory can be protected a > > little tiny bit from the main kernel. But I don't see why this is any > > better than XPO (eXclusive Page-frame Ownership), which brings to > > mind: > > What is the threat model anyway for AMD and Intel technologies? > > For me it looks like that you can read, write and even replay > encrypted pages both in SME and TME. What replay attack are you talking about? MKTME uses AES-XTS with physical address tweak. So the data is tied to the place in physical address space and replacing one encrypted page with another encrypted page from different address will produce garbage on decryption.
On Fri, 2018-12-07 at 14:57 +0300, Kirill A. Shutemov wrote: > > What is the threat model anyway for AMD and Intel technologies? > > > > For me it looks like that you can read, write and even replay > > encrypted pages both in SME and TME. > > What replay attack are you talking about? MKTME uses AES-XTS with physical > address tweak. So the data is tied to the place in physical address space and > replacing one encrypted page with another encrypted page from different > address will produce garbage on decryption. Just trying to understand how this works. So you use physical address like a nonce/version for the page and thus prevent replay? Was not aware of this. /Jarkko
On Fri, Dec 7, 2018 at 3:57 AM Kirill A. Shutemov <kirill@shutemov.name> wrote: > > What is the threat model anyway for AMD and Intel technologies? > > > > For me it looks like that you can read, write and even replay > > encrypted pages both in SME and TME. > > What replay attack are you talking about? MKTME uses AES-XTS with physical > address tweak. So the data is tied to the place in physical address space > and replacing one encrypted page with another encrypted page from > different address will produce garbage on decryption. What if you have some control over the physical addresses you write the stolen encrypted page to? For instance, VM_Eve might manage to use physical address space previously used by VM_Alice by getting the hypervisor to move memory around (memory pressure, force other VMs out via some type of DOS attack, etc.). Say: C is VM_Alice's clear text at hwaddr E = mktme_encrypt(VM_Allice_key, hwaddr, C) Eve somehow stole the encrypted bits E Eve would need to write the page E without further encryption to make sure that the DRAM contains the original stolen bits E, not encrypted again with VM_Eve's key or mktme_encrypt(VM_Eve_key, hwaddr, E) would be present in the DRAM which is not helpful. But with MKTME under the current proposal VM_Eve can disable encryption for a given mapping, right? (See also Note 1) Eve gets the HV to move VM_Alice back over the same physical address, Eve "somehow" gets VM_Alice to read that page and use its content (which would likely be a use of uninitialized memory bug, from VM_Alice's perspective) and you have a replay attack? For TME, this doesn't work as you cannot partially disable encryption, so if Eve tries to write the stolen encrypted bits E, even in the "right place", they get encrypted again to tme_encrypt(hwaddr, E). Upon decryption, VM_Alice will get E, not C. Note 1: Actually, even if with MKTME you cannot disable encryption but *if* Eve knows its own key, Eve can always write a preimage P that the CPU encrypts to E for VM_Alice to read back and decrypt: P = mktme_decrypt(VM_Eve_key, hwaddr, E) This is not possible with TME as Eve doesn't know the key used by the CPU and cannot compute P.
On Fri, 2018-12-07 at 13:59 -0800, Jarkko Sakkinen wrote: > On Fri, 2018-12-07 at 14:57 +0300, Kirill A. Shutemov wrote: > > > What is the threat model anyway for AMD and Intel technologies? > > > > > > For me it looks like that you can read, write and even replay > > > encrypted pages both in SME and TME. > > > > What replay attack are you talking about? MKTME uses AES-XTS with physical > > address tweak. So the data is tied to the place in physical address space > > and > > replacing one encrypted page with another encrypted page from different > > address will produce garbage on decryption. > > Just trying to understand how this works. > > So you use physical address like a nonce/version for the page and > thus prevent replay? Was not aware of this. The brutal fact is that a physical address is an astronomical stretch from a random value or increasing counter. Thus, it is fair to say that MKTME provides only naive measures against replay attacks... /Jarkko
On Fri, Dec 7, 2018 at 3:45 PM Sakkinen, Jarkko <jarkko.sakkinen@intel.com> wrote: > > On Fri, 2018-12-07 at 13:59 -0800, Jarkko Sakkinen wrote: > > On Fri, 2018-12-07 at 14:57 +0300, Kirill A. Shutemov wrote: > > > > What is the threat model anyway for AMD and Intel technologies? > > > > > > > > For me it looks like that you can read, write and even replay > > > > encrypted pages both in SME and TME. > > > > > > What replay attack are you talking about? MKTME uses AES-XTS with physical > > > address tweak. So the data is tied to the place in physical address space > > > and > > > replacing one encrypted page with another encrypted page from different > > > address will produce garbage on decryption. > > > > Just trying to understand how this works. > > > > So you use physical address like a nonce/version for the page and > > thus prevent replay? Was not aware of this. > > The brutal fact is that a physical address is an astronomical stretch > from a random value or increasing counter. Thus, it is fair to say that > MKTME provides only naive measures against replay attacks... > And this is potentially a big deal, since there are much simpler replay attacks that can compromise the system. For example, if I can replay the contents of a page table, I can write to freed memory. --Andy
> On Dec 6, 2018, at 5:55 PM, Huang, Kai <kai.huang@intel.com> wrote: > > >> >> TME itself provides a ton of protection -- you can't just barge into >> the datacenter, refrigerate the DIMMs, walk away with them, and read >> off everyone's data. >> >> Am I missing something? > > I think we can make such assumption in most cases, but I think it's better that we don't make any > assumption at all. For example, the admin of data center (or anyone) who has physical access to > servers may do something malicious. I am not expert but there should be other physical attack > methods besides coldboot attack, if the malicious employee can get physical access to server w/o > being detected. > >> >>> >>> But, I think what you're implying is that the security properties of >>> user-supplied keys can only be *worse* than using CPU-generated keys >>> (assuming the CPU does a good job generating it). So, why bother >>> allowing user-specified keys in the first place? >> >> That too :) > > I think one usage of user-specified key is for NVDIMM, since CPU key will be gone after machine > reboot, therefore if NVDIMM is encrypted by CPU key we are not able to retrieve it once > shutdown/reboot, etc. > > There are some other use cases that already require tenant to send key to CSP. For example, the VM > image can be provided by tenant and encrypted by tenant's own key, and tenant needs to send key to > CSP when asking CSP to run that encrypted image. I can imagine a few reasons why one would want to encrypt one’s image. For example, the CSP could issue a public key and state, or even attest, that the key is wrapped and locked to particular PCRs of their TPM or otherwise protected by an enclave that verifies that the key is only used to decrypt the image for the benefit of a hypervisor. I don’t see what MKTME has to do with this. The only remotely plausible way I can see to use MKTME for this is to have the hypervisor load a TPM (or other enclave) protected key into an MKTME user key slot and to load customer-provided ciphertext into the corresponding physical memory (using an MKTME no-encrypt slot). But this has three major problems. First, it's effectively just a fancy way to avoid one AES pass over the data. Second, sensible scheme for this type of VM image protection would use *authenticated* encryption or at least verify a signature, which MKTME can't do. The third problem is the real show-stopper, though: this scheme requires that the ciphertext go into predetermined physical addresses, which would be a giant mess.
On 12/7/18 3:53 PM, Andy Lutomirski wrote: > The third problem is the real show-stopper, though: this scheme > requires that the ciphertext go into predetermined physical > addresses, which would be a giant mess. There's a more fundamental problem than that. The tweak fed into the actual AES-XTS operation is determined by the firmware, programmed into the memory controller, and is not visible to software. So, not only would you need to put stuff at a fixed physical address, the tweaks can change from boot-to-boot, so whatever you did would only be good for one boot.
On Fri, 2018-12-07 at 23:45 +0000, Sakkinen, Jarkko wrote: > On Fri, 2018-12-07 at 13:59 -0800, Jarkko Sakkinen wrote: > > On Fri, 2018-12-07 at 14:57 +0300, Kirill A. Shutemov wrote: > > > > What is the threat model anyway for AMD and Intel technologies? > > > > > > > > For me it looks like that you can read, write and even replay > > > > encrypted pages both in SME and TME. > > > > > > What replay attack are you talking about? MKTME uses AES-XTS with physical > > > address tweak. So the data is tied to the place in physical address space > > > and > > > replacing one encrypted page with another encrypted page from different > > > address will produce garbage on decryption. > > > > Just trying to understand how this works. > > > > So you use physical address like a nonce/version for the page and > > thus prevent replay? Was not aware of this. > > The brutal fact is that a physical address is an astronomical stretch > from a random value or increasing counter. Thus, it is fair to say that > MKTME provides only naive measures against replay attacks... > > /Jarkko Currently there's no nonce to protect cache line so TME/MKTME is not able to prevent replay attack you mentioned. Currently MKTME only involves AES-XTS-128 encryption but nothing else. But like I said if I understand correctly even SEV doesn't have integrity protection so not able to prevent reply attack as well. Thanks, -Kai
> > There are some other use cases that already require tenant to send key to CSP. For example, the > > VM > > image can be provided by tenant and encrypted by tenant's own key, and tenant needs to send key > > to > > CSP when asking CSP to run that encrypted image. > > > I can imagine a few reasons why one would want to encrypt one’s image. > For example, the CSP could issue a public key and state, or even > attest, that the key is wrapped and locked to particular PCRs of their > TPM or otherwise protected by an enclave that verifies that the key is > only used to decrypt the image for the benefit of a hypervisor. Right. I think before tenant releases key to CSP it should always use attestation authority to verify the trustiness of computer node. I can understand that the key can be wrapped by TPM before sending to CSP but need some catch up about using enclave part. The thing is computer node can be trusted doesn't mean it cannot be attacked, or even it doesn't mean it can prevent, ie some malicious admin, to get tenant key even by using legitimate way. There are many SW components involved here. Anyway this is not related to MKTME itself like you mentioned below, therefore the point is, as we already see MKTME itself provides very weak security protection, we need to see whether MKTME has value from the whole use case's point of view (including all the things you mentioned above) -- we define the whole use case, we clearly state who/what should be in trust boundary, and what we can prevent, etc. > > I don’t see what MKTME has to do with this. The only remotely > plausible way I can see to use MKTME for this is to have the > hypervisor load a TPM (or other enclave) protected key into an MKTME > user key slot and to load customer-provided ciphertext into the > corresponding physical memory (using an MKTME no-encrypt slot). But > this has three major problems. First, it's effectively just a fancy > way to avoid one AES pass over the data. Second, sensible scheme for > this type of VM image protection would use *authenticated* encryption > or at least verify a signature, which MKTME can't do. The third > problem is the real show-stopper, though: this scheme requires that > the ciphertext go into predetermined physical addresses, which would > be a giant mess. My intention was to say if we are already sending key to CSP, then we may prefer to use the key for MKTME VM runtime protection as well, but like you said we may not have real security gain here comparing to TME, so I agree we need to find out one specific case to prove that. Thanks, -Kai
On Sat, 2018-12-08 at 09:33 +0800, Huang, Kai wrote: > Currently there's no nonce to protect cache line so TME/MKTME is not able to > prevent replay attack > you mentioned. Currently MKTME only involves AES-XTS-128 encryption but > nothing else. But like I > said if I understand correctly even SEV doesn't have integrity protection so > not able to prevent > reply attack as well. You're absolutely correct. There's a also good paper on SEV subvertion: https://arxiv.org/pdf/1805.09604.pdf I don't think this makes MKTME or SEV uselss, but yeah, it is a constraint that needs to be taken into consideration when finding the best way to use these technologies in Linux. /Jarkko
On Fri, 2018-12-07 at 15:45 -0800, Jarkko Sakkinen wrote: > The brutal fact is that a physical address is an astronomical stretch > from a random value or increasing counter. Thus, it is fair to say that > MKTME provides only naive measures against replay attacks... I'll try to summarize how I understand the high level security model of MKTME because (would be good idea to document it). Assumptions: 1. The hypervisor has not been infiltrated. 2. The hypervisor does not leak secrets. When (1) and (2) hold [1], we harden VMs in two different ways: A. VMs cannot leak data to each other or can they with L1TF when HT is enabled? B. Protects against cold boot attacks. Isn't this what this about in the nutshell roughly? [1] XPFO could potentially be an opt-in feature that reduces the damage when either of these assumptions has been broken. /Jarkko
On Wed, Dec 12, 2018 at 7:31 AM Sakkinen, Jarkko <jarkko.sakkinen@intel.com> wrote: > > On Fri, 2018-12-07 at 15:45 -0800, Jarkko Sakkinen wrote: > > The brutal fact is that a physical address is an astronomical stretch > > from a random value or increasing counter. Thus, it is fair to say that > > MKTME provides only naive measures against replay attacks... > > I'll try to summarize how I understand the high level security > model of MKTME because (would be good idea to document it). > > Assumptions: > > 1. The hypervisor has not been infiltrated. > 2. The hypervisor does not leak secrets. > > When (1) and (2) hold [1], we harden VMs in two different ways: > > A. VMs cannot leak data to each other or can they with L1TF when HT > is enabled? I strongly suspect that, on L1TF-vulnerable CPUs, MKTME provides no protection whatsoever. It sounds like MKTME is implemented in the memory controller -- as far as the rest of the CPU and the cache hierarchy are concerned, the MKTME key selction bits are just part of the physical address. So an attack like L1TF that leaks a cacheline that's selected by physical address will leak the cleartext if the key selection bits are set correctly. (I suppose that, if the attacker needs to brute-force the physical address, then MKTME makes it a bit harder because the effective physical address space is larger.) > B. Protects against cold boot attacks. TME does this, AFAIK. MKTME does, too, unless the "user" mode is used, in which case the protection is weaker. > > Isn't this what this about in the nutshell roughly? > > [1] XPFO could potentially be an opt-in feature that reduces the > damage when either of these assumptions has been broken. > > /Jarkko
On Wed, 2018-12-12 at 08:29 -0800, Andy Lutomirski wrote: > On Wed, Dec 12, 2018 at 7:31 AM Sakkinen, Jarkko > <jarkko.sakkinen@intel.com> wrote: > > On Fri, 2018-12-07 at 15:45 -0800, Jarkko Sakkinen wrote: > > > The brutal fact is that a physical address is an astronomical stretch > > > from a random value or increasing counter. Thus, it is fair to say that > > > MKTME provides only naive measures against replay attacks... > > > > I'll try to summarize how I understand the high level security > > model of MKTME because (would be good idea to document it). > > > > Assumptions: > > > > 1. The hypervisor has not been infiltrated. > > 2. The hypervisor does not leak secrets. > > > > When (1) and (2) hold [1], we harden VMs in two different ways: > > > > A. VMs cannot leak data to each other or can they with L1TF when HT > > is enabled? > > I strongly suspect that, on L1TF-vulnerable CPUs, MKTME provides no > protection whatsoever. It sounds like MKTME is implemented in the > memory controller -- as far as the rest of the CPU and the cache > hierarchy are concerned, the MKTME key selction bits are just part of > the physical address. So an attack like L1TF that leaks a cacheline > that's selected by physical address will leak the cleartext if the key > selection bits are set correctly. > > (I suppose that, if the attacker needs to brute-force the physical > address, then MKTME makes it a bit harder because the effective > physical address space is larger.) > > > B. Protects against cold boot attacks. > > TME does this, AFAIK. MKTME does, too, unless the "user" mode is > used, in which case the protection is weaker. > > > Isn't this what this about in the nutshell roughly? > > > > [1] XPFO could potentially be an opt-in feature that reduces the > > damage when either of these assumptions has been broken. This all should be summarized in the documentation (high-level model and corner cases). /Jarkko
> I strongly suspect that, on L1TF-vulnerable CPUs, MKTME provides no > protection whatsoever. It sounds like MKTME is implemented in the > memory controller -- as far as the rest of the CPU and the cache hierarchy > are concerned, the MKTME key selction bits are just part of the physical > address. So an attack like L1TF that leaks a cacheline that's selected by > physical address will leak the cleartext if the key selection bits are set > correctly. Right. MKTME doesn't prevent cache based attack. Data in cache is in clear. Thanks, -Kai
> This all should be summarized in the documentation (high-level model and > corner cases). I am not sure whether it is necessary to document L1TF explicitly, since it is quite obvious that MKTME doesn't prevent that. IMHO if needed we only need to mention MKTME doesn't prevent any sort of cache based attack, since data in cache is in clear. In fact SGX doesn't prevent this either.. Thanks, -Kai
On Thu, 2018-12-13 at 07:27 +0800, Huang, Kai wrote: > > This all should be summarized in the documentation (high-level model and > > corner cases). > > I am not sure whether it is necessary to document L1TF explicitly, since it is > quite obvious that MKTME doesn't prevent that. IMHO if needed we only need to > mention MKTME doesn't prevent any sort of cache based attack, since data in > cache is in clear. > > In fact SGX doesn't prevent this either.. Sorry, was a bit unclear. I meant the assumptions and goals. /Jarkko
On Thu, 2018-12-13 at 07:49 +0200, Jarkko Sakkinen wrote: > On Thu, 2018-12-13 at 07:27 +0800, Huang, Kai wrote: > > > This all should be summarized in the documentation (high-level model and > > > corner cases). > > > > I am not sure whether it is necessary to document L1TF explicitly, since it > > is > > quite obvious that MKTME doesn't prevent that. IMHO if needed we only need > > to > > mention MKTME doesn't prevent any sort of cache based attack, since data in > > cache is in clear. > > > > In fact SGX doesn't prevent this either.. > > Sorry, was a bit unclear. I meant the assumptions and goals. I.e. what I put in my earlier response, what belongs to TCB and what types adversaries is pursued to be protected. /Jarkko