Message ID | 20190903131504.18935-4-thomas_os@shipmail.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Have TTM support SEV encryption with coherent memory | expand |
This whole thing looks like a fascinating collection of hacks. :) ttm is taking a stack-alllocated "VMA" and handing it to vmf_insert_*() which obviously are expecting "real" VMAs that are linked into the mm. It's extracting some pgprot_t information from the real VMA, making a psuedo-temporary VMA, then passing the temporary one back into the insertion functions: > static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) > { ... > struct vm_area_struct cvma; ... > if (vma->vm_flags & VM_MIXEDMAP) > ret = vmf_insert_mixed(&cvma, address, > __pfn_to_pfn_t(pfn, PFN_DEV)); > else > ret = vmf_insert_pfn(&cvma, address, pfn); I can totally see why this needs new exports. But, man, it doesn't seem like something we want to keep *feeding*. The real problem here is that the encryption bits from the device VMA's "true" vma->vm_page_prot don't match the ones that actually get inserted, probably because the device ptes need the encryption bits cleared but the system memory PTEs need them set *and* they're mixed under one VMA. The thing we need to stop is having mixed encryption rules under one VMA.
On Tue, Sep 3, 2019 at 9:38 PM Dave Hansen <dave.hansen@intel.com> wrote: > > This whole thing looks like a fascinating collection of hacks. :) > > ttm is taking a stack-alllocated "VMA" and handing it to vmf_insert_*() > which obviously are expecting "real" VMAs that are linked into the mm. > It's extracting some pgprot_t information from the real VMA, making a > psuedo-temporary VMA, then passing the temporary one back into the > insertion functions: > > > static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) > > { > ... > > struct vm_area_struct cvma; > ... > > if (vma->vm_flags & VM_MIXEDMAP) > > ret = vmf_insert_mixed(&cvma, address, > > __pfn_to_pfn_t(pfn, PFN_DEV)); > > else > > ret = vmf_insert_pfn(&cvma, address, pfn); > > I can totally see why this needs new exports. But, man, it doesn't seem > like something we want to keep *feeding*. > > The real problem here is that the encryption bits from the device VMA's > "true" vma->vm_page_prot don't match the ones that actually get > inserted, probably because the device ptes need the encryption bits > cleared but the system memory PTEs need them set *and* they're mixed > under one VMA. > > The thing we need to stop is having mixed encryption rules under one VMA. The point here is that we want this. We need to be able to move the buffer between device ptes and system memory ptes, transparently, behind userspace back, without races. And the fast path (which is "no pte exists for this vma") must be real fast, so taking mmap_sem and replacing the vma is no-go. -Daniel
On 9/3/19 12:51 PM, Daniel Vetter wrote: >> The thing we need to stop is having mixed encryption rules under one VMA. > The point here is that we want this. We need to be able to move the > buffer between device ptes and system memory ptes, transparently, > behind userspace back, without races. And the fast path (which is "no > pte exists for this vma") must be real fast, so taking mmap_sem and > replacing the vma is no-go. So, when the user asks for encryption and we say, "sure, we'll encrypt that", then we want the device driver to be able to transparently undo that encryption under the covers for device memory? That seems suboptimal. I'd rather the device driver just say: "Nope, you can't encrypt my VMA". Because that's the truth.
On 9/3/19 9:55 PM, Dave Hansen wrote: > On 9/3/19 12:51 PM, Daniel Vetter wrote: >>> The thing we need to stop is having mixed encryption rules under one VMA. >> The point here is that we want this. We need to be able to move the >> buffer between device ptes and system memory ptes, transparently, >> behind userspace back, without races. And the fast path (which is "no >> pte exists for this vma") must be real fast, so taking mmap_sem and >> replacing the vma is no-go. > So, when the user asks for encryption and we say, "sure, we'll encrypt > that", then we want the device driver to be able to transparently undo > that encryption under the covers for device memory? That seems suboptimal. > > I'd rather the device driver just say: "Nope, you can't encrypt my VMA". > Because that's the truth. The thing here is that it's the underlying physical memory that define the correct encryption flags. If it's DMA memory and SEV is active or PCI memory. It's always unencrypted. User-space in a SEV vm should always, from a data protection point of view, *assume* that graphics buffers are unencrypted. (Which will of course limit the use of gpus and display controllers in a SEV vm). Platform code sets the vma encryption to on by default. So the question here should really be, can we determine already at mmap time whether backing memory will be unencrypted and adjust the *real* vma->vm_page_prot under the mmap_sem? Possibly, but that requires populating the buffer with memory at mmap time rather than at first fault time. And it still requires knowledge whether the device DMA is always unencrypted (or if SEV is active). /Thomas
On 9/3/19 1:36 PM, Thomas Hellström (VMware) wrote: > So the question here should really be, can we determine already at mmap > time whether backing memory will be unencrypted and adjust the *real* > vma->vm_page_prot under the mmap_sem? > > Possibly, but that requires populating the buffer with memory at mmap > time rather than at first fault time. I'm not connecting the dots. vma->vm_page_prot is used to create a VMA's PTEs regardless of if they are created at mmap() or fault time. If we establish a good vma->vm_page_prot, can't we just use it forever for demand faults? Or, are you concerned that if an attempt is made to demand-fault page that's incompatible with vma->vm_page_prot that we have to SEGV? > And it still requires knowledge whether the device DMA is always > unencrypted (or if SEV is active). I may be getting mixed up on MKTME (the Intel memory encryption) and SEV. Is SEV supported on all memory types? Page cache, hugetlbfs, anonymous? Or just anonymous?
On 9/3/19 10:51 PM, Dave Hansen wrote: > On 9/3/19 1:36 PM, Thomas Hellström (VMware) wrote: >> So the question here should really be, can we determine already at mmap >> time whether backing memory will be unencrypted and adjust the *real* >> vma->vm_page_prot under the mmap_sem? >> >> Possibly, but that requires populating the buffer with memory at mmap >> time rather than at first fault time. > I'm not connecting the dots. > > vma->vm_page_prot is used to create a VMA's PTEs regardless of if they > are created at mmap() or fault time. If we establish a good > vma->vm_page_prot, can't we just use it forever for demand faults? With SEV I think that we could possibly establish the encryption flags at vma creation time. But thinking of it, it would actually break with SME where buffer content can be moved between encrypted system memory and unencrypted graphics card PCI memory behind user-space's back. That would imply killing all user-space encrypted PTEs and at fault time set up new ones pointing to unencrypted PCI memory.. > > Or, are you concerned that if an attempt is made to demand-fault page > that's incompatible with vma->vm_page_prot that we have to SEGV? > >> And it still requires knowledge whether the device DMA is always >> unencrypted (or if SEV is active). > I may be getting mixed up on MKTME (the Intel memory encryption) and > SEV. Is SEV supported on all memory types? Page cache, hugetlbfs, > anonymous? Or just anonymous? SEV AFAIK encrypts *all* memory except DMA memory. To do that it uses a SWIOTLB backed by unencrypted memory, and it also flips coherent DMA memory to unencrypted (which is a very slow operation and patch 4 deals with caching such memory). /Thomas
On Tue, Sep 3, 2019 at 2:05 PM Thomas Hellström (VMware) <thomas_os@shipmail.org> wrote: > > On 9/3/19 10:51 PM, Dave Hansen wrote: > > On 9/3/19 1:36 PM, Thomas Hellström (VMware) wrote: > >> So the question here should really be, can we determine already at mmap > >> time whether backing memory will be unencrypted and adjust the *real* > >> vma->vm_page_prot under the mmap_sem? > >> > >> Possibly, but that requires populating the buffer with memory at mmap > >> time rather than at first fault time. > > I'm not connecting the dots. > > > > vma->vm_page_prot is used to create a VMA's PTEs regardless of if they > > are created at mmap() or fault time. If we establish a good > > vma->vm_page_prot, can't we just use it forever for demand faults? > > With SEV I think that we could possibly establish the encryption flags > at vma creation time. But thinking of it, it would actually break with > SME where buffer content can be moved between encrypted system memory > and unencrypted graphics card PCI memory behind user-space's back. That > would imply killing all user-space encrypted PTEs and at fault time set > up new ones pointing to unencrypted PCI memory.. > > > > > Or, are you concerned that if an attempt is made to demand-fault page > > that's incompatible with vma->vm_page_prot that we have to SEGV? > > > >> And it still requires knowledge whether the device DMA is always > >> unencrypted (or if SEV is active). > > I may be getting mixed up on MKTME (the Intel memory encryption) and > > SEV. Is SEV supported on all memory types? Page cache, hugetlbfs, > > anonymous? Or just anonymous? > > SEV AFAIK encrypts *all* memory except DMA memory. To do that it uses a > SWIOTLB backed by unencrypted memory, and it also flips coherent DMA > memory to unencrypted (which is a very slow operation and patch 4 deals > with caching such memory). > I'm still lost. You have some fancy VMA where the backing pages change behind the application's back. This isn't particularly novel -- plain old anonymous memory and plain old mapped files do this too. Can't you all the insert_pfn APIs and call it a day? What's so special that you need all this magic? ISTM you should be able to allocate memory that's addressable by the device (dma_alloc_coherent() or whatever) and then map it into user memory just like you'd map any other page. I feel like I'm missing something here.
On 9/3/19 11:46 PM, Andy Lutomirski wrote: > On Tue, Sep 3, 2019 at 2:05 PM Thomas Hellström (VMware) > <thomas_os@shipmail.org> wrote: >> On 9/3/19 10:51 PM, Dave Hansen wrote: >>> On 9/3/19 1:36 PM, Thomas Hellström (VMware) wrote: >>>> So the question here should really be, can we determine already at mmap >>>> time whether backing memory will be unencrypted and adjust the *real* >>>> vma->vm_page_prot under the mmap_sem? >>>> >>>> Possibly, but that requires populating the buffer with memory at mmap >>>> time rather than at first fault time. >>> I'm not connecting the dots. >>> >>> vma->vm_page_prot is used to create a VMA's PTEs regardless of if they >>> are created at mmap() or fault time. If we establish a good >>> vma->vm_page_prot, can't we just use it forever for demand faults? >> With SEV I think that we could possibly establish the encryption flags >> at vma creation time. But thinking of it, it would actually break with >> SME where buffer content can be moved between encrypted system memory >> and unencrypted graphics card PCI memory behind user-space's back. That >> would imply killing all user-space encrypted PTEs and at fault time set >> up new ones pointing to unencrypted PCI memory.. >> >>> Or, are you concerned that if an attempt is made to demand-fault page >>> that's incompatible with vma->vm_page_prot that we have to SEGV? >>> >>>> And it still requires knowledge whether the device DMA is always >>>> unencrypted (or if SEV is active). >>> I may be getting mixed up on MKTME (the Intel memory encryption) and >>> SEV. Is SEV supported on all memory types? Page cache, hugetlbfs, >>> anonymous? Or just anonymous? >> SEV AFAIK encrypts *all* memory except DMA memory. To do that it uses a >> SWIOTLB backed by unencrypted memory, and it also flips coherent DMA >> memory to unencrypted (which is a very slow operation and patch 4 deals >> with caching such memory). >> > I'm still lost. You have some fancy VMA where the backing pages > change behind the application's back. This isn't particularly novel > -- plain old anonymous memory and plain old mapped files do this too. > Can't you all the insert_pfn APIs and call it a day? What's so > special that you need all this magic? ISTM you should be able to > allocate memory that's addressable by the device (dma_alloc_coherent() > or whatever) and then map it into user memory just like you'd map any > other page. > > I feel like I'm missing something here. Yes, so in this case we use dma_alloc_coherent(). With SEV, that gives us unencrypted pages. (Pages whose linear kernel map is marked unencrypted). With SME that (typcially) gives us encrypted pages. In both these cases, vm_get_page_prot() returns an encrypted page protection, which lands in vma->vm_page_prot. In the SEV case, we therefore need to modify the page protection to unencrypted. Hence we need to know whether we're running under SEV and therefore need to modify the protection. If not, the user-space PTE would incorrectly have the encryption flag set. /Thomas
On 9/4/19 12:08 AM, Thomas Hellström (VMware) wrote: > On 9/3/19 11:46 PM, Andy Lutomirski wrote: >> On Tue, Sep 3, 2019 at 2:05 PM Thomas Hellström (VMware) >> <thomas_os@shipmail.org> wrote: >>> On 9/3/19 10:51 PM, Dave Hansen wrote: >>>> On 9/3/19 1:36 PM, Thomas Hellström (VMware) wrote: >>>>> So the question here should really be, can we determine already at >>>>> mmap >>>>> time whether backing memory will be unencrypted and adjust the *real* >>>>> vma->vm_page_prot under the mmap_sem? >>>>> >>>>> Possibly, but that requires populating the buffer with memory at mmap >>>>> time rather than at first fault time. >>>> I'm not connecting the dots. >>>> >>>> vma->vm_page_prot is used to create a VMA's PTEs regardless of if they >>>> are created at mmap() or fault time. If we establish a good >>>> vma->vm_page_prot, can't we just use it forever for demand faults? >>> With SEV I think that we could possibly establish the encryption flags >>> at vma creation time. But thinking of it, it would actually break with >>> SME where buffer content can be moved between encrypted system memory >>> and unencrypted graphics card PCI memory behind user-space's back. That >>> would imply killing all user-space encrypted PTEs and at fault time set >>> up new ones pointing to unencrypted PCI memory.. >>> >>>> Or, are you concerned that if an attempt is made to demand-fault page >>>> that's incompatible with vma->vm_page_prot that we have to SEGV? >>>> >>>>> And it still requires knowledge whether the device DMA is always >>>>> unencrypted (or if SEV is active). >>>> I may be getting mixed up on MKTME (the Intel memory encryption) and >>>> SEV. Is SEV supported on all memory types? Page cache, hugetlbfs, >>>> anonymous? Or just anonymous? >>> SEV AFAIK encrypts *all* memory except DMA memory. To do that it uses a >>> SWIOTLB backed by unencrypted memory, and it also flips coherent DMA >>> memory to unencrypted (which is a very slow operation and patch 4 deals >>> with caching such memory). >>> >> I'm still lost. You have some fancy VMA where the backing pages >> change behind the application's back. This isn't particularly novel >> -- plain old anonymous memory and plain old mapped files do this too. >> Can't you all the insert_pfn APIs and call it a day? What's so >> special that you need all this magic? ISTM you should be able to >> allocate memory that's addressable by the device (dma_alloc_coherent() >> or whatever) and then map it into user memory just like you'd map any >> other page. >> >> I feel like I'm missing something here. > > Yes, so in this case we use dma_alloc_coherent(). > > With SEV, that gives us unencrypted pages. (Pages whose linear kernel > map is marked unencrypted). With SME that (typcially) gives us > encrypted pages. In both these cases, vm_get_page_prot() returns > an encrypted page protection, which lands in vma->vm_page_prot. > > In the SEV case, we therefore need to modify the page protection to > unencrypted. Hence we need to know whether we're running under SEV and > therefore need to modify the protection. If not, the user-space PTE > would incorrectly have the encryption flag set. > > /Thomas > > And, of course, had we not been "fancy", we could have used dma_mmap_coherent(), which in theory should set up the correct user-space page protection. But now we're moving stuff around so we can't. /Thomas
Thomas, this series has garnered a nak and a whole pile of thoroughly confused reviewers. Could you take another stab at this along with a more ample changelog explaining the context of the problem? I suspect that's a better place to start than having us all piece together the disparate parts of the thread.
> On Sep 3, 2019, at 3:15 PM, Thomas Hellström (VMware) <thomas_os@shipmail.org> wrote: > >> On 9/4/19 12:08 AM, Thomas Hellström (VMware) wrote: >>> On 9/3/19 11:46 PM, Andy Lutomirski wrote: >>> On Tue, Sep 3, 2019 at 2:05 PM Thomas Hellström (VMware) >>> <thomas_os@shipmail.org> wrote: >>>> On 9/3/19 10:51 PM, Dave Hansen wrote: >>>>>> On 9/3/19 1:36 PM, Thomas Hellström (VMware) wrote: >>>>>> So the question here should really be, can we determine already at mmap >>>>>> time whether backing memory will be unencrypted and adjust the *real* >>>>>> vma->vm_page_prot under the mmap_sem? >>>>>> >>>>>> Possibly, but that requires populating the buffer with memory at mmap >>>>>> time rather than at first fault time. >>>>> I'm not connecting the dots. >>>>> >>>>> vma->vm_page_prot is used to create a VMA's PTEs regardless of if they >>>>> are created at mmap() or fault time. If we establish a good >>>>> vma->vm_page_prot, can't we just use it forever for demand faults? >>>> With SEV I think that we could possibly establish the encryption flags >>>> at vma creation time. But thinking of it, it would actually break with >>>> SME where buffer content can be moved between encrypted system memory >>>> and unencrypted graphics card PCI memory behind user-space's back. That >>>> would imply killing all user-space encrypted PTEs and at fault time set >>>> up new ones pointing to unencrypted PCI memory.. >>>> >>>>> Or, are you concerned that if an attempt is made to demand-fault page >>>>> that's incompatible with vma->vm_page_prot that we have to SEGV? >>>>> >>>>>> And it still requires knowledge whether the device DMA is always >>>>>> unencrypted (or if SEV is active). >>>>> I may be getting mixed up on MKTME (the Intel memory encryption) and >>>>> SEV. Is SEV supported on all memory types? Page cache, hugetlbfs, >>>>> anonymous? Or just anonymous? >>>> SEV AFAIK encrypts *all* memory except DMA memory. To do that it uses a >>>> SWIOTLB backed by unencrypted memory, and it also flips coherent DMA >>>> memory to unencrypted (which is a very slow operation and patch 4 deals >>>> with caching such memory). >>>> >>> I'm still lost. You have some fancy VMA where the backing pages >>> change behind the application's back. This isn't particularly novel >>> -- plain old anonymous memory and plain old mapped files do this too. >>> Can't you all the insert_pfn APIs and call it a day? What's so >>> special that you need all this magic? ISTM you should be able to >>> allocate memory that's addressable by the device (dma_alloc_coherent() >>> or whatever) and then map it into user memory just like you'd map any >>> other page. >>> >>> I feel like I'm missing something here. >> >> Yes, so in this case we use dma_alloc_coherent(). >> >> With SEV, that gives us unencrypted pages. (Pages whose linear kernel map is marked unencrypted). With SME that (typcially) gives us encrypted pages. In both these cases, vm_get_page_prot() returns >> an encrypted page protection, which lands in vma->vm_page_prot. >> >> In the SEV case, we therefore need to modify the page protection to unencrypted. Hence we need to know whether we're running under SEV and therefore need to modify the protection. If not, the user-space PTE would incorrectly have the encryption flag set. >> I’m still confused. You got unencrypted pages with an unencrypted PFN. Why do you need to fiddle? You have a PFN, and you’re inserting it with vmf_insert_pfn(). This should just work, no? There doesn’t seem to be any real funny business in dma_mmap_attrs() or dma_common_mmap(). But, reading this, I have more questions: Can’t you get rid of cvma by using vmf_insert_pfn_prot()? Would it make sense to add a vmf_insert_dma_page() to directly do exactly what you’re trying to do? And a broader question just because I’m still confused: why isn’t the encryption bit in the PFN? The whole SEV/SME system seems like it’s trying a bit to hard to be fully invisible to the kernel.
On 9/4/19 1:15 AM, Andy Lutomirski wrote: > >> On Sep 3, 2019, at 3:15 PM, Thomas Hellström (VMware) <thomas_os@shipmail.org> wrote: >> >>> On 9/4/19 12:08 AM, Thomas Hellström (VMware) wrote: >>>> On 9/3/19 11:46 PM, Andy Lutomirski wrote: >>>> On Tue, Sep 3, 2019 at 2:05 PM Thomas Hellström (VMware) >>>> <thomas_os@shipmail.org> wrote: >>>>> On 9/3/19 10:51 PM, Dave Hansen wrote: >>>>>>> On 9/3/19 1:36 PM, Thomas Hellström (VMware) wrote: >>>>>>> So the question here should really be, can we determine already at mmap >>>>>>> time whether backing memory will be unencrypted and adjust the *real* >>>>>>> vma->vm_page_prot under the mmap_sem? >>>>>>> >>>>>>> Possibly, but that requires populating the buffer with memory at mmap >>>>>>> time rather than at first fault time. >>>>>> I'm not connecting the dots. >>>>>> >>>>>> vma->vm_page_prot is used to create a VMA's PTEs regardless of if they >>>>>> are created at mmap() or fault time. If we establish a good >>>>>> vma->vm_page_prot, can't we just use it forever for demand faults? >>>>> With SEV I think that we could possibly establish the encryption flags >>>>> at vma creation time. But thinking of it, it would actually break with >>>>> SME where buffer content can be moved between encrypted system memory >>>>> and unencrypted graphics card PCI memory behind user-space's back. That >>>>> would imply killing all user-space encrypted PTEs and at fault time set >>>>> up new ones pointing to unencrypted PCI memory.. >>>>> >>>>>> Or, are you concerned that if an attempt is made to demand-fault page >>>>>> that's incompatible with vma->vm_page_prot that we have to SEGV? >>>>>> >>>>>>> And it still requires knowledge whether the device DMA is always >>>>>>> unencrypted (or if SEV is active). >>>>>> I may be getting mixed up on MKTME (the Intel memory encryption) and >>>>>> SEV. Is SEV supported on all memory types? Page cache, hugetlbfs, >>>>>> anonymous? Or just anonymous? >>>>> SEV AFAIK encrypts *all* memory except DMA memory. To do that it uses a >>>>> SWIOTLB backed by unencrypted memory, and it also flips coherent DMA >>>>> memory to unencrypted (which is a very slow operation and patch 4 deals >>>>> with caching such memory). >>>>> >>>> I'm still lost. You have some fancy VMA where the backing pages >>>> change behind the application's back. This isn't particularly novel >>>> -- plain old anonymous memory and plain old mapped files do this too. >>>> Can't you all the insert_pfn APIs and call it a day? What's so >>>> special that you need all this magic? ISTM you should be able to >>>> allocate memory that's addressable by the device (dma_alloc_coherent() >>>> or whatever) and then map it into user memory just like you'd map any >>>> other page. >>>> >>>> I feel like I'm missing something here. >>> Yes, so in this case we use dma_alloc_coherent(). >>> >>> With SEV, that gives us unencrypted pages. (Pages whose linear kernel map is marked unencrypted). With SME that (typcially) gives us encrypted pages. In both these cases, vm_get_page_prot() returns >>> an encrypted page protection, which lands in vma->vm_page_prot. >>> >>> In the SEV case, we therefore need to modify the page protection to unencrypted. Hence we need to know whether we're running under SEV and therefore need to modify the protection. If not, the user-space PTE would incorrectly have the encryption flag set. >>> > I’m still confused. You got unencrypted pages with an unencrypted PFN. Why do you need to fiddle? You have a PFN, and you’re inserting it with vmf_insert_pfn(). This should just work, no? OK now I see what causes the confusion. With SEV, the encryption state is, while *physically* encoded in an address bit, from what I can tell, not *logically* encoded in the pfn, but in the page_prot for cpu mapping purposes. That is, page_to_pfn() returns the same pfn whether the page is encrypted or unencrypted. Hence nobody can't tell from the pfn whether the page is unencrypted or encrypted. For device DMA address purposes, the encryption status is encoded in the dma address by the dma layer in phys_to_dma(). > There doesn’t seem to be any real funny business in dma_mmap_attrs() or dma_common_mmap(). No, from what I can tell the call in these functions to dma_pgprot() generates an incorrect page protection since it doesn't take unencrypted coherent memory into account. I don't think anybody has used these functions yet with SEV. > > But, reading this, I have more questions: > > Can’t you get rid of cvma by using vmf_insert_pfn_prot()? It looks like that, although there are comments in the code about serious performance problems using VM_PFNMAP / vmf_insert_pfn() with write-combining and PAT, so that would require some serious testing with hardware I don't have. But I guess there is definitely room for improvement here. Ideally we'd like to be able to change the vma->vm_page_prot within fault(). But we can > > Would it make sense to add a vmf_insert_dma_page() to directly do exactly what you’re trying to do? Yes, but as a longer term solution I would prefer a general dma_pgprot() exported, so that we could, in a dma-compliant way, use coherent pages with other apis, like kmap_atomic_prot() and vmap(). That is, basically split coherent page allocation in two steps: Allocation and mapping. > > And a broader question just because I’m still confused: why isn’t the encryption bit in the PFN? The whole SEV/SME system seems like it’s trying a bit to hard to be fully invisible to the kernel. I guess you'd have to ask AMD about that. But my understanding is that encoding it in an address bit does make it trivial to do decryption / encryption on the fly to DMA devices that are not otherwise aware of it, just by handing them a special physical address. For cpu mapping purposes it might become awkward to encode it in the pfn since pfn_to_page and friends would need knowledge about this. Personally I think it would have made sense to track it like PAT in track_pfn_insert(). Thanks, Thomas
Am 03.09.19 um 23:05 schrieb Thomas Hellström (VMware): > On 9/3/19 10:51 PM, Dave Hansen wrote: >> On 9/3/19 1:36 PM, Thomas Hellström (VMware) wrote: >>> So the question here should really be, can we determine already at mmap >>> time whether backing memory will be unencrypted and adjust the *real* >>> vma->vm_page_prot under the mmap_sem? >>> >>> Possibly, but that requires populating the buffer with memory at mmap >>> time rather than at first fault time. >> I'm not connecting the dots. >> >> vma->vm_page_prot is used to create a VMA's PTEs regardless of if they >> are created at mmap() or fault time. If we establish a good >> vma->vm_page_prot, can't we just use it forever for demand faults? > > With SEV I think that we could possibly establish the encryption flags > at vma creation time. But thinking of it, it would actually break with > SME where buffer content can be moved between encrypted system memory > and unencrypted graphics card PCI memory behind user-space's back. > That would imply killing all user-space encrypted PTEs and at fault > time set up new ones pointing to unencrypted PCI memory.. Well my problem is where do you see encrypted system memory here? At least for AMD GPUs all memory accessed must be unencrypted and that counts for both system as well as PCI memory. So I don't get why we can't assume always unencrypted and keep it like that. Regards, Christian.
On Wed, Sep 4, 2019 at 8:49 AM Thomas Hellström (VMware) <thomas_os@shipmail.org> wrote: > On 9/4/19 1:15 AM, Andy Lutomirski wrote: > > But, reading this, I have more questions: > > > > Can’t you get rid of cvma by using vmf_insert_pfn_prot()? > > It looks like that, although there are comments in the code about > serious performance problems using VM_PFNMAP / vmf_insert_pfn() with > write-combining and PAT, so that would require some serious testing with > hardware I don't have. But I guess there is definitely room for > improvement here. Ideally we'd like to be able to change the > vma->vm_page_prot within fault(). But we can Just a quick comment on this: It's the repeated (per-pfn/pte) lookup of the PAT tables, which are dead slow. If you have a struct io_mapping then that can be done once, and then just blindly inserted. See remap_io_mapping in i915. -Daniel
Hi, Christian, On 9/4/19 9:33 AM, Koenig, Christian wrote: > Am 03.09.19 um 23:05 schrieb Thomas Hellström (VMware): >> On 9/3/19 10:51 PM, Dave Hansen wrote: >>> On 9/3/19 1:36 PM, Thomas Hellström (VMware) wrote: >>>> So the question here should really be, can we determine already at mmap >>>> time whether backing memory will be unencrypted and adjust the *real* >>>> vma->vm_page_prot under the mmap_sem? >>>> >>>> Possibly, but that requires populating the buffer with memory at mmap >>>> time rather than at first fault time. >>> I'm not connecting the dots. >>> >>> vma->vm_page_prot is used to create a VMA's PTEs regardless of if they >>> are created at mmap() or fault time. If we establish a good >>> vma->vm_page_prot, can't we just use it forever for demand faults? >> With SEV I think that we could possibly establish the encryption flags >> at vma creation time. But thinking of it, it would actually break with >> SME where buffer content can be moved between encrypted system memory >> and unencrypted graphics card PCI memory behind user-space's back. >> That would imply killing all user-space encrypted PTEs and at fault >> time set up new ones pointing to unencrypted PCI memory.. > Well my problem is where do you see encrypted system memory here? > > At least for AMD GPUs all memory accessed must be unencrypted and that > counts for both system as well as PCI memory. We're talking SME now right? The current SME setup is that if a device's DMA mask says it's capable of addressing the encryption bit, coherent memory will be encrypted. The memory controllers will decrypt for the device on the fly. Otherwise coherent memory will be decrypted. > > So I don't get why we can't assume always unencrypted and keep it like that. I see two reasons. First, it would break with a real device that signals it's capable of addressing the encryption bit. Second I can imagine unaccelerated setups (something like vkms using prime feeding a VNC connection) where we actually want the TTM buffers encrypted to protect data. But at least the latter reason is way far out in the future. So for me I'm ok with that if that works for you? /Thomas > > Regards, > Christian.
Hi, Dave, On 9/4/19 1:10 AM, Dave Hansen wrote: > Thomas, this series has garnered a nak and a whole pile of thoroughly > confused reviewers. > > Could you take another stab at this along with a more ample changelog > explaining the context of the problem? I suspect that's a better place > to start than having us all piece together the disparate parts of the > thread. Sure. I was just trying to follow up on the emails to get a better understanding what got people confused in the first place. Thanks, Thomas
On 9/4/19 10:19 AM, Thomas Hellström (VMware) wrote: > Hi, Christian, > > On 9/4/19 9:33 AM, Koenig, Christian wrote: >> Am 03.09.19 um 23:05 schrieb Thomas Hellström (VMware): >>> On 9/3/19 10:51 PM, Dave Hansen wrote: >>>> On 9/3/19 1:36 PM, Thomas Hellström (VMware) wrote: >>>>> So the question here should really be, can we determine already at >>>>> mmap >>>>> time whether backing memory will be unencrypted and adjust the *real* >>>>> vma->vm_page_prot under the mmap_sem? >>>>> >>>>> Possibly, but that requires populating the buffer with memory at mmap >>>>> time rather than at first fault time. >>>> I'm not connecting the dots. >>>> >>>> vma->vm_page_prot is used to create a VMA's PTEs regardless of if they >>>> are created at mmap() or fault time. If we establish a good >>>> vma->vm_page_prot, can't we just use it forever for demand faults? >>> With SEV I think that we could possibly establish the encryption flags >>> at vma creation time. But thinking of it, it would actually break with >>> SME where buffer content can be moved between encrypted system memory >>> and unencrypted graphics card PCI memory behind user-space's back. >>> That would imply killing all user-space encrypted PTEs and at fault >>> time set up new ones pointing to unencrypted PCI memory.. >> Well my problem is where do you see encrypted system memory here? >> >> At least for AMD GPUs all memory accessed must be unencrypted and that >> counts for both system as well as PCI memory. > > We're talking SME now right? > > The current SME setup is that if a device's DMA mask says it's capable > of addressing the encryption bit, coherent memory will be encrypted. > The memory controllers will decrypt for the device on the fly. > Otherwise coherent memory will be decrypted. > >> >> So I don't get why we can't assume always unencrypted and keep it >> like that. > > I see two reasons. First, it would break with a real device that > signals it's capable of addressing the encryption bit. > > Second I can imagine unaccelerated setups (something like vkms using > prime feeding a VNC connection) where we actually want the TTM buffers > encrypted to protect data. > > But at least the latter reason is way far out in the future. > > So for me I'm ok with that if that works for you? Hmm, BTW, Are you sure the AMD GPUs use unencrypted system memory rather than relying on the memory controllers to decrypt? In that case it seems strange that they get away with encrypted TTM PTEs, whereas vmwgfx don't... /Thomas > > /Thomas > > >> >> Regards, >> Christian. > >
On 9/4/19 9:53 AM, Daniel Vetter wrote: > On Wed, Sep 4, 2019 at 8:49 AM Thomas Hellström (VMware) > <thomas_os@shipmail.org> wrote: >> On 9/4/19 1:15 AM, Andy Lutomirski wrote: >>> But, reading this, I have more questions: >>> >>> Can’t you get rid of cvma by using vmf_insert_pfn_prot()? >> It looks like that, although there are comments in the code about >> serious performance problems using VM_PFNMAP / vmf_insert_pfn() with >> write-combining and PAT, so that would require some serious testing with >> hardware I don't have. But I guess there is definitely room for >> improvement here. Ideally we'd like to be able to change the >> vma->vm_page_prot within fault(). But we can > Just a quick comment on this: It's the repeated (per-pfn/pte) lookup > of the PAT tables, which are dead slow. If you have a struct > io_mapping then that can be done once, and then just blindly inserted. > See remap_io_mapping in i915. > -Daniel Thanks, Daniel. Indeed looks a lot like remap_pfn_range(), but usable at fault time? /Thomas
Am 04.09.19 um 10:19 schrieb Thomas Hellström (VMware): > Hi, Christian, > > On 9/4/19 9:33 AM, Koenig, Christian wrote: >> Am 03.09.19 um 23:05 schrieb Thomas Hellström (VMware): >>> On 9/3/19 10:51 PM, Dave Hansen wrote: >>>> On 9/3/19 1:36 PM, Thomas Hellström (VMware) wrote: >>>>> So the question here should really be, can we determine already at >>>>> mmap >>>>> time whether backing memory will be unencrypted and adjust the *real* >>>>> vma->vm_page_prot under the mmap_sem? >>>>> >>>>> Possibly, but that requires populating the buffer with memory at mmap >>>>> time rather than at first fault time. >>>> I'm not connecting the dots. >>>> >>>> vma->vm_page_prot is used to create a VMA's PTEs regardless of if they >>>> are created at mmap() or fault time. If we establish a good >>>> vma->vm_page_prot, can't we just use it forever for demand faults? >>> With SEV I think that we could possibly establish the encryption flags >>> at vma creation time. But thinking of it, it would actually break with >>> SME where buffer content can be moved between encrypted system memory >>> and unencrypted graphics card PCI memory behind user-space's back. >>> That would imply killing all user-space encrypted PTEs and at fault >>> time set up new ones pointing to unencrypted PCI memory.. >> Well my problem is where do you see encrypted system memory here? >> >> At least for AMD GPUs all memory accessed must be unencrypted and that >> counts for both system as well as PCI memory. > > We're talking SME now right? > > The current SME setup is that if a device's DMA mask says it's capable > of addressing the encryption bit, coherent memory will be encrypted. > The memory controllers will decrypt for the device on the fly. > Otherwise coherent memory will be decrypted. > >> >> So I don't get why we can't assume always unencrypted and keep it >> like that. > > I see two reasons. First, it would break with a real device that > signals it's capable of addressing the encryption bit. Why? Because we don't use dma_mmap_coherent()? I've already talked with Christoph that we probably want to switch TTM over to using that instead to also get rid of the ttm_io_prot() hack. Regards, Christian. > > Second I can imagine unaccelerated setups (something like vkms using > prime feeding a VNC connection) where we actually want the TTM buffers > encrypted to protect data. > > But at least the latter reason is way far out in the future. > > So for me I'm ok with that if that works for you? > > /Thomas > > >> >> Regards, >> Christian. > >
On Wed, Sep 4, 2019 at 12:38 PM Thomas Hellström (VMware) <thomas_os@shipmail.org> wrote: > > On 9/4/19 9:53 AM, Daniel Vetter wrote: > > On Wed, Sep 4, 2019 at 8:49 AM Thomas Hellström (VMware) > > <thomas_os@shipmail.org> wrote: > >> On 9/4/19 1:15 AM, Andy Lutomirski wrote: > >>> But, reading this, I have more questions: > >>> > >>> Can’t you get rid of cvma by using vmf_insert_pfn_prot()? > >> It looks like that, although there are comments in the code about > >> serious performance problems using VM_PFNMAP / vmf_insert_pfn() with > >> write-combining and PAT, so that would require some serious testing with > >> hardware I don't have. But I guess there is definitely room for > >> improvement here. Ideally we'd like to be able to change the > >> vma->vm_page_prot within fault(). But we can > > Just a quick comment on this: It's the repeated (per-pfn/pte) lookup > > of the PAT tables, which are dead slow. If you have a struct > > io_mapping then that can be done once, and then just blindly inserted. > > See remap_io_mapping in i915. > > -Daniel > > Thanks, Daniel. > > Indeed looks a lot like remap_pfn_range(), but usable at fault time? Yeah we call it from our fault handler. It's essentially vm_insert_pfn except the pat track isn't there, but instead relies on the pat tracking io_mapping has done already. -Daniel
On 9/4/19 1:10 PM, Koenig, Christian wrote: > Am 04.09.19 um 10:19 schrieb Thomas Hellström (VMware): >> Hi, Christian, >> >> On 9/4/19 9:33 AM, Koenig, Christian wrote: >>> Am 03.09.19 um 23:05 schrieb Thomas Hellström (VMware): >>>> On 9/3/19 10:51 PM, Dave Hansen wrote: >>>>> On 9/3/19 1:36 PM, Thomas Hellström (VMware) wrote: >>>>>> So the question here should really be, can we determine already at >>>>>> mmap >>>>>> time whether backing memory will be unencrypted and adjust the *real* >>>>>> vma->vm_page_prot under the mmap_sem? >>>>>> >>>>>> Possibly, but that requires populating the buffer with memory at mmap >>>>>> time rather than at first fault time. >>>>> I'm not connecting the dots. >>>>> >>>>> vma->vm_page_prot is used to create a VMA's PTEs regardless of if they >>>>> are created at mmap() or fault time. If we establish a good >>>>> vma->vm_page_prot, can't we just use it forever for demand faults? >>>> With SEV I think that we could possibly establish the encryption flags >>>> at vma creation time. But thinking of it, it would actually break with >>>> SME where buffer content can be moved between encrypted system memory >>>> and unencrypted graphics card PCI memory behind user-space's back. >>>> That would imply killing all user-space encrypted PTEs and at fault >>>> time set up new ones pointing to unencrypted PCI memory.. >>> Well my problem is where do you see encrypted system memory here? >>> >>> At least for AMD GPUs all memory accessed must be unencrypted and that >>> counts for both system as well as PCI memory. >> We're talking SME now right? >> >> The current SME setup is that if a device's DMA mask says it's capable >> of addressing the encryption bit, coherent memory will be encrypted. >> The memory controllers will decrypt for the device on the fly. >> Otherwise coherent memory will be decrypted. >> >>> So I don't get why we can't assume always unencrypted and keep it >>> like that. >> I see two reasons. First, it would break with a real device that >> signals it's capable of addressing the encryption bit. > Why? Because we don't use dma_mmap_coherent()? Well, assuming always unencrypted would obviously break on a real device with encrypted coherent memory? dma_mmap_coherent() would work from the encryption point of view (although I think it's currently buggy and will send out an RFC for what I believe is a fix for that). > > I've already talked with Christoph that we probably want to switch TTM > over to using that instead to also get rid of the ttm_io_prot() hack. OK, would that mean us ditching other memory modes completely? And on-the-fly caching transitions? or is it just for the special case of cached coherent memory? Do we need to cache the coherent kernel mappings in TTM as well, for ttm_bo_kmap()? /Thomas > > Regards, > Christian. > >> Second I can imagine unaccelerated setups (something like vkms using >> prime feeding a VNC connection) where we actually want the TTM buffers >> encrypted to protect data. >> >> But at least the latter reason is way far out in the future. >> >> So for me I'm ok with that if that works for you? >> >> /Thomas >> >> >>> Regards, >>> Christian. >>
On 9/4/19 2:35 PM, Thomas Hellström (VMware) wrote: > >> >> I've already talked with Christoph that we probably want to switch TTM >> over to using that instead to also get rid of the ttm_io_prot() hack. > > OK, would that mean us ditching other memory modes completely? And > on-the-fly caching transitions? or is it just for the special case of > cached coherent memory? Do we need to cache the coherent kernel > mappings in TTM as well, for ttm_bo_kmap()? Reading this again, I wanted to point out that I'm not against this. Just curious. /Thomas
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index fe81c565e7ef..d5ad8f03b63f 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -419,11 +419,13 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo, page = i * dir + add; if (old_iomap == NULL) { pgprot_t prot = ttm_io_prot(old_mem->placement, + ttm->page_flags, PAGE_KERNEL); ret = ttm_copy_ttm_io_page(ttm, new_iomap, page, prot); } else if (new_iomap == NULL) { pgprot_t prot = ttm_io_prot(new_mem->placement, + ttm->page_flags, PAGE_KERNEL); ret = ttm_copy_io_ttm_page(ttm, old_iomap, page, prot); @@ -526,11 +528,11 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, return 0; } -pgprot_t ttm_io_prot(uint32_t caching_flags, pgprot_t tmp) +pgprot_t ttm_io_prot(u32 caching_flags, u32 tt_page_flags, pgprot_t tmp) { /* Cached mappings need no adjustment */ if (caching_flags & TTM_PL_FLAG_CACHED) - return tmp; + goto check_encryption; #if defined(__i386__) || defined(__x86_64__) if (caching_flags & TTM_PL_FLAG_WC) @@ -548,6 +550,11 @@ pgprot_t ttm_io_prot(uint32_t caching_flags, pgprot_t tmp) #if defined(__sparc__) tmp = pgprot_noncached(tmp); #endif + +check_encryption: + if (tt_page_flags & TTM_PAGE_FLAG_DECRYPTED) + tmp = pgprot_decrypted(tmp); + return tmp; } EXPORT_SYMBOL(ttm_io_prot); @@ -594,7 +601,8 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo, if (ret) return ret; - if (num_pages == 1 && (mem->placement & TTM_PL_FLAG_CACHED)) { + if (num_pages == 1 && (mem->placement & TTM_PL_FLAG_CACHED) && + !(ttm->page_flags & TTM_PAGE_FLAG_DECRYPTED)) { /* * We're mapping a single page, and the desired * page protection is consistent with the bo. @@ -608,7 +616,8 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo, * We need to use vmap to get the desired page protection * or to make the buffer object look contiguous. */ - prot = ttm_io_prot(mem->placement, PAGE_KERNEL); + prot = ttm_io_prot(mem->placement, ttm->page_flags, + PAGE_KERNEL); map->bo_kmap_type = ttm_bo_map_vmap; map->virtual = vmap(ttm->pages + start_page, num_pages, 0, prot); diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..194d8d618d23 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -226,12 +226,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) * by mmap_sem in write mode. */ cvma = *vma; - cvma.vm_page_prot = vm_get_page_prot(cvma.vm_flags); - - if (bo->mem.bus.is_iomem) { - cvma.vm_page_prot = ttm_io_prot(bo->mem.placement, - cvma.vm_page_prot); - } else { + if (!bo->mem.bus.is_iomem) { struct ttm_operation_ctx ctx = { .interruptible = false, .no_wait_gpu = false, @@ -240,14 +235,18 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) }; ttm = bo->ttm; - cvma.vm_page_prot = ttm_io_prot(bo->mem.placement, - cvma.vm_page_prot); - - /* Allocate all page at once, most common usage */ - if (ttm_tt_populate(ttm, &ctx)) { + if (ttm_tt_populate(bo->ttm, &ctx)) { ret = VM_FAULT_OOM; goto out_io_unlock; } + cvma.vm_page_prot = ttm_io_prot(bo->mem.placement, + ttm->page_flags, + cvma.vm_page_prot); + } else { + /* Iomem should not be marked encrypted */ + cvma.vm_page_prot = ttm_io_prot(bo->mem.placement, + TTM_PAGE_FLAG_DECRYPTED, + cvma.vm_page_prot); } /* diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index 7d78e6deac89..9b15df8ecd49 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -48,6 +48,7 @@ #include <linux/atomic.h> #include <linux/device.h> #include <linux/kthread.h> +#include <linux/dma-direct.h> #include <drm/ttm/ttm_bo_driver.h> #include <drm/ttm/ttm_page_alloc.h> #include <drm/ttm/ttm_set_memory.h> @@ -984,6 +985,9 @@ int ttm_dma_populate(struct ttm_dma_tt *ttm_dma, struct device *dev, } ttm->state = tt_unbound; + if (force_dma_unencrypted(dev)) + ttm->page_flags |= TTM_PAGE_FLAG_DECRYPTED; + return 0; } EXPORT_SYMBOL_GPL(ttm_dma_populate); diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c index bb46ca0c458f..d3ced89a37e9 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c @@ -483,8 +483,10 @@ int vmw_bo_cpu_blit(struct ttm_buffer_object *dst, d.src_pages = src->ttm->pages; d.dst_num_pages = dst->num_pages; d.src_num_pages = src->num_pages; - d.dst_prot = ttm_io_prot(dst->mem.placement, PAGE_KERNEL); - d.src_prot = ttm_io_prot(src->mem.placement, PAGE_KERNEL); + d.dst_prot = ttm_io_prot(dst->mem.placement, dst->ttm->page_flags, + PAGE_KERNEL); + d.src_prot = ttm_io_prot(src->mem.placement, src->ttm->page_flags, + PAGE_KERNEL); d.diff = diff; for (j = 0; j < h; ++j) { diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 6f536caea368..68ead1bd3042 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -893,13 +893,15 @@ int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo); /** * ttm_io_prot * - * @c_state: Caching state. + * @caching_flags: The caching flags of the map. + * @tt_page_flags: The tt_page_flags of the map, TTM_PAGE_FLAG_* * @tmp: Page protection flag for a normal, cached mapping. * * Utility function that returns the pgprot_t that should be used for - * setting up a PTE with the caching model indicated by @c_state. + * setting up a PTE with the caching model indicated by @caching_flags, + * and encryption state indicated by @tt_page_flags, */ -pgprot_t ttm_io_prot(uint32_t caching_flags, pgprot_t tmp); +pgprot_t ttm_io_prot(u32 caching_flags, u32 tt_page_flags, pgprot_t tmp); extern const struct ttm_mem_type_manager_func ttm_bo_manager_func; diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h index c0e928abf592..45cc26355513 100644 --- a/include/drm/ttm/ttm_tt.h +++ b/include/drm/ttm/ttm_tt.h @@ -41,6 +41,7 @@ struct ttm_operation_ctx; #define TTM_PAGE_FLAG_DMA32 (1 << 7) #define TTM_PAGE_FLAG_SG (1 << 8) #define TTM_PAGE_FLAG_NO_RETRY (1 << 9) +#define TTM_PAGE_FLAG_DECRYPTED (1 << 10) enum ttm_caching_state { tt_uncached,