diff mbox series

RFC: dma-buf: Require VM_SPECIAL vma for mmap

Message ID 20210203211948.2529297-1-daniel.vetter@ffwll.ch (mailing list archive)
State New, archived
Headers show
Series RFC: dma-buf: Require VM_SPECIAL vma for mmap | expand

Commit Message

Daniel Vetter Feb. 3, 2021, 9:19 p.m. UTC
tldr; DMA buffers aren't normal memory, expecting that you can use
them like that (like calling get_user_pages works, or that they're
accounting like any other normal memory) cannot be guaranteed.

Since some userspace only runs on integrated devices, where all
buffers are actually all resident system memory, there's a huge
temptation to assume that a struct page is always present and useable
like for any more pagecache backed mmap. This has the potential to
result in a uapi nightmare.

To stop this gap require that DMA buffer mmaps are VM_SPECIAL, which
blocks get_user_pages and all the other struct page based
infrastructure for everyone. In spirit this is the uapi counterpart to
the kernel-internal CONFIG_DMABUF_DEBUG.

Motivated by a recent patch which wanted to swich the system dma-buf
heap to vm_insert_page instead of vm_insert_pfn.

References: https://lore.kernel.org/lkml/CAKMK7uHi+mG0z0HUmNt13QCCvutuRVjpcR0NjRL12k-WbWzkRg@mail.gmail.com/
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/dma-buf/dma-buf.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

Comments

Jason Gunthorpe Feb. 4, 2021, 4:13 p.m. UTC | #1
On Wed, Feb 03, 2021 at 10:19:48PM +0100, Daniel Vetter wrote:
> tldr; DMA buffers aren't normal memory, expecting that you can use
> them like that (like calling get_user_pages works, or that they're
> accounting like any other normal memory) cannot be guaranteed.
> 
> Since some userspace only runs on integrated devices, where all
> buffers are actually all resident system memory, there's a huge
> temptation to assume that a struct page is always present and useable
> like for any more pagecache backed mmap. This has the potential to
> result in a uapi nightmare.
> 
> To stop this gap require that DMA buffer mmaps are VM_SPECIAL, which
> blocks get_user_pages and all the other struct page based
> infrastructure for everyone. In spirit this is the uapi counterpart to
> the kernel-internal CONFIG_DMABUF_DEBUG.

Fast gup needs the special flag set on the PTE as well.. Feels weird
to have a special VMA without also having special PTEs?

Jason
Daniel Vetter Feb. 4, 2021, 5:16 p.m. UTC | #2
On Thu, Feb 4, 2021 at 5:13 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Wed, Feb 03, 2021 at 10:19:48PM +0100, Daniel Vetter wrote:
> > tldr; DMA buffers aren't normal memory, expecting that you can use
> > them like that (like calling get_user_pages works, or that they're
> > accounting like any other normal memory) cannot be guaranteed.
> >
> > Since some userspace only runs on integrated devices, where all
> > buffers are actually all resident system memory, there's a huge
> > temptation to assume that a struct page is always present and useable
> > like for any more pagecache backed mmap. This has the potential to
> > result in a uapi nightmare.
> >
> > To stop this gap require that DMA buffer mmaps are VM_SPECIAL, which
> > blocks get_user_pages and all the other struct page based
> > infrastructure for everyone. In spirit this is the uapi counterpart to
> > the kernel-internal CONFIG_DMABUF_DEBUG.
>
> Fast gup needs the special flag set on the PTE as well.. Feels weird
> to have a special VMA without also having special PTEs?

There's kinda no convenient & cheap way to check for the pte_special
flag. This here should at least catch accidental misuse, people
building their own ptes we can't stop. Maybe we should exclude
VM_MIXEDMAP to catch vm_insert_page in one of these.

Hm looking at code I think we need to require VM_PFNMAP here to stop
vm_insert_page. And looking at the various functions, that seems to be
required (and I guess VM_IO is more for really funky architectures
where io-space is somewhere else?). I guess I should check for
VM_PFNMAP instead of VM_SPECIAL?
-Daniel
Jason Gunthorpe Feb. 4, 2021, 6:38 p.m. UTC | #3
On Thu, Feb 04, 2021 at 06:16:27PM +0100, Daniel Vetter wrote:
> On Thu, Feb 4, 2021 at 5:13 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Wed, Feb 03, 2021 at 10:19:48PM +0100, Daniel Vetter wrote:
> > > tldr; DMA buffers aren't normal memory, expecting that you can use
> > > them like that (like calling get_user_pages works, or that they're
> > > accounting like any other normal memory) cannot be guaranteed.
> > >
> > > Since some userspace only runs on integrated devices, where all
> > > buffers are actually all resident system memory, there's a huge
> > > temptation to assume that a struct page is always present and useable
> > > like for any more pagecache backed mmap. This has the potential to
> > > result in a uapi nightmare.
> > >
> > > To stop this gap require that DMA buffer mmaps are VM_SPECIAL, which
> > > blocks get_user_pages and all the other struct page based
> > > infrastructure for everyone. In spirit this is the uapi counterpart to
> > > the kernel-internal CONFIG_DMABUF_DEBUG.
> >
> > Fast gup needs the special flag set on the PTE as well.. Feels weird
> > to have a special VMA without also having special PTEs?
> 
> There's kinda no convenient & cheap way to check for the pte_special
> flag. This here should at least catch accidental misuse, people
> building their own ptes we can't stop. Maybe we should exclude
> VM_MIXEDMAP to catch vm_insert_page in one of these.
> 
> Hm looking at code I think we need to require VM_PFNMAP here to stop
> vm_insert_page. And looking at the various functions, that seems to be
> required (and I guess VM_IO is more for really funky architectures
> where io-space is somewhere else?). I guess I should check for
> VM_PFNMAP instead of VM_SPECIAL?

Well, you said the goal was to block GUP usage, that won't happen
without the PTE special flag, at least on x86

So, really, what you are saying is all dmabuf users should always use
vmf_insert_pfn_prot() or something similar - and never insert_page/etc?

It might make sense to check the vma flags in all the insert paths, eg
vm_insert_page() can't work with VMAs that should not have struct
pages in them (eg VM_SPECIAl, VM_PFNMAP, !VM_MIXEMAP if I understand
it right)

At least as some VM debug option

Jason
Daniel Vetter Feb. 4, 2021, 7:59 p.m. UTC | #4
On Thu, Feb 4, 2021 at 7:38 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Feb 04, 2021 at 06:16:27PM +0100, Daniel Vetter wrote:
> > On Thu, Feb 4, 2021 at 5:13 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > On Wed, Feb 03, 2021 at 10:19:48PM +0100, Daniel Vetter wrote:
> > > > tldr; DMA buffers aren't normal memory, expecting that you can use
> > > > them like that (like calling get_user_pages works, or that they're
> > > > accounting like any other normal memory) cannot be guaranteed.
> > > >
> > > > Since some userspace only runs on integrated devices, where all
> > > > buffers are actually all resident system memory, there's a huge
> > > > temptation to assume that a struct page is always present and useable
> > > > like for any more pagecache backed mmap. This has the potential to
> > > > result in a uapi nightmare.
> > > >
> > > > To stop this gap require that DMA buffer mmaps are VM_SPECIAL, which
> > > > blocks get_user_pages and all the other struct page based
> > > > infrastructure for everyone. In spirit this is the uapi counterpart to
> > > > the kernel-internal CONFIG_DMABUF_DEBUG.
> > >
> > > Fast gup needs the special flag set on the PTE as well.. Feels weird
> > > to have a special VMA without also having special PTEs?
> >
> > There's kinda no convenient & cheap way to check for the pte_special
> > flag. This here should at least catch accidental misuse, people
> > building their own ptes we can't stop. Maybe we should exclude
> > VM_MIXEDMAP to catch vm_insert_page in one of these.
> >
> > Hm looking at code I think we need to require VM_PFNMAP here to stop
> > vm_insert_page. And looking at the various functions, that seems to be
> > required (and I guess VM_IO is more for really funky architectures
> > where io-space is somewhere else?). I guess I should check for
> > VM_PFNMAP instead of VM_SPECIAL?
>
> Well, you said the goal was to block GUP usage, that won't happen
> without the PTE special flag, at least on x86
>
> So, really, what you are saying is all dmabuf users should always use
> vmf_insert_pfn_prot() or something similar - and never insert_page/etc?
>
> It might make sense to check the vma flags in all the insert paths, eg
> vm_insert_page() can't work with VMAs that should not have struct
> pages in them (eg VM_SPECIAl, VM_PFNMAP, !VM_MIXEMAP if I understand
> it right)

Well that's what I've done, and it /looks/ like all the checks are
there already, as long as we use VM_PFNMAP. vm_insert_page tries to
auto-add VM_MIXEDMAP, but bails out with a BUG_ON if VM_PFNMAP is set.
And all the vm_insert_pfn_prot/remap_pfn_range functions require (or
set) VM_PFNMAP.

So I think just checking for VM_PFNMAP after the vma is set up should
be enough to guarantee we'll only have pte_special ptes in there,
ever. But I'm not sure, this stuff all isn't really documented much
and the code is sometimes a maze (to me at least).

> At least as some VM debug option

Seems to be there already unconditionally.
-Daniel
Jason Gunthorpe Feb. 4, 2021, 8:09 p.m. UTC | #5
On Thu, Feb 04, 2021 at 08:59:59PM +0100, Daniel Vetter wrote:

> So I think just checking for VM_PFNMAP after the vma is set up should
> be enough to guarantee we'll only have pte_special ptes in there,
> ever. But I'm not sure, this stuff all isn't really documented much
> and the code is sometimes a maze (to me at least).

Yes, that makes sense. VM_PFNMAP and !VM_MIXEDMAP seems like the right
check after the VMA is populated

But how do you stuff special pfns into a VMA outside the fault
handler?

Jason
Daniel Vetter Feb. 4, 2021, 8:19 p.m. UTC | #6
On Thu, Feb 4, 2021 at 9:09 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Feb 04, 2021 at 08:59:59PM +0100, Daniel Vetter wrote:
>
> > So I think just checking for VM_PFNMAP after the vma is set up should
> > be enough to guarantee we'll only have pte_special ptes in there,
> > ever. But I'm not sure, this stuff all isn't really documented much
> > and the code is sometimes a maze (to me at least).
>
> Yes, that makes sense. VM_PFNMAP and !VM_MIXEDMAP seems like the right
> check after the VMA is populated
>
> But how do you stuff special pfns into a VMA outside the fault
> handler?

Many drivers we have don't have dynamic buffer management (kinda
overkill for a few framebuffers on a display-only IP block), so the
just remap_pfn_range on ->mmap, and don't have a fault handler at all.

Or I'm not understanding what you're asking?
-Daniel
Jason Gunthorpe Feb. 4, 2021, 8:59 p.m. UTC | #7
On Thu, Feb 04, 2021 at 09:19:57PM +0100, Daniel Vetter wrote:
> On Thu, Feb 4, 2021 at 9:09 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Thu, Feb 04, 2021 at 08:59:59PM +0100, Daniel Vetter wrote:
> >
> > > So I think just checking for VM_PFNMAP after the vma is set up should
> > > be enough to guarantee we'll only have pte_special ptes in there,
> > > ever. But I'm not sure, this stuff all isn't really documented much
> > > and the code is sometimes a maze (to me at least).
> >
> > Yes, that makes sense. VM_PFNMAP and !VM_MIXEDMAP seems like the right
> > check after the VMA is populated
> >
> > But how do you stuff special pfns into a VMA outside the fault
> > handler?
> 
> Many drivers we have don't have dynamic buffer management (kinda
> overkill for a few framebuffers on a display-only IP block), so the
> just remap_pfn_range on ->mmap, and don't have a fault handler at all.

remap_pfn_range() makes sense, do you expect drivers using struct page
backed memory to call that as well?

Jason
Christian König Feb. 5, 2021, 8:05 a.m. UTC | #8
Am 04.02.21 um 19:38 schrieb Jason Gunthorpe:
> On Thu, Feb 04, 2021 at 06:16:27PM +0100, Daniel Vetter wrote:
>> On Thu, Feb 4, 2021 at 5:13 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>> On Wed, Feb 03, 2021 at 10:19:48PM +0100, Daniel Vetter wrote:
>>>> tldr; DMA buffers aren't normal memory, expecting that you can use
>>>> them like that (like calling get_user_pages works, or that they're
>>>> accounting like any other normal memory) cannot be guaranteed.
>>>>
>>>> Since some userspace only runs on integrated devices, where all
>>>> buffers are actually all resident system memory, there's a huge
>>>> temptation to assume that a struct page is always present and useable
>>>> like for any more pagecache backed mmap. This has the potential to
>>>> result in a uapi nightmare.
>>>>
>>>> To stop this gap require that DMA buffer mmaps are VM_SPECIAL, which
>>>> blocks get_user_pages and all the other struct page based
>>>> infrastructure for everyone. In spirit this is the uapi counterpart to
>>>> the kernel-internal CONFIG_DMABUF_DEBUG.
>>> Fast gup needs the special flag set on the PTE as well.. Feels weird
>>> to have a special VMA without also having special PTEs?
>> There's kinda no convenient & cheap way to check for the pte_special
>> flag. This here should at least catch accidental misuse, people
>> building their own ptes we can't stop. Maybe we should exclude
>> VM_MIXEDMAP to catch vm_insert_page in one of these.
>>
>> Hm looking at code I think we need to require VM_PFNMAP here to stop
>> vm_insert_page. And looking at the various functions, that seems to be
>> required (and I guess VM_IO is more for really funky architectures
>> where io-space is somewhere else?). I guess I should check for
>> VM_PFNMAP instead of VM_SPECIAL?
> Well, you said the goal was to block GUP usage, that won't happen
> without the PTE special flag, at least on x86

When is that special flag being set?

> So, really, what you are saying is all dmabuf users should always use
> vmf_insert_pfn_prot() or something similar - and never insert_page/etc?

Exactly, yes.

Christian.

> It might make sense to check the vma flags in all the insert paths, eg
> vm_insert_page() can't work with VMAs that should not have struct
> pages in them (eg VM_SPECIAl, VM_PFNMAP, !VM_MIXEMAP if I understand
> it right)
>
> At least as some VM debug option
>
> Jason
Daniel Vetter Feb. 5, 2021, 9:14 a.m. UTC | #9
On Thu, Feb 4, 2021 at 9:59 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Feb 04, 2021 at 09:19:57PM +0100, Daniel Vetter wrote:
> > On Thu, Feb 4, 2021 at 9:09 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Thu, Feb 04, 2021 at 08:59:59PM +0100, Daniel Vetter wrote:
> > >
> > > > So I think just checking for VM_PFNMAP after the vma is set up should
> > > > be enough to guarantee we'll only have pte_special ptes in there,
> > > > ever. But I'm not sure, this stuff all isn't really documented much
> > > > and the code is sometimes a maze (to me at least).
> > >
> > > Yes, that makes sense. VM_PFNMAP and !VM_MIXEDMAP seems like the right
> > > check after the VMA is populated
> > >
> > > But how do you stuff special pfns into a VMA outside the fault
> > > handler?
> >
> > Many drivers we have don't have dynamic buffer management (kinda
> > overkill for a few framebuffers on a display-only IP block), so the
> > just remap_pfn_range on ->mmap, and don't have a fault handler at all.
>
> remap_pfn_range() makes sense, do you expect drivers using struct page
> backed memory to call that as well?

All the ones using CMA through dma_alloc_coherent end up in there with
the dma_mmap_wc function. So yeah we have tons already.

The drivers with dynamic memory management all use vm_insert_pfn, even
when the buffer is in system memory and struct page backed. I think
those are the two cases. There's another mmap in drm/i915, but that
should never leave intel-specific userspace, and I think we're also
phasing it out somewhat. Either way, should never show up in a shared
buffer usecase, ever, so I think we can ignore it.
-Daniel
Christian König Feb. 5, 2021, 1:42 p.m. UTC | #10
Am 05.02.21 um 14:41 schrieb Daniel Vetter:
> tldr; DMA buffers aren't normal memory, expecting that you can use
> them like that (like calling get_user_pages works, or that they're
> accounting like any other normal memory) cannot be guaranteed.
>
> Since some userspace only runs on integrated devices, where all
> buffers are actually all resident system memory, there's a huge
> temptation to assume that a struct page is always present and useable
> like for any more pagecache backed mmap. This has the potential to
> result in a uapi nightmare.
>
> To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> blocks get_user_pages and all the other struct page based
> infrastructure for everyone. In spirit this is the uapi counterpart to
> the kernel-internal CONFIG_DMABUF_DEBUG.
>
> Motivated by a recent patch which wanted to swich the system dma-buf
> heap to vm_insert_page instead of vm_insert_pfn.
>
> v2:
>
> Jason brought up that we also want to guarantee that all ptes have the
> pte_special flag set, to catch fast get_user_pages (on architectures
> that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
>
>  From auditing the various functions to insert pfn pte entires
> (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> this should be the correct flag to check for.
>
> References: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2FCAKMK7uHi%2BmG0z0HUmNt13QCCvutuRVjpcR0NjRL12k-WbWzkRg%40mail.gmail.com%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Ca8634bb0be8d4b0de8c108d8c9dbb81c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481292814972959%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=8rylgsXhWvQT5zwt1Sc2Nb2IQR%2Fkd16ErIHfZ4PErZI%3D&amp;reserved=0
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: John Stultz <john.stultz@linaro.org>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: linux-media@vger.kernel.org
> Cc: linaro-mm-sig@lists.linaro.org

Acked-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/dma-buf/dma-buf.c | 15 +++++++++++++--
>   1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index f264b70c383e..06cb1d2e9fdc 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -127,6 +127,7 @@ static struct file_system_type dma_buf_fs_type = {
>   static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
>   {
>   	struct dma_buf *dmabuf;
> +	int ret;
>   
>   	if (!is_dma_buf_file(file))
>   		return -EINVAL;
> @@ -142,7 +143,11 @@ static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
>   	    dmabuf->size >> PAGE_SHIFT)
>   		return -EINVAL;
>   
> -	return dmabuf->ops->mmap(dmabuf, vma);
> +	ret = dmabuf->ops->mmap(dmabuf, vma);
> +
> +	WARN_ON(!(vma->vm_flags & VM_PFNMAP));
> +
> +	return ret;
>   }
>   
>   static loff_t dma_buf_llseek(struct file *file, loff_t offset, int whence)
> @@ -1244,6 +1249,8 @@ EXPORT_SYMBOL_GPL(dma_buf_end_cpu_access);
>   int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
>   		 unsigned long pgoff)
>   {
> +	int ret;
> +
>   	if (WARN_ON(!dmabuf || !vma))
>   		return -EINVAL;
>   
> @@ -1264,7 +1271,11 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
>   	vma_set_file(vma, dmabuf->file);
>   	vma->vm_pgoff = pgoff;
>   
> -	return dmabuf->ops->mmap(dmabuf, vma);
> +	ret = dmabuf->ops->mmap(dmabuf, vma);
> +
> +	WARN_ON(!(vma->vm_flags & VM_PFNMAP));
> +
> +	return ret;
>   }
>   EXPORT_SYMBOL_GPL(dma_buf_mmap);
>
diff mbox series

Patch

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index f264b70c383e..d3081fc07056 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -127,6 +127,7 @@  static struct file_system_type dma_buf_fs_type = {
 static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
 {
 	struct dma_buf *dmabuf;
+	int ret;
 
 	if (!is_dma_buf_file(file))
 		return -EINVAL;
@@ -142,7 +143,11 @@  static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
 	    dmabuf->size >> PAGE_SHIFT)
 		return -EINVAL;
 
-	return dmabuf->ops->mmap(dmabuf, vma);
+	ret = dmabuf->ops->mmap(dmabuf, vma);
+
+	WARN_ON(!(vma->vm_flags & VM_SPECIAL));
+
+	return ret;
 }
 
 static loff_t dma_buf_llseek(struct file *file, loff_t offset, int whence)
@@ -1244,6 +1249,8 @@  EXPORT_SYMBOL_GPL(dma_buf_end_cpu_access);
 int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
 		 unsigned long pgoff)
 {
+	int ret;
+
 	if (WARN_ON(!dmabuf || !vma))
 		return -EINVAL;
 
@@ -1264,7 +1271,11 @@  int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
 	vma_set_file(vma, dmabuf->file);
 	vma->vm_pgoff = pgoff;
 
-	return dmabuf->ops->mmap(dmabuf, vma);
+	ret = dmabuf->ops->mmap(dmabuf, vma);
+
+	WARN_ON(!(vma->vm_flags & VM_SPECIAL));
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(dma_buf_mmap);