diff mbox series

[v2,09/17] mm: Add unsafe_follow_pfn

Message ID 20201009075934.3509076-10-daniel.vetter@ffwll.ch
State New, archived
Headers show
Series follow_pfn and other iomap races | expand

Commit Message

Daniel Vetter Oct. 9, 2020, 7:59 a.m. UTC
Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea.

Unfortunately there's some users where this is not fixable (like v4l
userptr of iomem mappings) or involves a pile of work (vfio type1
iommu). For now annotate these as unsafe and splat appropriately.

This patch adds an unsafe_follow_pfn, which later patches will then
roll out to all appropriate places.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: kvm@vger.kernel.org
---
 include/linux/mm.h |  2 ++
 mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
 mm/nommu.c         | 17 +++++++++++++++++
 security/Kconfig   | 13 +++++++++++++
 4 files changed, 63 insertions(+), 1 deletion(-)

Comments

Mauro Carvalho Chehab Oct. 9, 2020, 10:34 a.m. UTC | #1
Hi,

Em Fri,  9 Oct 2020 09:59:26 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> Way back it was a reasonable assumptions that iomem mappings never
> change the pfn range they point at. But this has changed:
> 
> - gpu drivers dynamically manage their memory nowadays, invalidating
> ptes with unmap_mapping_range when buffers get moved
> 
> - contiguous dma allocations have moved from dedicated carvetouts to
> cma regions. This means if we miss the unmap the pfn might contain
> pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> 
> - even /dev/mem now invalidates mappings when the kernel requests that
> iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> ("/dev/mem: Revoke mappings when a driver claims the region")
> 
> Accessing pfns obtained from ptes without holding all the locks is
> therefore no longer a good idea.
> 
> Unfortunately there's some users where this is not fixable (like v4l
> userptr of iomem mappings) or involves a pile of work (vfio type1
> iommu). For now annotate these as unsafe and splat appropriately.
> 
> This patch adds an unsafe_follow_pfn, which later patches will then
> roll out to all appropriate places.

NACK, as this breaks an existing userspace API on media.

While I agree that using the userptr on media is something that
new drivers may not support, as DMABUF is a better way of
handling it, changing this for existing ones is a big no, 
as it may break usersapace.

The right approach here would be to be able to fine-tune
support for it on a per-driver basis, e. g. disabling such
feature only for drivers that would use a movable page.

The media subsystem has already a way to disable USERPTR
support from VB2. the right approach would be to ensure
that newer drivers will only set this if they won't use
movable pages.

Regards,
Mauro

> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: kvm@vger.kernel.org
> ---
>  include/linux/mm.h |  2 ++
>  mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
>  mm/nommu.c         | 17 +++++++++++++++++
>  security/Kconfig   | 13 +++++++++++++
>  4 files changed, 63 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2a16631c1fda..ec8c90928fc9 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1653,6 +1653,8 @@ int follow_pte_pmd(struct mm_struct *mm, unsigned long address,
>  		   pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp);
>  int follow_pfn(struct vm_area_struct *vma, unsigned long address,
>  	unsigned long *pfn);
> +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> +		      unsigned long *pfn);
>  int follow_phys(struct vm_area_struct *vma, unsigned long address,
>  		unsigned int flags, unsigned long *prot, resource_size_t *phys);
>  int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
> diff --git a/mm/memory.c b/mm/memory.c
> index f7cbc4dde0ef..7c7b234ffb24 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4821,7 +4821,12 @@ EXPORT_SYMBOL(follow_pte_pmd);
>   * @address: user virtual address
>   * @pfn: location to store found PFN
>   *
> - * Only IO mappings and raw PFN mappings are allowed.
> + * Only IO mappings and raw PFN mappings are allowed. Note that callers must
> + * ensure coherency with pte updates by using a &mmu_notifier to follow updates.
> + * If this is not feasible, or the access to the @pfn is only very short term,
> + * use follow_pte_pmd() instead and hold the pagetable lock for the duration of
> + * the access instead. Any caller not following these requirements must use
> + * unsafe_follow_pfn() instead.
>   *
>   * Return: zero and the pfn at @pfn on success, -ve otherwise.
>   */
> @@ -4844,6 +4849,31 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
>  }
>  EXPORT_SYMBOL(follow_pfn);
>  
> +/**
> + * unsafe_follow_pfn - look up PFN at a user virtual address
> + * @vma: memory mapping
> + * @address: user virtual address
> + * @pfn: location to store found PFN
> + *
> + * Only IO mappings and raw PFN mappings are allowed.
> + *
> + * Returns zero and the pfn at @pfn on success, -ve otherwise.
> + */
> +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> +	unsigned long *pfn)
> +{
> +#ifdef CONFIG_STRICT_FOLLOW_PFN
> +	pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
> +	return -EINVAL;
> +#else
> +	WARN_ONCE(1, "unsafe follow_pfn usage\n");
> +	add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> +
> +	return follow_pfn(vma, address, pfn);
> +#endif
> +}
> +EXPORT_SYMBOL(unsafe_follow_pfn);
> +
>  #ifdef CONFIG_HAVE_IOREMAP_PROT
>  int follow_phys(struct vm_area_struct *vma,
>  		unsigned long address, unsigned int flags,
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 75a327149af1..3db2910f0d64 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -132,6 +132,23 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
>  }
>  EXPORT_SYMBOL(follow_pfn);
>  
> +/**
> + * unsafe_follow_pfn - look up PFN at a user virtual address
> + * @vma: memory mapping
> + * @address: user virtual address
> + * @pfn: location to store found PFN
> + *
> + * Only IO mappings and raw PFN mappings are allowed.
> + *
> + * Returns zero and the pfn at @pfn on success, -ve otherwise.
> + */
> +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> +	unsigned long *pfn)
> +{
> +	return follow_pfn(vma, address, pfn);
> +}
> +EXPORT_SYMBOL(unsafe_follow_pfn);
> +
>  LIST_HEAD(vmap_area_list);
>  
>  void vfree(const void *addr)
> diff --git a/security/Kconfig b/security/Kconfig
> index 7561f6f99f1d..48945402e103 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
>  	  If you wish for all usermode helper programs to be disabled,
>  	  specify an empty string here (i.e. "").
>  
> +config STRICT_FOLLOW_PFN
> +	bool "Disable unsafe use of follow_pfn"
> +	depends on MMU
> +	help
> +	  Some functionality in the kernel follows userspace mappings to iomem
> +	  ranges in an unsafe matter. Examples include v4l userptr for zero-copy
> +	  buffers sharing.
> +
> +	  If this option is switched on, such access is rejected. Only enable
> +	  this option when you must run userspace which requires this.
> +
> +	  If in doubt, say Y.
> +
>  source "security/selinux/Kconfig"
>  source "security/smack/Kconfig"
>  source "security/tomoyo/Kconfig"



Thanks,
Mauro
Jason Gunthorpe Oct. 9, 2020, 12:21 p.m. UTC | #2
On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:
> Hi,
> 
> Em Fri,  9 Oct 2020 09:59:26 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> 
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> > 
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> > ptes with unmap_mapping_range when buffers get moved
> > 
> > - contiguous dma allocations have moved from dedicated carvetouts to
> > cma regions. This means if we miss the unmap the pfn might contain
> > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > 
> > - even /dev/mem now invalidates mappings when the kernel requests that
> > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > ("/dev/mem: Revoke mappings when a driver claims the region")
> > 
> > Accessing pfns obtained from ptes without holding all the locks is
> > therefore no longer a good idea.
> > 
> > Unfortunately there's some users where this is not fixable (like v4l
> > userptr of iomem mappings) or involves a pile of work (vfio type1
> > iommu). For now annotate these as unsafe and splat appropriately.
> > 
> > This patch adds an unsafe_follow_pfn, which later patches will then
> > roll out to all appropriate places.
> 
> NACK, as this breaks an existing userspace API on media.

It doesn't break it. You get a big warning the thing is broken and it
keeps working.

We can't leave such a huge security hole open - it impacts other
subsystems, distros need to be able to run in a secure mode.

> While I agree that using the userptr on media is something that
> new drivers may not support, as DMABUF is a better way of
> handling it, changing this for existing ones is a big no, 
> as it may break usersapace.

media community needs to work to fix this, not pretend it is OK to
keep going as-is.

Dealing with security issues is the one case where an uABI break might
be acceptable.

If you want to NAK it then you need to come up with the work to do
something here correctly that will support the old drivers without the
kernel taint.

Unfortunately making things uncomfortable for the subsystem is the big
hammer the core kernel needs to use to actually get this security work
done by those responsible.

Jason
Mauro Carvalho Chehab Oct. 9, 2020, 12:37 p.m. UTC | #3
Em Fri, 9 Oct 2020 09:21:11 -0300
Jason Gunthorpe <jgg@ziepe.ca> escreveu:

> On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:
> > Hi,
> > 
> > Em Fri,  9 Oct 2020 09:59:26 +0200
> > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> >   
> > > Way back it was a reasonable assumptions that iomem mappings never
> > > change the pfn range they point at. But this has changed:
> > > 
> > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > ptes with unmap_mapping_range when buffers get moved
> > > 
> > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > cma regions. This means if we miss the unmap the pfn might contain
> > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > > 
> > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > > 
> > > Accessing pfns obtained from ptes without holding all the locks is
> > > therefore no longer a good idea.
> > > 
> > > Unfortunately there's some users where this is not fixable (like v4l
> > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > iommu). For now annotate these as unsafe and splat appropriately.
> > > 
> > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > roll out to all appropriate places.  
> > 
> > NACK, as this breaks an existing userspace API on media.  
> 
> It doesn't break it. You get a big warning the thing is broken and it
> keeps working.
> 
> We can't leave such a huge security hole open - it impacts other
> subsystems, distros need to be able to run in a secure mode.

Well, if distros disable it, then apps will break.

> > While I agree that using the userptr on media is something that
> > new drivers may not support, as DMABUF is a better way of
> > handling it, changing this for existing ones is a big no, 
> > as it may break usersapace.  
> 
> media community needs to work to fix this, not pretend it is OK to
> keep going as-is.

> Dealing with security issues is the one case where an uABI break might
> be acceptable.
> 
> If you want to NAK it then you need to come up with the work to do
> something here correctly that will support the old drivers without the
> kernel taint.
> 
> Unfortunately making things uncomfortable for the subsystem is the big
> hammer the core kernel needs to use to actually get this security work
> done by those responsible.


I'm not pretending that this is ok. Just pointing that the approach
taken is NOT OK.

I'm not a mm/ expert, but, from what I understood from Daniel's patch
description is that this is unsafe *only if*  __GFP_MOVABLE is used.

Well, no drivers inside the media subsystem uses such flag, although
they may rely on some infrastructure that could be using it behind
the bars.

If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE 
flag that it would be denying the core mm code to set __GFP_MOVABLE.

Please let address the issue on this way, instead of broken an
userspace API that it is there since 1991.

Thanks,
Mauro
Mauro Carvalho Chehab Oct. 9, 2020, 12:39 p.m. UTC | #4
Em Fri, 9 Oct 2020 14:37:23 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:

> Em Fri, 9 Oct 2020 09:21:11 -0300
> Jason Gunthorpe <jgg@ziepe.ca> escreveu:
> 
> > On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:  
> > > Hi,
> > > 
> > > Em Fri,  9 Oct 2020 09:59:26 +0200
> > > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> > >     
> > > > Way back it was a reasonable assumptions that iomem mappings never
> > > > change the pfn range they point at. But this has changed:
> > > > 
> > > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > > ptes with unmap_mapping_range when buffers get moved
> > > > 
> > > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > > cma regions. This means if we miss the unmap the pfn might contain
> > > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > > > 
> > > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > > > 
> > > > Accessing pfns obtained from ptes without holding all the locks is
> > > > therefore no longer a good idea.
> > > > 
> > > > Unfortunately there's some users where this is not fixable (like v4l
> > > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > > iommu). For now annotate these as unsafe and splat appropriately.
> > > > 
> > > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > > roll out to all appropriate places.    
> > > 
> > > NACK, as this breaks an existing userspace API on media.    
> > 
> > It doesn't break it. You get a big warning the thing is broken and it
> > keeps working.
> > 
> > We can't leave such a huge security hole open - it impacts other
> > subsystems, distros need to be able to run in a secure mode.  
> 
> Well, if distros disable it, then apps will break.
> 
> > > While I agree that using the userptr on media is something that
> > > new drivers may not support, as DMABUF is a better way of
> > > handling it, changing this for existing ones is a big no, 
> > > as it may break usersapace.    
> > 
> > media community needs to work to fix this, not pretend it is OK to
> > keep going as-is.  
> 
> > Dealing with security issues is the one case where an uABI break might
> > be acceptable.
> > 
> > If you want to NAK it then you need to come up with the work to do
> > something here correctly that will support the old drivers without the
> > kernel taint.
> > 
> > Unfortunately making things uncomfortable for the subsystem is the big
> > hammer the core kernel needs to use to actually get this security work
> > done by those responsible.  
> 
> 
> I'm not pretending that this is ok. Just pointing that the approach
> taken is NOT OK.
> 
> I'm not a mm/ expert, but, from what I understood from Daniel's patch
> description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> 
> Well, no drivers inside the media subsystem uses such flag, although
> they may rely on some infrastructure that could be using it behind
> the bars.
> 
> If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE 
> flag that it would be denying the core mm code to set __GFP_MOVABLE.
> 
> Please let address the issue on this way, instead of broken an
> userspace API that it is there since 1991.

In time: I meant to say 1998.

Thanks,
Mauro
Jason Gunthorpe Oct. 9, 2020, 12:48 p.m. UTC | #5
On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:

> I'm not a mm/ expert, but, from what I understood from Daniel's patch
> description is that this is unsafe *only if*  __GFP_MOVABLE is used.

No, it is unconditionally unsafe. The CMA movable mappings are
specific VMAs that will have bad issues here, but there are other
types too.

The only way to do something at a VMA level is to have a list of OK
VMAs, eg because they were creatd via a special mmap helper from the
media subsystem.

> Well, no drivers inside the media subsystem uses such flag, although
> they may rely on some infrastructure that could be using it behind
> the bars.

It doesn't matter, nothing prevents the user from calling media APIs
on mmaps it gets from other subsystems.

> If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> flag that it would be denying the core mm code to set __GFP_MOVABLE.

We can't tell from the VMA these kinds of details..

It has to go the other direction, evey mmap that might be used as a
userptr here has to be found and the VMA specially created to allow
its use. At least that is a kernel only change, but will need people
with the HW to do this work.

> Please let address the issue on this way, instead of broken an
> userspace API that it is there since 1991.

It has happened before :( It took 4 years for RDMA to undo the uAPI
breakage caused by a security fix for something that was a 15 years
old bug. 

Jason
Daniel Vetter Oct. 9, 2020, 5:52 p.m. UTC | #6
On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
>
> > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
>
> No, it is unconditionally unsafe. The CMA movable mappings are
> specific VMAs that will have bad issues here, but there are other
> types too.
>
> The only way to do something at a VMA level is to have a list of OK
> VMAs, eg because they were creatd via a special mmap helper from the
> media subsystem.
>
> > Well, no drivers inside the media subsystem uses such flag, although
> > they may rely on some infrastructure that could be using it behind
> > the bars.
>
> It doesn't matter, nothing prevents the user from calling media APIs
> on mmaps it gets from other subsystems.

I think a good first step would be to disable userptr of non struct
page backed storage going forward for any new hw support. Even on
existing drivers. dma-buf sharing has been around for long enough now
that this shouldn't be a problem. Unfortunately right now this doesn't
seem to exist, so the entire problem keeps getting perpetuated.

> > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > flag that it would be denying the core mm code to set __GFP_MOVABLE.
>
> We can't tell from the VMA these kinds of details..
>
> It has to go the other direction, evey mmap that might be used as a
> userptr here has to be found and the VMA specially created to allow
> its use. At least that is a kernel only change, but will need people
> with the HW to do this work.

I think the only reasonable way to keep this working is:
- add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
- add dma-buf export support to fbdev and v4l
- roll this out everywhere we still need it.

Realistically this just isn't going to happen. And anything else just
reimplements half of dma-buf, which is kinda pointless (you need
minimally refcounting and some way to get at a promise of a permanent
sg list for dma. Plus probably the vmap for kernel cpu access.

> > Please let address the issue on this way, instead of broken an
> > userspace API that it is there since 1991.
>
> It has happened before :( It took 4 years for RDMA to undo the uAPI
> breakage caused by a security fix for something that was a 15 years
> old bug.

Yeah we have a bunch of these on the drm side too. Some of them are
really just "you have to upgrade userspace", and there's no real fix
for the security nightmare without that.
-Daniel
Jason Gunthorpe Oct. 9, 2020, 6:01 p.m. UTC | #7
On Fri, Oct 09, 2020 at 07:52:05PM +0200, Daniel Vetter wrote:

> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.
> 
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> - add dma-buf export support to fbdev and v4l
> - roll this out everywhere we still need it.

It seems to me there is a technical way forward to restore user
compat, so it is really no different than RDMA/DRM pain we both
suffered before.

Thus no justification to NAK it. If media wants things to keep working
they have to do the technical path like you outline above.

> Realistically this just isn't going to happen. 

If your series goes ahead it will get solved. Someone will take on the
huge project to either add DMA buf to the drivers people still care
about, or do the work above to transparently handle in kernel.

If we allow things to keep working without consequence then nobody
will do it.

The only reason we did the 4 years of work in RDMA was because Linus
went in and broke the uABI for a security fix. It was hundreds of
patches to fix it, so I don't have much sympathy for "it is too hard"
here.

Jason
Daniel Vetter Oct. 9, 2020, 7:31 p.m. UTC | #8
On Fri, Oct 9, 2020 at 8:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 09, 2020 at 07:52:05PM +0200, Daniel Vetter wrote:
>
> > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> > >
> > > We can't tell from the VMA these kinds of details..
> > >
> > > It has to go the other direction, evey mmap that might be used as a
> > > userptr here has to be found and the VMA specially created to allow
> > > its use. At least that is a kernel only change, but will need people
> > > with the HW to do this work.
> >
> > I think the only reasonable way to keep this working is:
> > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> > - add dma-buf export support to fbdev and v4l
> > - roll this out everywhere we still need it.
>
> It seems to me there is a technical way forward to restore user
> compat, so it is really no different than RDMA/DRM pain we both
> suffered before.
>
> Thus no justification to NAK it. If media wants things to keep working
> they have to do the technical path like you outline above.
>
> > Realistically this just isn't going to happen.
>
> If your series goes ahead it will get solved. Someone will take on the
> huge project to either add DMA buf to the drivers people still care
> about, or do the work above to transparently handle in kernel.
>
> If we allow things to keep working without consequence then nobody
> will do it.
>
> The only reason we did the 4 years of work in RDMA was because Linus
> went in and broke the uABI for a security fix. It was hundreds of
> patches to fix it, so I don't have much sympathy for "it is too hard"
> here.

Oh fully agreeing with you here, I just wanted to lay out that a)
there is a solid plan to fix it and b) it's way too much work for me
to just type it as a part of a "learn me some core mm semantics"
project :-)

I was hoping that we could get away with a special marker for
problematic vma, and filter those out. But after all the digging I've
noticed that on anything remotely modern, there's just nothing left.
Device memory management has become massively more dynamic in the past
10 years.
-Daniel
Mauro Carvalho Chehab Oct. 10, 2020, 9:21 a.m. UTC | #9
Em Fri, 9 Oct 2020 19:52:05 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> >  
> > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.  
> >
> > No, it is unconditionally unsafe. The CMA movable mappings are
> > specific VMAs that will have bad issues here, but there are other
> > types too.

I didn't check the mm dirty details, but I strongly suspect that the mm
code has a way to prevent moving a mmapped page while it is still in usage.

If not, then the entire movable pages concept sounds broken to me, and
has to be fixed at mm subsystem.

> >
> > The only way to do something at a VMA level is to have a list of OK
> > VMAs, eg because they were creatd via a special mmap helper from the
> > media subsystem.

I'm not sure if I'm following you on that. The media API can work with
different ways of sending buffers to userspace:

	- read();

	- using the overlay mode. This interface is deprecated in
	  practice, being replaced by DMABUF. Only a few older hardware
	  supports it, and it depends on an special X11 helper driver
	  for it to work.

	- via DMABUF:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html

	- via mmap, using a mmap helper:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/mmap.html

	- via mmap, using userspace-provided pointers:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html

The existing open-source programs usually chose one or more of the above
modes. if the Kernel driver returns -EINVAL when an mmapped streaming I/O
mode is not supported, userspace has to select a different method.

Most userspace open source programs have fallback support: if one
mmap I/O method fails, it selects another one, although this is not
a mandatory requirement. I found (and fixed) a few ones that were
relying exclusively on userptr support, but I didn't make a 
comprehensive check.

Also there are a number of relevant closed-source apps that we have no 
idea about what methods they use, like Skype, and other similar
videoconferencing programs. Breaking support for those, specially at
a time where people are relying on it in order to participate on
conferences and doing internal meetings is a **very bad** idea.

So, whatever solution is taken, it should not be dumping warning
messages at the system and tainting the Kernel, but, instead, checking
if the userspace request is valid or not. If it is invalid, return the
proper error code via the right V4L2 ioctl, in order to let userspace
choose a different method. I the request is valid, refcount the pages 
for them to not be moved while they're still in usage.

-

Let me provide some background about how things work at the media
subsytem. If you want to know more, the userspace-provided memory 
mapped pointers work is described here:

	https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html#userp

Basically, userspace calls either one of those ioctls:

	VIDIOC_CREATE_BUFS:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-create-bufs.html

Which is translated into a videobuf2 call to: vb2_ioctl_create_bufs()

	VIDIOC_REQBUFS
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-reqbufs.html#vidioc-reqbufs

Which is translated into a videobuf2 call to: vb2_ioctl_reqbufs()

Both internally calls vb2_verify_memory_type(), which is responsible
for checking if the provided pointers are OK for the usage and/or do
all necessary page allocations, and taking care of any special 
requirements. This could easily have some additional checks to
verify if the requested VMA address has pages that are movable or
not, ensuring that ensure that the VMA is OK, and locking them in
order to prevent the mm code to move such pages while they are in
usage by the media subsystem.

Now, as I said before, I don't know the dirty details about how
to lock those pages at the mm subsystem in order to avoid it
to move the used pages. Yet, when vb2_create_framevec()
is called, the check if VMA is OK should already be happened
at vb2_verify_memory_type().

-

It should be noticed that the dirty hack added by patch 09/17
and 10/17 affects *all* types of memory allocations at V4L2, 
as this kAPI is called by the 3 different memory models
supported at the media subsystem:

	drivers/media/common/videobuf2/videobuf2-vmalloc.c
	drivers/media/common/videobuf2/videobuf2-dma-contig.c
	drivers/media/common/videobuf2/videobuf2-dma-sg.c

In other words, with this code:

	int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
		unsigned long *pfn)
	{
	#ifdef CONFIG_STRICT_FOLLOW_PFN
		pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
		return -EINVAL;
	#else
		WARN_ONCE(1, "unsafe follow_pfn usage\n");
		add_taint(TAINT_USER, LOCKDEP_STILL_OK);

		return follow_pfn(vma, address, pfn);
	#endif

you're unconditionally breaking the media userspace API support not
only for embedded systems that could be using userptr instead of
DMA_BUF, but also for *all* video devices, including USB cameras.

This is **NOT** an acceptable solution. 

So, I stand my NACK to this approach.

> > > Well, no drivers inside the media subsystem uses such flag, although
> > > they may rely on some infrastructure that could be using it behind
> > > the bars.  
> >
> > It doesn't matter, nothing prevents the user from calling media APIs
> > on mmaps it gets from other subsystems.  
> 
> I think a good first step would be to disable userptr of non struct
> page backed storage going forward for any new hw support. Even on
> existing drivers. dma-buf sharing has been around for long enough now
> that this shouldn't be a problem. Unfortunately right now this doesn't
> seem to exist, so the entire problem keeps getting perpetuated.

Well, the media uAPI does support DMABUF (both import and export):

	https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html
	https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-expbuf.html#vidioc-expbuf

And I fully agree that newer programs should use DMABUF when sharing
buffers with DRM subsystem, but that's not my main concern.

What I do want is to not break userspace support nor to taint the Kernel
due to a valid uAPI call.

A valid uAPI call should check if the parameters passed though it are
valid. If they are, it should handle. Otherwise, it should return
-EINVAL, without tainting the Kernel or printing warning messages.

The approach took by patches 09/17 and 10/17 doesn't do that.
Instead, they just unconditionally breaks the media uAPI.

What should be done, instead, is to drop patch 10/17, and work on
a way for the code inside vb2_create_framevec() to ensure that, if
USERPTR is used, the memory pages will be properly locked while the
driver is using, returning -EINVAL only if there's no way to proceed,
without tainting the Kernel.

> 
> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.  
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.  
> 
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);

Not sure how an userspace buffer could be mapped to be using it,
specially since the buffer may not even be imported/exported
from the DRM subsystem, but it could simply be allocated
via glibc calloc()/malloc().

> - add dma-buf export support to fbdev and v4l

V4L has support for it already.

> - roll this out everywhere we still need it.

That's where things are hard. This is not like DRM, where the APIs
are called via some open source libraries that are also managed
by DRM upstream developers.

In the case of the media subsystem, while we added a libv4l sometime
ago, not all userspace apps use it, as a large part of them used
to exist before the addition of the libraries. Also, we're currently 
trying to deprecate libv4l, at least for embedded systems, in favor
of libcamera.

On media, there are lots of closed source apps that uses the media API
directly. Even talking about open source ones, there are lots of
different apps, including not only media-specific apps, but also
generic ones, like web browsers, which don't use the libraries we
wrote.

An userspace API breakage would take *huge* efforts and will take
lots of time to have it merged everywhere. It will cause lots of
troubles everywhere.

> Realistically this just isn't going to happen. 

Why not? Any app developer could already use DMA-BUF if required,
as the upstream support was added several years ago.

> And anything else just
> reimplements half of dma-buf, 

It is just the opposite: those uAPIs came years before dma-buf.
In other words, it was dma-buf that re-implemented it ;-)

Now, I agree with you that dma-buf is a way cooler than the past
alternatives.

-

It sounds to me that you're considering on only one use case of 
USERPTR: to pass a buffer created from DRM. As far as I'm aware,
only embedded userspace applications actually do that.

Yet, there are a number of other applications that do something like
the userptr_capture() function on this code:

	https://git.linuxtv.org/v4l-utils.git/tree/contrib/test/v4l2grab.c

E. g. using glibc alloc functions like calloc() to allocate memory, 
passing the user-allocated data to the Kernel via something like this:

	struct v4l2_requestbuffers req;
	struct v4l2_buffer buf;
	int n_buffers = 2;

	req.count  = 2;
	req.type   = V4L2_BUF_TYPE_VIDEO_CAPTURE;
	req.memory = V4L2_MEMORY_USERPTR;
	if (ioctl(fd, VIDIOC_REQBUFS, &req))
		return errno;

	for (i = 0; i < req.count; ++i) {
		buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
		buf.memory = V4L2_MEMORY_USERPTR;
		buf.index = i;
		buf.m.userptr = (unsigned long)buffers[i].start;
		buf.length = buffers[i].length;
		if (ioctl(fd, VIDIOC_QBUF, &buf))
			return errno;
	}
	if (ioctl(fd, VIDIOC_STREAMON, &req.type))
		return errno;

	/* some capture loop */

	ioctl(fd, VIDIOC_STREAMOFF, &req.type);

I can't possibly see *any* security issues with the above code.

As I said before, VIDIOC_REQBUFS should be checking if the userspace
buffers are OK and ensure that their refcounts will be incremented,
in order to avoid mm to move the pages used there, freeing the 
refconts when VIDIOC_STREAMOFF - or close(fd) - is called.

> which is kinda pointless (you need
> minimally refcounting and some way to get at a promise of a permanent
> sg list for dma. Plus probably the vmap for kernel cpu access.

Yeah, refcounting needs to happen. 

Thanks,
Mauro
Daniel Vetter Oct. 10, 2020, 10:53 a.m. UTC | #10
Hi Mauro,

You might want to read the patches more carefully, because what you're
demanding is what my patches do. Short summary:

- if STRICT_FOLLOW_PFN is set:
a) normal memory is handled as-is (i.e. your example works) through
the addition of FOLL_LONGTERM. This is the "pin the pages correctly"
approach you're demanding
b) for non-page memory (zerocopy sharing before dma-buf was upstreamed
is the only use-case for this) it is correctly rejected with -EINVAL

- if you do have blobby userspace which requires the zero-copy using
userptr to work, and doesn't have any of the fallbacks implemented
that you describe, this would be a regression. That's why
STRICT_FOLLOW_PFN can be unset. And yes there's a real security issue
in this usage, Marek also confirmed that the removal of the vma copy
code a few years ago essentially broke even the weak assumptions that
made the code work 10+ years ago when it was merged.

so tdlr; Everything you described will keep working even with the new
flag set, and everything you demand must be implemented _is_
implemented in this patch series.

Also please keep in mind that we are _not_ talking about the general
userptr support that was merge ~20 years ago. This patch series here
is _only_ about the zerocpy userptr support merged with 50ac952d2263
("[media] videobuf2-dma-sg: Support io userptr operations on io
memory") in 2013.

Why this hack was merged in 2013 when we merged dma-buf almost 2 years
before that I have no idea about. Imo that patch simply should never
have landed, and instead dma-buf support prioritized.

Cheers, Daniel


On Sat, Oct 10, 2020 at 11:21 AM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Fri, 9 Oct 2020 19:52:05 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
>
> > On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> > >
> > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> > >
> > > No, it is unconditionally unsafe. The CMA movable mappings are
> > > specific VMAs that will have bad issues here, but there are other
> > > types too.
>
> I didn't check the mm dirty details, but I strongly suspect that the mm
> code has a way to prevent moving a mmapped page while it is still in usage.
>
> If not, then the entire movable pages concept sounds broken to me, and
> has to be fixed at mm subsystem.
>
> > >
> > > The only way to do something at a VMA level is to have a list of OK
> > > VMAs, eg because they were creatd via a special mmap helper from the
> > > media subsystem.
>
> I'm not sure if I'm following you on that. The media API can work with
> different ways of sending buffers to userspace:
>
>         - read();
>
>         - using the overlay mode. This interface is deprecated in
>           practice, being replaced by DMABUF. Only a few older hardware
>           supports it, and it depends on an special X11 helper driver
>           for it to work.
>
>         - via DMABUF:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html
>
>         - via mmap, using a mmap helper:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/mmap.html
>
>         - via mmap, using userspace-provided pointers:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html
>
> The existing open-source programs usually chose one or more of the above
> modes. if the Kernel driver returns -EINVAL when an mmapped streaming I/O
> mode is not supported, userspace has to select a different method.
>
> Most userspace open source programs have fallback support: if one
> mmap I/O method fails, it selects another one, although this is not
> a mandatory requirement. I found (and fixed) a few ones that were
> relying exclusively on userptr support, but I didn't make a
> comprehensive check.
>
> Also there are a number of relevant closed-source apps that we have no
> idea about what methods they use, like Skype, and other similar
> videoconferencing programs. Breaking support for those, specially at
> a time where people are relying on it in order to participate on
> conferences and doing internal meetings is a **very bad** idea.
>
> So, whatever solution is taken, it should not be dumping warning
> messages at the system and tainting the Kernel, but, instead, checking
> if the userspace request is valid or not. If it is invalid, return the
> proper error code via the right V4L2 ioctl, in order to let userspace
> choose a different method. I the request is valid, refcount the pages
> for them to not be moved while they're still in usage.
>
> -
>
> Let me provide some background about how things work at the media
> subsytem. If you want to know more, the userspace-provided memory
> mapped pointers work is described here:
>
>         https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html#userp
>
> Basically, userspace calls either one of those ioctls:
>
>         VIDIOC_CREATE_BUFS:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-create-bufs.html
>
> Which is translated into a videobuf2 call to: vb2_ioctl_create_bufs()
>
>         VIDIOC_REQBUFS
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-reqbufs.html#vidioc-reqbufs
>
> Which is translated into a videobuf2 call to: vb2_ioctl_reqbufs()
>
> Both internally calls vb2_verify_memory_type(), which is responsible
> for checking if the provided pointers are OK for the usage and/or do
> all necessary page allocations, and taking care of any special
> requirements. This could easily have some additional checks to
> verify if the requested VMA address has pages that are movable or
> not, ensuring that ensure that the VMA is OK, and locking them in
> order to prevent the mm code to move such pages while they are in
> usage by the media subsystem.
>
> Now, as I said before, I don't know the dirty details about how
> to lock those pages at the mm subsystem in order to avoid it
> to move the used pages. Yet, when vb2_create_framevec()
> is called, the check if VMA is OK should already be happened
> at vb2_verify_memory_type().
>
> -
>
> It should be noticed that the dirty hack added by patch 09/17
> and 10/17 affects *all* types of memory allocations at V4L2,
> as this kAPI is called by the 3 different memory models
> supported at the media subsystem:
>
>         drivers/media/common/videobuf2/videobuf2-vmalloc.c
>         drivers/media/common/videobuf2/videobuf2-dma-contig.c
>         drivers/media/common/videobuf2/videobuf2-dma-sg.c
>
> In other words, with this code:
>
>         int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
>                 unsigned long *pfn)
>         {
>         #ifdef CONFIG_STRICT_FOLLOW_PFN
>                 pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
>                 return -EINVAL;
>         #else
>                 WARN_ONCE(1, "unsafe follow_pfn usage\n");
>                 add_taint(TAINT_USER, LOCKDEP_STILL_OK);
>
>                 return follow_pfn(vma, address, pfn);
>         #endif
>
> you're unconditionally breaking the media userspace API support not
> only for embedded systems that could be using userptr instead of
> DMA_BUF, but also for *all* video devices, including USB cameras.
>
> This is **NOT** an acceptable solution.
>
> So, I stand my NACK to this approach.
>
> > > > Well, no drivers inside the media subsystem uses such flag, although
> > > > they may rely on some infrastructure that could be using it behind
> > > > the bars.
> > >
> > > It doesn't matter, nothing prevents the user from calling media APIs
> > > on mmaps it gets from other subsystems.
> >
> > I think a good first step would be to disable userptr of non struct
> > page backed storage going forward for any new hw support. Even on
> > existing drivers. dma-buf sharing has been around for long enough now
> > that this shouldn't be a problem. Unfortunately right now this doesn't
> > seem to exist, so the entire problem keeps getting perpetuated.
>
> Well, the media uAPI does support DMABUF (both import and export):
>
>         https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html
>         https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-expbuf.html#vidioc-expbuf
>
> And I fully agree that newer programs should use DMABUF when sharing
> buffers with DRM subsystem, but that's not my main concern.
>
> What I do want is to not break userspace support nor to taint the Kernel
> due to a valid uAPI call.
>
> A valid uAPI call should check if the parameters passed though it are
> valid. If they are, it should handle. Otherwise, it should return
> -EINVAL, without tainting the Kernel or printing warning messages.
>
> The approach took by patches 09/17 and 10/17 doesn't do that.
> Instead, they just unconditionally breaks the media uAPI.
>
> What should be done, instead, is to drop patch 10/17, and work on
> a way for the code inside vb2_create_framevec() to ensure that, if
> USERPTR is used, the memory pages will be properly locked while the
> driver is using, returning -EINVAL only if there's no way to proceed,
> without tainting the Kernel.
>
> >
> > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> > >
> > > We can't tell from the VMA these kinds of details..
> > >
> > > It has to go the other direction, evey mmap that might be used as a
> > > userptr here has to be found and the VMA specially created to allow
> > > its use. At least that is a kernel only change, but will need people
> > > with the HW to do this work.
> >
> > I think the only reasonable way to keep this working is:
> > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
>
> Not sure how an userspace buffer could be mapped to be using it,
> specially since the buffer may not even be imported/exported
> from the DRM subsystem, but it could simply be allocated
> via glibc calloc()/malloc().
>
> > - add dma-buf export support to fbdev and v4l
>
> V4L has support for it already.
>
> > - roll this out everywhere we still need it.
>
> That's where things are hard. This is not like DRM, where the APIs
> are called via some open source libraries that are also managed
> by DRM upstream developers.
>
> In the case of the media subsystem, while we added a libv4l sometime
> ago, not all userspace apps use it, as a large part of them used
> to exist before the addition of the libraries. Also, we're currently
> trying to deprecate libv4l, at least for embedded systems, in favor
> of libcamera.
>
> On media, there are lots of closed source apps that uses the media API
> directly. Even talking about open source ones, there are lots of
> different apps, including not only media-specific apps, but also
> generic ones, like web browsers, which don't use the libraries we
> wrote.
>
> An userspace API breakage would take *huge* efforts and will take
> lots of time to have it merged everywhere. It will cause lots of
> troubles everywhere.
>
> > Realistically this just isn't going to happen.
>
> Why not? Any app developer could already use DMA-BUF if required,
> as the upstream support was added several years ago.
>
> > And anything else just
> > reimplements half of dma-buf,
>
> It is just the opposite: those uAPIs came years before dma-buf.
> In other words, it was dma-buf that re-implemented it ;-)
>
> Now, I agree with you that dma-buf is a way cooler than the past
> alternatives.
>
> -
>
> It sounds to me that you're considering on only one use case of
> USERPTR: to pass a buffer created from DRM. As far as I'm aware,
> only embedded userspace applications actually do that.
>
> Yet, there are a number of other applications that do something like
> the userptr_capture() function on this code:
>
>         https://git.linuxtv.org/v4l-utils.git/tree/contrib/test/v4l2grab.c
>
> E. g. using glibc alloc functions like calloc() to allocate memory,
> passing the user-allocated data to the Kernel via something like this:
>
>         struct v4l2_requestbuffers req;
>         struct v4l2_buffer buf;
>         int n_buffers = 2;
>
>         req.count  = 2;
>         req.type   = V4L2_BUF_TYPE_VIDEO_CAPTURE;
>         req.memory = V4L2_MEMORY_USERPTR;
>         if (ioctl(fd, VIDIOC_REQBUFS, &req))
>                 return errno;
>
>         for (i = 0; i < req.count; ++i) {
>                 buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
>                 buf.memory = V4L2_MEMORY_USERPTR;
>                 buf.index = i;
>                 buf.m.userptr = (unsigned long)buffers[i].start;
>                 buf.length = buffers[i].length;
>                 if (ioctl(fd, VIDIOC_QBUF, &buf))
>                         return errno;
>         }
>         if (ioctl(fd, VIDIOC_STREAMON, &req.type))
>                 return errno;
>
>         /* some capture loop */
>
>         ioctl(fd, VIDIOC_STREAMOFF, &req.type);
>
> I can't possibly see *any* security issues with the above code.
>
> As I said before, VIDIOC_REQBUFS should be checking if the userspace
> buffers are OK and ensure that their refcounts will be incremented,
> in order to avoid mm to move the pages used there, freeing the
> refconts when VIDIOC_STREAMOFF - or close(fd) - is called.
>
> > which is kinda pointless (you need
> > minimally refcounting and some way to get at a promise of a permanent
> > sg list for dma. Plus probably the vmap for kernel cpu access.
>
> Yeah, refcounting needs to happen.
>
> Thanks,
> Mauro



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Mauro Carvalho Chehab Oct. 10, 2020, 11:39 a.m. UTC | #11
Em Sat, 10 Oct 2020 12:53:49 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> Hi Mauro,
> 
> You might want to read the patches more carefully, because what you're
> demanding is what my patches do. Short summary:
> 
> - if STRICT_FOLLOW_PFN is set:
> a) normal memory is handled as-is (i.e. your example works) through
> the addition of FOLL_LONGTERM. This is the "pin the pages correctly"
> approach you're demanding
> b) for non-page memory (zerocopy sharing before dma-buf was upstreamed
> is the only use-case for this) it is correctly rejected with -EINVAL
> 
> - if you do have blobby userspace which requires the zero-copy using
> userptr to work, and doesn't have any of the fallbacks implemented
> that you describe, this would be a regression. That's why
> STRICT_FOLLOW_PFN can be unset. And yes there's a real security issue
> in this usage, Marek also confirmed that the removal of the vma copy
> code a few years ago essentially broke even the weak assumptions that
> made the code work 10+ years ago when it was merged.
> 
> so tdlr; Everything you described will keep working even with the new
> flag set, and everything you demand must be implemented _is_
> implemented in this patch series.
> 
> Also please keep in mind that we are _not_ talking about the general
> userptr support that was merge ~20 years ago. This patch series here
> is _only_ about the zerocpy userptr support merged with 50ac952d2263
> ("[media] videobuf2-dma-sg: Support io userptr operations on io
> memory") in 2013.

Ok, now it is making more sense. Please update the comments for
patch 10/17 to describe the above.

We need some time to test this though, in order to check if no
regressions were added (except the ones due to changeset 50ac952d2263).

> 
> Why this hack was merged in 2013 when we merged dma-buf almost 2 years
> before that I have no idea about. Imo that patch simply should never
> have landed, and instead dma-buf support prioritized.

If I recall correctly, we didn't have any DMABUF support
at the media subsystem, back on 2013.

It took some time for the DMA-BUF to arrive at media, as this
was not a top priority. Also, there aren't many developers that
understand the memory model well enough to implement DMA-BUF support
and touch the VB2 code, which is quite complex, as it supports
lots of different ways for I/O, plus works with vmalloc, DMA
contig and DMA scatter/gather. 

Changes there should carefully be tested against different
drivers, in order to avoid regressions on it.

> Cheers, Daniel

Thanks,
Mauro
Daniel Vetter Oct. 10, 2020, 11:56 a.m. UTC | #12
On Sat, Oct 10, 2020 at 1:39 PM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Sat, 10 Oct 2020 12:53:49 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
>
> > Hi Mauro,
> >
> > You might want to read the patches more carefully, because what you're
> > demanding is what my patches do. Short summary:
> >
> > - if STRICT_FOLLOW_PFN is set:
> > a) normal memory is handled as-is (i.e. your example works) through
> > the addition of FOLL_LONGTERM. This is the "pin the pages correctly"
> > approach you're demanding
> > b) for non-page memory (zerocopy sharing before dma-buf was upstreamed
> > is the only use-case for this) it is correctly rejected with -EINVAL
> >
> > - if you do have blobby userspace which requires the zero-copy using
> > userptr to work, and doesn't have any of the fallbacks implemented
> > that you describe, this would be a regression. That's why
> > STRICT_FOLLOW_PFN can be unset. And yes there's a real security issue
> > in this usage, Marek also confirmed that the removal of the vma copy
> > code a few years ago essentially broke even the weak assumptions that
> > made the code work 10+ years ago when it was merged.
> >
> > so tdlr; Everything you described will keep working even with the new
> > flag set, and everything you demand must be implemented _is_
> > implemented in this patch series.
> >
> > Also please keep in mind that we are _not_ talking about the general
> > userptr support that was merge ~20 years ago. This patch series here
> > is _only_ about the zerocpy userptr support merged with 50ac952d2263
> > ("[media] videobuf2-dma-sg: Support io userptr operations on io
> > memory") in 2013.
>
> Ok, now it is making more sense. Please update the comments for
> patch 10/17 to describe the above.

Will do.

> We need some time to test this though, in order to check if no
> regressions were added (except the ones due to changeset 50ac952d2263).

Yeah testing of the previous patches to switch to FOLL_LONGTERM would
be really good. I also need that for habanalabs and ideally exynos
too. All the userptr for normal memory should keep working, and with
FOLL_LONGTERM it should actually work better, since with that it
should now correctly interact with pagecache and fs code, not just
with anon memory from malloc.

Thanks, Daniel

> > Why this hack was merged in 2013 when we merged dma-buf almost 2 years
> > before that I have no idea about. Imo that patch simply should never
> > have landed, and instead dma-buf support prioritized.
>
> If I recall correctly, we didn't have any DMABUF support
> at the media subsystem, back on 2013.
>
> It took some time for the DMA-BUF to arrive at media, as this
> was not a top priority. Also, there aren't many developers that
> understand the memory model well enough to implement DMA-BUF support
> and touch the VB2 code, which is quite complex, as it supports
> lots of different ways for I/O, plus works with vmalloc, DMA
> contig and DMA scatter/gather.
>
> Changes there should carefully be tested against different
> drivers, in order to avoid regressions on it.
>
> > Cheers, Daniel
>
> Thanks,
> Mauro
Tomasz Figa Oct. 10, 2020, 5:22 p.m. UTC | #13
Hi Daniel,

On Fri, Oct 9, 2020 at 7:52 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> >
> > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> >
> > No, it is unconditionally unsafe. The CMA movable mappings are
> > specific VMAs that will have bad issues here, but there are other
> > types too.
> >
> > The only way to do something at a VMA level is to have a list of OK
> > VMAs, eg because they were creatd via a special mmap helper from the
> > media subsystem.
> >
> > > Well, no drivers inside the media subsystem uses such flag, although
> > > they may rely on some infrastructure that could be using it behind
> > > the bars.
> >
> > It doesn't matter, nothing prevents the user from calling media APIs
> > on mmaps it gets from other subsystems.
>
> I think a good first step would be to disable userptr of non struct
> page backed storage going forward for any new hw support. Even on
> existing drivers. dma-buf sharing has been around for long enough now
> that this shouldn't be a problem. Unfortunately right now this doesn't
> seem to exist, so the entire problem keeps getting perpetuated.
>
> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.
>
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> - add dma-buf export support to fbdev and v4l

I assume you mean V4L2 and not the obsolete V4L that is emulated in
the userspace by libv4l. If so, every video device that uses videobuf2
gets DMA-buf export for free and there is nothing needed to enable it.

We probably still have a few legacy drivers using videobuf (non-2),
but IMHO those should be safe to put behind some disabled-by-default
Kconfig symbol or even completely drop, as the legacy framework has
been deprecated for many years already.

> - roll this out everywhere we still need it.
>
> Realistically this just isn't going to happen. And anything else just
> reimplements half of dma-buf, which is kinda pointless (you need
> minimally refcounting and some way to get at a promise of a permanent
> sg list for dma. Plus probably the vmap for kernel cpu access.
>
> > > Please let address the issue on this way, instead of broken an
> > > userspace API that it is there since 1991.
> >
> > It has happened before :( It took 4 years for RDMA to undo the uAPI
> > breakage caused by a security fix for something that was a 15 years
> > old bug.
>
> Yeah we have a bunch of these on the drm side too. Some of them are
> really just "you have to upgrade userspace", and there's no real fix
> for the security nightmare without that.

I think we need to phase out such userspace indeed. The Kconfig symbol
allows enabling the unsafe functionality for anyone who still needs
it, so I think it's not entirely a breakage.

Best regards,
Tomasz
Tomasz Figa Oct. 10, 2020, 5:30 p.m. UTC | #14
Hi Mauro,

On Fri, Oct 9, 2020 at 2:37 PM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Fri, 9 Oct 2020 09:21:11 -0300
> Jason Gunthorpe <jgg@ziepe.ca> escreveu:
>
> > On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:
> > > Hi,
> > >
> > > Em Fri,  9 Oct 2020 09:59:26 +0200
> > > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> > >
> > > > Way back it was a reasonable assumptions that iomem mappings never
> > > > change the pfn range they point at. But this has changed:
> > > >
> > > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > > ptes with unmap_mapping_range when buffers get moved
> > > >
> > > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > > cma regions. This means if we miss the unmap the pfn might contain
> > > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > > >
> > > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > > >
> > > > Accessing pfns obtained from ptes without holding all the locks is
> > > > therefore no longer a good idea.
> > > >
> > > > Unfortunately there's some users where this is not fixable (like v4l
> > > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > > iommu). For now annotate these as unsafe and splat appropriately.
> > > >
> > > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > > roll out to all appropriate places.
> > >
> > > NACK, as this breaks an existing userspace API on media.
> >
> > It doesn't break it. You get a big warning the thing is broken and it
> > keeps working.
> >
> > We can't leave such a huge security hole open - it impacts other
> > subsystems, distros need to be able to run in a secure mode.
>
> Well, if distros disable it, then apps will break.
>

Do we have any information on userspace that actually needs this functionality?

Note that we're _not_ talking here about the complete USERPTR
functionality, but rather just the very corner case of carveout memory
not backed by struct pages.

Given that the current in-tree ways of reserving carveout memory, such
as shared-dma-pool, actually give memory backed by struct pages, do we
even have a source of such legacy memory in the kernel today?

I think that given that this is a very niche functionality, we could
have it disabled by default for security reasons and if someone
_really_ (i.e. there is no replacement) needs it, they probably need
to use a custom kernel build anyway for their exotic hardware setup
(with PFN-backed carveout memory), so they can enable it.

> > > While I agree that using the userptr on media is something that
> > > new drivers may not support, as DMABUF is a better way of
> > > handling it, changing this for existing ones is a big no,
> > > as it may break usersapace.
> >
> > media community needs to work to fix this, not pretend it is OK to
> > keep going as-is.
>
> > Dealing with security issues is the one case where an uABI break might
> > be acceptable.
> >
> > If you want to NAK it then you need to come up with the work to do
> > something here correctly that will support the old drivers without the
> > kernel taint.
> >
> > Unfortunately making things uncomfortable for the subsystem is the big
> > hammer the core kernel needs to use to actually get this security work
> > done by those responsible.
>
>
> I'm not pretending that this is ok. Just pointing that the approach
> taken is NOT OK.
>
> I'm not a mm/ expert, but, from what I understood from Daniel's patch
> description is that this is unsafe *only if*  __GFP_MOVABLE is used.
>
> Well, no drivers inside the media subsystem uses such flag, although
> they may rely on some infrastructure that could be using it behind
> the bars.
>
> If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> flag that it would be denying the core mm code to set __GFP_MOVABLE.
>
> Please let address the issue on this way, instead of broken an
> userspace API that it is there since 1991.

Note that USERPTR as a whole generally has been considered deprecated
in V4L2 for many years and people have been actively discouraged to
use it. And, still, we're just talking here about the very rare corner
case, not the whole USERPTR API.

Best regards,
Tomasz
Laurent Pinchart Oct. 10, 2020, 9:11 p.m. UTC | #15
Hi Daniel,

On Fri, Oct 09, 2020 at 07:52:05PM +0200, Daniel Vetter wrote:
> On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> >
> > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> >
> > No, it is unconditionally unsafe. The CMA movable mappings are
> > specific VMAs that will have bad issues here, but there are other
> > types too.
> >
> > The only way to do something at a VMA level is to have a list of OK
> > VMAs, eg because they were creatd via a special mmap helper from the
> > media subsystem.
> >
> > > Well, no drivers inside the media subsystem uses such flag, although
> > > they may rely on some infrastructure that could be using it behind
> > > the bars.
> >
> > It doesn't matter, nothing prevents the user from calling media APIs
> > on mmaps it gets from other subsystems.
> 
> I think a good first step would be to disable userptr of non struct
> page backed storage going forward for any new hw support. Even on
> existing drivers. dma-buf sharing has been around for long enough now
> that this shouldn't be a problem. Unfortunately right now this doesn't
> seem to exist, so the entire problem keeps getting perpetuated.

On the V4L2 side, I think we should disable USERPTR for any new driver,
period. That's what I've been recommended when reviewing patches for
several years already. It's a deprecated API, it should be phased out,
which starts by not allowing any new use case.

> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.
> 
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> - add dma-buf export support to fbdev and v4l
> - roll this out everywhere we still need it.
> 
> Realistically this just isn't going to happen. And anything else just
> reimplements half of dma-buf, which is kinda pointless (you need
> minimally refcounting and some way to get at a promise of a permanent
> sg list for dma. Plus probably the vmap for kernel cpu access.
> 
> > > Please let address the issue on this way, instead of broken an
> > > userspace API that it is there since 1991.
> >
> > It has happened before :( It took 4 years for RDMA to undo the uAPI
> > breakage caused by a security fix for something that was a 15 years
> > old bug.
> 
> Yeah we have a bunch of these on the drm side too. Some of them are
> really just "you have to upgrade userspace", and there's no real fix
> for the security nightmare without that.
Laurent Pinchart Oct. 10, 2020, 9:35 p.m. UTC | #16
Hi Tomasz,

On Sat, Oct 10, 2020 at 07:22:48PM +0200, Tomasz Figa wrote:
> On Fri, Oct 9, 2020 at 7:52 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> > >
> > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> > >
> > > No, it is unconditionally unsafe. The CMA movable mappings are
> > > specific VMAs that will have bad issues here, but there are other
> > > types too.
> > >
> > > The only way to do something at a VMA level is to have a list of OK
> > > VMAs, eg because they were creatd via a special mmap helper from the
> > > media subsystem.
> > >
> > > > Well, no drivers inside the media subsystem uses such flag, although
> > > > they may rely on some infrastructure that could be using it behind
> > > > the bars.
> > >
> > > It doesn't matter, nothing prevents the user from calling media APIs
> > > on mmaps it gets from other subsystems.
> >
> > I think a good first step would be to disable userptr of non struct
> > page backed storage going forward for any new hw support. Even on
> > existing drivers. dma-buf sharing has been around for long enough now
> > that this shouldn't be a problem. Unfortunately right now this doesn't
> > seem to exist, so the entire problem keeps getting perpetuated.
> >
> > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> > >
> > > We can't tell from the VMA these kinds of details..
> > >
> > > It has to go the other direction, evey mmap that might be used as a
> > > userptr here has to be found and the VMA specially created to allow
> > > its use. At least that is a kernel only change, but will need people
> > > with the HW to do this work.
> >
> > I think the only reasonable way to keep this working is:
> > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> > - add dma-buf export support to fbdev and v4l
> 
> I assume you mean V4L2 and not the obsolete V4L that is emulated in
> the userspace by libv4l. If so, every video device that uses videobuf2
> gets DMA-buf export for free and there is nothing needed to enable it.
> 
> We probably still have a few legacy drivers using videobuf (non-2),
> but IMHO those should be safe to put behind some disabled-by-default
> Kconfig symbol or even completely drop, as the legacy framework has
> been deprecated for many years already.

There's 8 drivers left, and they support a very large number of devices.
I expect unhappy users distros stop shipping them. On the other hand,
videobuf has been deprecated for a loooooooong time, so there has been
plenty of time to convert the remaining drivers to videobuf2. If nobody
can do it, then we'll have to drop support for these devices given the
security issues.

We have moved media drivers to staging in the past when there wasn't
enough maintenance effort, we could do so here too.

> > - roll this out everywhere we still need it.
> >
> > Realistically this just isn't going to happen. And anything else just
> > reimplements half of dma-buf, which is kinda pointless (you need
> > minimally refcounting and some way to get at a promise of a permanent
> > sg list for dma. Plus probably the vmap for kernel cpu access.
> >
> > > > Please let address the issue on this way, instead of broken an
> > > > userspace API that it is there since 1991.
> > >
> > > It has happened before :( It took 4 years for RDMA to undo the uAPI
> > > breakage caused by a security fix for something that was a 15 years
> > > old bug.
> >
> > Yeah we have a bunch of these on the drm side too. Some of them are
> > really just "you have to upgrade userspace", and there's no real fix
> > for the security nightmare without that.
> 
> I think we need to phase out such userspace indeed. The Kconfig symbol
> allows enabling the unsafe functionality for anyone who still needs
> it, so I think it's not entirely a breakage.
Daniel Vetter Oct. 10, 2020, 9:50 p.m. UTC | #17
On Sat, Oct 10, 2020 at 11:36 PM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Tomasz,
>
> On Sat, Oct 10, 2020 at 07:22:48PM +0200, Tomasz Figa wrote:
> > On Fri, Oct 9, 2020 at 7:52 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> > > >
> > > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> > > >
> > > > No, it is unconditionally unsafe. The CMA movable mappings are
> > > > specific VMAs that will have bad issues here, but there are other
> > > > types too.
> > > >
> > > > The only way to do something at a VMA level is to have a list of OK
> > > > VMAs, eg because they were creatd via a special mmap helper from the
> > > > media subsystem.
> > > >
> > > > > Well, no drivers inside the media subsystem uses such flag, although
> > > > > they may rely on some infrastructure that could be using it behind
> > > > > the bars.
> > > >
> > > > It doesn't matter, nothing prevents the user from calling media APIs
> > > > on mmaps it gets from other subsystems.
> > >
> > > I think a good first step would be to disable userptr of non struct
> > > page backed storage going forward for any new hw support. Even on
> > > existing drivers. dma-buf sharing has been around for long enough now
> > > that this shouldn't be a problem. Unfortunately right now this doesn't
> > > seem to exist, so the entire problem keeps getting perpetuated.
> > >
> > > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> > > >
> > > > We can't tell from the VMA these kinds of details..
> > > >
> > > > It has to go the other direction, evey mmap that might be used as a
> > > > userptr here has to be found and the VMA specially created to allow
> > > > its use. At least that is a kernel only change, but will need people
> > > > with the HW to do this work.
> > >
> > > I think the only reasonable way to keep this working is:
> > > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> > > - add dma-buf export support to fbdev and v4l
> >
> > I assume you mean V4L2 and not the obsolete V4L that is emulated in
> > the userspace by libv4l. If so, every video device that uses videobuf2
> > gets DMA-buf export for free and there is nothing needed to enable it.

Yeah. And I missed that v4l2 added dma-buf export too.

> > We probably still have a few legacy drivers using videobuf (non-2),
> > but IMHO those should be safe to put behind some disabled-by-default
> > Kconfig symbol or even completely drop, as the legacy framework has
> > been deprecated for many years already.
>
> There's 8 drivers left, and they support a very large number of devices.
> I expect unhappy users distros stop shipping them. On the other hand,
> videobuf has been deprecated for a loooooooong time, so there has been
> plenty of time to convert the remaining drivers to videobuf2. If nobody
> can do it, then we'll have to drop support for these devices given the
> security issues.

Again, the issue here is _only_ with follow_pfn. For videobuf1 this
means videbuf-dma-contig.c userptr support is broken. Unlike videobuf2
it means it's broken for all usage (not just zero-copy userptr),
because videbuf-dma-contig.c lacks the pin_user_pages path. But that
would be easy to add if this poses a  problem I think - we just need
to carry over the pin_user_pages_fast logic from videbuf2, no driver
changes required. But of course I don't think we should do that before
someone reports the regression, since videobuf1 userptr is doubly
deprecated :-)

Everything else keeps working with videobuf1 with my patch series. So
depending upon which videobuf1 implementations these 8 drivers use,
you might not even have any real breakage there.

> We have moved media drivers to staging in the past when there wasn't
> enough maintenance effort, we could do so here too.

I'm not breaking the world with this, it's really very minimal
use-case. At least as far as I'm understanding the entire media
subsystem here.
-Daniel

> > > - roll this out everywhere we still need it.
> > >
> > > Realistically this just isn't going to happen. And anything else just
> > > reimplements half of dma-buf, which is kinda pointless (you need
> > > minimally refcounting and some way to get at a promise of a permanent
> > > sg list for dma. Plus probably the vmap for kernel cpu access.
> > >
> > > > > Please let address the issue on this way, instead of broken an
> > > > > userspace API that it is there since 1991.
> > > >
> > > > It has happened before :( It took 4 years for RDMA to undo the uAPI
> > > > breakage caused by a security fix for something that was a 15 years
> > > > old bug.
> > >
> > > Yeah we have a bunch of these on the drm side too. Some of them are
> > > really just "you have to upgrade userspace", and there's no real fix
> > > for the security nightmare without that.
> >
> > I think we need to phase out such userspace indeed. The Kconfig symbol
> > allows enabling the unsafe functionality for anyone who still needs
> > it, so I think it's not entirely a breakage.
>
> --
> Regards,
>
> Laurent Pinchart
Mauro Carvalho Chehab Oct. 11, 2020, 6:27 a.m. UTC | #18
Em Sat, 10 Oct 2020 23:50:27 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> On Sat, Oct 10, 2020 at 11:36 PM Laurent Pinchart
> <laurent.pinchart@ideasonboard.com> wrote:
> >

> > > We probably still have a few legacy drivers using videobuf (non-2),
> > > but IMHO those should be safe to put behind some disabled-by-default
> > > Kconfig symbol or even completely drop, as the legacy framework has
> > > been deprecated for many years already.  
> >
> > There's 8 drivers left, and they support a very large number of devices.
> > I expect unhappy users distros stop shipping them. On the other hand,
> > videobuf has been deprecated for a loooooooong time, so there has been
> > plenty of time to convert the remaining drivers to videobuf2. If nobody
> > can do it, then we'll have to drop support for these devices given the
> > security issues.  
> 
> Again, the issue here is _only_ with follow_pfn. For videobuf1 this
> means videbuf-dma-contig.c userptr support is broken. Unlike videobuf2
> it means it's broken for all usage (not just zero-copy userptr),
> because videbuf-dma-contig.c lacks the pin_user_pages path.

Well, follow_pfn() is used only by videbuf-dma-contig.c. If this is 
the only part of VB1 that will have userptr broken, then there's
just one driver that might be affected: davinci.

Yet, taking a deeper look:

	$ git grep include drivers/media/platform/davinci/|grep -i videobuf
	drivers/media/platform/davinci/vpif_capture.h:#include <media/videobuf2-dma-contig.h>
	drivers/media/platform/davinci/vpif_display.h:#include <media/videobuf2-dma-contig.h>

It sounds to me that it was already converted to VB2, but some VB1
symbols were not converted at its Kconfig.

It sounds to me that there are other drivers with some VB1 left overs
at Kconfig, as those are the only ones using VB1 those days:

	$ for i in $(git grep media/videobuf drivers |grep -v videobuf2 |grep -v v4l2-core|cut -d: -f1); do dirname $i; done|sort|uniq
	drivers/media/pci/bt8xx
	drivers/media/pci/cx18
	drivers/media/platform
	drivers/media/usb/tm6000
	drivers/media/usb/zr364xx
	drivers/staging/media/atomisp/pci

> But that
> would be easy to add if this poses a  problem I think - we just need
> to carry over the pin_user_pages_fast logic from videbuf2, no driver
> changes required. But of course I don't think we should do that before
> someone reports the regression, since videobuf1 userptr is doubly
> deprecated :-)

I think otherwise. Keeping a broken component at the Kernel is 
a bad idea. 

Yet, from my quick search above, it sounds to me that it is time for 
us to retire the VB1 DMA contig support as a hole, as there's no client 
for it anymore.

I'll work on some patches cleaning up the VB1 left overs at
Kconfig and removing videbuf-dma-contig.c for good, if there's
no hidden dependency on it.


Thanks,
Mauro
Mauro Carvalho Chehab Oct. 11, 2020, 6:36 a.m. UTC | #19
Em Sun, 11 Oct 2020 08:27:41 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:

> Em Sat, 10 Oct 2020 23:50:27 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> 
> > On Sat, Oct 10, 2020 at 11:36 PM Laurent Pinchart
> > <laurent.pinchart@ideasonboard.com> wrote:
> > >
> 
> > > > We probably still have a few legacy drivers using videobuf (non-2),
> > > > but IMHO those should be safe to put behind some disabled-by-default
> > > > Kconfig symbol or even completely drop, as the legacy framework has
> > > > been deprecated for many years already.  
> > >
> > > There's 8 drivers left, and they support a very large number of devices.
> > > I expect unhappy users distros stop shipping them. On the other hand,
> > > videobuf has been deprecated for a loooooooong time, so there has been
> > > plenty of time to convert the remaining drivers to videobuf2. If nobody
> > > can do it, then we'll have to drop support for these devices given the
> > > security issues.  
> > 
> > Again, the issue here is _only_ with follow_pfn. For videobuf1 this
> > means videbuf-dma-contig.c userptr support is broken. Unlike videobuf2
> > it means it's broken for all usage (not just zero-copy userptr),
> > because videbuf-dma-contig.c lacks the pin_user_pages path.
> 
> Well, follow_pfn() is used only by videbuf-dma-contig.c. If this is 
> the only part of VB1 that will have userptr broken, then there's
> just one driver that might be affected: davinci.
> 
> Yet, taking a deeper look:
> 
> 	$ git grep include drivers/media/platform/davinci/|grep -i videobuf
> 	drivers/media/platform/davinci/vpif_capture.h:#include <media/videobuf2-dma-contig.h>
> 	drivers/media/platform/davinci/vpif_display.h:#include <media/videobuf2-dma-contig.h>
> 
> It sounds to me that it was already converted to VB2, but some VB1
> symbols were not converted at its Kconfig.
> 
> It sounds to me that there are other drivers with some VB1 left overs
> at Kconfig, as those are the only ones using VB1 those days:
> 
> 	$ for i in $(git grep media/videobuf drivers |grep -v videobuf2 |grep -v v4l2-core|cut -d: -f1); do dirname $i; done|sort|uniq
> 	drivers/media/pci/bt8xx
> 	drivers/media/pci/cx18
> 	drivers/media/platform
> 	drivers/media/usb/tm6000
> 	drivers/media/usb/zr364xx
> 	drivers/staging/media/atomisp/pci

This is incomplete. There are two drivers that include videobuf
indirectly:

	include/media/davinci/vpfe_capture.h
	include/media/drv-intf/saa7146_vv.h

I double-checked that DaVinci still uses VB1. There are 
actually two clients for videbuf-dma-contig.c: davinci and fsl-viu.

Those two will be affected, if we don't add pin_user_pages_fast()
support to VB1 or convert them to VB2.

> 
> > But that
> > would be easy to add if this poses a  problem I think - we just need
> > to carry over the pin_user_pages_fast logic from videbuf2, no driver
> > changes required. But of course I don't think we should do that before
> > someone reports the regression, since videobuf1 userptr is doubly
> > deprecated :-)
> 
> I think otherwise. Keeping a broken component at the Kernel is 
> a bad idea. 
> 
> Yet, from my quick search above, it sounds to me that it is time for 
> us to retire the VB1 DMA contig support as a hole, as there's no client 
> for it anymore.
> 
> I'll work on some patches cleaning up the VB1 left overs at
> Kconfig and removing videbuf-dma-contig.c for good, if there's
> no hidden dependency on it.
> 
> 
> Thanks,
> Mauro



Thanks,
Mauro
Marek Szyprowski Oct. 12, 2020, 10:46 a.m. UTC | #20
Hi Jason,

On 09.10.2020 14:48, Jason Gunthorpe wrote:
> On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
>
>> I'm not a mm/ expert, but, from what I understood from Daniel's patch
>> description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> No, it is unconditionally unsafe. The CMA movable mappings are
> specific VMAs that will have bad issues here, but there are other
> types too.

I'm trying to follow this thread, but I really wonder what do you mean 
by CMA movable mappings? If a buffer has been allocated from CMA and 
used for DMA, it won't be moved in the memory. It will stay at the same 
physical memory address all the time until freed by the owner. It just a 
matter of proper usage count tracking to delay freeing if it is still 
used somewhere.

Best regards
Daniel Vetter Oct. 12, 2020, 1:49 p.m. UTC | #21
On Mon, Oct 12, 2020 at 12:47 PM Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
>
> Hi Jason,
>
> On 09.10.2020 14:48, Jason Gunthorpe wrote:
> > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> >
> >> I'm not a mm/ expert, but, from what I understood from Daniel's patch
> >> description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> > No, it is unconditionally unsafe. The CMA movable mappings are
> > specific VMAs that will have bad issues here, but there are other
> > types too.
>
> I'm trying to follow this thread, but I really wonder what do you mean
> by CMA movable mappings? If a buffer has been allocated from CMA and
> used for DMA, it won't be moved in the memory. It will stay at the same
> physical memory address all the time until freed by the owner. It just a
> matter of proper usage count tracking to delay freeing if it is still
> used somewhere.

 Yup. The problem is that this usage count tracking doesn't exist. And
drivers could at least in theory treat CMA like vram and swap buffers
in&out of it, so just refcounting the userspace vma isn't enough. In
practice, right now, it might be enough for CMA drivers though (but
there's more that's possible here).
-Daniel
diff mbox series

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2a16631c1fda..ec8c90928fc9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1653,6 +1653,8 @@  int follow_pte_pmd(struct mm_struct *mm, unsigned long address,
 		   pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp);
 int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 	unsigned long *pfn);
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+		      unsigned long *pfn);
 int follow_phys(struct vm_area_struct *vma, unsigned long address,
 		unsigned int flags, unsigned long *prot, resource_size_t *phys);
 int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index f7cbc4dde0ef..7c7b234ffb24 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4821,7 +4821,12 @@  EXPORT_SYMBOL(follow_pte_pmd);
  * @address: user virtual address
  * @pfn: location to store found PFN
  *
- * Only IO mappings and raw PFN mappings are allowed.
+ * Only IO mappings and raw PFN mappings are allowed. Note that callers must
+ * ensure coherency with pte updates by using a &mmu_notifier to follow updates.
+ * If this is not feasible, or the access to the @pfn is only very short term,
+ * use follow_pte_pmd() instead and hold the pagetable lock for the duration of
+ * the access instead. Any caller not following these requirements must use
+ * unsafe_follow_pfn() instead.
  *
  * Return: zero and the pfn at @pfn on success, -ve otherwise.
  */
@@ -4844,6 +4849,31 @@  int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+	unsigned long *pfn)
+{
+#ifdef CONFIG_STRICT_FOLLOW_PFN
+	pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
+	return -EINVAL;
+#else
+	WARN_ONCE(1, "unsafe follow_pfn usage\n");
+	add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+
+	return follow_pfn(vma, address, pfn);
+#endif
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
 #ifdef CONFIG_HAVE_IOREMAP_PROT
 int follow_phys(struct vm_area_struct *vma,
 		unsigned long address, unsigned int flags,
diff --git a/mm/nommu.c b/mm/nommu.c
index 75a327149af1..3db2910f0d64 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -132,6 +132,23 @@  int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+	unsigned long *pfn)
+{
+	return follow_pfn(vma, address, pfn);
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
 LIST_HEAD(vmap_area_list);
 
 void vfree(const void *addr)
diff --git a/security/Kconfig b/security/Kconfig
index 7561f6f99f1d..48945402e103 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -230,6 +230,19 @@  config STATIC_USERMODEHELPER_PATH
 	  If you wish for all usermode helper programs to be disabled,
 	  specify an empty string here (i.e. "").
 
+config STRICT_FOLLOW_PFN
+	bool "Disable unsafe use of follow_pfn"
+	depends on MMU
+	help
+	  Some functionality in the kernel follows userspace mappings to iomem
+	  ranges in an unsafe matter. Examples include v4l userptr for zero-copy
+	  buffers sharing.
+
+	  If this option is switched on, such access is rejected. Only enable
+	  this option when you must run userspace which requires this.
+
+	  If in doubt, say Y.
+
 source "security/selinux/Kconfig"
 source "security/smack/Kconfig"
 source "security/tomoyo/Kconfig"