[PULL] topic/iomem-mmap-vs-gup

Message ID	YJBHiRiCGzojk25U@phenom.ffwll.local (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=6v1A=J6=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C5A3D611CD Date: Mon, 3 May 2021 21:15:08 +0200 From: Daniel Vetter <daniel.vetter@ffwll.ch> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: DRI Development <dri-devel@lists.freedesktop.org>, LKML <linux-kernel@vger.kernel.org>, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-media@vger.kernel.org, linux-samsung-soc@vger.kernel.org Subject: [PULL] topic/iomem-mmap-vs-gup Message-ID: <YJBHiRiCGzojk25U@phenom.ffwll.local> Mail-Followup-To: Linus Torvalds <torvalds@linux-foundation.org>, DRI Development <dri-devel@lists.freedesktop.org>, LKML <linux-kernel@vger.kernel.org>, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-media@vger.kernel.org, linux-samsung-soc@vger.kernel.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Received-SPF: none (ffwll.ch>: No applicable sender policy available) receiver=imf12; identity=mailfrom; envelope-from="<daniel.vetter@ffwll.ch>"; helo=mail-ed1-f53.google.com; client-ip=209.85.208.53 Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	[PULL] topic/iomem-mmap-vs-gup \| expand [PULL] topic/iomem-mmap-vs-gup

Daniel Vetter May 3, 2021, 7:15 p.m. UTC

Hi Linus,

It's still the same topic branch as last merge window, but the name isn't
fitting all that well anymore :-)

Anyway here's a small pull for you to ponder, now that the big ones are
all through. It's been in -next almost the entire cycle, I've only done
some non-code rebases due to the -rc1 fumble and to fix some commit
message typos.

Christoph Hellwig also looked at these and aside from wanting to outright
remove it all didn't have objections.

topic/iomem-mmap-vs-gup-2021-05-03:
unexport follow_pfn

Follow-up to my pull from last merge window: kvm and vfio lost their
very unsafe use of follow_pfn, this appropriately marks up the very
last user for some userptr-as-buffer use-cases in media. There was
some resistance to outright removing it, maybe we can do this in a few
releases.

Cheers, Daniel

The following changes since commit 0d02ec6b3136c73c09e7859f0d0e4e2c4c07b49b:

  Linux 5.12-rc4 (2021-03-21 14:56:43 -0700)

are available in the Git repository at:

  git://anongit.freedesktop.org/drm/drm tags/topic/iomem-mmap-vs-gup-2021-05-03

for you to fetch changes up to ac8b8400620a4b0d9ca903ee9ad440bec736f5fa:

  mm: unexport follow_pfn (2021-04-08 16:54:38 +0200)

----------------------------------------------------------------
unexport follow_pfn

Follow-up to my pull from last merge window: kvm and vfio lost their
very unsafe use of follow_pfn, this appropriately marks up the very
last user for some userptr-as-buffer use-cases in media. There was
some resistance to outright removing it, maybe we can do this in a few
releases.

----------------------------------------------------------------
Daniel Vetter (3):
      mm: Add unsafe_follow_pfn
      media/videobuf1|2: Mark follow_pfn usage as unsafe
      mm: unexport follow_pfn

 drivers/media/common/videobuf2/frame_vector.c |  2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c |  2 +-
 include/linux/mm.h                            |  4 +--
 mm/memory.c                                   | 46 +++++++++++++++++----------
 mm/nommu.c                                    | 28 ++++++++++++----
 security/Kconfig                              | 13 ++++++++
 6 files changed, 68 insertions(+), 27 deletions(-)

Linus Torvalds May 6, 2021, 10:30 p.m. UTC | #1

[ You had a really odd Reply-to on this one ]

On Mon, May 3, 2021 at 12:15 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> Anyway here's a small pull for you to ponder, now that the big ones are
> all through.

Well, _now_ I'm all caught up. Knock wood. Anyway, time to look at it:

> Follow-up to my pull from last merge window: kvm and vfio lost their
> very unsafe use of follow_pfn, this appropriately marks up the very
> last user for some userptr-as-buffer use-cases in media. There was
> some resistance to outright removing it, maybe we can do this in a few
> releases.

Hmm. So this looks mostly ok to me, although I think the change to the
nommu case is pretty ridiculous.

On nommu, unsafe_follow_pfn() should just be a wrapper around
follow_pfn(). There's no races when you can't remap anything. No?

Do the two media cases even work on nommu?

Finally - did you intend fo this to be a real pull request? Because
the email read to me like "think about this and tell me what you
think" rather than "please pull"..

And I have now fulfilled that "think about and tell me" part ;)

              Linus

Daniel Vetter May 7, 2021, 3:53 p.m. UTC | #2

On Thu, May 06, 2021 at 03:30:45PM -0700, Linus Torvalds wrote:
> [ You had a really odd Reply-to on this one ]
> 
> On Mon, May 3, 2021 at 12:15 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >
> > Anyway here's a small pull for you to ponder, now that the big ones are
> > all through.
> 
> Well, _now_ I'm all caught up. Knock wood. Anyway, time to look at it:
> 
> > Follow-up to my pull from last merge window: kvm and vfio lost their
> > very unsafe use of follow_pfn, this appropriately marks up the very
> > last user for some userptr-as-buffer use-cases in media. There was
> > some resistance to outright removing it, maybe we can do this in a few
> > releases.
> 
> Hmm. So this looks mostly ok to me, although I think the change to the
> nommu case is pretty ridiculous.
> 
> On nommu, unsafe_follow_pfn() should just be a wrapper around
> follow_pfn(). There's no races when you can't remap anything. No?
> 
> Do the two media cases even work on nommu?

So personally I think the entire thing should just be thrown out, it's all
levels of scary and we have zero-copy buffer sharing done properly with
dma-buf since years in v4l.

Iirc I've had that in some early versions of all this, but got nacked by
some, supported by others from media as something that needs to go away.
This here is now the next best thing as a fishing expedition to figure out
whether there's actually anyone left who cares or not.

That's also why the nommu case has the same checks, even though it's all
fine there.

Hopefully the answer is "no users" and then we could remove this in a year
or two.

> Finally - did you intend fo this to be a real pull request? Because
> the email read to me like "think about this and tell me what you
> think" rather than "please pull"..
> 
> And I have now fulfilled that "think about and tell me" part ;)

Ah yes I rushed this a bit between appreciating some local fires here at
work and left out the instructions :-)

Please pull or tell me whether you want the outright removal (like
Christoph Hellwig also wants).

Cheers, Daniel

Linus Torvalds May 8, 2021, 4:46 p.m. UTC | #3

[ Daniel, please fix your broken email setup. You have this insane
"Reply-to" list that just duplicates all the participants. Very
broken, very annoying ]

On Fri, May 7, 2021 at 8:53 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> So personally I think the entire thing should just be thrown out, it's all
> levels of scary and we have zero-copy buffer sharing done properly with
> dma-buf since years in v4l.

So I've been looking at this more, and the more I look at it, the less
I like this series.

I think the proper fix is to just fix things.

For example, I'm looking at the v4l users of follow_pfn(), and I find
get_vaddr_frames(), which is just broken.

Fine, we know users are broken, but look at what appears to be the
main user of get_vaddr_frames(): vb2_dc_get_userptr().

What does that function do? Immediately after doing
get_vaddr_frames(), it tries to turn those pfn's into page pointers,
and then do sg_alloc_table_from_pages() on the end result.

Yes, yes, it also has that "ok, that failed, let's try to see if it's
some physically contiguous mapping" and do DMA directly to those
physical pages, but the point there is that that only happens when
they weren't normal pages to begin with.

So thew *fix* for at least that path is to

 (a) just use the regular pin_user_pages() for normal pages

 (b) perhaps keep the follow_pfn() case, but then limit it to that "no
page backing" and that physical pages case.

And honestly, the "struct frame_vector" thing already *has* support
for this, and the problem is simply that the v4l code has decided to
have the callers ask for pfn's rather than have the callers just ask
for a frame-vector that is either "pfn's with no paeg backing" _or_
"page list with proper page reference counting".

So this series of yours that just disables follow_pfn() actually seems
very wrong.

I think follow_pfn() is ok for the actual "this is not a 'struct page'
backed area", and disabling that case is wrong even going forward.

End result, I think the proper model is:

 - keep follow_pfn(), but limit it to the "not vm_normal_page()" case,
and return error for some real page mapping

 - make the get_vaddr_frames() first try "pin_user_pages()" (and
create a page array) and fall back to "follow_pfn()" if that fails (or
the other way around). Set the

IOW, get_vaddr_frames() would just do

        vec->got_ref = is_pages;
        vec->is_pfns = !is_pages;

and everything would just work out - the v4l code seems to already
have all the support for "it's a ofn array" vs "it's properly
refcounted pages".

So the only case we should disallow is the mixed case, that the v4l
code already seems to not be able to handle anyway (and honestly, it
looks like "got_ref/is_pfns" should be just one flag - they always
have to have the opposite values).

So I think this "unsafe_follow_pfn()" halfway step is actively wrong.
It doesn't move us forward. Quite the reverse. It just makes the
proper fix harder.

End result: not pulling it, unless somebody can explain to me in small
words why I'm wrong and have the mental capacity of a damaged rodent.

                Linus

Daniel Vetter May 10, 2021, 7:16 a.m. UTC | #4

On Sat, May 8, 2021 at 6:47 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> [ Daniel, please fix your broken email setup. You have this insane
> "Reply-to" list that just duplicates all the participants. Very
> broken, very annoying ]
>
> On Fri, May 7, 2021 at 8:53 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > So personally I think the entire thing should just be thrown out, it's all
> > levels of scary and we have zero-copy buffer sharing done properly with
> > dma-buf since years in v4l.
>
> So I've been looking at this more, and the more I look at it, the less
> I like this series.
>
> I think the proper fix is to just fix things.
>
> For example, I'm looking at the v4l users of follow_pfn(), and I find
> get_vaddr_frames(), which is just broken.
>
> Fine, we know users are broken, but look at what appears to be the
> main user of get_vaddr_frames(): vb2_dc_get_userptr().
>
> What does that function do? Immediately after doing
> get_vaddr_frames(), it tries to turn those pfn's into page pointers,
> and then do sg_alloc_table_from_pages() on the end result.
>
> Yes, yes, it also has that "ok, that failed, let's try to see if it's
> some physically contiguous mapping" and do DMA directly to those
> physical pages, but the point there is that that only happens when
> they weren't normal pages to begin with.
>
> So thew *fix* for at least that path is to
>
>  (a) just use the regular pin_user_pages() for normal pages

Yup, the "rip it all out" solution amounts to replacing this all,
including frame_vector helper code, with pin_user_pages.

>  (b) perhaps keep the follow_pfn() case, but then limit it to that "no
> page backing" and that physical pages case.
>
> And honestly, the "struct frame_vector" thing already *has* support
> for this, and the problem is simply that the v4l code has decided to
> have the callers ask for pfn's rather than have the callers just ask
> for a frame-vector that is either "pfn's with no paeg backing" _or_
> "page list with proper page reference counting".
>
> So this series of yours that just disables follow_pfn() actually seems
> very wrong.
>
> I think follow_pfn() is ok for the actual "this is not a 'struct page'
> backed area", and disabling that case is wrong even going forward.

I think this is where you miss a bit: We very much also want to stop
pinned userptr to physcial addresses that aren't page backed. This
might very well be some gpu pci bar, backed by vram, and vram is
managed as dynamically as struct page backed stuff (and there's all
the hmm dreams to make it actually use struct page, but that's another
story).

So by the time the media hw access that vb2 userptr buffer there's
good chances someone else's data is now there. If vb2 would have a
mmu_notifier subscription or similar to follow pte updates the gpu
driver does, then it would be all fine. But this vb2 model is a pinned
one, hence not fixable.

The other more practical issue is that peer2peer dma on modern hw
needs quite some setup. Just taking a cpu pfn and hoping that matches
the bus addr your device would need is a bit optimistic.

One theoretical & proper fix I discussed with Jason Gunthrope would be
to replace the pfn lookup with a lookup for a struct dma_buf. Which
has proper interfaces for pinning gpu buffers, figuring out p2p dma or
just figuring out the right dma mapping and all that. Idea was to make
a direct vma->dma_buf lookup or something like that. But consensus is
also that outside of gpus and very closely related things using
dma_buf is not a great idea, because there's a few too many silly
rules involved. For everyone else it's better to make the struct page
managed device memory stuff work most likely.

> End result, I think the proper model is:
>
>  - keep follow_pfn(), but limit it to the "not vm_normal_page()" case,
> and return error for some real page mapping
>
>  - make the get_vaddr_frames() first try "pin_user_pages()" (and
> create a page array) and fall back to "follow_pfn()" if that fails (or
> the other way around). Set the
>
> IOW, get_vaddr_frames() would just do
>
>         vec->got_ref = is_pages;
>         vec->is_pfns = !is_pages;
>
> and everything would just work out - the v4l code seems to already
> have all the support for "it's a ofn array" vs "it's properly
> refcounted pages".
>
> So the only case we should disallow is the mixed case, that the v4l
> code already seems to not be able to handle anyway (and honestly, it
> looks like "got_ref/is_pfns" should be just one flag - they always
> have to have the opposite values).
>
> So I think this "unsafe_follow_pfn()" halfway step is actively wrong.
> It doesn't move us forward. Quite the reverse. It just makes the
> proper fix harder.
>
> End result: not pulling it, unless somebody can explain to me in small
> words why I'm wrong and have the mental capacity of a damaged rodent.

No rodents I think, just more backstory of how this all fits. tldr;
pin_user_pages is the only safe use of this vb2 userptr thing.
-Daniel

Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Christoph Hellwig May 10, 2021, 7:29 a.m. UTC | #5

On Mon, May 10, 2021 at 09:16:58AM +0200, Daniel Vetter wrote:
> > End result: not pulling it, unless somebody can explain to me in small
> > words why I'm wrong and have the mental capacity of a damaged rodent.
> 
> No rodents I think, just more backstory of how this all fits. tldr;
> pin_user_pages is the only safe use of this vb2 userptr thing.

Yes, which is why I advocate for just ripping the follow_pfn path
out entirely.  It could have been used for crazy ad dangerous peer to
peer transfers outside of any infrastructure making it safe, or for
pre-CMA kernel memory carveouts for lage contiguous memory allocations
(which are pretty broken by design as well).  So IMHO the only sensible
thing is to remove this cruft entirely, and if it breaks a currently
working setup (which I think is unlikely) we'll have to make sure it
can work the proper way.

Jason Gunthorpe May 10, 2021, 1:50 p.m. UTC | #6

On Sat, May 08, 2021 at 09:46:41AM -0700, Linus Torvalds wrote:

> I think follow_pfn() is ok for the actual "this is not a 'struct page'
> backed area", and disabling that case is wrong even going forward.

Every place we've audited using follow_pfn() has been shown to have
some use-after-free bugs like Daniel describes, and a failure to check
permissions bug too.

All the other follow_pfn() users were moved to follow_pte() to fix the
permissions check and this shifts the use-after-free bug away from
being inside an MM API and into the caller mis-using the API by, say,
extracting and using the PFN outside the pte lock.

eg look at how VFIO wrongly uses follow_pte():

static int follow_fault_pfn()
        ret = follow_pte(vma->vm_mm, vaddr, &ptep, &ptl);
                *pfn = pte_pfn(*ptep);
        pte_unmap_unlock(ptep, ptl);

        // no protection that pte_pfn() is still valid!
        use_pfn(*pfn)

v4l is the only user that still has the missing permissions check
security bug too - so there is no outcome that should keep
follow_pfn() in the tree.

At worst v4l should change to follow_pte() and use it wrongly like
VFIO. At best we should delete all the v4l stuff.

Daniel I suppose we missed this relation to follow_pte(), so I agree
that keeping a unsafe_follow_pfn() around is not good.

Regards,
Jason

Daniel Vetter May 10, 2021, 2:55 p.m. UTC | #7

On Mon, May 10, 2021 at 3:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Sat, May 08, 2021 at 09:46:41AM -0700, Linus Torvalds wrote:
>
> > I think follow_pfn() is ok for the actual "this is not a 'struct page'
> > backed area", and disabling that case is wrong even going forward.
>
> Every place we've audited using follow_pfn() has been shown to have
> some use-after-free bugs like Daniel describes, and a failure to check
> permissions bug too.
>
> All the other follow_pfn() users were moved to follow_pte() to fix the
> permissions check and this shifts the use-after-free bug away from
> being inside an MM API and into the caller mis-using the API by, say,
> extracting and using the PFN outside the pte lock.
>
> eg look at how VFIO wrongly uses follow_pte():
>
> static int follow_fault_pfn()
>         ret = follow_pte(vma->vm_mm, vaddr, &ptep, &ptl);
>                 *pfn = pte_pfn(*ptep);
>         pte_unmap_unlock(ptep, ptl);
>
>         // no protection that pte_pfn() is still valid!
>         use_pfn(*pfn)
>
> v4l is the only user that still has the missing permissions check
> security bug too - so there is no outcome that should keep
> follow_pfn() in the tree.
>
> At worst v4l should change to follow_pte() and use it wrongly like
> VFIO. At best we should delete all the v4l stuff.

yeah vfio is still broken for the case I care about. I think there's
also some questions open still about whether kvm really uses
mmu_notifier in all cases correctly, but iirc the one exception was
s390, which didn't have pci mmap and that's how it gets away with that
specific problem.

> Daniel I suppose we missed this relation to follow_pte(), so I agree
> that keeping a unsafe_follow_pfn() around is not good.

tbh I never really got the additional issue with the missing write
checks. That users of follow_pfn (or well follow_pte + immediate lock
dropping like vfio) don't subscribe to the pte updates in general is
the bug I'm seeing. That v4l also glosses over the read/write access
stuff is kinda just the icing on the cake :-) It's pretty well broken
even if it would check that.
-Daniel

Jason Gunthorpe May 10, 2021, 3:34 p.m. UTC | #8

On Mon, May 10, 2021 at 04:55:39PM +0200, Daniel Vetter wrote:

> yeah vfio is still broken for the case I care about. I think there's
> also some questions open still about whether kvm really uses
> mmu_notifier in all cases correctly, 

IIRC kvm doesn't either.

> > Daniel I suppose we missed this relation to follow_pte(), so I agree
> > that keeping a unsafe_follow_pfn() around is not good.
> 
> tbh I never really got the additional issue with the missing write
> checks. That users of follow_pfn (or well follow_pte + immediate lock
> dropping like vfio) don't subscribe to the pte updates in general is
> the bug I'm seeing. That v4l also glosses over the read/write access
> stuff is kinda just the icing on the cake :-) It's pretty well broken
> even if it would check that.

It is just severity. Exploiting the use after free bug is somewhat
harder, exploiting the 'you can write to non-page write protected
memory' bug is not so hard.

Jason

Sean Christopherson May 10, 2021, 5:57 p.m. UTC | #9

+Paolo

On Mon, May 10, 2021, Jason Gunthorpe wrote:
> On Mon, May 10, 2021 at 04:55:39PM +0200, Daniel Vetter wrote:
> 
> > yeah vfio is still broken for the case I care about. I think there's
> > also some questions open still about whether kvm really uses
> > mmu_notifier in all cases correctly, 
> 
> IIRC kvm doesn't either.

Yep, KVM on x86 has a non-trivial number of flows that don't properly hook into
the mmu_notifier.  Paolo is working on fixing the problem, but I believe the
rework won't be ready until 5.14.

Paolo Bonzini May 10, 2021, 6:13 p.m. UTC | #10

On 10/05/21 19:57, Sean Christopherson wrote:
> +Paolo
> 
> On Mon, May 10, 2021, Jason Gunthorpe wrote:
>> On Mon, May 10, 2021 at 04:55:39PM +0200, Daniel Vetter wrote:
>>
>>> yeah vfio is still broken for the case I care about. I think there's
>>> also some questions open still about whether kvm really uses
>>> mmu_notifier in all cases correctly,
>>
>> IIRC kvm doesn't either.
> 
> Yep, KVM on x86 has a non-trivial number of flows that don't properly hook into
> the mmu_notifier.  Paolo is working on fixing the problem, but I believe the
> rework won't be ready until 5.14.

Yeah, I like the way it's coming, but I'm at 20-ish patches and counting.

Paolo

Daniel Vetter May 17, 2021, 3:29 p.m. UTC | #11

On Mon, May 10, 2021 at 9:30 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Mon, May 10, 2021 at 09:16:58AM +0200, Daniel Vetter wrote:
> > > End result: not pulling it, unless somebody can explain to me in small
> > > words why I'm wrong and have the mental capacity of a damaged rodent.
> >
> > No rodents I think, just more backstory of how this all fits. tldr;
> > pin_user_pages is the only safe use of this vb2 userptr thing.
>
> Yes, which is why I advocate for just ripping the follow_pfn path
> out entirely.  It could have been used for crazy ad dangerous peer to
> peer transfers outside of any infrastructure making it safe, or for
> pre-CMA kernel memory carveouts for lage contiguous memory allocations
> (which are pretty broken by design as well).  So IMHO the only sensible
> thing is to remove this cruft entirely, and if it breaks a currently
> working setup (which I think is unlikely) we'll have to make sure it
> can work the proper way.

Since I'm not getting any cozy consenus vibes here on any option I
think I'll just drop this.

Stephen, can you pls drop

git://anongit.freedesktop.org/drm/drm topic/iomem-mmap-vs-gup

from linux-next? It's not going anywhere. I'll also go ahead and
delete the branch, to make sure you catch this update :-)

Thanks, Daniel

Stephen Rothwell May 17, 2021, 10:04 p.m. UTC | #12

Hi Daniel,

On Mon, 17 May 2021 17:29:35 +0200 Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Mon, May 10, 2021 at 9:30 AM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Mon, May 10, 2021 at 09:16:58AM +0200, Daniel Vetter wrote:  
> > > > End result: not pulling it, unless somebody can explain to me in small
> > > > words why I'm wrong and have the mental capacity of a damaged rodent.  
> > >
> > > No rodents I think, just more backstory of how this all fits. tldr;
> > > pin_user_pages is the only safe use of this vb2 userptr thing.  
> >
> > Yes, which is why I advocate for just ripping the follow_pfn path
> > out entirely.  It could have been used for crazy ad dangerous peer to
> > peer transfers outside of any infrastructure making it safe, or for
> > pre-CMA kernel memory carveouts for lage contiguous memory allocations
> > (which are pretty broken by design as well).  So IMHO the only sensible
> > thing is to remove this cruft entirely, and if it breaks a currently
> > working setup (which I think is unlikely) we'll have to make sure it
> > can work the proper way.  
> 
> Since I'm not getting any cozy consenus vibes here on any option I
> think I'll just drop this.
> 
> Stephen, can you pls drop
> 
> git://anongit.freedesktop.org/drm/drm topic/iomem-mmap-vs-gup
> 
> from linux-next? It's not going anywhere. I'll also go ahead and
> delete the branch, to make sure you catch this update :-)

I have dropped this now.  Thanks for letting me know.

[PULL] topic/iomem-mmap-vs-gup

Pull-request

Message

Comments