Message ID | YJBHiRiCGzojk25U@phenom.ffwll.local (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [PULL] topic/iomem-mmap-vs-gup | expand |
[ You had a really odd Reply-to on this one ] On Mon, May 3, 2021 at 12:15 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > Anyway here's a small pull for you to ponder, now that the big ones are > all through. Well, _now_ I'm all caught up. Knock wood. Anyway, time to look at it: > Follow-up to my pull from last merge window: kvm and vfio lost their > very unsafe use of follow_pfn, this appropriately marks up the very > last user for some userptr-as-buffer use-cases in media. There was > some resistance to outright removing it, maybe we can do this in a few > releases. Hmm. So this looks mostly ok to me, although I think the change to the nommu case is pretty ridiculous. On nommu, unsafe_follow_pfn() should just be a wrapper around follow_pfn(). There's no races when you can't remap anything. No? Do the two media cases even work on nommu? Finally - did you intend fo this to be a real pull request? Because the email read to me like "think about this and tell me what you think" rather than "please pull".. And I have now fulfilled that "think about and tell me" part ;) Linus
On Thu, May 06, 2021 at 03:30:45PM -0700, Linus Torvalds wrote: > [ You had a really odd Reply-to on this one ] > > On Mon, May 3, 2021 at 12:15 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > Anyway here's a small pull for you to ponder, now that the big ones are > > all through. > > Well, _now_ I'm all caught up. Knock wood. Anyway, time to look at it: > > > Follow-up to my pull from last merge window: kvm and vfio lost their > > very unsafe use of follow_pfn, this appropriately marks up the very > > last user for some userptr-as-buffer use-cases in media. There was > > some resistance to outright removing it, maybe we can do this in a few > > releases. > > Hmm. So this looks mostly ok to me, although I think the change to the > nommu case is pretty ridiculous. > > On nommu, unsafe_follow_pfn() should just be a wrapper around > follow_pfn(). There's no races when you can't remap anything. No? > > Do the two media cases even work on nommu? So personally I think the entire thing should just be thrown out, it's all levels of scary and we have zero-copy buffer sharing done properly with dma-buf since years in v4l. Iirc I've had that in some early versions of all this, but got nacked by some, supported by others from media as something that needs to go away. This here is now the next best thing as a fishing expedition to figure out whether there's actually anyone left who cares or not. That's also why the nommu case has the same checks, even though it's all fine there. Hopefully the answer is "no users" and then we could remove this in a year or two. > Finally - did you intend fo this to be a real pull request? Because > the email read to me like "think about this and tell me what you > think" rather than "please pull".. > > And I have now fulfilled that "think about and tell me" part ;) Ah yes I rushed this a bit between appreciating some local fires here at work and left out the instructions :-) Please pull or tell me whether you want the outright removal (like Christoph Hellwig also wants). Cheers, Daniel
[ Daniel, please fix your broken email setup. You have this insane "Reply-to" list that just duplicates all the participants. Very broken, very annoying ] On Fri, May 7, 2021 at 8:53 AM Daniel Vetter <daniel@ffwll.ch> wrote: > > So personally I think the entire thing should just be thrown out, it's all > levels of scary and we have zero-copy buffer sharing done properly with > dma-buf since years in v4l. So I've been looking at this more, and the more I look at it, the less I like this series. I think the proper fix is to just fix things. For example, I'm looking at the v4l users of follow_pfn(), and I find get_vaddr_frames(), which is just broken. Fine, we know users are broken, but look at what appears to be the main user of get_vaddr_frames(): vb2_dc_get_userptr(). What does that function do? Immediately after doing get_vaddr_frames(), it tries to turn those pfn's into page pointers, and then do sg_alloc_table_from_pages() on the end result. Yes, yes, it also has that "ok, that failed, let's try to see if it's some physically contiguous mapping" and do DMA directly to those physical pages, but the point there is that that only happens when they weren't normal pages to begin with. So thew *fix* for at least that path is to (a) just use the regular pin_user_pages() for normal pages (b) perhaps keep the follow_pfn() case, but then limit it to that "no page backing" and that physical pages case. And honestly, the "struct frame_vector" thing already *has* support for this, and the problem is simply that the v4l code has decided to have the callers ask for pfn's rather than have the callers just ask for a frame-vector that is either "pfn's with no paeg backing" _or_ "page list with proper page reference counting". So this series of yours that just disables follow_pfn() actually seems very wrong. I think follow_pfn() is ok for the actual "this is not a 'struct page' backed area", and disabling that case is wrong even going forward. End result, I think the proper model is: - keep follow_pfn(), but limit it to the "not vm_normal_page()" case, and return error for some real page mapping - make the get_vaddr_frames() first try "pin_user_pages()" (and create a page array) and fall back to "follow_pfn()" if that fails (or the other way around). Set the IOW, get_vaddr_frames() would just do vec->got_ref = is_pages; vec->is_pfns = !is_pages; and everything would just work out - the v4l code seems to already have all the support for "it's a ofn array" vs "it's properly refcounted pages". So the only case we should disallow is the mixed case, that the v4l code already seems to not be able to handle anyway (and honestly, it looks like "got_ref/is_pfns" should be just one flag - they always have to have the opposite values). So I think this "unsafe_follow_pfn()" halfway step is actively wrong. It doesn't move us forward. Quite the reverse. It just makes the proper fix harder. End result: not pulling it, unless somebody can explain to me in small words why I'm wrong and have the mental capacity of a damaged rodent. Linus
On Sat, May 8, 2021 at 6:47 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > [ Daniel, please fix your broken email setup. You have this insane > "Reply-to" list that just duplicates all the participants. Very > broken, very annoying ] > > On Fri, May 7, 2021 at 8:53 AM Daniel Vetter <daniel@ffwll.ch> wrote: > > > > So personally I think the entire thing should just be thrown out, it's all > > levels of scary and we have zero-copy buffer sharing done properly with > > dma-buf since years in v4l. > > So I've been looking at this more, and the more I look at it, the less > I like this series. > > I think the proper fix is to just fix things. > > For example, I'm looking at the v4l users of follow_pfn(), and I find > get_vaddr_frames(), which is just broken. > > Fine, we know users are broken, but look at what appears to be the > main user of get_vaddr_frames(): vb2_dc_get_userptr(). > > What does that function do? Immediately after doing > get_vaddr_frames(), it tries to turn those pfn's into page pointers, > and then do sg_alloc_table_from_pages() on the end result. > > Yes, yes, it also has that "ok, that failed, let's try to see if it's > some physically contiguous mapping" and do DMA directly to those > physical pages, but the point there is that that only happens when > they weren't normal pages to begin with. > > So thew *fix* for at least that path is to > > (a) just use the regular pin_user_pages() for normal pages Yup, the "rip it all out" solution amounts to replacing this all, including frame_vector helper code, with pin_user_pages. > (b) perhaps keep the follow_pfn() case, but then limit it to that "no > page backing" and that physical pages case. > > And honestly, the "struct frame_vector" thing already *has* support > for this, and the problem is simply that the v4l code has decided to > have the callers ask for pfn's rather than have the callers just ask > for a frame-vector that is either "pfn's with no paeg backing" _or_ > "page list with proper page reference counting". > > So this series of yours that just disables follow_pfn() actually seems > very wrong. > > I think follow_pfn() is ok for the actual "this is not a 'struct page' > backed area", and disabling that case is wrong even going forward. I think this is where you miss a bit: We very much also want to stop pinned userptr to physcial addresses that aren't page backed. This might very well be some gpu pci bar, backed by vram, and vram is managed as dynamically as struct page backed stuff (and there's all the hmm dreams to make it actually use struct page, but that's another story). So by the time the media hw access that vb2 userptr buffer there's good chances someone else's data is now there. If vb2 would have a mmu_notifier subscription or similar to follow pte updates the gpu driver does, then it would be all fine. But this vb2 model is a pinned one, hence not fixable. The other more practical issue is that peer2peer dma on modern hw needs quite some setup. Just taking a cpu pfn and hoping that matches the bus addr your device would need is a bit optimistic. One theoretical & proper fix I discussed with Jason Gunthrope would be to replace the pfn lookup with a lookup for a struct dma_buf. Which has proper interfaces for pinning gpu buffers, figuring out p2p dma or just figuring out the right dma mapping and all that. Idea was to make a direct vma->dma_buf lookup or something like that. But consensus is also that outside of gpus and very closely related things using dma_buf is not a great idea, because there's a few too many silly rules involved. For everyone else it's better to make the struct page managed device memory stuff work most likely. > End result, I think the proper model is: > > - keep follow_pfn(), but limit it to the "not vm_normal_page()" case, > and return error for some real page mapping > > - make the get_vaddr_frames() first try "pin_user_pages()" (and > create a page array) and fall back to "follow_pfn()" if that fails (or > the other way around). Set the > > IOW, get_vaddr_frames() would just do > > vec->got_ref = is_pages; > vec->is_pfns = !is_pages; > > and everything would just work out - the v4l code seems to already > have all the support for "it's a ofn array" vs "it's properly > refcounted pages". > > So the only case we should disallow is the mixed case, that the v4l > code already seems to not be able to handle anyway (and honestly, it > looks like "got_ref/is_pfns" should be just one flag - they always > have to have the opposite values). > > So I think this "unsafe_follow_pfn()" halfway step is actively wrong. > It doesn't move us forward. Quite the reverse. It just makes the > proper fix harder. > > End result: not pulling it, unless somebody can explain to me in small > words why I'm wrong and have the mental capacity of a damaged rodent. No rodents I think, just more backstory of how this all fits. tldr; pin_user_pages is the only safe use of this vb2 userptr thing. -Daniel Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Mon, May 10, 2021 at 09:16:58AM +0200, Daniel Vetter wrote: > > End result: not pulling it, unless somebody can explain to me in small > > words why I'm wrong and have the mental capacity of a damaged rodent. > > No rodents I think, just more backstory of how this all fits. tldr; > pin_user_pages is the only safe use of this vb2 userptr thing. Yes, which is why I advocate for just ripping the follow_pfn path out entirely. It could have been used for crazy ad dangerous peer to peer transfers outside of any infrastructure making it safe, or for pre-CMA kernel memory carveouts for lage contiguous memory allocations (which are pretty broken by design as well). So IMHO the only sensible thing is to remove this cruft entirely, and if it breaks a currently working setup (which I think is unlikely) we'll have to make sure it can work the proper way.
On Sat, May 08, 2021 at 09:46:41AM -0700, Linus Torvalds wrote: > I think follow_pfn() is ok for the actual "this is not a 'struct page' > backed area", and disabling that case is wrong even going forward. Every place we've audited using follow_pfn() has been shown to have some use-after-free bugs like Daniel describes, and a failure to check permissions bug too. All the other follow_pfn() users were moved to follow_pte() to fix the permissions check and this shifts the use-after-free bug away from being inside an MM API and into the caller mis-using the API by, say, extracting and using the PFN outside the pte lock. eg look at how VFIO wrongly uses follow_pte(): static int follow_fault_pfn() ret = follow_pte(vma->vm_mm, vaddr, &ptep, &ptl); *pfn = pte_pfn(*ptep); pte_unmap_unlock(ptep, ptl); // no protection that pte_pfn() is still valid! use_pfn(*pfn) v4l is the only user that still has the missing permissions check security bug too - so there is no outcome that should keep follow_pfn() in the tree. At worst v4l should change to follow_pte() and use it wrongly like VFIO. At best we should delete all the v4l stuff. Daniel I suppose we missed this relation to follow_pte(), so I agree that keeping a unsafe_follow_pfn() around is not good. Regards, Jason
On Mon, May 10, 2021 at 3:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Sat, May 08, 2021 at 09:46:41AM -0700, Linus Torvalds wrote: > > > I think follow_pfn() is ok for the actual "this is not a 'struct page' > > backed area", and disabling that case is wrong even going forward. > > Every place we've audited using follow_pfn() has been shown to have > some use-after-free bugs like Daniel describes, and a failure to check > permissions bug too. > > All the other follow_pfn() users were moved to follow_pte() to fix the > permissions check and this shifts the use-after-free bug away from > being inside an MM API and into the caller mis-using the API by, say, > extracting and using the PFN outside the pte lock. > > eg look at how VFIO wrongly uses follow_pte(): > > static int follow_fault_pfn() > ret = follow_pte(vma->vm_mm, vaddr, &ptep, &ptl); > *pfn = pte_pfn(*ptep); > pte_unmap_unlock(ptep, ptl); > > // no protection that pte_pfn() is still valid! > use_pfn(*pfn) > > v4l is the only user that still has the missing permissions check > security bug too - so there is no outcome that should keep > follow_pfn() in the tree. > > At worst v4l should change to follow_pte() and use it wrongly like > VFIO. At best we should delete all the v4l stuff. yeah vfio is still broken for the case I care about. I think there's also some questions open still about whether kvm really uses mmu_notifier in all cases correctly, but iirc the one exception was s390, which didn't have pci mmap and that's how it gets away with that specific problem. > Daniel I suppose we missed this relation to follow_pte(), so I agree > that keeping a unsafe_follow_pfn() around is not good. tbh I never really got the additional issue with the missing write checks. That users of follow_pfn (or well follow_pte + immediate lock dropping like vfio) don't subscribe to the pte updates in general is the bug I'm seeing. That v4l also glosses over the read/write access stuff is kinda just the icing on the cake :-) It's pretty well broken even if it would check that. -Daniel
On Mon, May 10, 2021 at 04:55:39PM +0200, Daniel Vetter wrote: > yeah vfio is still broken for the case I care about. I think there's > also some questions open still about whether kvm really uses > mmu_notifier in all cases correctly, IIRC kvm doesn't either. > > Daniel I suppose we missed this relation to follow_pte(), so I agree > > that keeping a unsafe_follow_pfn() around is not good. > > tbh I never really got the additional issue with the missing write > checks. That users of follow_pfn (or well follow_pte + immediate lock > dropping like vfio) don't subscribe to the pte updates in general is > the bug I'm seeing. That v4l also glosses over the read/write access > stuff is kinda just the icing on the cake :-) It's pretty well broken > even if it would check that. It is just severity. Exploiting the use after free bug is somewhat harder, exploiting the 'you can write to non-page write protected memory' bug is not so hard. Jason
+Paolo On Mon, May 10, 2021, Jason Gunthorpe wrote: > On Mon, May 10, 2021 at 04:55:39PM +0200, Daniel Vetter wrote: > > > yeah vfio is still broken for the case I care about. I think there's > > also some questions open still about whether kvm really uses > > mmu_notifier in all cases correctly, > > IIRC kvm doesn't either. Yep, KVM on x86 has a non-trivial number of flows that don't properly hook into the mmu_notifier. Paolo is working on fixing the problem, but I believe the rework won't be ready until 5.14.
On 10/05/21 19:57, Sean Christopherson wrote: > +Paolo > > On Mon, May 10, 2021, Jason Gunthorpe wrote: >> On Mon, May 10, 2021 at 04:55:39PM +0200, Daniel Vetter wrote: >> >>> yeah vfio is still broken for the case I care about. I think there's >>> also some questions open still about whether kvm really uses >>> mmu_notifier in all cases correctly, >> >> IIRC kvm doesn't either. > > Yep, KVM on x86 has a non-trivial number of flows that don't properly hook into > the mmu_notifier. Paolo is working on fixing the problem, but I believe the > rework won't be ready until 5.14. Yeah, I like the way it's coming, but I'm at 20-ish patches and counting. Paolo
On Mon, May 10, 2021 at 9:30 AM Christoph Hellwig <hch@infradead.org> wrote: > > On Mon, May 10, 2021 at 09:16:58AM +0200, Daniel Vetter wrote: > > > End result: not pulling it, unless somebody can explain to me in small > > > words why I'm wrong and have the mental capacity of a damaged rodent. > > > > No rodents I think, just more backstory of how this all fits. tldr; > > pin_user_pages is the only safe use of this vb2 userptr thing. > > Yes, which is why I advocate for just ripping the follow_pfn path > out entirely. It could have been used for crazy ad dangerous peer to > peer transfers outside of any infrastructure making it safe, or for > pre-CMA kernel memory carveouts for lage contiguous memory allocations > (which are pretty broken by design as well). So IMHO the only sensible > thing is to remove this cruft entirely, and if it breaks a currently > working setup (which I think is unlikely) we'll have to make sure it > can work the proper way. Since I'm not getting any cozy consenus vibes here on any option I think I'll just drop this. Stephen, can you pls drop git://anongit.freedesktop.org/drm/drm topic/iomem-mmap-vs-gup from linux-next? It's not going anywhere. I'll also go ahead and delete the branch, to make sure you catch this update :-) Thanks, Daniel
Hi Daniel, On Mon, 17 May 2021 17:29:35 +0200 Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > On Mon, May 10, 2021 at 9:30 AM Christoph Hellwig <hch@infradead.org> wrote: > > > > On Mon, May 10, 2021 at 09:16:58AM +0200, Daniel Vetter wrote: > > > > End result: not pulling it, unless somebody can explain to me in small > > > > words why I'm wrong and have the mental capacity of a damaged rodent. > > > > > > No rodents I think, just more backstory of how this all fits. tldr; > > > pin_user_pages is the only safe use of this vb2 userptr thing. > > > > Yes, which is why I advocate for just ripping the follow_pfn path > > out entirely. It could have been used for crazy ad dangerous peer to > > peer transfers outside of any infrastructure making it safe, or for > > pre-CMA kernel memory carveouts for lage contiguous memory allocations > > (which are pretty broken by design as well). So IMHO the only sensible > > thing is to remove this cruft entirely, and if it breaks a currently > > working setup (which I think is unlikely) we'll have to make sure it > > can work the proper way. > > Since I'm not getting any cozy consenus vibes here on any option I > think I'll just drop this. > > Stephen, can you pls drop > > git://anongit.freedesktop.org/drm/drm topic/iomem-mmap-vs-gup > > from linux-next? It's not going anywhere. I'll also go ahead and > delete the branch, to make sure you catch this update :-) I have dropped this now. Thanks for letting me know.