Message ID | 20210205103259.42866-1-pbonzini@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | KVM: do not assume PTE is writable after follow_pfn | expand |
On Fri, Feb 05, 2021 at 05:32:57AM -0500, Paolo Bonzini wrote: > This series is the first step towards fixing KVM's usage of follow_pfn. > The immediate fix here is that KVM is not checking the writability of > the PFN, which actually dates back to way before the introduction of > follow_pfn in commit add6a0cd1c5b ("KVM: MMU: try to fix up page faults > before giving up", 2016-07-05). There are more changes needed to > invalidate gfn-to-pfn caches from MMU notifiers, but this issue will > be tackled later. > > A more fundamental issue however is that the follow_pfn function is > basically impossible to use correctly. Almost all users for example > are assuming that the page is writable; KVM was not alone in this > mistake. follow_pte, despite not being exported for modules, is a > far saner API. Therefore, patch 1 simplifies follow_pte a bit and > makes it available to modules. > > Please review and possibly ack for inclusion in the KVM tree, > thanks! FWIW, the patches look correct to me (if with patch 2 report fixed): Reviewed-by: Peter Xu <peterx@redhat.com> But I do have a question on why dax as the only user needs to pass in the notifier to follow_pte() for initialization. Indeed there're a difference on start/end init of the notifier depending on whether it's a huge pmd but since there's the pmdp passed over too so I assume the caller should know how to init the notifier anyways. The thing is at least in current code we could send meaningless notifiers, e.g., in follow_pte(): if (range) { mmu_notifier_range_init(range, MMU_NOTIFY_CLEAR, 0, NULL, mm, address & PAGE_MASK, (address & PAGE_MASK) + PAGE_SIZE); mmu_notifier_invalidate_range_start(range); } ptep = pte_offset_map_lock(mm, pmd, address, ptlp); if (!pte_present(*ptep)) goto unlock; *ptepp = ptep; return 0; unlock: pte_unmap_unlock(ptep, *ptlp); if (range) mmu_notifier_invalidate_range_end(range); The notify could be meaningless if we do the "goto unlock" path. Ideally it seems we can move the notifier code to caller (as what most mmu notifier users do) and we can also avoid doing that if follow_pte returned -EINVAL. Thanks,
On Fri, Feb 05, 2021 at 01:14:11PM -0500, Peter Xu wrote: > But I do have a question on why dax as the only user needs to pass in the > notifier to follow_pte() for initialization. Not sure either, why does DAX opencode something very much like page_mkclean() with dax_entry_mkclean()? Also it looks like DAX uses the wrong notifier, it calls MMU_NOTIFY_CLEAR but page_mkclean_one() uses MMU_NOTIFY_PROTECTION_PAGE for the same PTE modification sequence?? page_mkclean() has some technique to make the notifier have the right size without becoming entangled in the PTL locks.. Jason
On Mon, Feb 08, 2021 at 02:51:33PM -0400, Jason Gunthorpe wrote: > On Fri, Feb 05, 2021 at 01:14:11PM -0500, Peter Xu wrote: > > > But I do have a question on why dax as the only user needs to pass in the > > notifier to follow_pte() for initialization. > > Not sure either, why does DAX opencode something very much like > page_mkclean() with dax_entry_mkclean()? > > Also it looks like DAX uses the wrong notifier, it calls > MMU_NOTIFY_CLEAR but page_mkclean_one() uses > MMU_NOTIFY_PROTECTION_PAGE for the same PTE modification sequence?? > > page_mkclean() has some technique to make the notifier have the right > size without becoming entangled in the PTL locks.. Right. I guess it's because dax doesn't have "struct page*" on the back, so it can't directly use page_mkclean(). However the whole logic does look very similar, so maybe they can be merged in some way. Thanks,
On Mon, Feb 08, 2021 at 05:02:59PM -0500, Peter Xu wrote: > On Mon, Feb 08, 2021 at 02:51:33PM -0400, Jason Gunthorpe wrote: > > On Fri, Feb 05, 2021 at 01:14:11PM -0500, Peter Xu wrote: > > > > > But I do have a question on why dax as the only user needs to pass in the > > > notifier to follow_pte() for initialization. > > > > Not sure either, why does DAX opencode something very much like > > page_mkclean() with dax_entry_mkclean()? > > > > Also it looks like DAX uses the wrong notifier, it calls > > MMU_NOTIFY_CLEAR but page_mkclean_one() uses > > MMU_NOTIFY_PROTECTION_PAGE for the same PTE modification sequence?? > > > > page_mkclean() has some technique to make the notifier have the right > > size without becoming entangled in the PTL locks.. > > Right. I guess it's because dax doesn't have "struct page*" on the > back, so it It doesn't? I thought DAX cases did? Jason
On Mon, Feb 08, 2021 at 07:26:25PM -0400, Jason Gunthorpe wrote: > On Mon, Feb 08, 2021 at 05:02:59PM -0500, Peter Xu wrote: > > On Mon, Feb 08, 2021 at 02:51:33PM -0400, Jason Gunthorpe wrote: > > > On Fri, Feb 05, 2021 at 01:14:11PM -0500, Peter Xu wrote: > > > > > > > But I do have a question on why dax as the only user needs to pass in the > > > > notifier to follow_pte() for initialization. > > > > > > Not sure either, why does DAX opencode something very much like > > > page_mkclean() with dax_entry_mkclean()? > > > > > > Also it looks like DAX uses the wrong notifier, it calls > > > MMU_NOTIFY_CLEAR but page_mkclean_one() uses > > > MMU_NOTIFY_PROTECTION_PAGE for the same PTE modification sequence?? > > > > > > page_mkclean() has some technique to make the notifier have the right > > > size without becoming entangled in the PTL locks.. > > > > Right. I guess it's because dax doesn't have "struct page*" on the > > back, so it > > It doesn't? I thought DAX cases did? I'm not familiar with dax at all.. but it seems so: e.g. dax_iomap_pte_fault() looks like the general fault handler for dax mappings, in which there's calls to things like vmf_insert_mixed_mkwrite() trying to install ptes with raw pfn. Or I could also be missing something very important.. Thanks,
On Mon, Feb 08, 2021 at 07:26:25PM -0400, Jason Gunthorpe wrote: > > > page_mkclean() has some technique to make the notifier have the right > > > size without becoming entangled in the PTL locks.. > > > > Right. I guess it's because dax doesn't have "struct page*" on the > > back, so it > > It doesn't? I thought DAX cases did? File system DAX has a struct page, device DAX does not. Which means everything using iomap should have a page available, but i'm adding Dan as he should know the details :)
On 2/9/21 8:19 AM, Christoph Hellwig wrote: > On Mon, Feb 08, 2021 at 07:26:25PM -0400, Jason Gunthorpe wrote: >>>> page_mkclean() has some technique to make the notifier have the right >>>> size without becoming entangled in the PTL locks.. >>> >>> Right. I guess it's because dax doesn't have "struct page*" on the >>> back, so it >> >> It doesn't? I thought DAX cases did? > > File system DAX has a struct page, device DAX does not. Which means > everything using iomap should have a page available, but i'm adding > Dan as he should know the details :) > Both fsdax and device-dax have struct page. It's the pmem block device that doesn't.