Message ID | 20210115190451.3135416-1-axelrasmussen@google.com (mailing list archive) |
---|---|
Headers | show |
Series | userfaultfd: add minor fault handling | expand |
On Fri, Jan 15, 2021 at 11:04:42AM -0800, Axel Rasmussen wrote: > UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults. Without > modifications, the existing codepath assumes a new page needs to be allocated. > This is okay, since userspace must have a second non-UFFD-registered mapping > anyway, thus there isn't much reason to want to use these in any case (just > memcpy or memset or similar). > > - If UFFDIO_COPY is used on a minor fault, -EEXIST is returned. When minor fault the dst VM will report to src with the address. The src could checkup whether dst contains the latest data on that (pmd) page and either: - it's latest, then tells dst, dst does UFFDIO_CONTINUE - it's not latest, then tells dst (probably along with the page data? if hugetlbfs doesn't support double map, we'd need to batch all the dirty small pages in one shot), dst does whatever to replace the page Then, I'm thinking what would be the way to replace an old page.. is that one FALLOC_FL_PUNCH_HOLE plus one UFFDIO_COPY at last? Thanks,
On Thu, Jan 21, 2021 at 11:12 AM Peter Xu <peterx@redhat.com> wrote: > > On Fri, Jan 15, 2021 at 11:04:42AM -0800, Axel Rasmussen wrote: > > UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults. Without > > modifications, the existing codepath assumes a new page needs to be allocated. > > This is okay, since userspace must have a second non-UFFD-registered mapping > > anyway, thus there isn't much reason to want to use these in any case (just > > memcpy or memset or similar). > > > > - If UFFDIO_COPY is used on a minor fault, -EEXIST is returned. > > When minor fault the dst VM will report to src with the address. The src could > checkup whether dst contains the latest data on that (pmd) page and either: > > - it's latest, then tells dst, dst does UFFDIO_CONTINUE > > - it's not latest, then tells dst (probably along with the page data? if > hugetlbfs doesn't support double map, we'd need to batch all the dirty > small pages in one shot), dst does whatever to replace the page > > Then, I'm thinking what would be the way to replace an old page.. is that one > FALLOC_FL_PUNCH_HOLE plus one UFFDIO_COPY at last? When I wrote this, my thinking was that users of this feature would have two mappings, one of which is not UFFD registered at all. So, to replace the existing page contents, userspace would just write to the non-UFFD mapping (with memcpy() or whatever else, or we could get fancy and imagine using some RDMA technology to copy the page over the network from the live migration source directly in place). After performing the write, we just UFFDIO_CONTINUE. I believe FALLOC_FL_PUNCH_HOLE / MADV_REMOVE doesn't work with hugetlbfs? Once shmem support is implemented, I would expect FALLOC_FL_PUNCH_HOLE + UFFDIO_COPY to work, but I wonder if such an operation would be more expensive than just copying using the other side of the shared mapping? > > Thanks, > > -- > Peter Xu >
On Thu, Jan 21, 2021 at 02:13:50PM -0800, Axel Rasmussen wrote: > When I wrote this, my thinking was that users of this feature would > have two mappings, one of which is not UFFD registered at all. So, to > replace the existing page contents, userspace would just write to the > non-UFFD mapping (with memcpy() or whatever else, or we could get > fancy and imagine using some RDMA technology to copy the page over the > network from the live migration source directly in place). After > performing the write, we just UFFDIO_CONTINUE. > > I believe FALLOC_FL_PUNCH_HOLE / MADV_REMOVE doesn't work with > hugetlbfs? Once shmem support is implemented, I would expect > FALLOC_FL_PUNCH_HOLE + UFFDIO_COPY to work, but I wonder if such an > operation would be more expensive than just copying using the other > side of the shared mapping? IIUC hugetlb supports that (hugetlbfs_punch_hole()). But I agree with you on what you said should be good enough. Thanks,