Message ID | 1500533554-5779-4-git-send-email-a.perevalov@samsung.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov wrote: > This patch adds ability to track down already received > pages, it's necessary for calculation vCPU block time in > postcopy migration feature, maybe for restore after > postcopy migration failure. > Also it's necessary to solve shared memory issue in > postcopy livemigration. Information about received pages > will be transferred to the software virtual bridge > (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for > already received pages. fallocate syscall is required for > remmaped shared memory, due to remmaping itself blocks > ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT > error (struct page is exists after remmap). > > Bitmap is placed into RAMBlock as another postcopy/precopy > related bitmaps. > > Reviewed-by: Peter Xu <peterx@redhat.com> > Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> > --- [...] > static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr, > - void *from_addr, uint64_t pagesize) > + void *from_addr, uint64_t pagesize, RAMBlock *rb) > { > + int ret; > if (from_addr) { > struct uffdio_copy copy_struct; > copy_struct.dst = (uint64_t)(uintptr_t)host_addr; > copy_struct.src = (uint64_t)(uintptr_t)from_addr; > copy_struct.len = pagesize; > copy_struct.mode = 0; > - return ioctl(userfault_fd, UFFDIO_COPY, ©_struct); > + ret = ioctl(userfault_fd, UFFDIO_COPY, ©_struct); > } else { > struct uffdio_zeropage zero_struct; > zero_struct.range.start = (uint64_t)(uintptr_t)host_addr; > zero_struct.range.len = pagesize; > zero_struct.mode = 0; > - return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); > + ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); > + } > + if (!ret) { > + ramblock_recv_bitmap_set(host_addr, rb); Wait... Now we are using 4k-page/bit bitmap, do we need to take care of the huge pages here? Looks like we are only setting the first bit of it if it is a huge page?
On 07/26/2017 04:49 AM, Peter Xu wrote: > On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov wrote: >> This patch adds ability to track down already received >> pages, it's necessary for calculation vCPU block time in >> postcopy migration feature, maybe for restore after >> postcopy migration failure. >> Also it's necessary to solve shared memory issue in >> postcopy livemigration. Information about received pages >> will be transferred to the software virtual bridge >> (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for >> already received pages. fallocate syscall is required for >> remmaped shared memory, due to remmaping itself blocks >> ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT >> error (struct page is exists after remmap). >> >> Bitmap is placed into RAMBlock as another postcopy/precopy >> related bitmaps. >> >> Reviewed-by: Peter Xu <peterx@redhat.com> >> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> >> --- > [...] > >> static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr, >> - void *from_addr, uint64_t pagesize) >> + void *from_addr, uint64_t pagesize, RAMBlock *rb) >> { >> + int ret; >> if (from_addr) { >> struct uffdio_copy copy_struct; >> copy_struct.dst = (uint64_t)(uintptr_t)host_addr; >> copy_struct.src = (uint64_t)(uintptr_t)from_addr; >> copy_struct.len = pagesize; >> copy_struct.mode = 0; >> - return ioctl(userfault_fd, UFFDIO_COPY, ©_struct); >> + ret = ioctl(userfault_fd, UFFDIO_COPY, ©_struct); >> } else { >> struct uffdio_zeropage zero_struct; >> zero_struct.range.start = (uint64_t)(uintptr_t)host_addr; >> zero_struct.range.len = pagesize; >> zero_struct.mode = 0; >> - return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); >> + ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); >> + } >> + if (!ret) { >> + ramblock_recv_bitmap_set(host_addr, rb); > Wait... > > Now we are using 4k-page/bit bitmap, do we need to take care of the > huge pages here? Looks like we are only setting the first bit of it > if it is a huge page? First version was per ramblock page size, IOW bitmap was smaller in case of hugepages. You mentioned that TARGET_PAGE_SIZE is reasonable for precopy case, in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap for copied page" I though TARGET_PAGE_SIZE as transmition unit, is using in precopy even hugepage case. But it's not so logically, page being marked as dirty, should be sent as a whole page. >
On Wed, Jul 26, 2017 at 11:07:17AM +0300, Alexey Perevalov wrote: > On 07/26/2017 04:49 AM, Peter Xu wrote: > >On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov wrote: > >>This patch adds ability to track down already received > >>pages, it's necessary for calculation vCPU block time in > >>postcopy migration feature, maybe for restore after > >>postcopy migration failure. > >>Also it's necessary to solve shared memory issue in > >>postcopy livemigration. Information about received pages > >>will be transferred to the software virtual bridge > >>(e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for > >>already received pages. fallocate syscall is required for > >>remmaped shared memory, due to remmaping itself blocks > >>ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT > >>error (struct page is exists after remmap). > >> > >>Bitmap is placed into RAMBlock as another postcopy/precopy > >>related bitmaps. > >> > >>Reviewed-by: Peter Xu <peterx@redhat.com> > >>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> > >>--- > >[...] > > > >> static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr, > >>- void *from_addr, uint64_t pagesize) > >>+ void *from_addr, uint64_t pagesize, RAMBlock *rb) > >> { > >>+ int ret; > >> if (from_addr) { > >> struct uffdio_copy copy_struct; > >> copy_struct.dst = (uint64_t)(uintptr_t)host_addr; > >> copy_struct.src = (uint64_t)(uintptr_t)from_addr; > >> copy_struct.len = pagesize; > >> copy_struct.mode = 0; > >>- return ioctl(userfault_fd, UFFDIO_COPY, ©_struct); > >>+ ret = ioctl(userfault_fd, UFFDIO_COPY, ©_struct); > >> } else { > >> struct uffdio_zeropage zero_struct; > >> zero_struct.range.start = (uint64_t)(uintptr_t)host_addr; > >> zero_struct.range.len = pagesize; > >> zero_struct.mode = 0; > >>- return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); > >>+ ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); > >>+ } > >>+ if (!ret) { > >>+ ramblock_recv_bitmap_set(host_addr, rb); > >Wait... > > > >Now we are using 4k-page/bit bitmap, do we need to take care of the > >huge pages here? Looks like we are only setting the first bit of it > >if it is a huge page? > First version was per ramblock page size, IOW bitmap was smaller in > case of hugepages. Yes, but this is not the first version any more. :) This patch is using: bitmap_new(rb->max_length >> TARGET_PAGE_BITS); to allocate bitmap, so it is using small pages always for bitmap, right? (I should not really say "4k" pages, here I think the size is host page size, which is the thing returned from getpagesize()). > > > You mentioned that TARGET_PAGE_SIZE is reasonable for precopy case, > in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap for copied page" > I though TARGET_PAGE_SIZE as transmition unit, is using in precopy even > hugepage case. > But it's not so logically, page being marked as dirty, should be sent as a > whole page. Sorry if I misunderstood, but I didn't see anything wrong - we are sending pages in small pages, but when postcopy is there, we do UFFDIO_COPY in huge page, so everything is fine?
On 07/26/2017 11:43 AM, Peter Xu wrote: > On Wed, Jul 26, 2017 at 11:07:17AM +0300, Alexey Perevalov wrote: >> On 07/26/2017 04:49 AM, Peter Xu wrote: >>> On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov wrote: >>>> This patch adds ability to track down already received >>>> pages, it's necessary for calculation vCPU block time in >>>> postcopy migration feature, maybe for restore after >>>> postcopy migration failure. >>>> Also it's necessary to solve shared memory issue in >>>> postcopy livemigration. Information about received pages >>>> will be transferred to the software virtual bridge >>>> (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for >>>> already received pages. fallocate syscall is required for >>>> remmaped shared memory, due to remmaping itself blocks >>>> ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT >>>> error (struct page is exists after remmap). >>>> >>>> Bitmap is placed into RAMBlock as another postcopy/precopy >>>> related bitmaps. >>>> >>>> Reviewed-by: Peter Xu <peterx@redhat.com> >>>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> >>>> --- >>> [...] >>> >>>> static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr, >>>> - void *from_addr, uint64_t pagesize) >>>> + void *from_addr, uint64_t pagesize, RAMBlock *rb) >>>> { >>>> + int ret; >>>> if (from_addr) { >>>> struct uffdio_copy copy_struct; >>>> copy_struct.dst = (uint64_t)(uintptr_t)host_addr; >>>> copy_struct.src = (uint64_t)(uintptr_t)from_addr; >>>> copy_struct.len = pagesize; >>>> copy_struct.mode = 0; >>>> - return ioctl(userfault_fd, UFFDIO_COPY, ©_struct); >>>> + ret = ioctl(userfault_fd, UFFDIO_COPY, ©_struct); >>>> } else { >>>> struct uffdio_zeropage zero_struct; >>>> zero_struct.range.start = (uint64_t)(uintptr_t)host_addr; >>>> zero_struct.range.len = pagesize; >>>> zero_struct.mode = 0; >>>> - return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); >>>> + ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); >>>> + } >>>> + if (!ret) { >>>> + ramblock_recv_bitmap_set(host_addr, rb); >>> Wait... >>> >>> Now we are using 4k-page/bit bitmap, do we need to take care of the >>> huge pages here? Looks like we are only setting the first bit of it >>> if it is a huge page? >> First version was per ramblock page size, IOW bitmap was smaller in >> case of hugepages. > Yes, but this is not the first version any more. :) > > This patch is using: > > bitmap_new(rb->max_length >> TARGET_PAGE_BITS); > > to allocate bitmap, so it is using small pages always for bitmap, > right? (I should not really say "4k" pages, here I think the size is > host page size, which is the thing returned from getpagesize()). > >> >> You mentioned that TARGET_PAGE_SIZE is reasonable for precopy case, >> in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap for copied page" >> I though TARGET_PAGE_SIZE as transmition unit, is using in precopy even >> hugepage case. >> But it's not so logically, page being marked as dirty, should be sent as a >> whole page. > Sorry if I misunderstood, but I didn't see anything wrong - we are > sending pages in small pages, but when postcopy is there, we do > UFFDIO_COPY in huge page, so everything is fine? I think yes, we chose TARGET_PAGE_SIZE because of wider use case ranges.
On Wed, Jul 26, 2017 at 06:24:11PM +0300, Alexey Perevalov wrote: > On 07/26/2017 11:43 AM, Peter Xu wrote: > >On Wed, Jul 26, 2017 at 11:07:17AM +0300, Alexey Perevalov wrote: > >>On 07/26/2017 04:49 AM, Peter Xu wrote: > >>>On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov wrote: > >>>>This patch adds ability to track down already received > >>>>pages, it's necessary for calculation vCPU block time in > >>>>postcopy migration feature, maybe for restore after > >>>>postcopy migration failure. > >>>>Also it's necessary to solve shared memory issue in > >>>>postcopy livemigration. Information about received pages > >>>>will be transferred to the software virtual bridge > >>>>(e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for > >>>>already received pages. fallocate syscall is required for > >>>>remmaped shared memory, due to remmaping itself blocks > >>>>ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT > >>>>error (struct page is exists after remmap). > >>>> > >>>>Bitmap is placed into RAMBlock as another postcopy/precopy > >>>>related bitmaps. > >>>> > >>>>Reviewed-by: Peter Xu <peterx@redhat.com> > >>>>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> > >>>>--- > >>>[...] > >>> > >>>> static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr, > >>>>- void *from_addr, uint64_t pagesize) > >>>>+ void *from_addr, uint64_t pagesize, RAMBlock *rb) > >>>> { > >>>>+ int ret; > >>>> if (from_addr) { > >>>> struct uffdio_copy copy_struct; > >>>> copy_struct.dst = (uint64_t)(uintptr_t)host_addr; > >>>> copy_struct.src = (uint64_t)(uintptr_t)from_addr; > >>>> copy_struct.len = pagesize; > >>>> copy_struct.mode = 0; > >>>>- return ioctl(userfault_fd, UFFDIO_COPY, ©_struct); > >>>>+ ret = ioctl(userfault_fd, UFFDIO_COPY, ©_struct); > >>>> } else { > >>>> struct uffdio_zeropage zero_struct; > >>>> zero_struct.range.start = (uint64_t)(uintptr_t)host_addr; > >>>> zero_struct.range.len = pagesize; > >>>> zero_struct.mode = 0; > >>>>- return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); > >>>>+ ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); > >>>>+ } > >>>>+ if (!ret) { > >>>>+ ramblock_recv_bitmap_set(host_addr, rb); > >>>Wait... > >>> > >>>Now we are using 4k-page/bit bitmap, do we need to take care of the > >>>huge pages here? Looks like we are only setting the first bit of it > >>>if it is a huge page? > >>First version was per ramblock page size, IOW bitmap was smaller in > >>case of hugepages. > >Yes, but this is not the first version any more. :) > > > >This patch is using: > > > > bitmap_new(rb->max_length >> TARGET_PAGE_BITS); > > > >to allocate bitmap, so it is using small pages always for bitmap, > >right? (I should not really say "4k" pages, here I think the size is > >host page size, which is the thing returned from getpagesize()). > > > >> > >>You mentioned that TARGET_PAGE_SIZE is reasonable for precopy case, > >>in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap for copied page" > >>I though TARGET_PAGE_SIZE as transmition unit, is using in precopy even > >>hugepage case. > >>But it's not so logically, page being marked as dirty, should be sent as a > >>whole page. > >Sorry if I misunderstood, but I didn't see anything wrong - we are > >sending pages in small pages, but when postcopy is there, we do > >UFFDIO_COPY in huge page, so everything is fine? > I think yes, we chose TARGET_PAGE_SIZE because of wider > use case ranges. So... are you going to post another version? IIUC we just need to use a bitmap_set() to replace the ramblock_recv_bitmap_set(), while set the size with "pagesize / TARGET_PAGE_SIZE"? (I think I was wrong when saying getpagesize() above: the small page should be target page size, while the huge page should be the host's)
On 07/27/2017 05:35 AM, Peter Xu wrote: > On Wed, Jul 26, 2017 at 06:24:11PM +0300, Alexey Perevalov wrote: >> On 07/26/2017 11:43 AM, Peter Xu wrote: >>> On Wed, Jul 26, 2017 at 11:07:17AM +0300, Alexey Perevalov wrote: >>>> On 07/26/2017 04:49 AM, Peter Xu wrote: >>>>> On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov wrote: >>>>>> This patch adds ability to track down already received >>>>>> pages, it's necessary for calculation vCPU block time in >>>>>> postcopy migration feature, maybe for restore after >>>>>> postcopy migration failure. >>>>>> Also it's necessary to solve shared memory issue in >>>>>> postcopy livemigration. Information about received pages >>>>>> will be transferred to the software virtual bridge >>>>>> (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for >>>>>> already received pages. fallocate syscall is required for >>>>>> remmaped shared memory, due to remmaping itself blocks >>>>>> ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT >>>>>> error (struct page is exists after remmap). >>>>>> >>>>>> Bitmap is placed into RAMBlock as another postcopy/precopy >>>>>> related bitmaps. >>>>>> >>>>>> Reviewed-by: Peter Xu <peterx@redhat.com> >>>>>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> >>>>>> --- >>>>> [...] >>>>> >>>>>> static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr, >>>>>> - void *from_addr, uint64_t pagesize) >>>>>> + void *from_addr, uint64_t pagesize, RAMBlock *rb) >>>>>> { >>>>>> + int ret; >>>>>> if (from_addr) { >>>>>> struct uffdio_copy copy_struct; >>>>>> copy_struct.dst = (uint64_t)(uintptr_t)host_addr; >>>>>> copy_struct.src = (uint64_t)(uintptr_t)from_addr; >>>>>> copy_struct.len = pagesize; >>>>>> copy_struct.mode = 0; >>>>>> - return ioctl(userfault_fd, UFFDIO_COPY, ©_struct); >>>>>> + ret = ioctl(userfault_fd, UFFDIO_COPY, ©_struct); >>>>>> } else { >>>>>> struct uffdio_zeropage zero_struct; >>>>>> zero_struct.range.start = (uint64_t)(uintptr_t)host_addr; >>>>>> zero_struct.range.len = pagesize; >>>>>> zero_struct.mode = 0; >>>>>> - return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); >>>>>> + ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); >>>>>> + } >>>>>> + if (!ret) { >>>>>> + ramblock_recv_bitmap_set(host_addr, rb); >>>>> Wait... >>>>> >>>>> Now we are using 4k-page/bit bitmap, do we need to take care of the >>>>> huge pages here? Looks like we are only setting the first bit of it >>>>> if it is a huge page? >>>> First version was per ramblock page size, IOW bitmap was smaller in >>>> case of hugepages. >>> Yes, but this is not the first version any more. :) >>> >>> This patch is using: >>> >>> bitmap_new(rb->max_length >> TARGET_PAGE_BITS); >>> >>> to allocate bitmap, so it is using small pages always for bitmap, >>> right? (I should not really say "4k" pages, here I think the size is >>> host page size, which is the thing returned from getpagesize()). >>> >>>> You mentioned that TARGET_PAGE_SIZE is reasonable for precopy case, >>>> in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap for copied page" >>>> I though TARGET_PAGE_SIZE as transmition unit, is using in precopy even >>>> hugepage case. >>>> But it's not so logically, page being marked as dirty, should be sent as a >>>> whole page. >>> Sorry if I misunderstood, but I didn't see anything wrong - we are >>> sending pages in small pages, but when postcopy is there, we do >>> UFFDIO_COPY in huge page, so everything is fine? >> I think yes, we chose TARGET_PAGE_SIZE because of wider >> use case ranges. > So... are you going to post another version? IIUC we just need to use > a bitmap_set() to replace the ramblock_recv_bitmap_set(), while set > the size with "pagesize / TARGET_PAGE_SIZE"? From my point of view TARGET_PAGE_SIZE/TARGET_PAGE_BITS it's a platform specific and it used in ram_load to copy to buffer so it's more preferred for bitmap size and I'm not going to replace ramblock_recv_bitmap_set helper - it calculates offset. > > (I think I was wrong when saying getpagesize() above: the small page > should be target page size, while the huge page should be the host's) I think we should forget about huge page case in "received bitmap" concept, maybe in "uffd_copied bitmap" it was reasonable ;) >
On Thu, Jul 27, 2017 at 10:27:41AM +0300, Alexey Perevalov wrote: > On 07/27/2017 05:35 AM, Peter Xu wrote: > >On Wed, Jul 26, 2017 at 06:24:11PM +0300, Alexey Perevalov wrote: > >>On 07/26/2017 11:43 AM, Peter Xu wrote: > >>>On Wed, Jul 26, 2017 at 11:07:17AM +0300, Alexey Perevalov wrote: > >>>>On 07/26/2017 04:49 AM, Peter Xu wrote: > >>>>>On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov wrote: > >>>>>>This patch adds ability to track down already received > >>>>>>pages, it's necessary for calculation vCPU block time in > >>>>>>postcopy migration feature, maybe for restore after > >>>>>>postcopy migration failure. > >>>>>>Also it's necessary to solve shared memory issue in > >>>>>>postcopy livemigration. Information about received pages > >>>>>>will be transferred to the software virtual bridge > >>>>>>(e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for > >>>>>>already received pages. fallocate syscall is required for > >>>>>>remmaped shared memory, due to remmaping itself blocks > >>>>>>ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT > >>>>>>error (struct page is exists after remmap). > >>>>>> > >>>>>>Bitmap is placed into RAMBlock as another postcopy/precopy > >>>>>>related bitmaps. > >>>>>> > >>>>>>Reviewed-by: Peter Xu <peterx@redhat.com> > >>>>>>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> > >>>>>>--- > >>>>>[...] > >>>>> > >>>>>> static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr, > >>>>>>- void *from_addr, uint64_t pagesize) > >>>>>>+ void *from_addr, uint64_t pagesize, RAMBlock *rb) > >>>>>> { > >>>>>>+ int ret; > >>>>>> if (from_addr) { > >>>>>> struct uffdio_copy copy_struct; > >>>>>> copy_struct.dst = (uint64_t)(uintptr_t)host_addr; > >>>>>> copy_struct.src = (uint64_t)(uintptr_t)from_addr; > >>>>>> copy_struct.len = pagesize; > >>>>>> copy_struct.mode = 0; > >>>>>>- return ioctl(userfault_fd, UFFDIO_COPY, ©_struct); > >>>>>>+ ret = ioctl(userfault_fd, UFFDIO_COPY, ©_struct); > >>>>>> } else { > >>>>>> struct uffdio_zeropage zero_struct; > >>>>>> zero_struct.range.start = (uint64_t)(uintptr_t)host_addr; > >>>>>> zero_struct.range.len = pagesize; > >>>>>> zero_struct.mode = 0; > >>>>>>- return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); > >>>>>>+ ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); > >>>>>>+ } > >>>>>>+ if (!ret) { > >>>>>>+ ramblock_recv_bitmap_set(host_addr, rb); > >>>>>Wait... > >>>>> > >>>>>Now we are using 4k-page/bit bitmap, do we need to take care of the > >>>>>huge pages here? Looks like we are only setting the first bit of it > >>>>>if it is a huge page? > >>>>First version was per ramblock page size, IOW bitmap was smaller in > >>>>case of hugepages. > >>>Yes, but this is not the first version any more. :) > >>> > >>>This patch is using: > >>> > >>> bitmap_new(rb->max_length >> TARGET_PAGE_BITS); > >>> > >>>to allocate bitmap, so it is using small pages always for bitmap, > >>>right? (I should not really say "4k" pages, here I think the size is > >>>host page size, which is the thing returned from getpagesize()). > >>> > >>>>You mentioned that TARGET_PAGE_SIZE is reasonable for precopy case, > >>>>in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap for copied page" > >>>>I though TARGET_PAGE_SIZE as transmition unit, is using in precopy even > >>>>hugepage case. > >>>>But it's not so logically, page being marked as dirty, should be sent as a > >>>>whole page. > >>>Sorry if I misunderstood, but I didn't see anything wrong - we are > >>>sending pages in small pages, but when postcopy is there, we do > >>>UFFDIO_COPY in huge page, so everything is fine? > >>I think yes, we chose TARGET_PAGE_SIZE because of wider > >>use case ranges. > >So... are you going to post another version? IIUC we just need to use > >a bitmap_set() to replace the ramblock_recv_bitmap_set(), while set > >the size with "pagesize / TARGET_PAGE_SIZE"? > From my point of view TARGET_PAGE_SIZE/TARGET_PAGE_BITS it's a platform > specific > > and it used in ram_load to copy to buffer so it's more preferred for bitmap size > and I'm not going to replace ramblock_recv_bitmap_set helper - it calculates offset. > > > > >(I think I was wrong when saying getpagesize() above: the small page > > should be target page size, while the huge page should be the host's) > I think we should forget about huge page case in "received bitmap" > concept, maybe in "uffd_copied bitmap" it was reasonable ;) Again, I am not sure I got the whole idea of the reply... However, I do think when we UFFDIO_COPY a huge page, then we should do bitmap_set() on the received bitmap for the whole range that the huge page covers. IMHO, the bitmap is defined as "one bit per small page", and the small page size is TARGET_PAGE_SIZE. We cannot just assume that "as long as the first bit of the huge page is set, all the small pages in the huge page are set". Thanks,
On 07/28/2017 07:27 AM, Peter Xu wrote: > On Thu, Jul 27, 2017 at 10:27:41AM +0300, Alexey Perevalov wrote: >> On 07/27/2017 05:35 AM, Peter Xu wrote: >>> On Wed, Jul 26, 2017 at 06:24:11PM +0300, Alexey Perevalov wrote: >>>> On 07/26/2017 11:43 AM, Peter Xu wrote: >>>>> On Wed, Jul 26, 2017 at 11:07:17AM +0300, Alexey Perevalov wrote: >>>>>> On 07/26/2017 04:49 AM, Peter Xu wrote: >>>>>>> On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov wrote: >>>>>>>> This patch adds ability to track down already received >>>>>>>> pages, it's necessary for calculation vCPU block time in >>>>>>>> postcopy migration feature, maybe for restore after >>>>>>>> postcopy migration failure. >>>>>>>> Also it's necessary to solve shared memory issue in >>>>>>>> postcopy livemigration. Information about received pages >>>>>>>> will be transferred to the software virtual bridge >>>>>>>> (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for >>>>>>>> already received pages. fallocate syscall is required for >>>>>>>> remmaped shared memory, due to remmaping itself blocks >>>>>>>> ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT >>>>>>>> error (struct page is exists after remmap). >>>>>>>> >>>>>>>> Bitmap is placed into RAMBlock as another postcopy/precopy >>>>>>>> related bitmaps. >>>>>>>> >>>>>>>> Reviewed-by: Peter Xu <peterx@redhat.com> >>>>>>>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> >>>>>>>> --- >>>>>>> [...] >>>>>>> >>>>>>>> static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr, >>>>>>>> - void *from_addr, uint64_t pagesize) >>>>>>>> + void *from_addr, uint64_t pagesize, RAMBlock *rb) >>>>>>>> { >>>>>>>> + int ret; >>>>>>>> if (from_addr) { >>>>>>>> struct uffdio_copy copy_struct; >>>>>>>> copy_struct.dst = (uint64_t)(uintptr_t)host_addr; >>>>>>>> copy_struct.src = (uint64_t)(uintptr_t)from_addr; >>>>>>>> copy_struct.len = pagesize; >>>>>>>> copy_struct.mode = 0; >>>>>>>> - return ioctl(userfault_fd, UFFDIO_COPY, ©_struct); >>>>>>>> + ret = ioctl(userfault_fd, UFFDIO_COPY, ©_struct); >>>>>>>> } else { >>>>>>>> struct uffdio_zeropage zero_struct; >>>>>>>> zero_struct.range.start = (uint64_t)(uintptr_t)host_addr; >>>>>>>> zero_struct.range.len = pagesize; >>>>>>>> zero_struct.mode = 0; >>>>>>>> - return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); >>>>>>>> + ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); >>>>>>>> + } >>>>>>>> + if (!ret) { >>>>>>>> + ramblock_recv_bitmap_set(host_addr, rb); >>>>>>> Wait... >>>>>>> >>>>>>> Now we are using 4k-page/bit bitmap, do we need to take care of the >>>>>>> huge pages here? Looks like we are only setting the first bit of it >>>>>>> if it is a huge page? >>>>>> First version was per ramblock page size, IOW bitmap was smaller in >>>>>> case of hugepages. >>>>> Yes, but this is not the first version any more. :) >>>>> >>>>> This patch is using: >>>>> >>>>> bitmap_new(rb->max_length >> TARGET_PAGE_BITS); >>>>> >>>>> to allocate bitmap, so it is using small pages always for bitmap, >>>>> right? (I should not really say "4k" pages, here I think the size is >>>>> host page size, which is the thing returned from getpagesize()). >>>>> >>>>>> You mentioned that TARGET_PAGE_SIZE is reasonable for precopy case, >>>>>> in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap for copied page" >>>>>> I though TARGET_PAGE_SIZE as transmition unit, is using in precopy even >>>>>> hugepage case. >>>>>> But it's not so logically, page being marked as dirty, should be sent as a >>>>>> whole page. >>>>> Sorry if I misunderstood, but I didn't see anything wrong - we are >>>>> sending pages in small pages, but when postcopy is there, we do >>>>> UFFDIO_COPY in huge page, so everything is fine? >>>> I think yes, we chose TARGET_PAGE_SIZE because of wider >>>> use case ranges. >>> So... are you going to post another version? IIUC we just need to use >>> a bitmap_set() to replace the ramblock_recv_bitmap_set(), while set >>> the size with "pagesize / TARGET_PAGE_SIZE"? >> From my point of view TARGET_PAGE_SIZE/TARGET_PAGE_BITS it's a platform >> specific >> >> and it used in ram_load to copy to buffer so it's more preferred for bitmap size >> and I'm not going to replace ramblock_recv_bitmap_set helper - it calculates offset. >> >>> (I think I was wrong when saying getpagesize() above: the small page >>> should be target page size, while the huge page should be the host's) >> I think we should forget about huge page case in "received bitmap" >> concept, maybe in "uffd_copied bitmap" it was reasonable ;) > Again, I am not sure I got the whole idea of the reply... > > However, I do think when we UFFDIO_COPY a huge page, then we should do > bitmap_set() on the received bitmap for the whole range that the huge > page covers. for what purpose? > > IMHO, the bitmap is defined as "one bit per small page", and the small > page size is TARGET_PAGE_SIZE. We cannot just assume that "as long as > the first bit of the huge page is set, all the small pages in the huge > page are set". At the moment of copying all small pages of the huge page, should be received. Yes it's assumption, but I couldn't predict side effect, maybe it will be necessary in postcopy failure handling, while copying pages back, but I'm not sure right now. To know that, need to start implementing it, or at least to deep investigation. > Thanks, >
On Fri, Jul 28, 2017 at 09:43:28AM +0300, Alexey Perevalov wrote: > On 07/28/2017 07:27 AM, Peter Xu wrote: > >On Thu, Jul 27, 2017 at 10:27:41AM +0300, Alexey Perevalov wrote: > >>On 07/27/2017 05:35 AM, Peter Xu wrote: > >>>On Wed, Jul 26, 2017 at 06:24:11PM +0300, Alexey Perevalov wrote: > >>>>On 07/26/2017 11:43 AM, Peter Xu wrote: > >>>>>On Wed, Jul 26, 2017 at 11:07:17AM +0300, Alexey Perevalov wrote: > >>>>>>On 07/26/2017 04:49 AM, Peter Xu wrote: > >>>>>>>On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov wrote: > >>>>>>>>This patch adds ability to track down already received > >>>>>>>>pages, it's necessary for calculation vCPU block time in > >>>>>>>>postcopy migration feature, maybe for restore after > >>>>>>>>postcopy migration failure. > >>>>>>>>Also it's necessary to solve shared memory issue in > >>>>>>>>postcopy livemigration. Information about received pages > >>>>>>>>will be transferred to the software virtual bridge > >>>>>>>>(e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for > >>>>>>>>already received pages. fallocate syscall is required for > >>>>>>>>remmaped shared memory, due to remmaping itself blocks > >>>>>>>>ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT > >>>>>>>>error (struct page is exists after remmap). > >>>>>>>> > >>>>>>>>Bitmap is placed into RAMBlock as another postcopy/precopy > >>>>>>>>related bitmaps. > >>>>>>>> > >>>>>>>>Reviewed-by: Peter Xu <peterx@redhat.com> > >>>>>>>>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> > >>>>>>>>--- > >>>>>>>[...] > >>>>>>> > >>>>>>>> static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr, > >>>>>>>>- void *from_addr, uint64_t pagesize) > >>>>>>>>+ void *from_addr, uint64_t pagesize, RAMBlock *rb) > >>>>>>>> { > >>>>>>>>+ int ret; > >>>>>>>> if (from_addr) { > >>>>>>>> struct uffdio_copy copy_struct; > >>>>>>>> copy_struct.dst = (uint64_t)(uintptr_t)host_addr; > >>>>>>>> copy_struct.src = (uint64_t)(uintptr_t)from_addr; > >>>>>>>> copy_struct.len = pagesize; > >>>>>>>> copy_struct.mode = 0; > >>>>>>>>- return ioctl(userfault_fd, UFFDIO_COPY, ©_struct); > >>>>>>>>+ ret = ioctl(userfault_fd, UFFDIO_COPY, ©_struct); > >>>>>>>> } else { > >>>>>>>> struct uffdio_zeropage zero_struct; > >>>>>>>> zero_struct.range.start = (uint64_t)(uintptr_t)host_addr; > >>>>>>>> zero_struct.range.len = pagesize; > >>>>>>>> zero_struct.mode = 0; > >>>>>>>>- return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); > >>>>>>>>+ ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); > >>>>>>>>+ } > >>>>>>>>+ if (!ret) { > >>>>>>>>+ ramblock_recv_bitmap_set(host_addr, rb); > >>>>>>>Wait... > >>>>>>> > >>>>>>>Now we are using 4k-page/bit bitmap, do we need to take care of the > >>>>>>>huge pages here? Looks like we are only setting the first bit of it > >>>>>>>if it is a huge page? > >>>>>>First version was per ramblock page size, IOW bitmap was smaller in > >>>>>>case of hugepages. > >>>>>Yes, but this is not the first version any more. :) > >>>>> > >>>>>This patch is using: > >>>>> > >>>>> bitmap_new(rb->max_length >> TARGET_PAGE_BITS); > >>>>> > >>>>>to allocate bitmap, so it is using small pages always for bitmap, > >>>>>right? (I should not really say "4k" pages, here I think the size is > >>>>>host page size, which is the thing returned from getpagesize()). > >>>>> > >>>>>>You mentioned that TARGET_PAGE_SIZE is reasonable for precopy case, > >>>>>>in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap for copied page" > >>>>>>I though TARGET_PAGE_SIZE as transmition unit, is using in precopy even > >>>>>>hugepage case. > >>>>>>But it's not so logically, page being marked as dirty, should be sent as a > >>>>>>whole page. > >>>>>Sorry if I misunderstood, but I didn't see anything wrong - we are > >>>>>sending pages in small pages, but when postcopy is there, we do > >>>>>UFFDIO_COPY in huge page, so everything is fine? > >>>>I think yes, we chose TARGET_PAGE_SIZE because of wider > >>>>use case ranges. > >>>So... are you going to post another version? IIUC we just need to use > >>>a bitmap_set() to replace the ramblock_recv_bitmap_set(), while set > >>>the size with "pagesize / TARGET_PAGE_SIZE"? > >> From my point of view TARGET_PAGE_SIZE/TARGET_PAGE_BITS it's a platform > >>specific > >> > >>and it used in ram_load to copy to buffer so it's more preferred for bitmap size > >>and I'm not going to replace ramblock_recv_bitmap_set helper - it calculates offset. > >> > >>>(I think I was wrong when saying getpagesize() above: the small page > >>> should be target page size, while the huge page should be the host's) > >>I think we should forget about huge page case in "received bitmap" > >>concept, maybe in "uffd_copied bitmap" it was reasonable ;) > >Again, I am not sure I got the whole idea of the reply... > > > >However, I do think when we UFFDIO_COPY a huge page, then we should do > >bitmap_set() on the received bitmap for the whole range that the huge > >page covers. > for what purpose? We chose to use small-paged bitmap since in precopy we need to have such a granularity (in precopy, we can copy a small page even that small page is on a host huge page). Since we decided to use the small-paged bitmap, we need to make sure it follows how it was defined: one bit defines whether the corresponding small page is received. IMHO not following that is hacky and error-prone. > > > > >IMHO, the bitmap is defined as "one bit per small page", and the small > >page size is TARGET_PAGE_SIZE. We cannot just assume that "as long as > >the first bit of the huge page is set, all the small pages in the huge > >page are set". > At the moment of copying all small pages of the huge page, > should be received. Yes it's assumption, but I couldn't predict > side effect, maybe it will be necessary in postcopy failure handling, > while copying pages back, but I'm not sure right now. > To know that, need to start implementing it, or at least to deep > investigation. Yes, postcopy failure handling is exactly one case where it can be used. Of course with all the ramblock information we can re-construct the real bitmap when the source received the bitmaps from destination. However, why not we make it correct at the very beginning (especially when it is quite easy to do so)? (Actually, I asked since I am working on the RFC series of postcopy failure recovery. I will post RFCs soon) Thanks,
On 07/28/2017 09:57 AM, Peter Xu wrote: > On Fri, Jul 28, 2017 at 09:43:28AM +0300, Alexey Perevalov wrote: >> On 07/28/2017 07:27 AM, Peter Xu wrote: >>> On Thu, Jul 27, 2017 at 10:27:41AM +0300, Alexey Perevalov wrote: >>>> On 07/27/2017 05:35 AM, Peter Xu wrote: >>>>> On Wed, Jul 26, 2017 at 06:24:11PM +0300, Alexey Perevalov wrote: >>>>>> On 07/26/2017 11:43 AM, Peter Xu wrote: >>>>>>> On Wed, Jul 26, 2017 at 11:07:17AM +0300, Alexey Perevalov wrote: >>>>>>>> On 07/26/2017 04:49 AM, Peter Xu wrote: >>>>>>>>> On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov wrote: >>>>>>>>>> This patch adds ability to track down already received >>>>>>>>>> pages, it's necessary for calculation vCPU block time in >>>>>>>>>> postcopy migration feature, maybe for restore after >>>>>>>>>> postcopy migration failure. >>>>>>>>>> Also it's necessary to solve shared memory issue in >>>>>>>>>> postcopy livemigration. Information about received pages >>>>>>>>>> will be transferred to the software virtual bridge >>>>>>>>>> (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for >>>>>>>>>> already received pages. fallocate syscall is required for >>>>>>>>>> remmaped shared memory, due to remmaping itself blocks >>>>>>>>>> ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT >>>>>>>>>> error (struct page is exists after remmap). >>>>>>>>>> >>>>>>>>>> Bitmap is placed into RAMBlock as another postcopy/precopy >>>>>>>>>> related bitmaps. >>>>>>>>>> >>>>>>>>>> Reviewed-by: Peter Xu <peterx@redhat.com> >>>>>>>>>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> >>>>>>>>>> --- >>>>>>>>> [...] >>>>>>>>> >>>>>>>>>> static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr, >>>>>>>>>> - void *from_addr, uint64_t pagesize) >>>>>>>>>> + void *from_addr, uint64_t pagesize, RAMBlock *rb) >>>>>>>>>> { >>>>>>>>>> + int ret; >>>>>>>>>> if (from_addr) { >>>>>>>>>> struct uffdio_copy copy_struct; >>>>>>>>>> copy_struct.dst = (uint64_t)(uintptr_t)host_addr; >>>>>>>>>> copy_struct.src = (uint64_t)(uintptr_t)from_addr; >>>>>>>>>> copy_struct.len = pagesize; >>>>>>>>>> copy_struct.mode = 0; >>>>>>>>>> - return ioctl(userfault_fd, UFFDIO_COPY, ©_struct); >>>>>>>>>> + ret = ioctl(userfault_fd, UFFDIO_COPY, ©_struct); >>>>>>>>>> } else { >>>>>>>>>> struct uffdio_zeropage zero_struct; >>>>>>>>>> zero_struct.range.start = (uint64_t)(uintptr_t)host_addr; >>>>>>>>>> zero_struct.range.len = pagesize; >>>>>>>>>> zero_struct.mode = 0; >>>>>>>>>> - return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); >>>>>>>>>> + ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); >>>>>>>>>> + } >>>>>>>>>> + if (!ret) { >>>>>>>>>> + ramblock_recv_bitmap_set(host_addr, rb); >>>>>>>>> Wait... >>>>>>>>> >>>>>>>>> Now we are using 4k-page/bit bitmap, do we need to take care of the >>>>>>>>> huge pages here? Looks like we are only setting the first bit of it >>>>>>>>> if it is a huge page? >>>>>>>> First version was per ramblock page size, IOW bitmap was smaller in >>>>>>>> case of hugepages. >>>>>>> Yes, but this is not the first version any more. :) >>>>>>> >>>>>>> This patch is using: >>>>>>> >>>>>>> bitmap_new(rb->max_length >> TARGET_PAGE_BITS); >>>>>>> >>>>>>> to allocate bitmap, so it is using small pages always for bitmap, >>>>>>> right? (I should not really say "4k" pages, here I think the size is >>>>>>> host page size, which is the thing returned from getpagesize()). >>>>>>> >>>>>>>> You mentioned that TARGET_PAGE_SIZE is reasonable for precopy case, >>>>>>>> in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap for copied page" >>>>>>>> I though TARGET_PAGE_SIZE as transmition unit, is using in precopy even >>>>>>>> hugepage case. >>>>>>>> But it's not so logically, page being marked as dirty, should be sent as a >>>>>>>> whole page. >>>>>>> Sorry if I misunderstood, but I didn't see anything wrong - we are >>>>>>> sending pages in small pages, but when postcopy is there, we do >>>>>>> UFFDIO_COPY in huge page, so everything is fine? >>>>>> I think yes, we chose TARGET_PAGE_SIZE because of wider >>>>>> use case ranges. >>>>> So... are you going to post another version? IIUC we just need to use >>>>> a bitmap_set() to replace the ramblock_recv_bitmap_set(), while set >>>>> the size with "pagesize / TARGET_PAGE_SIZE"? >>>> From my point of view TARGET_PAGE_SIZE/TARGET_PAGE_BITS it's a platform >>>> specific >>>> >>>> and it used in ram_load to copy to buffer so it's more preferred for bitmap size >>>> and I'm not going to replace ramblock_recv_bitmap_set helper - it calculates offset. >>>> >>>>> (I think I was wrong when saying getpagesize() above: the small page >>>>> should be target page size, while the huge page should be the host's) >>>> I think we should forget about huge page case in "received bitmap" >>>> concept, maybe in "uffd_copied bitmap" it was reasonable ;) >>> Again, I am not sure I got the whole idea of the reply... >>> >>> However, I do think when we UFFDIO_COPY a huge page, then we should do >>> bitmap_set() on the received bitmap for the whole range that the huge >>> page covers. >> for what purpose? > We chose to use small-paged bitmap since in precopy we need to have > such a granularity (in precopy, we can copy a small page even that > small page is on a host huge page). > > Since we decided to use the small-paged bitmap, we need to make sure > it follows how it was defined: one bit defines whether the > corresponding small page is received. IMHO not following that is hacky > and error-prone. > >>> IMHO, the bitmap is defined as "one bit per small page", and the small >>> page size is TARGET_PAGE_SIZE. We cannot just assume that "as long as >>> the first bit of the huge page is set, all the small pages in the huge >>> page are set". >> At the moment of copying all small pages of the huge page, >> should be received. Yes it's assumption, but I couldn't predict >> side effect, maybe it will be necessary in postcopy failure handling, >> while copying pages back, but I'm not sure right now. >> To know that, need to start implementing it, or at least to deep >> investigation. > Yes, postcopy failure handling is exactly one case where it can be > used. Of course with all the ramblock information we can re-construct > the real bitmap when the source received the bitmaps from destination. > However, why not we make it correct at the very beginning (especially > when it is quite easy to do so)? > > (Actually, I asked since I am working on the RFC series of postcopy > failure recovery. I will post RFCs soon) > > Thanks, > Ok, I'll resend patchset today, all bits of the appropriate huge page will set.
On 07/28/2017 10:06 AM, Alexey Perevalov wrote: > On 07/28/2017 09:57 AM, Peter Xu wrote: >> On Fri, Jul 28, 2017 at 09:43:28AM +0300, Alexey Perevalov wrote: >>> On 07/28/2017 07:27 AM, Peter Xu wrote: >>>> On Thu, Jul 27, 2017 at 10:27:41AM +0300, Alexey Perevalov wrote: >>>>> On 07/27/2017 05:35 AM, Peter Xu wrote: >>>>>> On Wed, Jul 26, 2017 at 06:24:11PM +0300, Alexey Perevalov wrote: >>>>>>> On 07/26/2017 11:43 AM, Peter Xu wrote: >>>>>>>> On Wed, Jul 26, 2017 at 11:07:17AM +0300, Alexey Perevalov wrote: >>>>>>>>> On 07/26/2017 04:49 AM, Peter Xu wrote: >>>>>>>>>> On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov >>>>>>>>>> wrote: >>>>>>>>>>> This patch adds ability to track down already received >>>>>>>>>>> pages, it's necessary for calculation vCPU block time in >>>>>>>>>>> postcopy migration feature, maybe for restore after >>>>>>>>>>> postcopy migration failure. >>>>>>>>>>> Also it's necessary to solve shared memory issue in >>>>>>>>>>> postcopy livemigration. Information about received pages >>>>>>>>>>> will be transferred to the software virtual bridge >>>>>>>>>>> (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for >>>>>>>>>>> already received pages. fallocate syscall is required for >>>>>>>>>>> remmaped shared memory, due to remmaping itself blocks >>>>>>>>>>> ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT >>>>>>>>>>> error (struct page is exists after remmap). >>>>>>>>>>> >>>>>>>>>>> Bitmap is placed into RAMBlock as another postcopy/precopy >>>>>>>>>>> related bitmaps. >>>>>>>>>>> >>>>>>>>>>> Reviewed-by: Peter Xu <peterx@redhat.com> >>>>>>>>>>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> >>>>>>>>>>> --- >>>>>>>>>> [...] >>>>>>>>>> >>>>>>>>>>> static int qemu_ufd_copy_ioctl(int userfault_fd, void >>>>>>>>>>> *host_addr, >>>>>>>>>>> - void *from_addr, uint64_t pagesize) >>>>>>>>>>> + void *from_addr, uint64_t >>>>>>>>>>> pagesize, RAMBlock *rb) >>>>>>>>>>> { >>>>>>>>>>> + int ret; >>>>>>>>>>> if (from_addr) { >>>>>>>>>>> struct uffdio_copy copy_struct; >>>>>>>>>>> copy_struct.dst = (uint64_t)(uintptr_t)host_addr; >>>>>>>>>>> copy_struct.src = (uint64_t)(uintptr_t)from_addr; >>>>>>>>>>> copy_struct.len = pagesize; >>>>>>>>>>> copy_struct.mode = 0; >>>>>>>>>>> - return ioctl(userfault_fd, UFFDIO_COPY, ©_struct); >>>>>>>>>>> + ret = ioctl(userfault_fd, UFFDIO_COPY, ©_struct); >>>>>>>>>>> } else { >>>>>>>>>>> struct uffdio_zeropage zero_struct; >>>>>>>>>>> zero_struct.range.start = >>>>>>>>>>> (uint64_t)(uintptr_t)host_addr; >>>>>>>>>>> zero_struct.range.len = pagesize; >>>>>>>>>>> zero_struct.mode = 0; >>>>>>>>>>> - return ioctl(userfault_fd, UFFDIO_ZEROPAGE, >>>>>>>>>>> &zero_struct); >>>>>>>>>>> + ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, >>>>>>>>>>> &zero_struct); >>>>>>>>>>> + } >>>>>>>>>>> + if (!ret) { >>>>>>>>>>> + ramblock_recv_bitmap_set(host_addr, rb); >>>>>>>>>> Wait... >>>>>>>>>> >>>>>>>>>> Now we are using 4k-page/bit bitmap, do we need to take care >>>>>>>>>> of the >>>>>>>>>> huge pages here? Looks like we are only setting the first >>>>>>>>>> bit of it >>>>>>>>>> if it is a huge page? >>>>>>>>> First version was per ramblock page size, IOW bitmap was >>>>>>>>> smaller in >>>>>>>>> case of hugepages. >>>>>>>> Yes, but this is not the first version any more. :) >>>>>>>> >>>>>>>> This patch is using: >>>>>>>> >>>>>>>> bitmap_new(rb->max_length >> TARGET_PAGE_BITS); >>>>>>>> >>>>>>>> to allocate bitmap, so it is using small pages always for bitmap, >>>>>>>> right? (I should not really say "4k" pages, here I think the >>>>>>>> size is >>>>>>>> host page size, which is the thing returned from getpagesize()). >>>>>>>> >>>>>>>>> You mentioned that TARGET_PAGE_SIZE is reasonable for precopy >>>>>>>>> case, >>>>>>>>> in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap for >>>>>>>>> copied page" >>>>>>>>> I though TARGET_PAGE_SIZE as transmition unit, is using in >>>>>>>>> precopy even >>>>>>>>> hugepage case. >>>>>>>>> But it's not so logically, page being marked as dirty, should >>>>>>>>> be sent as a >>>>>>>>> whole page. >>>>>>>> Sorry if I misunderstood, but I didn't see anything wrong - we are >>>>>>>> sending pages in small pages, but when postcopy is there, we do >>>>>>>> UFFDIO_COPY in huge page, so everything is fine? >>>>>>> I think yes, we chose TARGET_PAGE_SIZE because of wider >>>>>>> use case ranges. >>>>>> So... are you going to post another version? IIUC we just need to >>>>>> use >>>>>> a bitmap_set() to replace the ramblock_recv_bitmap_set(), while set >>>>>> the size with "pagesize / TARGET_PAGE_SIZE"? >>>>> From my point of view TARGET_PAGE_SIZE/TARGET_PAGE_BITS it's a >>>>> platform >>>>> specific >>>>> >>>>> and it used in ram_load to copy to buffer so it's more preferred >>>>> for bitmap size >>>>> and I'm not going to replace ramblock_recv_bitmap_set helper - it >>>>> calculates offset. >>>>> >>>>>> (I think I was wrong when saying getpagesize() above: the small page >>>>>> should be target page size, while the huge page should be the >>>>>> host's) >>>>> I think we should forget about huge page case in "received bitmap" >>>>> concept, maybe in "uffd_copied bitmap" it was reasonable ;) >>>> Again, I am not sure I got the whole idea of the reply... >>>> >>>> However, I do think when we UFFDIO_COPY a huge page, then we should do >>>> bitmap_set() on the received bitmap for the whole range that the huge >>>> page covers. >>> for what purpose? >> We chose to use small-paged bitmap since in precopy we need to have >> such a granularity (in precopy, we can copy a small page even that >> small page is on a host huge page). >> >> Since we decided to use the small-paged bitmap, we need to make sure >> it follows how it was defined: one bit defines whether the >> corresponding small page is received. IMHO not following that is hacky >> and error-prone. >> >>>> IMHO, the bitmap is defined as "one bit per small page", and the small >>>> page size is TARGET_PAGE_SIZE. We cannot just assume that "as long as >>>> the first bit of the huge page is set, all the small pages in the huge >>>> page are set". >>> At the moment of copying all small pages of the huge page, >>> should be received. Yes it's assumption, but I couldn't predict >>> side effect, maybe it will be necessary in postcopy failure handling, >>> while copying pages back, but I'm not sure right now. >>> To know that, need to start implementing it, or at least to deep >>> investigation. >> Yes, postcopy failure handling is exactly one case where it can be >> used. Of course with all the ramblock information we can re-construct >> the real bitmap when the source received the bitmaps from destination. >> However, why not we make it correct at the very beginning (especially >> when it is quite easy to do so)? >> >> (Actually, I asked since I am working on the RFC series of postcopy >> failure recovery. I will post RFCs soon) >> >> Thanks, >> > Ok, I'll resend patchset today, all bits of the appropriate huge > > page will set. > > I saw you already included in you patch set migration: fix incorrect postcopy recved_bitmap do you think, is it worth to include your patch, of course with preserving authorship, into this patch set?
On Fri, Jul 28, 2017 at 06:29:20PM +0300, Alexey Perevalov wrote: > On 07/28/2017 10:06 AM, Alexey Perevalov wrote: > >On 07/28/2017 09:57 AM, Peter Xu wrote: > >>On Fri, Jul 28, 2017 at 09:43:28AM +0300, Alexey Perevalov wrote: > >>>On 07/28/2017 07:27 AM, Peter Xu wrote: > >>>>On Thu, Jul 27, 2017 at 10:27:41AM +0300, Alexey Perevalov wrote: > >>>>>On 07/27/2017 05:35 AM, Peter Xu wrote: > >>>>>>On Wed, Jul 26, 2017 at 06:24:11PM +0300, Alexey Perevalov wrote: > >>>>>>>On 07/26/2017 11:43 AM, Peter Xu wrote: > >>>>>>>>On Wed, Jul 26, 2017 at 11:07:17AM +0300, Alexey Perevalov wrote: > >>>>>>>>>On 07/26/2017 04:49 AM, Peter Xu wrote: > >>>>>>>>>>On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey > >>>>>>>>>>Perevalov wrote: > >>>>>>>>>>>This patch adds ability to track down already received > >>>>>>>>>>>pages, it's necessary for calculation vCPU block time in > >>>>>>>>>>>postcopy migration feature, maybe for restore after > >>>>>>>>>>>postcopy migration failure. > >>>>>>>>>>>Also it's necessary to solve shared memory issue in > >>>>>>>>>>>postcopy livemigration. Information about received pages > >>>>>>>>>>>will be transferred to the software virtual bridge > >>>>>>>>>>>(e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for > >>>>>>>>>>>already received pages. fallocate syscall is required for > >>>>>>>>>>>remmaped shared memory, due to remmaping itself blocks > >>>>>>>>>>>ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT > >>>>>>>>>>>error (struct page is exists after remmap). > >>>>>>>>>>> > >>>>>>>>>>>Bitmap is placed into RAMBlock as another postcopy/precopy > >>>>>>>>>>>related bitmaps. > >>>>>>>>>>> > >>>>>>>>>>>Reviewed-by: Peter Xu <peterx@redhat.com> > >>>>>>>>>>>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> > >>>>>>>>>>>--- > >>>>>>>>>>[...] > >>>>>>>>>> > >>>>>>>>>>> static int qemu_ufd_copy_ioctl(int userfault_fd, > >>>>>>>>>>>void *host_addr, > >>>>>>>>>>>- void *from_addr, uint64_t pagesize) > >>>>>>>>>>>+ void *from_addr, > >>>>>>>>>>>uint64_t pagesize, RAMBlock *rb) > >>>>>>>>>>> { > >>>>>>>>>>>+ int ret; > >>>>>>>>>>> if (from_addr) { > >>>>>>>>>>> struct uffdio_copy copy_struct; > >>>>>>>>>>> copy_struct.dst = (uint64_t)(uintptr_t)host_addr; > >>>>>>>>>>> copy_struct.src = (uint64_t)(uintptr_t)from_addr; > >>>>>>>>>>> copy_struct.len = pagesize; > >>>>>>>>>>> copy_struct.mode = 0; > >>>>>>>>>>>- return ioctl(userfault_fd, UFFDIO_COPY, ©_struct); > >>>>>>>>>>>+ ret = ioctl(userfault_fd, UFFDIO_COPY, ©_struct); > >>>>>>>>>>> } else { > >>>>>>>>>>> struct uffdio_zeropage zero_struct; > >>>>>>>>>>> zero_struct.range.start = > >>>>>>>>>>>(uint64_t)(uintptr_t)host_addr; > >>>>>>>>>>> zero_struct.range.len = pagesize; > >>>>>>>>>>> zero_struct.mode = 0; > >>>>>>>>>>>- return ioctl(userfault_fd, UFFDIO_ZEROPAGE, > >>>>>>>>>>>&zero_struct); > >>>>>>>>>>>+ ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, > >>>>>>>>>>>&zero_struct); > >>>>>>>>>>>+ } > >>>>>>>>>>>+ if (!ret) { > >>>>>>>>>>>+ ramblock_recv_bitmap_set(host_addr, rb); > >>>>>>>>>>Wait... > >>>>>>>>>> > >>>>>>>>>>Now we are using 4k-page/bit bitmap, do we need to take > >>>>>>>>>>care of the > >>>>>>>>>>huge pages here? Looks like we are only setting the > >>>>>>>>>>first bit of it > >>>>>>>>>>if it is a huge page? > >>>>>>>>>First version was per ramblock page size, IOW bitmap was > >>>>>>>>>smaller in > >>>>>>>>>case of hugepages. > >>>>>>>>Yes, but this is not the first version any more. :) > >>>>>>>> > >>>>>>>>This patch is using: > >>>>>>>> > >>>>>>>> bitmap_new(rb->max_length >> TARGET_PAGE_BITS); > >>>>>>>> > >>>>>>>>to allocate bitmap, so it is using small pages always for bitmap, > >>>>>>>>right? (I should not really say "4k" pages, here I think the > >>>>>>>>size is > >>>>>>>>host page size, which is the thing returned from getpagesize()). > >>>>>>>> > >>>>>>>>>You mentioned that TARGET_PAGE_SIZE is reasonable for > >>>>>>>>>precopy case, > >>>>>>>>>in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap > >>>>>>>>>for copied page" > >>>>>>>>>I though TARGET_PAGE_SIZE as transmition unit, is using in > >>>>>>>>>precopy even > >>>>>>>>>hugepage case. > >>>>>>>>>But it's not so logically, page being marked as dirty, > >>>>>>>>>should be sent as a > >>>>>>>>>whole page. > >>>>>>>>Sorry if I misunderstood, but I didn't see anything wrong - we are > >>>>>>>>sending pages in small pages, but when postcopy is there, we do > >>>>>>>>UFFDIO_COPY in huge page, so everything is fine? > >>>>>>>I think yes, we chose TARGET_PAGE_SIZE because of wider > >>>>>>>use case ranges. > >>>>>>So... are you going to post another version? IIUC we just need > >>>>>>to use > >>>>>>a bitmap_set() to replace the ramblock_recv_bitmap_set(), while set > >>>>>>the size with "pagesize / TARGET_PAGE_SIZE"? > >>>>> From my point of view TARGET_PAGE_SIZE/TARGET_PAGE_BITS it's a > >>>>>platform > >>>>>specific > >>>>> > >>>>>and it used in ram_load to copy to buffer so it's more preferred > >>>>>for bitmap size > >>>>>and I'm not going to replace ramblock_recv_bitmap_set helper - it > >>>>>calculates offset. > >>>>> > >>>>>>(I think I was wrong when saying getpagesize() above: the small page > >>>>>> should be target page size, while the huge page should be the > >>>>>>host's) > >>>>>I think we should forget about huge page case in "received bitmap" > >>>>>concept, maybe in "uffd_copied bitmap" it was reasonable ;) > >>>>Again, I am not sure I got the whole idea of the reply... > >>>> > >>>>However, I do think when we UFFDIO_COPY a huge page, then we should do > >>>>bitmap_set() on the received bitmap for the whole range that the huge > >>>>page covers. > >>>for what purpose? > >>We chose to use small-paged bitmap since in precopy we need to have > >>such a granularity (in precopy, we can copy a small page even that > >>small page is on a host huge page). > >> > >>Since we decided to use the small-paged bitmap, we need to make sure > >>it follows how it was defined: one bit defines whether the > >>corresponding small page is received. IMHO not following that is hacky > >>and error-prone. > >> > >>>>IMHO, the bitmap is defined as "one bit per small page", and the small > >>>>page size is TARGET_PAGE_SIZE. We cannot just assume that "as long as > >>>>the first bit of the huge page is set, all the small pages in the huge > >>>>page are set". > >>>At the moment of copying all small pages of the huge page, > >>>should be received. Yes it's assumption, but I couldn't predict > >>>side effect, maybe it will be necessary in postcopy failure handling, > >>>while copying pages back, but I'm not sure right now. > >>>To know that, need to start implementing it, or at least to deep > >>>investigation. > >>Yes, postcopy failure handling is exactly one case where it can be > >>used. Of course with all the ramblock information we can re-construct > >>the real bitmap when the source received the bitmaps from destination. > >>However, why not we make it correct at the very beginning (especially > >>when it is quite easy to do so)? > >> > >>(Actually, I asked since I am working on the RFC series of postcopy > >> failure recovery. I will post RFCs soon) > >> > >>Thanks, > >> > >Ok, I'll resend patchset today, all bits of the appropriate huge > > > >page will set. > > > > > I saw you already included in you patch set > > migration: fix incorrect postcopy recved_bitmap > > > do you think, is it worth to include your patch, > > of course with preserving authorship, into this patch set? I think we'd better squash that patch into yours (considering that current patch hasn't been merged), since I see that patch not an enhancement but a correction of this one. Or, do you have better way to write that? I didn't really think too much on it, just to make sure it can work well with the recovery rfc series. Please don't worry on the authorship, just squash it if you like - I am totally fine that you see that patch as "a comment" but in patch format. :-) -- Peter Xu
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index c04f4f6..bb902bb 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -47,6 +47,8 @@ struct RAMBlock { * of the postcopy phase */ unsigned long *unsentmap; + /* bitmap of already received pages in postcopy */ + unsigned long *receivedmap; }; static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset) @@ -60,6 +62,14 @@ static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t offset) return (char *)block->host + offset; } +static inline unsigned long int ramblock_recv_bitmap_offset(void *host_addr, + RAMBlock *rb) +{ + uint64_t host_addr_offset = + (uint64_t)(uintptr_t)(host_addr - (void *)rb->host); + return host_addr_offset >> TARGET_PAGE_BITS; +} + long qemu_getrampagesize(void); unsigned long last_ram_page(void); RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index be497bb..276ce12 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -560,22 +560,27 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis) } static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr, - void *from_addr, uint64_t pagesize) + void *from_addr, uint64_t pagesize, RAMBlock *rb) { + int ret; if (from_addr) { struct uffdio_copy copy_struct; copy_struct.dst = (uint64_t)(uintptr_t)host_addr; copy_struct.src = (uint64_t)(uintptr_t)from_addr; copy_struct.len = pagesize; copy_struct.mode = 0; - return ioctl(userfault_fd, UFFDIO_COPY, ©_struct); + ret = ioctl(userfault_fd, UFFDIO_COPY, ©_struct); } else { struct uffdio_zeropage zero_struct; zero_struct.range.start = (uint64_t)(uintptr_t)host_addr; zero_struct.range.len = pagesize; zero_struct.mode = 0; - return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); + ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct); + } + if (!ret) { + ramblock_recv_bitmap_set(host_addr, rb); } + return ret; } /* @@ -592,7 +597,7 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from, * which would be slightly cheaper, but we'd have to be careful * of the order of updating our page state. */ - if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, from, pagesize)) { + if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, from, pagesize, rb)) { int e = errno; error_report("%s: %s copy host: %p from: %p (size: %zd)", __func__, strerror(e), host, from, pagesize); @@ -614,7 +619,8 @@ int postcopy_place_page_zero(MigrationIncomingState *mis, void *host, trace_postcopy_place_page_zero(host); if (qemu_ram_pagesize(rb) == getpagesize()) { - if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, NULL, getpagesize())) { + if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, NULL, getpagesize(), + rb)) { int e = errno; error_report("%s: %s zero host: %p", __func__, strerror(e), host); diff --git a/migration/ram.c b/migration/ram.c index 9cc1b17..107ee9d 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -147,6 +147,32 @@ out: return ret; } +static void ramblock_recv_map_init(void) +{ + RAMBlock *rb; + + RAMBLOCK_FOREACH(rb) { + assert(!rb->receivedmap); + rb->receivedmap = bitmap_new(rb->max_length >> TARGET_PAGE_BITS); + } +} + +int ramblock_recv_bitmap_test(void *host_addr, RAMBlock *rb) +{ + return test_bit(ramblock_recv_bitmap_offset(host_addr, rb), + rb->receivedmap); +} + +void ramblock_recv_bitmap_set(void *host_addr, RAMBlock *rb) +{ + set_bit_atomic(ramblock_recv_bitmap_offset(host_addr, rb), rb->receivedmap); +} + +void ramblock_recv_bitmap_clear(void *host_addr, RAMBlock *rb) +{ + clear_bit(ramblock_recv_bitmap_offset(host_addr, rb), rb->receivedmap); +} + /* * An outstanding page request, on the source, having been received * and queued @@ -1793,6 +1819,8 @@ int ram_discard_range(const char *rbname, uint64_t start, size_t length) goto err; } + bitmap_clear(rb->receivedmap, start >> TARGET_PAGE_BITS, + length >> TARGET_PAGE_BITS); ret = ram_block_discard_range(rb, start, length); err: @@ -2324,13 +2352,20 @@ static int ram_load_setup(QEMUFile *f, void *opaque) { xbzrle_load_setup(); compress_threads_load_setup(); + ramblock_recv_map_init(); return 0; } static int ram_load_cleanup(void *opaque) { + RAMBlock *rb; xbzrle_load_cleanup(); compress_threads_load_cleanup(); + + RAMBLOCK_FOREACH(rb) { + g_free(rb->receivedmap); + rb->receivedmap = NULL; + } return 0; } @@ -2545,6 +2580,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) ret = -EINVAL; break; } + ramblock_recv_bitmap_set(host, block); trace_ram_load_loop(block->idstr, (uint64_t)addr, flags, host); } diff --git a/migration/ram.h b/migration/ram.h index c081fde..b711552 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -52,4 +52,9 @@ int ram_discard_range(const char *block_name, uint64_t start, size_t length); int ram_postcopy_incoming_init(MigrationIncomingState *mis); void ram_handle_compressed(void *host, uint8_t ch, uint64_t size); + +int ramblock_recv_bitmap_test(void *host_addr, RAMBlock *rb); +void ramblock_recv_bitmap_set(void *host_addr, RAMBlock *rb); +void ramblock_recv_bitmap_clear(void *host_addr, RAMBlock *rb); + #endif