[v8,3/3] migration: add bitmap for received page

Message ID	1500533554-5779-4-git-send-email-a.perevalov@samsung.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> From: Alexey Perevalov <a.perevalov@samsung.com> To: qemu-devel@nongnu.org Date: Thu, 20 Jul 2017 09:52:34 +0300 Message-id: <1500533554-5779-4-git-send-email-a.perevalov@samsung.com> In-reply-to: <1500533554-5779-1-git-send-email-a.perevalov@samsung.com> CMS-TYPE: 201P References: <1500533554-5779-1-git-send-email-a.perevalov@samsung.com> <CGME20170720065249eucas1p2cc3e779e2cde2125f90ae25ff03ac646@eucas1p2.samsung.com> Subject: [Qemu-devel] [PATCH v8 3/3] migration: add bitmap for received page Precedence: list Cc: heetae82.ahn@samsung.com, quintela@redhat.com, dgilbert@redhat.com, peterx@redhat.com, Alexey Perevalov <a.perevalov@samsung.com>, i.maximets@samsung.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Alexey Perevalov July 20, 2017, 6:52 a.m. UTC

This patch adds ability to track down already received
pages, it's necessary for calculation vCPU block time in
postcopy migration feature, maybe for restore after
postcopy migration failure.
Also it's necessary to solve shared memory issue in
postcopy livemigration. Information about received pages
will be transferred to the software virtual bridge
(e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
already received pages. fallocate syscall is required for
remmaped shared memory, due to remmaping itself blocks
ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
error (struct page is exists after remmap).

Bitmap is placed into RAMBlock as another postcopy/precopy
related bitmaps.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 include/exec/ram_addr.h  | 10 ++++++++++
 migration/postcopy-ram.c | 16 +++++++++++-----
 migration/ram.c          | 36 ++++++++++++++++++++++++++++++++++++
 migration/ram.h          |  5 +++++
 4 files changed, 62 insertions(+), 5 deletions(-)

Peter Xu July 26, 2017, 1:49 a.m. UTC | #1

On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov wrote:
> This patch adds ability to track down already received
> pages, it's necessary for calculation vCPU block time in
> postcopy migration feature, maybe for restore after
> postcopy migration failure.
> Also it's necessary to solve shared memory issue in
> postcopy livemigration. Information about received pages
> will be transferred to the software virtual bridge
> (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
> already received pages. fallocate syscall is required for
> remmaped shared memory, due to remmaping itself blocks
> ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
> error (struct page is exists after remmap).
> 
> Bitmap is placed into RAMBlock as another postcopy/precopy
> related bitmaps.
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---

[...]

>  static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
> -        void *from_addr, uint64_t pagesize)
> +                               void *from_addr, uint64_t pagesize, RAMBlock *rb)
>  {
> +    int ret;
>      if (from_addr) {
>          struct uffdio_copy copy_struct;
>          copy_struct.dst = (uint64_t)(uintptr_t)host_addr;
>          copy_struct.src = (uint64_t)(uintptr_t)from_addr;
>          copy_struct.len = pagesize;
>          copy_struct.mode = 0;
> -        return ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
> +        ret = ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
>      } else {
>          struct uffdio_zeropage zero_struct;
>          zero_struct.range.start = (uint64_t)(uintptr_t)host_addr;
>          zero_struct.range.len = pagesize;
>          zero_struct.mode = 0;
> -        return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
> +        ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
> +    }
> +    if (!ret) {
> +        ramblock_recv_bitmap_set(host_addr, rb);

Wait...

Now we are using 4k-page/bit bitmap, do we need to take care of the
huge pages here?  Looks like we are only setting the first bit of it
if it is a huge page?

Alexey Perevalov July 26, 2017, 8:07 a.m. UTC | #2

On 07/26/2017 04:49 AM, Peter Xu wrote:
> On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov wrote:
>> This patch adds ability to track down already received
>> pages, it's necessary for calculation vCPU block time in
>> postcopy migration feature, maybe for restore after
>> postcopy migration failure.
>> Also it's necessary to solve shared memory issue in
>> postcopy livemigration. Information about received pages
>> will be transferred to the software virtual bridge
>> (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
>> already received pages. fallocate syscall is required for
>> remmaped shared memory, due to remmaping itself blocks
>> ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
>> error (struct page is exists after remmap).
>>
>> Bitmap is placed into RAMBlock as another postcopy/precopy
>> related bitmaps.
>>
>> Reviewed-by: Peter Xu <peterx@redhat.com>
>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
>> ---
> [...]
>
>>   static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
>> -        void *from_addr, uint64_t pagesize)
>> +                               void *from_addr, uint64_t pagesize, RAMBlock *rb)
>>   {
>> +    int ret;
>>       if (from_addr) {
>>           struct uffdio_copy copy_struct;
>>           copy_struct.dst = (uint64_t)(uintptr_t)host_addr;
>>           copy_struct.src = (uint64_t)(uintptr_t)from_addr;
>>           copy_struct.len = pagesize;
>>           copy_struct.mode = 0;
>> -        return ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
>> +        ret = ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
>>       } else {
>>           struct uffdio_zeropage zero_struct;
>>           zero_struct.range.start = (uint64_t)(uintptr_t)host_addr;
>>           zero_struct.range.len = pagesize;
>>           zero_struct.mode = 0;
>> -        return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
>> +        ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
>> +    }
>> +    if (!ret) {
>> +        ramblock_recv_bitmap_set(host_addr, rb);
> Wait...
>
> Now we are using 4k-page/bit bitmap, do we need to take care of the
> huge pages here?  Looks like we are only setting the first bit of it
> if it is a huge page?
First version was per ramblock page size, IOW bitmap was smaller in
case of hugepages.


You mentioned that TARGET_PAGE_SIZE is reasonable for precopy case,
in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap for copied page"
I though TARGET_PAGE_SIZE as transmition unit, is using in precopy even 
hugepage case.
But it's not so logically, page being marked as dirty, should be sent as 
a whole page.

>

Peter Xu July 26, 2017, 8:43 a.m. UTC | #3

On Wed, Jul 26, 2017 at 11:07:17AM +0300, Alexey Perevalov wrote:
> On 07/26/2017 04:49 AM, Peter Xu wrote:
> >On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov wrote:
> >>This patch adds ability to track down already received
> >>pages, it's necessary for calculation vCPU block time in
> >>postcopy migration feature, maybe for restore after
> >>postcopy migration failure.
> >>Also it's necessary to solve shared memory issue in
> >>postcopy livemigration. Information about received pages
> >>will be transferred to the software virtual bridge
> >>(e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
> >>already received pages. fallocate syscall is required for
> >>remmaped shared memory, due to remmaping itself blocks
> >>ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
> >>error (struct page is exists after remmap).
> >>
> >>Bitmap is placed into RAMBlock as another postcopy/precopy
> >>related bitmaps.
> >>
> >>Reviewed-by: Peter Xu <peterx@redhat.com>
> >>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> >>---
> >[...]
> >
> >>  static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
> >>-        void *from_addr, uint64_t pagesize)
> >>+                               void *from_addr, uint64_t pagesize, RAMBlock *rb)
> >>  {
> >>+    int ret;
> >>      if (from_addr) {
> >>          struct uffdio_copy copy_struct;
> >>          copy_struct.dst = (uint64_t)(uintptr_t)host_addr;
> >>          copy_struct.src = (uint64_t)(uintptr_t)from_addr;
> >>          copy_struct.len = pagesize;
> >>          copy_struct.mode = 0;
> >>-        return ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
> >>+        ret = ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
> >>      } else {
> >>          struct uffdio_zeropage zero_struct;
> >>          zero_struct.range.start = (uint64_t)(uintptr_t)host_addr;
> >>          zero_struct.range.len = pagesize;
> >>          zero_struct.mode = 0;
> >>-        return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
> >>+        ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
> >>+    }
> >>+    if (!ret) {
> >>+        ramblock_recv_bitmap_set(host_addr, rb);
> >Wait...
> >
> >Now we are using 4k-page/bit bitmap, do we need to take care of the
> >huge pages here?  Looks like we are only setting the first bit of it
> >if it is a huge page?
> First version was per ramblock page size, IOW bitmap was smaller in
> case of hugepages.

Yes, but this is not the first version any more. :)

This patch is using:

  bitmap_new(rb->max_length >> TARGET_PAGE_BITS);

to allocate bitmap, so it is using small pages always for bitmap,
right? (I should not really say "4k" pages, here I think the size is
host page size, which is the thing returned from getpagesize()).

> 
> 
> You mentioned that TARGET_PAGE_SIZE is reasonable for precopy case,
> in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap for copied page"
> I though TARGET_PAGE_SIZE as transmition unit, is using in precopy even
> hugepage case.
> But it's not so logically, page being marked as dirty, should be sent as a
> whole page.

Sorry if I misunderstood, but I didn't see anything wrong - we are
sending pages in small pages, but when postcopy is there, we do
UFFDIO_COPY in huge page, so everything is fine?

Alexey Perevalov July 26, 2017, 3:24 p.m. UTC | #4

On 07/26/2017 11:43 AM, Peter Xu wrote:
> On Wed, Jul 26, 2017 at 11:07:17AM +0300, Alexey Perevalov wrote:
>> On 07/26/2017 04:49 AM, Peter Xu wrote:
>>> On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov wrote:
>>>> This patch adds ability to track down already received
>>>> pages, it's necessary for calculation vCPU block time in
>>>> postcopy migration feature, maybe for restore after
>>>> postcopy migration failure.
>>>> Also it's necessary to solve shared memory issue in
>>>> postcopy livemigration. Information about received pages
>>>> will be transferred to the software virtual bridge
>>>> (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
>>>> already received pages. fallocate syscall is required for
>>>> remmaped shared memory, due to remmaping itself blocks
>>>> ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
>>>> error (struct page is exists after remmap).
>>>>
>>>> Bitmap is placed into RAMBlock as another postcopy/precopy
>>>> related bitmaps.
>>>>
>>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
>>>> ---
>>> [...]
>>>
>>>>   static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
>>>> -        void *from_addr, uint64_t pagesize)
>>>> +                               void *from_addr, uint64_t pagesize, RAMBlock *rb)
>>>>   {
>>>> +    int ret;
>>>>       if (from_addr) {
>>>>           struct uffdio_copy copy_struct;
>>>>           copy_struct.dst = (uint64_t)(uintptr_t)host_addr;
>>>>           copy_struct.src = (uint64_t)(uintptr_t)from_addr;
>>>>           copy_struct.len = pagesize;
>>>>           copy_struct.mode = 0;
>>>> -        return ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
>>>> +        ret = ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
>>>>       } else {
>>>>           struct uffdio_zeropage zero_struct;
>>>>           zero_struct.range.start = (uint64_t)(uintptr_t)host_addr;
>>>>           zero_struct.range.len = pagesize;
>>>>           zero_struct.mode = 0;
>>>> -        return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
>>>> +        ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
>>>> +    }
>>>> +    if (!ret) {
>>>> +        ramblock_recv_bitmap_set(host_addr, rb);
>>> Wait...
>>>
>>> Now we are using 4k-page/bit bitmap, do we need to take care of the
>>> huge pages here?  Looks like we are only setting the first bit of it
>>> if it is a huge page?
>> First version was per ramblock page size, IOW bitmap was smaller in
>> case of hugepages.
> Yes, but this is not the first version any more. :)
>
> This patch is using:
>
>    bitmap_new(rb->max_length >> TARGET_PAGE_BITS);
>
> to allocate bitmap, so it is using small pages always for bitmap,
> right? (I should not really say "4k" pages, here I think the size is
> host page size, which is the thing returned from getpagesize()).
>
>>
>> You mentioned that TARGET_PAGE_SIZE is reasonable for precopy case,
>> in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap for copied page"
>> I though TARGET_PAGE_SIZE as transmition unit, is using in precopy even
>> hugepage case.
>> But it's not so logically, page being marked as dirty, should be sent as a
>> whole page.
> Sorry if I misunderstood, but I didn't see anything wrong - we are
> sending pages in small pages, but when postcopy is there, we do
> UFFDIO_COPY in huge page, so everything is fine?
I think yes, we chose TARGET_PAGE_SIZE because of wider
use case ranges.

Peter Xu July 27, 2017, 2:35 a.m. UTC | #5

On Wed, Jul 26, 2017 at 06:24:11PM +0300, Alexey Perevalov wrote:
> On 07/26/2017 11:43 AM, Peter Xu wrote:
> >On Wed, Jul 26, 2017 at 11:07:17AM +0300, Alexey Perevalov wrote:
> >>On 07/26/2017 04:49 AM, Peter Xu wrote:
> >>>On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov wrote:
> >>>>This patch adds ability to track down already received
> >>>>pages, it's necessary for calculation vCPU block time in
> >>>>postcopy migration feature, maybe for restore after
> >>>>postcopy migration failure.
> >>>>Also it's necessary to solve shared memory issue in
> >>>>postcopy livemigration. Information about received pages
> >>>>will be transferred to the software virtual bridge
> >>>>(e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
> >>>>already received pages. fallocate syscall is required for
> >>>>remmaped shared memory, due to remmaping itself blocks
> >>>>ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
> >>>>error (struct page is exists after remmap).
> >>>>
> >>>>Bitmap is placed into RAMBlock as another postcopy/precopy
> >>>>related bitmaps.
> >>>>
> >>>>Reviewed-by: Peter Xu <peterx@redhat.com>
> >>>>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> >>>>---
> >>>[...]
> >>>
> >>>>  static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
> >>>>-        void *from_addr, uint64_t pagesize)
> >>>>+                               void *from_addr, uint64_t pagesize, RAMBlock *rb)
> >>>>  {
> >>>>+    int ret;
> >>>>      if (from_addr) {
> >>>>          struct uffdio_copy copy_struct;
> >>>>          copy_struct.dst = (uint64_t)(uintptr_t)host_addr;
> >>>>          copy_struct.src = (uint64_t)(uintptr_t)from_addr;
> >>>>          copy_struct.len = pagesize;
> >>>>          copy_struct.mode = 0;
> >>>>-        return ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
> >>>>+        ret = ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
> >>>>      } else {
> >>>>          struct uffdio_zeropage zero_struct;
> >>>>          zero_struct.range.start = (uint64_t)(uintptr_t)host_addr;
> >>>>          zero_struct.range.len = pagesize;
> >>>>          zero_struct.mode = 0;
> >>>>-        return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
> >>>>+        ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
> >>>>+    }
> >>>>+    if (!ret) {
> >>>>+        ramblock_recv_bitmap_set(host_addr, rb);
> >>>Wait...
> >>>
> >>>Now we are using 4k-page/bit bitmap, do we need to take care of the
> >>>huge pages here?  Looks like we are only setting the first bit of it
> >>>if it is a huge page?
> >>First version was per ramblock page size, IOW bitmap was smaller in
> >>case of hugepages.
> >Yes, but this is not the first version any more. :)
> >
> >This patch is using:
> >
> >   bitmap_new(rb->max_length >> TARGET_PAGE_BITS);
> >
> >to allocate bitmap, so it is using small pages always for bitmap,
> >right? (I should not really say "4k" pages, here I think the size is
> >host page size, which is the thing returned from getpagesize()).
> >
> >>
> >>You mentioned that TARGET_PAGE_SIZE is reasonable for precopy case,
> >>in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap for copied page"
> >>I though TARGET_PAGE_SIZE as transmition unit, is using in precopy even
> >>hugepage case.
> >>But it's not so logically, page being marked as dirty, should be sent as a
> >>whole page.
> >Sorry if I misunderstood, but I didn't see anything wrong - we are
> >sending pages in small pages, but when postcopy is there, we do
> >UFFDIO_COPY in huge page, so everything is fine?
> I think yes, we chose TARGET_PAGE_SIZE because of wider
> use case ranges.

So... are you going to post another version? IIUC we just need to use
a bitmap_set() to replace the ramblock_recv_bitmap_set(), while set
the size with "pagesize / TARGET_PAGE_SIZE"?

(I think I was wrong when saying getpagesize() above: the small page
 should be target page size, while the huge page should be the host's)

Alexey Perevalov July 27, 2017, 7:27 a.m. UTC | #6

On 07/27/2017 05:35 AM, Peter Xu wrote:
> On Wed, Jul 26, 2017 at 06:24:11PM +0300, Alexey Perevalov wrote:
>> On 07/26/2017 11:43 AM, Peter Xu wrote:
>>> On Wed, Jul 26, 2017 at 11:07:17AM +0300, Alexey Perevalov wrote:
>>>> On 07/26/2017 04:49 AM, Peter Xu wrote:
>>>>> On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov wrote:
>>>>>> This patch adds ability to track down already received
>>>>>> pages, it's necessary for calculation vCPU block time in
>>>>>> postcopy migration feature, maybe for restore after
>>>>>> postcopy migration failure.
>>>>>> Also it's necessary to solve shared memory issue in
>>>>>> postcopy livemigration. Information about received pages
>>>>>> will be transferred to the software virtual bridge
>>>>>> (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
>>>>>> already received pages. fallocate syscall is required for
>>>>>> remmaped shared memory, due to remmaping itself blocks
>>>>>> ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
>>>>>> error (struct page is exists after remmap).
>>>>>>
>>>>>> Bitmap is placed into RAMBlock as another postcopy/precopy
>>>>>> related bitmaps.
>>>>>>
>>>>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>>>>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
>>>>>> ---
>>>>> [...]
>>>>>
>>>>>>   static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
>>>>>> -        void *from_addr, uint64_t pagesize)
>>>>>> +                               void *from_addr, uint64_t pagesize, RAMBlock *rb)
>>>>>>   {
>>>>>> +    int ret;
>>>>>>       if (from_addr) {
>>>>>>           struct uffdio_copy copy_struct;
>>>>>>           copy_struct.dst = (uint64_t)(uintptr_t)host_addr;
>>>>>>           copy_struct.src = (uint64_t)(uintptr_t)from_addr;
>>>>>>           copy_struct.len = pagesize;
>>>>>>           copy_struct.mode = 0;
>>>>>> -        return ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
>>>>>> +        ret = ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
>>>>>>       } else {
>>>>>>           struct uffdio_zeropage zero_struct;
>>>>>>           zero_struct.range.start = (uint64_t)(uintptr_t)host_addr;
>>>>>>           zero_struct.range.len = pagesize;
>>>>>>           zero_struct.mode = 0;
>>>>>> -        return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
>>>>>> +        ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
>>>>>> +    }
>>>>>> +    if (!ret) {
>>>>>> +        ramblock_recv_bitmap_set(host_addr, rb);
>>>>> Wait...
>>>>>
>>>>> Now we are using 4k-page/bit bitmap, do we need to take care of the
>>>>> huge pages here?  Looks like we are only setting the first bit of it
>>>>> if it is a huge page?
>>>> First version was per ramblock page size, IOW bitmap was smaller in
>>>> case of hugepages.
>>> Yes, but this is not the first version any more. :)
>>>
>>> This patch is using:
>>>
>>>    bitmap_new(rb->max_length >> TARGET_PAGE_BITS);
>>>
>>> to allocate bitmap, so it is using small pages always for bitmap,
>>> right? (I should not really say "4k" pages, here I think the size is
>>> host page size, which is the thing returned from getpagesize()).
>>>
>>>> You mentioned that TARGET_PAGE_SIZE is reasonable for precopy case,
>>>> in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap for copied page"
>>>> I though TARGET_PAGE_SIZE as transmition unit, is using in precopy even
>>>> hugepage case.
>>>> But it's not so logically, page being marked as dirty, should be sent as a
>>>> whole page.
>>> Sorry if I misunderstood, but I didn't see anything wrong - we are
>>> sending pages in small pages, but when postcopy is there, we do
>>> UFFDIO_COPY in huge page, so everything is fine?
>> I think yes, we chose TARGET_PAGE_SIZE because of wider
>> use case ranges.
> So... are you going to post another version? IIUC we just need to use
> a bitmap_set() to replace the ramblock_recv_bitmap_set(), while set
> the size with "pagesize / TARGET_PAGE_SIZE"?
 From my point of view TARGET_PAGE_SIZE/TARGET_PAGE_BITS it's a platform 
specific

and it used in ram_load to copy to buffer so it's more preferred for bitmap size
and I'm not going to replace ramblock_recv_bitmap_set helper - it calculates offset.

>
> (I think I was wrong when saying getpagesize() above: the small page
>   should be target page size, while the huge page should be the host's)
I think we should forget about huge page case in "received bitmap"
concept, maybe in "uffd_copied bitmap" it was reasonable ;)

>

Peter Xu July 28, 2017, 4:27 a.m. UTC | #7

On Thu, Jul 27, 2017 at 10:27:41AM +0300, Alexey Perevalov wrote:
> On 07/27/2017 05:35 AM, Peter Xu wrote:
> >On Wed, Jul 26, 2017 at 06:24:11PM +0300, Alexey Perevalov wrote:
> >>On 07/26/2017 11:43 AM, Peter Xu wrote:
> >>>On Wed, Jul 26, 2017 at 11:07:17AM +0300, Alexey Perevalov wrote:
> >>>>On 07/26/2017 04:49 AM, Peter Xu wrote:
> >>>>>On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov wrote:
> >>>>>>This patch adds ability to track down already received
> >>>>>>pages, it's necessary for calculation vCPU block time in
> >>>>>>postcopy migration feature, maybe for restore after
> >>>>>>postcopy migration failure.
> >>>>>>Also it's necessary to solve shared memory issue in
> >>>>>>postcopy livemigration. Information about received pages
> >>>>>>will be transferred to the software virtual bridge
> >>>>>>(e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
> >>>>>>already received pages. fallocate syscall is required for
> >>>>>>remmaped shared memory, due to remmaping itself blocks
> >>>>>>ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
> >>>>>>error (struct page is exists after remmap).
> >>>>>>
> >>>>>>Bitmap is placed into RAMBlock as another postcopy/precopy
> >>>>>>related bitmaps.
> >>>>>>
> >>>>>>Reviewed-by: Peter Xu <peterx@redhat.com>
> >>>>>>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> >>>>>>---
> >>>>>[...]
> >>>>>
> >>>>>>  static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
> >>>>>>-        void *from_addr, uint64_t pagesize)
> >>>>>>+                               void *from_addr, uint64_t pagesize, RAMBlock *rb)
> >>>>>>  {
> >>>>>>+    int ret;
> >>>>>>      if (from_addr) {
> >>>>>>          struct uffdio_copy copy_struct;
> >>>>>>          copy_struct.dst = (uint64_t)(uintptr_t)host_addr;
> >>>>>>          copy_struct.src = (uint64_t)(uintptr_t)from_addr;
> >>>>>>          copy_struct.len = pagesize;
> >>>>>>          copy_struct.mode = 0;
> >>>>>>-        return ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
> >>>>>>+        ret = ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
> >>>>>>      } else {
> >>>>>>          struct uffdio_zeropage zero_struct;
> >>>>>>          zero_struct.range.start = (uint64_t)(uintptr_t)host_addr;
> >>>>>>          zero_struct.range.len = pagesize;
> >>>>>>          zero_struct.mode = 0;
> >>>>>>-        return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
> >>>>>>+        ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
> >>>>>>+    }
> >>>>>>+    if (!ret) {
> >>>>>>+        ramblock_recv_bitmap_set(host_addr, rb);
> >>>>>Wait...
> >>>>>
> >>>>>Now we are using 4k-page/bit bitmap, do we need to take care of the
> >>>>>huge pages here?  Looks like we are only setting the first bit of it
> >>>>>if it is a huge page?
> >>>>First version was per ramblock page size, IOW bitmap was smaller in
> >>>>case of hugepages.
> >>>Yes, but this is not the first version any more. :)
> >>>
> >>>This patch is using:
> >>>
> >>>   bitmap_new(rb->max_length >> TARGET_PAGE_BITS);
> >>>
> >>>to allocate bitmap, so it is using small pages always for bitmap,
> >>>right? (I should not really say "4k" pages, here I think the size is
> >>>host page size, which is the thing returned from getpagesize()).
> >>>
> >>>>You mentioned that TARGET_PAGE_SIZE is reasonable for precopy case,
> >>>>in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap for copied page"
> >>>>I though TARGET_PAGE_SIZE as transmition unit, is using in precopy even
> >>>>hugepage case.
> >>>>But it's not so logically, page being marked as dirty, should be sent as a
> >>>>whole page.
> >>>Sorry if I misunderstood, but I didn't see anything wrong - we are
> >>>sending pages in small pages, but when postcopy is there, we do
> >>>UFFDIO_COPY in huge page, so everything is fine?
> >>I think yes, we chose TARGET_PAGE_SIZE because of wider
> >>use case ranges.
> >So... are you going to post another version? IIUC we just need to use
> >a bitmap_set() to replace the ramblock_recv_bitmap_set(), while set
> >the size with "pagesize / TARGET_PAGE_SIZE"?
> From my point of view TARGET_PAGE_SIZE/TARGET_PAGE_BITS it's a platform
> specific
> 
> and it used in ram_load to copy to buffer so it's more preferred for bitmap size
> and I'm not going to replace ramblock_recv_bitmap_set helper - it calculates offset.
> 
> >
> >(I think I was wrong when saying getpagesize() above: the small page
> >  should be target page size, while the huge page should be the host's)
> I think we should forget about huge page case in "received bitmap"
> concept, maybe in "uffd_copied bitmap" it was reasonable ;)

Again, I am not sure I got the whole idea of the reply...

However, I do think when we UFFDIO_COPY a huge page, then we should do
bitmap_set() on the received bitmap for the whole range that the huge
page covers.

IMHO, the bitmap is defined as "one bit per small page", and the small
page size is TARGET_PAGE_SIZE. We cannot just assume that "as long as
the first bit of the huge page is set, all the small pages in the huge
page are set".

Thanks,

Alexey Perevalov July 28, 2017, 6:43 a.m. UTC | #8

On 07/28/2017 07:27 AM, Peter Xu wrote:
> On Thu, Jul 27, 2017 at 10:27:41AM +0300, Alexey Perevalov wrote:
>> On 07/27/2017 05:35 AM, Peter Xu wrote:
>>> On Wed, Jul 26, 2017 at 06:24:11PM +0300, Alexey Perevalov wrote:
>>>> On 07/26/2017 11:43 AM, Peter Xu wrote:
>>>>> On Wed, Jul 26, 2017 at 11:07:17AM +0300, Alexey Perevalov wrote:
>>>>>> On 07/26/2017 04:49 AM, Peter Xu wrote:
>>>>>>> On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov wrote:
>>>>>>>> This patch adds ability to track down already received
>>>>>>>> pages, it's necessary for calculation vCPU block time in
>>>>>>>> postcopy migration feature, maybe for restore after
>>>>>>>> postcopy migration failure.
>>>>>>>> Also it's necessary to solve shared memory issue in
>>>>>>>> postcopy livemigration. Information about received pages
>>>>>>>> will be transferred to the software virtual bridge
>>>>>>>> (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
>>>>>>>> already received pages. fallocate syscall is required for
>>>>>>>> remmaped shared memory, due to remmaping itself blocks
>>>>>>>> ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
>>>>>>>> error (struct page is exists after remmap).
>>>>>>>>
>>>>>>>> Bitmap is placed into RAMBlock as another postcopy/precopy
>>>>>>>> related bitmaps.
>>>>>>>>
>>>>>>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>>>>>>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
>>>>>>>> ---
>>>>>>> [...]
>>>>>>>
>>>>>>>>   static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
>>>>>>>> -        void *from_addr, uint64_t pagesize)
>>>>>>>> +                               void *from_addr, uint64_t pagesize, RAMBlock *rb)
>>>>>>>>   {
>>>>>>>> +    int ret;
>>>>>>>>       if (from_addr) {
>>>>>>>>           struct uffdio_copy copy_struct;
>>>>>>>>           copy_struct.dst = (uint64_t)(uintptr_t)host_addr;
>>>>>>>>           copy_struct.src = (uint64_t)(uintptr_t)from_addr;
>>>>>>>>           copy_struct.len = pagesize;
>>>>>>>>           copy_struct.mode = 0;
>>>>>>>> -        return ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
>>>>>>>> +        ret = ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
>>>>>>>>       } else {
>>>>>>>>           struct uffdio_zeropage zero_struct;
>>>>>>>>           zero_struct.range.start = (uint64_t)(uintptr_t)host_addr;
>>>>>>>>           zero_struct.range.len = pagesize;
>>>>>>>>           zero_struct.mode = 0;
>>>>>>>> -        return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
>>>>>>>> +        ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
>>>>>>>> +    }
>>>>>>>> +    if (!ret) {
>>>>>>>> +        ramblock_recv_bitmap_set(host_addr, rb);
>>>>>>> Wait...
>>>>>>>
>>>>>>> Now we are using 4k-page/bit bitmap, do we need to take care of the
>>>>>>> huge pages here?  Looks like we are only setting the first bit of it
>>>>>>> if it is a huge page?
>>>>>> First version was per ramblock page size, IOW bitmap was smaller in
>>>>>> case of hugepages.
>>>>> Yes, but this is not the first version any more. :)
>>>>>
>>>>> This patch is using:
>>>>>
>>>>>    bitmap_new(rb->max_length >> TARGET_PAGE_BITS);
>>>>>
>>>>> to allocate bitmap, so it is using small pages always for bitmap,
>>>>> right? (I should not really say "4k" pages, here I think the size is
>>>>> host page size, which is the thing returned from getpagesize()).
>>>>>
>>>>>> You mentioned that TARGET_PAGE_SIZE is reasonable for precopy case,
>>>>>> in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap for copied page"
>>>>>> I though TARGET_PAGE_SIZE as transmition unit, is using in precopy even
>>>>>> hugepage case.
>>>>>> But it's not so logically, page being marked as dirty, should be sent as a
>>>>>> whole page.
>>>>> Sorry if I misunderstood, but I didn't see anything wrong - we are
>>>>> sending pages in small pages, but when postcopy is there, we do
>>>>> UFFDIO_COPY in huge page, so everything is fine?
>>>> I think yes, we chose TARGET_PAGE_SIZE because of wider
>>>> use case ranges.
>>> So... are you going to post another version? IIUC we just need to use
>>> a bitmap_set() to replace the ramblock_recv_bitmap_set(), while set
>>> the size with "pagesize / TARGET_PAGE_SIZE"?
>>  From my point of view TARGET_PAGE_SIZE/TARGET_PAGE_BITS it's a platform
>> specific
>>
>> and it used in ram_load to copy to buffer so it's more preferred for bitmap size
>> and I'm not going to replace ramblock_recv_bitmap_set helper - it calculates offset.
>>
>>> (I think I was wrong when saying getpagesize() above: the small page
>>>   should be target page size, while the huge page should be the host's)
>> I think we should forget about huge page case in "received bitmap"
>> concept, maybe in "uffd_copied bitmap" it was reasonable ;)
> Again, I am not sure I got the whole idea of the reply...
>
> However, I do think when we UFFDIO_COPY a huge page, then we should do
> bitmap_set() on the received bitmap for the whole range that the huge
> page covers.
for what purpose?

>
> IMHO, the bitmap is defined as "one bit per small page", and the small
> page size is TARGET_PAGE_SIZE. We cannot just assume that "as long as
> the first bit of the huge page is set, all the small pages in the huge
> page are set".
At the moment of copying all small pages of the huge page,
should be received. Yes it's assumption, but I couldn't predict
side effect, maybe it will be necessary in postcopy failure handling,
while copying pages back, but I'm not sure right now.
To know that, need to start implementing it, or at least to deep 
investigation.
> Thanks,
>

Peter Xu July 28, 2017, 6:57 a.m. UTC | #9

On Fri, Jul 28, 2017 at 09:43:28AM +0300, Alexey Perevalov wrote:
> On 07/28/2017 07:27 AM, Peter Xu wrote:
> >On Thu, Jul 27, 2017 at 10:27:41AM +0300, Alexey Perevalov wrote:
> >>On 07/27/2017 05:35 AM, Peter Xu wrote:
> >>>On Wed, Jul 26, 2017 at 06:24:11PM +0300, Alexey Perevalov wrote:
> >>>>On 07/26/2017 11:43 AM, Peter Xu wrote:
> >>>>>On Wed, Jul 26, 2017 at 11:07:17AM +0300, Alexey Perevalov wrote:
> >>>>>>On 07/26/2017 04:49 AM, Peter Xu wrote:
> >>>>>>>On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov wrote:
> >>>>>>>>This patch adds ability to track down already received
> >>>>>>>>pages, it's necessary for calculation vCPU block time in
> >>>>>>>>postcopy migration feature, maybe for restore after
> >>>>>>>>postcopy migration failure.
> >>>>>>>>Also it's necessary to solve shared memory issue in
> >>>>>>>>postcopy livemigration. Information about received pages
> >>>>>>>>will be transferred to the software virtual bridge
> >>>>>>>>(e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
> >>>>>>>>already received pages. fallocate syscall is required for
> >>>>>>>>remmaped shared memory, due to remmaping itself blocks
> >>>>>>>>ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
> >>>>>>>>error (struct page is exists after remmap).
> >>>>>>>>
> >>>>>>>>Bitmap is placed into RAMBlock as another postcopy/precopy
> >>>>>>>>related bitmaps.
> >>>>>>>>
> >>>>>>>>Reviewed-by: Peter Xu <peterx@redhat.com>
> >>>>>>>>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> >>>>>>>>---
> >>>>>>>[...]
> >>>>>>>
> >>>>>>>>  static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
> >>>>>>>>-        void *from_addr, uint64_t pagesize)
> >>>>>>>>+                               void *from_addr, uint64_t pagesize, RAMBlock *rb)
> >>>>>>>>  {
> >>>>>>>>+    int ret;
> >>>>>>>>      if (from_addr) {
> >>>>>>>>          struct uffdio_copy copy_struct;
> >>>>>>>>          copy_struct.dst = (uint64_t)(uintptr_t)host_addr;
> >>>>>>>>          copy_struct.src = (uint64_t)(uintptr_t)from_addr;
> >>>>>>>>          copy_struct.len = pagesize;
> >>>>>>>>          copy_struct.mode = 0;
> >>>>>>>>-        return ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
> >>>>>>>>+        ret = ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
> >>>>>>>>      } else {
> >>>>>>>>          struct uffdio_zeropage zero_struct;
> >>>>>>>>          zero_struct.range.start = (uint64_t)(uintptr_t)host_addr;
> >>>>>>>>          zero_struct.range.len = pagesize;
> >>>>>>>>          zero_struct.mode = 0;
> >>>>>>>>-        return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
> >>>>>>>>+        ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
> >>>>>>>>+    }
> >>>>>>>>+    if (!ret) {
> >>>>>>>>+        ramblock_recv_bitmap_set(host_addr, rb);
> >>>>>>>Wait...
> >>>>>>>
> >>>>>>>Now we are using 4k-page/bit bitmap, do we need to take care of the
> >>>>>>>huge pages here?  Looks like we are only setting the first bit of it
> >>>>>>>if it is a huge page?
> >>>>>>First version was per ramblock page size, IOW bitmap was smaller in
> >>>>>>case of hugepages.
> >>>>>Yes, but this is not the first version any more. :)
> >>>>>
> >>>>>This patch is using:
> >>>>>
> >>>>>   bitmap_new(rb->max_length >> TARGET_PAGE_BITS);
> >>>>>
> >>>>>to allocate bitmap, so it is using small pages always for bitmap,
> >>>>>right? (I should not really say "4k" pages, here I think the size is
> >>>>>host page size, which is the thing returned from getpagesize()).
> >>>>>
> >>>>>>You mentioned that TARGET_PAGE_SIZE is reasonable for precopy case,
> >>>>>>in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap for copied page"
> >>>>>>I though TARGET_PAGE_SIZE as transmition unit, is using in precopy even
> >>>>>>hugepage case.
> >>>>>>But it's not so logically, page being marked as dirty, should be sent as a
> >>>>>>whole page.
> >>>>>Sorry if I misunderstood, but I didn't see anything wrong - we are
> >>>>>sending pages in small pages, but when postcopy is there, we do
> >>>>>UFFDIO_COPY in huge page, so everything is fine?
> >>>>I think yes, we chose TARGET_PAGE_SIZE because of wider
> >>>>use case ranges.
> >>>So... are you going to post another version? IIUC we just need to use
> >>>a bitmap_set() to replace the ramblock_recv_bitmap_set(), while set
> >>>the size with "pagesize / TARGET_PAGE_SIZE"?
> >> From my point of view TARGET_PAGE_SIZE/TARGET_PAGE_BITS it's a platform
> >>specific
> >>
> >>and it used in ram_load to copy to buffer so it's more preferred for bitmap size
> >>and I'm not going to replace ramblock_recv_bitmap_set helper - it calculates offset.
> >>
> >>>(I think I was wrong when saying getpagesize() above: the small page
> >>>  should be target page size, while the huge page should be the host's)
> >>I think we should forget about huge page case in "received bitmap"
> >>concept, maybe in "uffd_copied bitmap" it was reasonable ;)
> >Again, I am not sure I got the whole idea of the reply...
> >
> >However, I do think when we UFFDIO_COPY a huge page, then we should do
> >bitmap_set() on the received bitmap for the whole range that the huge
> >page covers.
> for what purpose?

We chose to use small-paged bitmap since in precopy we need to have
such a granularity (in precopy, we can copy a small page even that
small page is on a host huge page).

Since we decided to use the small-paged bitmap, we need to make sure
it follows how it was defined: one bit defines whether the
corresponding small page is received. IMHO not following that is hacky
and error-prone.

> 
> >
> >IMHO, the bitmap is defined as "one bit per small page", and the small
> >page size is TARGET_PAGE_SIZE. We cannot just assume that "as long as
> >the first bit of the huge page is set, all the small pages in the huge
> >page are set".
> At the moment of copying all small pages of the huge page,
> should be received. Yes it's assumption, but I couldn't predict
> side effect, maybe it will be necessary in postcopy failure handling,
> while copying pages back, but I'm not sure right now.
> To know that, need to start implementing it, or at least to deep
> investigation.

Yes, postcopy failure handling is exactly one case where it can be
used. Of course with all the ramblock information we can re-construct
the real bitmap when the source received the bitmaps from destination.
However, why not we make it correct at the very beginning (especially
when it is quite easy to do so)?

(Actually, I asked since I am working on the RFC series of postcopy
 failure recovery. I will post RFCs soon)

Thanks,

Alexey Perevalov July 28, 2017, 7:06 a.m. UTC | #10

On 07/28/2017 09:57 AM, Peter Xu wrote:
> On Fri, Jul 28, 2017 at 09:43:28AM +0300, Alexey Perevalov wrote:
>> On 07/28/2017 07:27 AM, Peter Xu wrote:
>>> On Thu, Jul 27, 2017 at 10:27:41AM +0300, Alexey Perevalov wrote:
>>>> On 07/27/2017 05:35 AM, Peter Xu wrote:
>>>>> On Wed, Jul 26, 2017 at 06:24:11PM +0300, Alexey Perevalov wrote:
>>>>>> On 07/26/2017 11:43 AM, Peter Xu wrote:
>>>>>>> On Wed, Jul 26, 2017 at 11:07:17AM +0300, Alexey Perevalov wrote:
>>>>>>>> On 07/26/2017 04:49 AM, Peter Xu wrote:
>>>>>>>>> On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov wrote:
>>>>>>>>>> This patch adds ability to track down already received
>>>>>>>>>> pages, it's necessary for calculation vCPU block time in
>>>>>>>>>> postcopy migration feature, maybe for restore after
>>>>>>>>>> postcopy migration failure.
>>>>>>>>>> Also it's necessary to solve shared memory issue in
>>>>>>>>>> postcopy livemigration. Information about received pages
>>>>>>>>>> will be transferred to the software virtual bridge
>>>>>>>>>> (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
>>>>>>>>>> already received pages. fallocate syscall is required for
>>>>>>>>>> remmaped shared memory, due to remmaping itself blocks
>>>>>>>>>> ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
>>>>>>>>>> error (struct page is exists after remmap).
>>>>>>>>>>
>>>>>>>>>> Bitmap is placed into RAMBlock as another postcopy/precopy
>>>>>>>>>> related bitmaps.
>>>>>>>>>>
>>>>>>>>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>>>>>>>>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
>>>>>>>>>> ---
>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>>>   static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
>>>>>>>>>> -        void *from_addr, uint64_t pagesize)
>>>>>>>>>> +                               void *from_addr, uint64_t pagesize, RAMBlock *rb)
>>>>>>>>>>   {
>>>>>>>>>> +    int ret;
>>>>>>>>>>       if (from_addr) {
>>>>>>>>>>           struct uffdio_copy copy_struct;
>>>>>>>>>>           copy_struct.dst = (uint64_t)(uintptr_t)host_addr;
>>>>>>>>>>           copy_struct.src = (uint64_t)(uintptr_t)from_addr;
>>>>>>>>>>           copy_struct.len = pagesize;
>>>>>>>>>>           copy_struct.mode = 0;
>>>>>>>>>> -        return ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
>>>>>>>>>> +        ret = ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
>>>>>>>>>>       } else {
>>>>>>>>>>           struct uffdio_zeropage zero_struct;
>>>>>>>>>>           zero_struct.range.start = (uint64_t)(uintptr_t)host_addr;
>>>>>>>>>>           zero_struct.range.len = pagesize;
>>>>>>>>>>           zero_struct.mode = 0;
>>>>>>>>>> -        return ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
>>>>>>>>>> +        ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
>>>>>>>>>> +    }
>>>>>>>>>> +    if (!ret) {
>>>>>>>>>> +        ramblock_recv_bitmap_set(host_addr, rb);
>>>>>>>>> Wait...
>>>>>>>>>
>>>>>>>>> Now we are using 4k-page/bit bitmap, do we need to take care of the
>>>>>>>>> huge pages here?  Looks like we are only setting the first bit of it
>>>>>>>>> if it is a huge page?
>>>>>>>> First version was per ramblock page size, IOW bitmap was smaller in
>>>>>>>> case of hugepages.
>>>>>>> Yes, but this is not the first version any more. :)
>>>>>>>
>>>>>>> This patch is using:
>>>>>>>
>>>>>>>    bitmap_new(rb->max_length >> TARGET_PAGE_BITS);
>>>>>>>
>>>>>>> to allocate bitmap, so it is using small pages always for bitmap,
>>>>>>> right? (I should not really say "4k" pages, here I think the size is
>>>>>>> host page size, which is the thing returned from getpagesize()).
>>>>>>>
>>>>>>>> You mentioned that TARGET_PAGE_SIZE is reasonable for precopy case,
>>>>>>>> in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap for copied page"
>>>>>>>> I though TARGET_PAGE_SIZE as transmition unit, is using in precopy even
>>>>>>>> hugepage case.
>>>>>>>> But it's not so logically, page being marked as dirty, should be sent as a
>>>>>>>> whole page.
>>>>>>> Sorry if I misunderstood, but I didn't see anything wrong - we are
>>>>>>> sending pages in small pages, but when postcopy is there, we do
>>>>>>> UFFDIO_COPY in huge page, so everything is fine?
>>>>>> I think yes, we chose TARGET_PAGE_SIZE because of wider
>>>>>> use case ranges.
>>>>> So... are you going to post another version? IIUC we just need to use
>>>>> a bitmap_set() to replace the ramblock_recv_bitmap_set(), while set
>>>>> the size with "pagesize / TARGET_PAGE_SIZE"?
>>>>  From my point of view TARGET_PAGE_SIZE/TARGET_PAGE_BITS it's a platform
>>>> specific
>>>>
>>>> and it used in ram_load to copy to buffer so it's more preferred for bitmap size
>>>> and I'm not going to replace ramblock_recv_bitmap_set helper - it calculates offset.
>>>>
>>>>> (I think I was wrong when saying getpagesize() above: the small page
>>>>>   should be target page size, while the huge page should be the host's)
>>>> I think we should forget about huge page case in "received bitmap"
>>>> concept, maybe in "uffd_copied bitmap" it was reasonable ;)
>>> Again, I am not sure I got the whole idea of the reply...
>>>
>>> However, I do think when we UFFDIO_COPY a huge page, then we should do
>>> bitmap_set() on the received bitmap for the whole range that the huge
>>> page covers.
>> for what purpose?
> We chose to use small-paged bitmap since in precopy we need to have
> such a granularity (in precopy, we can copy a small page even that
> small page is on a host huge page).
>
> Since we decided to use the small-paged bitmap, we need to make sure
> it follows how it was defined: one bit defines whether the
> corresponding small page is received. IMHO not following that is hacky
> and error-prone.
>
>>> IMHO, the bitmap is defined as "one bit per small page", and the small
>>> page size is TARGET_PAGE_SIZE. We cannot just assume that "as long as
>>> the first bit of the huge page is set, all the small pages in the huge
>>> page are set".
>> At the moment of copying all small pages of the huge page,
>> should be received. Yes it's assumption, but I couldn't predict
>> side effect, maybe it will be necessary in postcopy failure handling,
>> while copying pages back, but I'm not sure right now.
>> To know that, need to start implementing it, or at least to deep
>> investigation.
> Yes, postcopy failure handling is exactly one case where it can be
> used. Of course with all the ramblock information we can re-construct
> the real bitmap when the source received the bitmaps from destination.
> However, why not we make it correct at the very beginning (especially
> when it is quite easy to do so)?
>
> (Actually, I asked since I am working on the RFC series of postcopy
>   failure recovery. I will post RFCs soon)
>
> Thanks,
>
Ok, I'll resend patchset today, all bits of the appropriate huge

page will set.

Alexey Perevalov July 28, 2017, 3:29 p.m. UTC | #11

On 07/28/2017 10:06 AM, Alexey Perevalov wrote:
> On 07/28/2017 09:57 AM, Peter Xu wrote:
>> On Fri, Jul 28, 2017 at 09:43:28AM +0300, Alexey Perevalov wrote:
>>> On 07/28/2017 07:27 AM, Peter Xu wrote:
>>>> On Thu, Jul 27, 2017 at 10:27:41AM +0300, Alexey Perevalov wrote:
>>>>> On 07/27/2017 05:35 AM, Peter Xu wrote:
>>>>>> On Wed, Jul 26, 2017 at 06:24:11PM +0300, Alexey Perevalov wrote:
>>>>>>> On 07/26/2017 11:43 AM, Peter Xu wrote:
>>>>>>>> On Wed, Jul 26, 2017 at 11:07:17AM +0300, Alexey Perevalov wrote:
>>>>>>>>> On 07/26/2017 04:49 AM, Peter Xu wrote:
>>>>>>>>>> On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey Perevalov 
>>>>>>>>>> wrote:
>>>>>>>>>>> This patch adds ability to track down already received
>>>>>>>>>>> pages, it's necessary for calculation vCPU block time in
>>>>>>>>>>> postcopy migration feature, maybe for restore after
>>>>>>>>>>> postcopy migration failure.
>>>>>>>>>>> Also it's necessary to solve shared memory issue in
>>>>>>>>>>> postcopy livemigration. Information about received pages
>>>>>>>>>>> will be transferred to the software virtual bridge
>>>>>>>>>>> (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
>>>>>>>>>>> already received pages. fallocate syscall is required for
>>>>>>>>>>> remmaped shared memory, due to remmaping itself blocks
>>>>>>>>>>> ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
>>>>>>>>>>> error (struct page is exists after remmap).
>>>>>>>>>>>
>>>>>>>>>>> Bitmap is placed into RAMBlock as another postcopy/precopy
>>>>>>>>>>> related bitmaps.
>>>>>>>>>>>
>>>>>>>>>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>>>>>>>>>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
>>>>>>>>>>> ---
>>>>>>>>>> [...]
>>>>>>>>>>
>>>>>>>>>>>   static int qemu_ufd_copy_ioctl(int userfault_fd, void 
>>>>>>>>>>> *host_addr,
>>>>>>>>>>> -        void *from_addr, uint64_t pagesize)
>>>>>>>>>>> +                               void *from_addr, uint64_t 
>>>>>>>>>>> pagesize, RAMBlock *rb)
>>>>>>>>>>>   {
>>>>>>>>>>> +    int ret;
>>>>>>>>>>>       if (from_addr) {
>>>>>>>>>>>           struct uffdio_copy copy_struct;
>>>>>>>>>>>           copy_struct.dst = (uint64_t)(uintptr_t)host_addr;
>>>>>>>>>>>           copy_struct.src = (uint64_t)(uintptr_t)from_addr;
>>>>>>>>>>>           copy_struct.len = pagesize;
>>>>>>>>>>>           copy_struct.mode = 0;
>>>>>>>>>>> -        return ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
>>>>>>>>>>> +        ret = ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
>>>>>>>>>>>       } else {
>>>>>>>>>>>           struct uffdio_zeropage zero_struct;
>>>>>>>>>>>           zero_struct.range.start = 
>>>>>>>>>>> (uint64_t)(uintptr_t)host_addr;
>>>>>>>>>>>           zero_struct.range.len = pagesize;
>>>>>>>>>>>           zero_struct.mode = 0;
>>>>>>>>>>> -        return ioctl(userfault_fd, UFFDIO_ZEROPAGE, 
>>>>>>>>>>> &zero_struct);
>>>>>>>>>>> +        ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, 
>>>>>>>>>>> &zero_struct);
>>>>>>>>>>> +    }
>>>>>>>>>>> +    if (!ret) {
>>>>>>>>>>> +        ramblock_recv_bitmap_set(host_addr, rb);
>>>>>>>>>> Wait...
>>>>>>>>>>
>>>>>>>>>> Now we are using 4k-page/bit bitmap, do we need to take care 
>>>>>>>>>> of the
>>>>>>>>>> huge pages here?  Looks like we are only setting the first 
>>>>>>>>>> bit of it
>>>>>>>>>> if it is a huge page?
>>>>>>>>> First version was per ramblock page size, IOW bitmap was 
>>>>>>>>> smaller in
>>>>>>>>> case of hugepages.
>>>>>>>> Yes, but this is not the first version any more. :)
>>>>>>>>
>>>>>>>> This patch is using:
>>>>>>>>
>>>>>>>>    bitmap_new(rb->max_length >> TARGET_PAGE_BITS);
>>>>>>>>
>>>>>>>> to allocate bitmap, so it is using small pages always for bitmap,
>>>>>>>> right? (I should not really say "4k" pages, here I think the 
>>>>>>>> size is
>>>>>>>> host page size, which is the thing returned from getpagesize()).
>>>>>>>>
>>>>>>>>> You mentioned that TARGET_PAGE_SIZE is reasonable for precopy 
>>>>>>>>> case,
>>>>>>>>> in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap for 
>>>>>>>>> copied page"
>>>>>>>>> I though TARGET_PAGE_SIZE as transmition unit, is using in 
>>>>>>>>> precopy even
>>>>>>>>> hugepage case.
>>>>>>>>> But it's not so logically, page being marked as dirty, should 
>>>>>>>>> be sent as a
>>>>>>>>> whole page.
>>>>>>>> Sorry if I misunderstood, but I didn't see anything wrong - we are
>>>>>>>> sending pages in small pages, but when postcopy is there, we do
>>>>>>>> UFFDIO_COPY in huge page, so everything is fine?
>>>>>>> I think yes, we chose TARGET_PAGE_SIZE because of wider
>>>>>>> use case ranges.
>>>>>> So... are you going to post another version? IIUC we just need to 
>>>>>> use
>>>>>> a bitmap_set() to replace the ramblock_recv_bitmap_set(), while set
>>>>>> the size with "pagesize / TARGET_PAGE_SIZE"?
>>>>>  From my point of view TARGET_PAGE_SIZE/TARGET_PAGE_BITS it's a 
>>>>> platform
>>>>> specific
>>>>>
>>>>> and it used in ram_load to copy to buffer so it's more preferred 
>>>>> for bitmap size
>>>>> and I'm not going to replace ramblock_recv_bitmap_set helper - it 
>>>>> calculates offset.
>>>>>
>>>>>> (I think I was wrong when saying getpagesize() above: the small page
>>>>>>   should be target page size, while the huge page should be the 
>>>>>> host's)
>>>>> I think we should forget about huge page case in "received bitmap"
>>>>> concept, maybe in "uffd_copied bitmap" it was reasonable ;)
>>>> Again, I am not sure I got the whole idea of the reply...
>>>>
>>>> However, I do think when we UFFDIO_COPY a huge page, then we should do
>>>> bitmap_set() on the received bitmap for the whole range that the huge
>>>> page covers.
>>> for what purpose?
>> We chose to use small-paged bitmap since in precopy we need to have
>> such a granularity (in precopy, we can copy a small page even that
>> small page is on a host huge page).
>>
>> Since we decided to use the small-paged bitmap, we need to make sure
>> it follows how it was defined: one bit defines whether the
>> corresponding small page is received. IMHO not following that is hacky
>> and error-prone.
>>
>>>> IMHO, the bitmap is defined as "one bit per small page", and the small
>>>> page size is TARGET_PAGE_SIZE. We cannot just assume that "as long as
>>>> the first bit of the huge page is set, all the small pages in the huge
>>>> page are set".
>>> At the moment of copying all small pages of the huge page,
>>> should be received. Yes it's assumption, but I couldn't predict
>>> side effect, maybe it will be necessary in postcopy failure handling,
>>> while copying pages back, but I'm not sure right now.
>>> To know that, need to start implementing it, or at least to deep
>>> investigation.
>> Yes, postcopy failure handling is exactly one case where it can be
>> used. Of course with all the ramblock information we can re-construct
>> the real bitmap when the source received the bitmaps from destination.
>> However, why not we make it correct at the very beginning (especially
>> when it is quite easy to do so)?
>>
>> (Actually, I asked since I am working on the RFC series of postcopy
>>   failure recovery. I will post RFCs soon)
>>
>> Thanks,
>>
> Ok, I'll resend patchset today, all bits of the appropriate huge
>
> page will set.
>
>
I saw you already included in you patch set

  migration: fix incorrect postcopy recved_bitmap


do you think, is it worth to include your patch,

of course with preserving authorship, into this patch set?

Peter Xu July 31, 2017, 1:33 a.m. UTC | #12

On Fri, Jul 28, 2017 at 06:29:20PM +0300, Alexey Perevalov wrote:
> On 07/28/2017 10:06 AM, Alexey Perevalov wrote:
> >On 07/28/2017 09:57 AM, Peter Xu wrote:
> >>On Fri, Jul 28, 2017 at 09:43:28AM +0300, Alexey Perevalov wrote:
> >>>On 07/28/2017 07:27 AM, Peter Xu wrote:
> >>>>On Thu, Jul 27, 2017 at 10:27:41AM +0300, Alexey Perevalov wrote:
> >>>>>On 07/27/2017 05:35 AM, Peter Xu wrote:
> >>>>>>On Wed, Jul 26, 2017 at 06:24:11PM +0300, Alexey Perevalov wrote:
> >>>>>>>On 07/26/2017 11:43 AM, Peter Xu wrote:
> >>>>>>>>On Wed, Jul 26, 2017 at 11:07:17AM +0300, Alexey Perevalov wrote:
> >>>>>>>>>On 07/26/2017 04:49 AM, Peter Xu wrote:
> >>>>>>>>>>On Thu, Jul 20, 2017 at 09:52:34AM +0300, Alexey
> >>>>>>>>>>Perevalov wrote:
> >>>>>>>>>>>This patch adds ability to track down already received
> >>>>>>>>>>>pages, it's necessary for calculation vCPU block time in
> >>>>>>>>>>>postcopy migration feature, maybe for restore after
> >>>>>>>>>>>postcopy migration failure.
> >>>>>>>>>>>Also it's necessary to solve shared memory issue in
> >>>>>>>>>>>postcopy livemigration. Information about received pages
> >>>>>>>>>>>will be transferred to the software virtual bridge
> >>>>>>>>>>>(e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
> >>>>>>>>>>>already received pages. fallocate syscall is required for
> >>>>>>>>>>>remmaped shared memory, due to remmaping itself blocks
> >>>>>>>>>>>ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
> >>>>>>>>>>>error (struct page is exists after remmap).
> >>>>>>>>>>>
> >>>>>>>>>>>Bitmap is placed into RAMBlock as another postcopy/precopy
> >>>>>>>>>>>related bitmaps.
> >>>>>>>>>>>
> >>>>>>>>>>>Reviewed-by: Peter Xu <peterx@redhat.com>
> >>>>>>>>>>>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> >>>>>>>>>>>---
> >>>>>>>>>>[...]
> >>>>>>>>>>
> >>>>>>>>>>>  static int qemu_ufd_copy_ioctl(int userfault_fd,
> >>>>>>>>>>>void *host_addr,
> >>>>>>>>>>>-        void *from_addr, uint64_t pagesize)
> >>>>>>>>>>>+                               void *from_addr,
> >>>>>>>>>>>uint64_t pagesize, RAMBlock *rb)
> >>>>>>>>>>>  {
> >>>>>>>>>>>+    int ret;
> >>>>>>>>>>>      if (from_addr) {
> >>>>>>>>>>>          struct uffdio_copy copy_struct;
> >>>>>>>>>>>          copy_struct.dst = (uint64_t)(uintptr_t)host_addr;
> >>>>>>>>>>>          copy_struct.src = (uint64_t)(uintptr_t)from_addr;
> >>>>>>>>>>>          copy_struct.len = pagesize;
> >>>>>>>>>>>          copy_struct.mode = 0;
> >>>>>>>>>>>-        return ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
> >>>>>>>>>>>+        ret = ioctl(userfault_fd, UFFDIO_COPY, &copy_struct);
> >>>>>>>>>>>      } else {
> >>>>>>>>>>>          struct uffdio_zeropage zero_struct;
> >>>>>>>>>>>          zero_struct.range.start =
> >>>>>>>>>>>(uint64_t)(uintptr_t)host_addr;
> >>>>>>>>>>>          zero_struct.range.len = pagesize;
> >>>>>>>>>>>          zero_struct.mode = 0;
> >>>>>>>>>>>-        return ioctl(userfault_fd, UFFDIO_ZEROPAGE,
> >>>>>>>>>>>&zero_struct);
> >>>>>>>>>>>+        ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE,
> >>>>>>>>>>>&zero_struct);
> >>>>>>>>>>>+    }
> >>>>>>>>>>>+    if (!ret) {
> >>>>>>>>>>>+        ramblock_recv_bitmap_set(host_addr, rb);
> >>>>>>>>>>Wait...
> >>>>>>>>>>
> >>>>>>>>>>Now we are using 4k-page/bit bitmap, do we need to take
> >>>>>>>>>>care of the
> >>>>>>>>>>huge pages here?  Looks like we are only setting the
> >>>>>>>>>>first bit of it
> >>>>>>>>>>if it is a huge page?
> >>>>>>>>>First version was per ramblock page size, IOW bitmap was
> >>>>>>>>>smaller in
> >>>>>>>>>case of hugepages.
> >>>>>>>>Yes, but this is not the first version any more. :)
> >>>>>>>>
> >>>>>>>>This patch is using:
> >>>>>>>>
> >>>>>>>>   bitmap_new(rb->max_length >> TARGET_PAGE_BITS);
> >>>>>>>>
> >>>>>>>>to allocate bitmap, so it is using small pages always for bitmap,
> >>>>>>>>right? (I should not really say "4k" pages, here I think the
> >>>>>>>>size is
> >>>>>>>>host page size, which is the thing returned from getpagesize()).
> >>>>>>>>
> >>>>>>>>>You mentioned that TARGET_PAGE_SIZE is reasonable for
> >>>>>>>>>precopy case,
> >>>>>>>>>in "Re: [Qemu-devel] [PATCH v1 2/2] migration: add bitmap
> >>>>>>>>>for copied page"
> >>>>>>>>>I though TARGET_PAGE_SIZE as transmition unit, is using in
> >>>>>>>>>precopy even
> >>>>>>>>>hugepage case.
> >>>>>>>>>But it's not so logically, page being marked as dirty,
> >>>>>>>>>should be sent as a
> >>>>>>>>>whole page.
> >>>>>>>>Sorry if I misunderstood, but I didn't see anything wrong - we are
> >>>>>>>>sending pages in small pages, but when postcopy is there, we do
> >>>>>>>>UFFDIO_COPY in huge page, so everything is fine?
> >>>>>>>I think yes, we chose TARGET_PAGE_SIZE because of wider
> >>>>>>>use case ranges.
> >>>>>>So... are you going to post another version? IIUC we just need
> >>>>>>to use
> >>>>>>a bitmap_set() to replace the ramblock_recv_bitmap_set(), while set
> >>>>>>the size with "pagesize / TARGET_PAGE_SIZE"?
> >>>>> From my point of view TARGET_PAGE_SIZE/TARGET_PAGE_BITS it's a
> >>>>>platform
> >>>>>specific
> >>>>>
> >>>>>and it used in ram_load to copy to buffer so it's more preferred
> >>>>>for bitmap size
> >>>>>and I'm not going to replace ramblock_recv_bitmap_set helper - it
> >>>>>calculates offset.
> >>>>>
> >>>>>>(I think I was wrong when saying getpagesize() above: the small page
> >>>>>>  should be target page size, while the huge page should be the
> >>>>>>host's)
> >>>>>I think we should forget about huge page case in "received bitmap"
> >>>>>concept, maybe in "uffd_copied bitmap" it was reasonable ;)
> >>>>Again, I am not sure I got the whole idea of the reply...
> >>>>
> >>>>However, I do think when we UFFDIO_COPY a huge page, then we should do
> >>>>bitmap_set() on the received bitmap for the whole range that the huge
> >>>>page covers.
> >>>for what purpose?
> >>We chose to use small-paged bitmap since in precopy we need to have
> >>such a granularity (in precopy, we can copy a small page even that
> >>small page is on a host huge page).
> >>
> >>Since we decided to use the small-paged bitmap, we need to make sure
> >>it follows how it was defined: one bit defines whether the
> >>corresponding small page is received. IMHO not following that is hacky
> >>and error-prone.
> >>
> >>>>IMHO, the bitmap is defined as "one bit per small page", and the small
> >>>>page size is TARGET_PAGE_SIZE. We cannot just assume that "as long as
> >>>>the first bit of the huge page is set, all the small pages in the huge
> >>>>page are set".
> >>>At the moment of copying all small pages of the huge page,
> >>>should be received. Yes it's assumption, but I couldn't predict
> >>>side effect, maybe it will be necessary in postcopy failure handling,
> >>>while copying pages back, but I'm not sure right now.
> >>>To know that, need to start implementing it, or at least to deep
> >>>investigation.
> >>Yes, postcopy failure handling is exactly one case where it can be
> >>used. Of course with all the ramblock information we can re-construct
> >>the real bitmap when the source received the bitmaps from destination.
> >>However, why not we make it correct at the very beginning (especially
> >>when it is quite easy to do so)?
> >>
> >>(Actually, I asked since I am working on the RFC series of postcopy
> >>  failure recovery. I will post RFCs soon)
> >>
> >>Thanks,
> >>
> >Ok, I'll resend patchset today, all bits of the appropriate huge
> >
> >page will set.
> >
> >
> I saw you already included in you patch set
> 
>  migration: fix incorrect postcopy recved_bitmap
> 
> 
> do you think, is it worth to include your patch,
> 
> of course with preserving authorship, into this patch set?

I think we'd better squash that patch into yours (considering that
current patch hasn't been merged), since I see that patch not an
enhancement but a correction of this one. Or, do you have better way
to write that? I didn't really think too much on it, just to make sure
it can work well with the recovery rfc series.

Please don't worry on the authorship, just squash it if you like - I
am totally fine that you see that patch as "a comment" but in patch
format. :-)

--
Peter Xu

[v8,3/3] migration: add bitmap for received page

Commit Message

Comments

Patch