diff mbox series

migratioin/ram.c: reset complete_round when we gets a queued page

Message ID 20190605010828.6969-1-richardw.yang@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series migratioin/ram.c: reset complete_round when we gets a queued page | expand

Commit Message

Wei Yang June 5, 2019, 1:08 a.m. UTC
In case we gets a queued page, the order of block is interrupted. We may
not rely on the complete_round flag to say we have already searched the
whole blocks on the list.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
---
 migration/ram.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Peter Xu June 5, 2019, 6:41 a.m. UTC | #1
On Wed, Jun 05, 2019 at 09:08:28AM +0800, Wei Yang wrote:
> In case we gets a queued page, the order of block is interrupted. We may
> not rely on the complete_round flag to say we have already searched the
> whole blocks on the list.
> 
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> ---
>  migration/ram.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index d881981876..e9b40d636d 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2290,6 +2290,12 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
>           */
>          pss->block = block;
>          pss->page = offset >> TARGET_PAGE_BITS;
> +
> +        /*
> +         * This unqueued page would break the "one round" check, even is
> +         * really rare.

Why this is needed?  Could you help explain the problem first?

Thanks,
Wei Yang June 5, 2019, 8:52 a.m. UTC | #2
On Wed, Jun 05, 2019 at 02:41:08PM +0800, Peter Xu wrote:
>On Wed, Jun 05, 2019 at 09:08:28AM +0800, Wei Yang wrote:
>> In case we gets a queued page, the order of block is interrupted. We may
>> not rely on the complete_round flag to say we have already searched the
>> whole blocks on the list.
>> 
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> ---
>>  migration/ram.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>> 
>> diff --git a/migration/ram.c b/migration/ram.c
>> index d881981876..e9b40d636d 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -2290,6 +2290,12 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
>>           */
>>          pss->block = block;
>>          pss->page = offset >> TARGET_PAGE_BITS;
>> +
>> +        /*
>> +         * This unqueued page would break the "one round" check, even is
>> +         * really rare.
>
>Why this is needed?  Could you help explain the problem first?

Peter, Thanks for your question.

I found this issue during code review and I believe this is a corner case.

Below is a draft chart for ram_find_and_save_block:

    ram_find_and_save_block
        do
            get_queued_page()
            find_dirty_block()
            ram_save_host_page()
        while

The basic logic here is : get a page need to migrate and migrate it.

In case we don't have get_queued_page(), find_dirty_block() will search the
whole ram_list.blocks by order. pss->complete_round is used to indicate
whether this search has looped.

Everything works fine after get_queued_page() involved. The block unqueued in
get_queued_page() could be any block in the ram_list.blocks. This means we
have very little chance to break the looped indicator.

                           unqueue_page()  last_seen_block
                                     |     |
    ram_list.blocks                  v     v
    ---------------------------------+=====+---


Just draw a raw picture to demonstrate a corner case.

For example, we start from last_seen_block and search till the end of
ram_list.blocks. At this moment, pss->complete_round is set to true. Then we
get a queued page from unqueue_page() at the point I pointed. So the loop
continues may just continue the range as I marked as "=". We will skip all the
other ranges.

This is really a corner case, since ram_save_host_page() should return 0 and
there should be no dirty page in this range. But I don't see we may avoid this
case.

If I am not correct, just let me know :-)

>
>Thanks,
>
>-- 
>Peter Xu
Peter Xu June 5, 2019, 9:38 a.m. UTC | #3
On Wed, Jun 05, 2019 at 04:52:07PM +0800, Wei Yang wrote:
> On Wed, Jun 05, 2019 at 02:41:08PM +0800, Peter Xu wrote:
> >On Wed, Jun 05, 2019 at 09:08:28AM +0800, Wei Yang wrote:
> >> In case we gets a queued page, the order of block is interrupted. We may
> >> not rely on the complete_round flag to say we have already searched the
> >> whole blocks on the list.
> >> 
> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> >> ---
> >>  migration/ram.c | 6 ++++++
> >>  1 file changed, 6 insertions(+)
> >> 
> >> diff --git a/migration/ram.c b/migration/ram.c
> >> index d881981876..e9b40d636d 100644
> >> --- a/migration/ram.c
> >> +++ b/migration/ram.c
> >> @@ -2290,6 +2290,12 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
> >>           */
> >>          pss->block = block;
> >>          pss->page = offset >> TARGET_PAGE_BITS;
> >> +
> >> +        /*
> >> +         * This unqueued page would break the "one round" check, even is
> >> +         * really rare.
> >
> >Why this is needed?  Could you help explain the problem first?
> 
> Peter, Thanks for your question.
> 
> I found this issue during code review and I believe this is a corner case.
> 
> Below is a draft chart for ram_find_and_save_block:
> 
>     ram_find_and_save_block
>         do
>             get_queued_page()
>             find_dirty_block()
>             ram_save_host_page()
>         while
> 
> The basic logic here is : get a page need to migrate and migrate it.
> 
> In case we don't have get_queued_page(), find_dirty_block() will search the
> whole ram_list.blocks by order. pss->complete_round is used to indicate
> whether this search has looped.
> 
> Everything works fine after get_queued_page() involved. The block unqueued in
> get_queued_page() could be any block in the ram_list.blocks. This means we
> have very little chance to break the looped indicator.
> 
>                            unqueue_page()  last_seen_block
>                                      |     |
>     ram_list.blocks                  v     v
>     ---------------------------------+=====+---
> 
> 
> Just draw a raw picture to demonstrate a corner case.
> 
> For example, we start from last_seen_block and search till the end of
> ram_list.blocks. At this moment, pss->complete_round is set to true. Then we
> get a queued page from unqueue_page() at the point I pointed. So the loop
> continues may just continue the range as I marked as "=". We will skip all the
> other ranges.

Ah I see your point, but I don't think there is a problem - note that
complete_round will be reset for each ram_find_and_save_block(), so
even if we have that iteration of ram_find_and_save_block() to return
we'll still know we have dirty pages to migrate and in the next call
we'll be fine, no?
Juan Quintela June 5, 2019, 10:33 a.m. UTC | #4
Peter Xu <peterx@redhat.com> wrote:
> On Wed, Jun 05, 2019 at 04:52:07PM +0800, Wei Yang wrote:
>> On Wed, Jun 05, 2019 at 02:41:08PM +0800, Peter Xu wrote:
>> >On Wed, Jun 05, 2019 at 09:08:28AM +0800, Wei Yang wrote:
>> >> In case we gets a queued page, the order of block is interrupted. We may
>> >> not rely on the complete_round flag to say we have already searched the
>> >> whole blocks on the list.
>> >> 
>> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> >> ---
>> >>  migration/ram.c | 6 ++++++
>> >>  1 file changed, 6 insertions(+)
>> >> 
>> >> diff --git a/migration/ram.c b/migration/ram.c
>> >> index d881981876..e9b40d636d 100644
>> >> --- a/migration/ram.c
>> >> +++ b/migration/ram.c
>> >> @@ -2290,6 +2290,12 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
>> >>           */
>> >>          pss->block = block;
>> >>          pss->page = offset >> TARGET_PAGE_BITS;
>> >> +
>> >> +        /*
>> >> +         * This unqueued page would break the "one round" check, even is
>> >> +         * really rare.
>> >


> Ah I see your point, but I don't think there is a problem - note that
> complete_round will be reset for each ram_find_and_save_block(), so
> even if we have that iteration of ram_find_and_save_block() to return
> we'll still know we have dirty pages to migrate and in the next call
> we'll be fine, no?

Reviewed-by: Juan Quintela <quintela@redhat.com>

I *think* that peter is perhaps right, but it is not clear at all, and
it is easier to be safe.  I think that the only case that this could
matter is if:
- all pages are clean (so complete_round will get as true)
- we went a queue_page request

Is that possible?  I am not completely sure after looking at the code.
It *could* be if the page that got queued is the last page remaining,
but ......  I fully agree that the case that _almost all_ pages are
clean and we get a request for a queued page is really rare, so it
should not matter in real life, but ....

Later, Juan.
Philippe Mathieu-Daudé June 5, 2019, 12:27 p.m. UTC | #5
migratioin -> migration
Wei Yang June 5, 2019, 1:39 p.m. UTC | #6
On Wed, Jun 05, 2019 at 02:27:11PM +0200, Philippe Mathieu-Daud?? wrote:
>migratioin -> migration

Ah... I should take an English lesson...

Thanks
Wei Yang June 5, 2019, 1:39 p.m. UTC | #7
On Wed, Jun 05, 2019 at 12:33:39PM +0200, Juan Quintela wrote:
>Peter Xu <peterx@redhat.com> wrote:
>> On Wed, Jun 05, 2019 at 04:52:07PM +0800, Wei Yang wrote:
>>> On Wed, Jun 05, 2019 at 02:41:08PM +0800, Peter Xu wrote:
>>> >On Wed, Jun 05, 2019 at 09:08:28AM +0800, Wei Yang wrote:
>>> >> In case we gets a queued page, the order of block is interrupted. We may
>>> >> not rely on the complete_round flag to say we have already searched the
>>> >> whole blocks on the list.
>>> >> 
>>> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>>> >> ---
>>> >>  migration/ram.c | 6 ++++++
>>> >>  1 file changed, 6 insertions(+)
>>> >> 
>>> >> diff --git a/migration/ram.c b/migration/ram.c
>>> >> index d881981876..e9b40d636d 100644
>>> >> --- a/migration/ram.c
>>> >> +++ b/migration/ram.c
>>> >> @@ -2290,6 +2290,12 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
>>> >>           */
>>> >>          pss->block = block;
>>> >>          pss->page = offset >> TARGET_PAGE_BITS;
>>> >> +
>>> >> +        /*
>>> >> +         * This unqueued page would break the "one round" check, even is
>>> >> +         * really rare.
>>> >
>
>
>> Ah I see your point, but I don't think there is a problem - note that
>> complete_round will be reset for each ram_find_and_save_block(), so
>> even if we have that iteration of ram_find_and_save_block() to return
>> we'll still know we have dirty pages to migrate and in the next call
>> we'll be fine, no?
>
>Reviewed-by: Juan Quintela <quintela@redhat.com>
>
>I *think* that peter is perhaps right, but it is not clear at all, and
>it is easier to be safe.  I think that the only case that this could
>matter is if:
>- all pages are clean (so complete_round will get as true)
>- we went a queue_page request
>
>Is that possible?  I am not completely sure after looking at the code.
>It *could* be if the page that got queued is the last page remaining,
>but ......  I fully agree that the case that _almost all_ pages are
>clean and we get a request for a queued page is really rare, so it
>should not matter in real life, but ....
>

Agree

>Later, Juan.
Wei Yang June 5, 2019, 1:41 p.m. UTC | #8
On Wed, Jun 05, 2019 at 05:38:19PM +0800, Peter Xu wrote:
>On Wed, Jun 05, 2019 at 04:52:07PM +0800, Wei Yang wrote:
>> On Wed, Jun 05, 2019 at 02:41:08PM +0800, Peter Xu wrote:
>> >On Wed, Jun 05, 2019 at 09:08:28AM +0800, Wei Yang wrote:
>> >> In case we gets a queued page, the order of block is interrupted. We may
>> >> not rely on the complete_round flag to say we have already searched the
>> >> whole blocks on the list.
>> >> 
>> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> >> ---
>> >>  migration/ram.c | 6 ++++++
>> >>  1 file changed, 6 insertions(+)
>> >> 
>> >> diff --git a/migration/ram.c b/migration/ram.c
>> >> index d881981876..e9b40d636d 100644
>> >> --- a/migration/ram.c
>> >> +++ b/migration/ram.c
>> >> @@ -2290,6 +2290,12 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
>> >>           */
>> >>          pss->block = block;
>> >>          pss->page = offset >> TARGET_PAGE_BITS;
>> >> +
>> >> +        /*
>> >> +         * This unqueued page would break the "one round" check, even is
>> >> +         * really rare.
>> >
>> >Why this is needed?  Could you help explain the problem first?
>> 
>> Peter, Thanks for your question.
>> 
>> I found this issue during code review and I believe this is a corner case.
>> 
>> Below is a draft chart for ram_find_and_save_block:
>> 
>>     ram_find_and_save_block
>>         do
>>             get_queued_page()
>>             find_dirty_block()
>>             ram_save_host_page()
>>         while
>> 
>> The basic logic here is : get a page need to migrate and migrate it.
>> 
>> In case we don't have get_queued_page(), find_dirty_block() will search the
>> whole ram_list.blocks by order. pss->complete_round is used to indicate
>> whether this search has looped.
>> 
>> Everything works fine after get_queued_page() involved. The block unqueued in
>> get_queued_page() could be any block in the ram_list.blocks. This means we
>> have very little chance to break the looped indicator.
>> 
>>                            unqueue_page()  last_seen_block
>>                                      |     |
>>     ram_list.blocks                  v     v
>>     ---------------------------------+=====+---
>> 
>> 
>> Just draw a raw picture to demonstrate a corner case.
>> 
>> For example, we start from last_seen_block and search till the end of
>> ram_list.blocks. At this moment, pss->complete_round is set to true. Then we
>> get a queued page from unqueue_page() at the point I pointed. So the loop
>> continues may just continue the range as I marked as "=". We will skip all the
>> other ranges.
>
>Ah I see your point, but I don't think there is a problem - note that
>complete_round will be reset for each ram_find_and_save_block(), so
>even if we have that iteration of ram_find_and_save_block() to return
>we'll still know we have dirty pages to migrate and in the next call
>we'll be fine, no?
>

This is really a rare case and hard to say whether it would be harmful.

The chance still exists.

>-- 
>Peter Xu
Philippe Mathieu-Daudé June 5, 2019, 2:11 p.m. UTC | #9
On 6/5/19 3:39 PM, Wei Yang wrote:
> On Wed, Jun 05, 2019 at 02:27:11PM +0200, Philippe Mathieu-Daud?? wrote:
>> migratioin -> migration
> 
> Ah... I should take an English lesson...

Your English is fine, I believe this is just a typo that slipped in ;)
diff mbox series

Patch

diff --git a/migration/ram.c b/migration/ram.c
index d881981876..e9b40d636d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2290,6 +2290,12 @@  static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
          */
         pss->block = block;
         pss->page = offset >> TARGET_PAGE_BITS;
+
+        /*
+         * This unqueued page would break the "one round" check, even is
+         * really rare.
+         */
+        pss->complete_round = false;
     }
 
     return !!block;