Message ID | 20190605010828.6969-1-richardw.yang@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | migratioin/ram.c: reset complete_round when we gets a queued page | expand |
On Wed, Jun 05, 2019 at 09:08:28AM +0800, Wei Yang wrote: > In case we gets a queued page, the order of block is interrupted. We may > not rely on the complete_round flag to say we have already searched the > whole blocks on the list. > > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> > --- > migration/ram.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/migration/ram.c b/migration/ram.c > index d881981876..e9b40d636d 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -2290,6 +2290,12 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss) > */ > pss->block = block; > pss->page = offset >> TARGET_PAGE_BITS; > + > + /* > + * This unqueued page would break the "one round" check, even is > + * really rare. Why this is needed? Could you help explain the problem first? Thanks,
On Wed, Jun 05, 2019 at 02:41:08PM +0800, Peter Xu wrote: >On Wed, Jun 05, 2019 at 09:08:28AM +0800, Wei Yang wrote: >> In case we gets a queued page, the order of block is interrupted. We may >> not rely on the complete_round flag to say we have already searched the >> whole blocks on the list. >> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> >> --- >> migration/ram.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/migration/ram.c b/migration/ram.c >> index d881981876..e9b40d636d 100644 >> --- a/migration/ram.c >> +++ b/migration/ram.c >> @@ -2290,6 +2290,12 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss) >> */ >> pss->block = block; >> pss->page = offset >> TARGET_PAGE_BITS; >> + >> + /* >> + * This unqueued page would break the "one round" check, even is >> + * really rare. > >Why this is needed? Could you help explain the problem first? Peter, Thanks for your question. I found this issue during code review and I believe this is a corner case. Below is a draft chart for ram_find_and_save_block: ram_find_and_save_block do get_queued_page() find_dirty_block() ram_save_host_page() while The basic logic here is : get a page need to migrate and migrate it. In case we don't have get_queued_page(), find_dirty_block() will search the whole ram_list.blocks by order. pss->complete_round is used to indicate whether this search has looped. Everything works fine after get_queued_page() involved. The block unqueued in get_queued_page() could be any block in the ram_list.blocks. This means we have very little chance to break the looped indicator. unqueue_page() last_seen_block | | ram_list.blocks v v ---------------------------------+=====+--- Just draw a raw picture to demonstrate a corner case. For example, we start from last_seen_block and search till the end of ram_list.blocks. At this moment, pss->complete_round is set to true. Then we get a queued page from unqueue_page() at the point I pointed. So the loop continues may just continue the range as I marked as "=". We will skip all the other ranges. This is really a corner case, since ram_save_host_page() should return 0 and there should be no dirty page in this range. But I don't see we may avoid this case. If I am not correct, just let me know :-) > >Thanks, > >-- >Peter Xu
On Wed, Jun 05, 2019 at 04:52:07PM +0800, Wei Yang wrote: > On Wed, Jun 05, 2019 at 02:41:08PM +0800, Peter Xu wrote: > >On Wed, Jun 05, 2019 at 09:08:28AM +0800, Wei Yang wrote: > >> In case we gets a queued page, the order of block is interrupted. We may > >> not rely on the complete_round flag to say we have already searched the > >> whole blocks on the list. > >> > >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> > >> --- > >> migration/ram.c | 6 ++++++ > >> 1 file changed, 6 insertions(+) > >> > >> diff --git a/migration/ram.c b/migration/ram.c > >> index d881981876..e9b40d636d 100644 > >> --- a/migration/ram.c > >> +++ b/migration/ram.c > >> @@ -2290,6 +2290,12 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss) > >> */ > >> pss->block = block; > >> pss->page = offset >> TARGET_PAGE_BITS; > >> + > >> + /* > >> + * This unqueued page would break the "one round" check, even is > >> + * really rare. > > > >Why this is needed? Could you help explain the problem first? > > Peter, Thanks for your question. > > I found this issue during code review and I believe this is a corner case. > > Below is a draft chart for ram_find_and_save_block: > > ram_find_and_save_block > do > get_queued_page() > find_dirty_block() > ram_save_host_page() > while > > The basic logic here is : get a page need to migrate and migrate it. > > In case we don't have get_queued_page(), find_dirty_block() will search the > whole ram_list.blocks by order. pss->complete_round is used to indicate > whether this search has looped. > > Everything works fine after get_queued_page() involved. The block unqueued in > get_queued_page() could be any block in the ram_list.blocks. This means we > have very little chance to break the looped indicator. > > unqueue_page() last_seen_block > | | > ram_list.blocks v v > ---------------------------------+=====+--- > > > Just draw a raw picture to demonstrate a corner case. > > For example, we start from last_seen_block and search till the end of > ram_list.blocks. At this moment, pss->complete_round is set to true. Then we > get a queued page from unqueue_page() at the point I pointed. So the loop > continues may just continue the range as I marked as "=". We will skip all the > other ranges. Ah I see your point, but I don't think there is a problem - note that complete_round will be reset for each ram_find_and_save_block(), so even if we have that iteration of ram_find_and_save_block() to return we'll still know we have dirty pages to migrate and in the next call we'll be fine, no?
Peter Xu <peterx@redhat.com> wrote: > On Wed, Jun 05, 2019 at 04:52:07PM +0800, Wei Yang wrote: >> On Wed, Jun 05, 2019 at 02:41:08PM +0800, Peter Xu wrote: >> >On Wed, Jun 05, 2019 at 09:08:28AM +0800, Wei Yang wrote: >> >> In case we gets a queued page, the order of block is interrupted. We may >> >> not rely on the complete_round flag to say we have already searched the >> >> whole blocks on the list. >> >> >> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> >> >> --- >> >> migration/ram.c | 6 ++++++ >> >> 1 file changed, 6 insertions(+) >> >> >> >> diff --git a/migration/ram.c b/migration/ram.c >> >> index d881981876..e9b40d636d 100644 >> >> --- a/migration/ram.c >> >> +++ b/migration/ram.c >> >> @@ -2290,6 +2290,12 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss) >> >> */ >> >> pss->block = block; >> >> pss->page = offset >> TARGET_PAGE_BITS; >> >> + >> >> + /* >> >> + * This unqueued page would break the "one round" check, even is >> >> + * really rare. >> > > Ah I see your point, but I don't think there is a problem - note that > complete_round will be reset for each ram_find_and_save_block(), so > even if we have that iteration of ram_find_and_save_block() to return > we'll still know we have dirty pages to migrate and in the next call > we'll be fine, no? Reviewed-by: Juan Quintela <quintela@redhat.com> I *think* that peter is perhaps right, but it is not clear at all, and it is easier to be safe. I think that the only case that this could matter is if: - all pages are clean (so complete_round will get as true) - we went a queue_page request Is that possible? I am not completely sure after looking at the code. It *could* be if the page that got queued is the last page remaining, but ...... I fully agree that the case that _almost all_ pages are clean and we get a request for a queued page is really rare, so it should not matter in real life, but .... Later, Juan.
migratioin -> migration
On Wed, Jun 05, 2019 at 02:27:11PM +0200, Philippe Mathieu-Daud?? wrote:
>migratioin -> migration
Ah... I should take an English lesson...
Thanks
On Wed, Jun 05, 2019 at 12:33:39PM +0200, Juan Quintela wrote: >Peter Xu <peterx@redhat.com> wrote: >> On Wed, Jun 05, 2019 at 04:52:07PM +0800, Wei Yang wrote: >>> On Wed, Jun 05, 2019 at 02:41:08PM +0800, Peter Xu wrote: >>> >On Wed, Jun 05, 2019 at 09:08:28AM +0800, Wei Yang wrote: >>> >> In case we gets a queued page, the order of block is interrupted. We may >>> >> not rely on the complete_round flag to say we have already searched the >>> >> whole blocks on the list. >>> >> >>> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> >>> >> --- >>> >> migration/ram.c | 6 ++++++ >>> >> 1 file changed, 6 insertions(+) >>> >> >>> >> diff --git a/migration/ram.c b/migration/ram.c >>> >> index d881981876..e9b40d636d 100644 >>> >> --- a/migration/ram.c >>> >> +++ b/migration/ram.c >>> >> @@ -2290,6 +2290,12 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss) >>> >> */ >>> >> pss->block = block; >>> >> pss->page = offset >> TARGET_PAGE_BITS; >>> >> + >>> >> + /* >>> >> + * This unqueued page would break the "one round" check, even is >>> >> + * really rare. >>> > > > >> Ah I see your point, but I don't think there is a problem - note that >> complete_round will be reset for each ram_find_and_save_block(), so >> even if we have that iteration of ram_find_and_save_block() to return >> we'll still know we have dirty pages to migrate and in the next call >> we'll be fine, no? > >Reviewed-by: Juan Quintela <quintela@redhat.com> > >I *think* that peter is perhaps right, but it is not clear at all, and >it is easier to be safe. I think that the only case that this could >matter is if: >- all pages are clean (so complete_round will get as true) >- we went a queue_page request > >Is that possible? I am not completely sure after looking at the code. >It *could* be if the page that got queued is the last page remaining, >but ...... I fully agree that the case that _almost all_ pages are >clean and we get a request for a queued page is really rare, so it >should not matter in real life, but .... > Agree >Later, Juan.
On Wed, Jun 05, 2019 at 05:38:19PM +0800, Peter Xu wrote: >On Wed, Jun 05, 2019 at 04:52:07PM +0800, Wei Yang wrote: >> On Wed, Jun 05, 2019 at 02:41:08PM +0800, Peter Xu wrote: >> >On Wed, Jun 05, 2019 at 09:08:28AM +0800, Wei Yang wrote: >> >> In case we gets a queued page, the order of block is interrupted. We may >> >> not rely on the complete_round flag to say we have already searched the >> >> whole blocks on the list. >> >> >> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> >> >> --- >> >> migration/ram.c | 6 ++++++ >> >> 1 file changed, 6 insertions(+) >> >> >> >> diff --git a/migration/ram.c b/migration/ram.c >> >> index d881981876..e9b40d636d 100644 >> >> --- a/migration/ram.c >> >> +++ b/migration/ram.c >> >> @@ -2290,6 +2290,12 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss) >> >> */ >> >> pss->block = block; >> >> pss->page = offset >> TARGET_PAGE_BITS; >> >> + >> >> + /* >> >> + * This unqueued page would break the "one round" check, even is >> >> + * really rare. >> > >> >Why this is needed? Could you help explain the problem first? >> >> Peter, Thanks for your question. >> >> I found this issue during code review and I believe this is a corner case. >> >> Below is a draft chart for ram_find_and_save_block: >> >> ram_find_and_save_block >> do >> get_queued_page() >> find_dirty_block() >> ram_save_host_page() >> while >> >> The basic logic here is : get a page need to migrate and migrate it. >> >> In case we don't have get_queued_page(), find_dirty_block() will search the >> whole ram_list.blocks by order. pss->complete_round is used to indicate >> whether this search has looped. >> >> Everything works fine after get_queued_page() involved. The block unqueued in >> get_queued_page() could be any block in the ram_list.blocks. This means we >> have very little chance to break the looped indicator. >> >> unqueue_page() last_seen_block >> | | >> ram_list.blocks v v >> ---------------------------------+=====+--- >> >> >> Just draw a raw picture to demonstrate a corner case. >> >> For example, we start from last_seen_block and search till the end of >> ram_list.blocks. At this moment, pss->complete_round is set to true. Then we >> get a queued page from unqueue_page() at the point I pointed. So the loop >> continues may just continue the range as I marked as "=". We will skip all the >> other ranges. > >Ah I see your point, but I don't think there is a problem - note that >complete_round will be reset for each ram_find_and_save_block(), so >even if we have that iteration of ram_find_and_save_block() to return >we'll still know we have dirty pages to migrate and in the next call >we'll be fine, no? > This is really a rare case and hard to say whether it would be harmful. The chance still exists. >-- >Peter Xu
On 6/5/19 3:39 PM, Wei Yang wrote: > On Wed, Jun 05, 2019 at 02:27:11PM +0200, Philippe Mathieu-Daud?? wrote: >> migratioin -> migration > > Ah... I should take an English lesson... Your English is fine, I believe this is just a typo that slipped in ;)
diff --git a/migration/ram.c b/migration/ram.c index d881981876..e9b40d636d 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2290,6 +2290,12 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss) */ pss->block = block; pss->page = offset >> TARGET_PAGE_BITS; + + /* + * This unqueued page would break the "one round" check, even is + * really rare. + */ + pss->complete_round = false; } return !!block;
In case we gets a queued page, the order of block is interrupted. We may not rely on the complete_round flag to say we have already searched the whole blocks on the list. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> --- migration/ram.c | 6 ++++++ 1 file changed, 6 insertions(+)