Message ID | 20230207005650.1810-2-quintela@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [PULL,01/30] migration: Fix migration crash when target psize larger than host | expand |
07.02.2023 03:56, Juan Quintela wrote: > From: Peter Xu <peterx@redhat.com> > > Commit d9e474ea56 overlooked the case where the target psize is even larger > than the host psize. One example is Alpha has 8K page size and migration > will start to crash the source QEMU when running Alpha migration on x86. > > Fix it by detecting that case and set host start/end just to cover the > single page to be migrated. FWIW, commit in question, which is d9e474ea56, has been applied after the last released version to date, which is 7.2.0. So I guess this change is not applicable to -stable. /mjt
Michael Tokarev <mjt@tls.msk.ru> wrote: > 07.02.2023 03:56, Juan Quintela wrote: >> From: Peter Xu <peterx@redhat.com> >> Commit d9e474ea56 overlooked the case where the target psize is even >> larger >> than the host psize. One example is Alpha has 8K page size and migration >> will start to crash the source QEMU when running Alpha migration on x86. >> Fix it by detecting that case and set host start/end just to cover >> the >> single page to be migrated. > > FWIW, commit in question, which is d9e474ea56, has been applied after the > last released version to date, which is 7.2.0. So I guess this change is > not applicable to -stable. I think it should. This is a bug that can happen when your host page is smaller than the guest size. And has been that way for a long time. Once told that, people don't migrate alpha guests a lot, but it can also happens with funny combinations of large pages. Peter Xu knows more about this. Later, Juan.
On Fri, Feb 10, 2023 at 01:11:34PM +0100, Juan Quintela wrote: > Michael Tokarev <mjt@tls.msk.ru> wrote: > > 07.02.2023 03:56, Juan Quintela wrote: > >> From: Peter Xu <peterx@redhat.com> > >> Commit d9e474ea56 overlooked the case where the target psize is even > >> larger > >> than the host psize. One example is Alpha has 8K page size and migration > >> will start to crash the source QEMU when running Alpha migration on x86. > >> Fix it by detecting that case and set host start/end just to cover > >> the > >> single page to be migrated. > > > > FWIW, commit in question, which is d9e474ea56, has been applied after the > > last released version to date, which is 7.2.0. So I guess this change is > > not applicable to -stable. > > I think it should. > > This is a bug that can happen when your host page is smaller than the > guest size. > > And has been that way for a long time. > Once told that, people don't migrate alpha guests a lot, but it can also > happens with funny combinations of large pages. > > Peter Xu knows more about this. Thanks, Juan. I think Michael was correct that d9e474ea56 is only merged after our most recent release (which is v7.2.0). So it doesn't need to have stable copied (aka, it doesn't need to be applied to any QEMU's stable tree). Juan, could you help drop the "cc: stable" line if the pull is going to have a new version? Side note: I think in the case where we have wrongly have the cc:stable it shouldn't hurt either, because when the stable tree tries to pick it up it'll find it doesn't apply at all, then a downstream-only patch will be needed, then we'll also figure all things out, just later. Thanks,
On Fri, Feb 10, 2023, 16:01 Peter Xu <peterx@redhat.com> wrote: > On Fri, Feb 10, 2023 at 01:11:34PM +0100, Juan Quintela wrote: > > Michael Tokarev <mjt@tls.msk.ru> wrote: > > > 07.02.2023 03:56, Juan Quintela wrote: > > >> From: Peter Xu <peterx@redhat.com> > > >> Commit d9e474ea56 overlooked the case where the target psize is even > > >> larger > > >> than the host psize. One example is Alpha has 8K page size and > migration > > >> will start to crash the source QEMU when running Alpha migration on > x86. > > >> Fix it by detecting that case and set host start/end just to cover > > >> the > > >> single page to be migrated. > > > > > > FWIW, commit in question, which is d9e474ea56, has been applied after > the > > > last released version to date, which is 7.2.0. So I guess this change > is > > > not applicable to -stable. > > > > I think it should. > > > > This is a bug that can happen when your host page is smaller than the > > guest size. > > > > And has been that way for a long time. > > Once told that, people don't migrate alpha guests a lot, but it can also > > happens with funny combinations of large pages. > > > > Peter Xu knows more about this. > > Thanks, Juan. > > I think Michael was correct that d9e474ea56 is only merged after our most > recent release (which is v7.2.0). So it doesn't need to have stable copied > (aka, it doesn't need to be applied to any QEMU's stable tree). > > Juan, could you help drop the "cc: stable" line if the pull is going to > have a new version? > Sure. I hope the problem is somewhere else. > > Side note: I think in the case where we have wrongly have the cc:stable it > shouldn't hurt either, because when the stable tree tries to pick it up > it'll find it doesn't apply at all, then a downstream-only patch will be > needed, then we'll also figure all things out, just later. > > Thanks, > > -- > Peter Xu > >
10.02.2023 18:01, Peter Xu пишет: > Thanks, Juan. > > I think Michael was correct that d9e474ea56 is only merged after our most > recent release (which is v7.2.0). So it doesn't need to have stable copied > (aka, it doesn't need to be applied to any QEMU's stable tree). > > Juan, could you help drop the "cc: stable" line if the pull is going to > have a new version? It has been applied to master already, - this is where I picked it from. > Side note: I think in the case where we have wrongly have the cc:stable it > shouldn't hurt either, because when the stable tree tries to pick it up > it'll find it doesn't apply at all, then a downstream-only patch will be > needed, then we'll also figure all things out, just later. There's absolutely nothing wrong with that. I was just not sure about my own sanity here, and decided to ask. Maybe the problem was deeper and a more careful backport is needed. Speaking of -stable, on my view, it is better if "too many" things will be tagged for it and filtered out, than some important things will not be tagged. Thank you! /mjt
On Fri, Feb 10, 2023 at 06:28:36PM +0300, Michael Tokarev wrote: > 10.02.2023 18:01, Peter Xu пишет: > > > Thanks, Juan. > > > > I think Michael was correct that d9e474ea56 is only merged after our most > > recent release (which is v7.2.0). So it doesn't need to have stable copied > > (aka, it doesn't need to be applied to any QEMU's stable tree). > > > > Juan, could you help drop the "cc: stable" line if the pull is going to > > have a new version? > > It has been applied to master already, - this is where I picked it from. Ah I didn't notice that. > > > Side note: I think in the case where we have wrongly have the cc:stable it > > shouldn't hurt either, because when the stable tree tries to pick it up > > it'll find it doesn't apply at all, then a downstream-only patch will be > > needed, then we'll also figure all things out, just later. > > There's absolutely nothing wrong with that. I was just not sure about my > own sanity here, and decided to ask. Maybe the problem was deeper and a > more careful backport is needed. Thanks for raising this. The old tree should be good with guest psize > host psize not only because the assertion was not there before, but also because we used the right boundary (as hostpage_boundary here): size_t pagesize_bits = qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS; unsigned long hostpage_boundary = QEMU_ALIGN_UP(pss->page + 1, pagesize_bits); And it also matches with Thomas's bisection result, where the bug report came from. TL;DR: I'm pretty sure the old code was good, hence no explicit backport needed. > > Speaking of -stable, on my view, it is better if "too many" things will be > tagged for it and filtered out, than some important things will not be > tagged. Agreed. I guess we shouldn't blindly apply cc:stable because it'll add unnecessary burden to the stable tree maintainers on figuring things out later, IOW it should be a question being thought thoroughly when the developer was working on the patch. Normally it should be the maintainers' role to double check especially when one patch should apply stable but it got missing (which should happen more frequently..). In the ideal world of perfect developers and active tree maintainers, the cc:stable should be 100% accurate because the judgement should be able to have been made with existing knowledge, then stable maintainance should be hopefully even smoother than the reality. In this case definitely my fault to applied the cc:stable wrongly, luckily in the healthy way rather than losing one of them when really needed. > > Thank you! Thanks again to both!
diff --git a/migration/ram.c b/migration/ram.c index 334309f1c6..68a45338e3 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2319,8 +2319,25 @@ static void pss_host_page_prepare(PageSearchStatus *pss) size_t guest_pfns = qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS; pss->host_page_sending = true; - pss->host_page_start = ROUND_DOWN(pss->page, guest_pfns); - pss->host_page_end = ROUND_UP(pss->page + 1, guest_pfns); + if (guest_pfns <= 1) { + /* + * This covers both when guest psize == host psize, or when guest + * has larger psize than the host (guest_pfns==0). + * + * For the latter, we always send one whole guest page per + * iteration of the host page (example: an Alpha VM on x86 host + * will have guest psize 8K while host psize 4K). + */ + pss->host_page_start = pss->page; + pss->host_page_end = pss->page + 1; + } else { + /* + * The host page spans over multiple guest pages, we send them + * within the same host page iteration. + */ + pss->host_page_start = ROUND_DOWN(pss->page, guest_pfns); + pss->host_page_end = ROUND_UP(pss->page + 1, guest_pfns); + } } /*