Message ID | 20240126041725.124562-7-peterx@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [PULL,01/15] userfaultfd: use 1ULL to build ioctl masks | expand |
peterx@redhat.com writes: > From: Fabiano Rosas <farosas@suse.de> > > The 'max' cpu is not expected to be stable in terms of features across > QEMU versions, so it should not be expected to migrate. > > While the tests currently all pass with -cpu max, that is only because > we're not testing across QEMU versions, which is the more common > use-case for migration. > > We've recently introduced compatibility tests that use two different > QEMU versions and the tests are now failing for aarch64. The next > patch adds those tests to CI, so we cannot use the 'max' cpu > anymore. Replace it with the 'neoverse-n1', which has a fixed set of > features. > > Suggested-by: Peter Maydell <peter.maydell@linaro.org> > Signed-off-by: Fabiano Rosas <farosas@suse.de> > Link: https://lore.kernel.org/r/20240118164951.30350-2-farosas@suse.de > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > tests/qtest/migration-test.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c > index 7675519cfa..15713f3666 100644 > --- a/tests/qtest/migration-test.c > +++ b/tests/qtest/migration-test.c > @@ -820,7 +820,7 @@ static int test_migrate_start(QTestState **from, QTestState **to, > memory_size = "150M"; > machine_alias = "virt"; > machine_opts = "gic-version=max"; > - arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath); > + arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath); > start_address = ARM_TEST_MEM_START; > end_address = ARM_TEST_MEM_END; > } else { This breaks the tests on an arm host with KVM support. We could drop this patch from the PR, it doesn't affect anything else. Or squash in: -->8-- From b8aa5d8a2b33dcc28e4cd4ce2c4f4eacc3a3b845 Mon Sep 17 00:00:00 2001 From: Fabiano Rosas <farosas@suse.de> Date: Fri, 26 Jan 2024 11:33:15 -0300 Subject: [PATCH] fixup! tests/qtest/migration: Don't use -cpu max for aarch64 Signed-off-by: Fabiano Rosas <farosas@suse.de> --- tests/qtest/migration-test.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 15713f3666..2ba9cab684 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -820,7 +820,9 @@ static int test_migrate_start(QTestState **from, QTestState **to, memory_size = "150M"; machine_alias = "virt"; machine_opts = "gic-version=max"; - arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath); + arch_opts = g_strdup_printf("-cpu %s -kernel %s", + qtest_has_accel("kvm") ? + "host" : "neoverse-n1", bootpath); start_address = ARM_TEST_MEM_START; end_address = ARM_TEST_MEM_END; } else {
On Fri, 26 Jan 2024 at 14:36, Fabiano Rosas <farosas@suse.de> wrote: > > peterx@redhat.com writes: > > > From: Fabiano Rosas <farosas@suse.de> > > > > The 'max' cpu is not expected to be stable in terms of features across > > QEMU versions, so it should not be expected to migrate. > > > > While the tests currently all pass with -cpu max, that is only because > > we're not testing across QEMU versions, which is the more common > > use-case for migration. > > > > We've recently introduced compatibility tests that use two different > > QEMU versions and the tests are now failing for aarch64. The next > > patch adds those tests to CI, so we cannot use the 'max' cpu > > anymore. Replace it with the 'neoverse-n1', which has a fixed set of > > features. > > > > Suggested-by: Peter Maydell <peter.maydell@linaro.org> > > Signed-off-by: Fabiano Rosas <farosas@suse.de> > > Link: https://lore.kernel.org/r/20240118164951.30350-2-farosas@suse.de > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > tests/qtest/migration-test.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c > > index 7675519cfa..15713f3666 100644 > > --- a/tests/qtest/migration-test.c > > +++ b/tests/qtest/migration-test.c > > @@ -820,7 +820,7 @@ static int test_migrate_start(QTestState **from, QTestState **to, > > memory_size = "150M"; > > machine_alias = "virt"; > > machine_opts = "gic-version=max"; > > - arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath); > > + arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath); > > start_address = ARM_TEST_MEM_START; > > end_address = ARM_TEST_MEM_END; > > } else { > > This breaks the tests on an arm host with KVM support. We could drop > this patch from the PR, it doesn't affect anything else. > > Or squash in: > > -->8-- > From b8aa5d8a2b33dcc28e4cd4ce2c4f4eacc3a3b845 Mon Sep 17 00:00:00 2001 > From: Fabiano Rosas <farosas@suse.de> > Date: Fri, 26 Jan 2024 11:33:15 -0300 > Subject: [PATCH] fixup! tests/qtest/migration: Don't use -cpu max for aarch64 > > Signed-off-by: Fabiano Rosas <farosas@suse.de> > --- > tests/qtest/migration-test.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c > index 15713f3666..2ba9cab684 100644 > --- a/tests/qtest/migration-test.c > +++ b/tests/qtest/migration-test.c > @@ -820,7 +820,9 @@ static int test_migrate_start(QTestState **from, QTestState **to, > memory_size = "150M"; > machine_alias = "virt"; > machine_opts = "gic-version=max"; > - arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath); > + arch_opts = g_strdup_printf("-cpu %s -kernel %s", > + qtest_has_accel("kvm") ? > + "host" : "neoverse-n1", bootpath); > start_address = ARM_TEST_MEM_START; > end_address = ARM_TEST_MEM_END; > } else { If you want to do that then a comment explaining why would be helpful for future readers, I think. thanks -- PMM
Peter Maydell <peter.maydell@linaro.org> writes: > On Fri, 26 Jan 2024 at 14:36, Fabiano Rosas <farosas@suse.de> wrote: >> >> peterx@redhat.com writes: >> >> > From: Fabiano Rosas <farosas@suse.de> >> > >> > The 'max' cpu is not expected to be stable in terms of features across >> > QEMU versions, so it should not be expected to migrate. >> > >> > While the tests currently all pass with -cpu max, that is only because >> > we're not testing across QEMU versions, which is the more common >> > use-case for migration. >> > >> > We've recently introduced compatibility tests that use two different >> > QEMU versions and the tests are now failing for aarch64. The next >> > patch adds those tests to CI, so we cannot use the 'max' cpu >> > anymore. Replace it with the 'neoverse-n1', which has a fixed set of >> > features. >> > >> > Suggested-by: Peter Maydell <peter.maydell@linaro.org> >> > Signed-off-by: Fabiano Rosas <farosas@suse.de> >> > Link: https://lore.kernel.org/r/20240118164951.30350-2-farosas@suse.de >> > Signed-off-by: Peter Xu <peterx@redhat.com> >> > --- >> > tests/qtest/migration-test.c | 2 +- >> > 1 file changed, 1 insertion(+), 1 deletion(-) >> > >> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c >> > index 7675519cfa..15713f3666 100644 >> > --- a/tests/qtest/migration-test.c >> > +++ b/tests/qtest/migration-test.c >> > @@ -820,7 +820,7 @@ static int test_migrate_start(QTestState **from, QTestState **to, >> > memory_size = "150M"; >> > machine_alias = "virt"; >> > machine_opts = "gic-version=max"; >> > - arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath); >> > + arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath); >> > start_address = ARM_TEST_MEM_START; >> > end_address = ARM_TEST_MEM_END; >> > } else { >> >> This breaks the tests on an arm host with KVM support. We could drop >> this patch from the PR, it doesn't affect anything else. >> >> Or squash in: >> >> -->8-- >> From b8aa5d8a2b33dcc28e4cd4ce2c4f4eacc3a3b845 Mon Sep 17 00:00:00 2001 >> From: Fabiano Rosas <farosas@suse.de> >> Date: Fri, 26 Jan 2024 11:33:15 -0300 >> Subject: [PATCH] fixup! tests/qtest/migration: Don't use -cpu max for aarch64 >> >> Signed-off-by: Fabiano Rosas <farosas@suse.de> >> --- >> tests/qtest/migration-test.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c >> index 15713f3666..2ba9cab684 100644 >> --- a/tests/qtest/migration-test.c >> +++ b/tests/qtest/migration-test.c >> @@ -820,7 +820,9 @@ static int test_migrate_start(QTestState **from, QTestState **to, >> memory_size = "150M"; >> machine_alias = "virt"; >> machine_opts = "gic-version=max"; >> - arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath); >> + arch_opts = g_strdup_printf("-cpu %s -kernel %s", >> + qtest_has_accel("kvm") ? >> + "host" : "neoverse-n1", bootpath); >> start_address = ARM_TEST_MEM_START; >> end_address = ARM_TEST_MEM_END; >> } else { > > If you want to do that then a comment explaining why would be > helpful for future readers, I think. Ok, let's drop this one then, I'll resend. Thanks
On Fri, Jan 26, 2024 at 11:54:32AM -0300, Fabiano Rosas wrote: > Peter Maydell <peter.maydell@linaro.org> writes: > > > On Fri, 26 Jan 2024 at 14:36, Fabiano Rosas <farosas@suse.de> wrote: > >> > >> peterx@redhat.com writes: > >> > >> > From: Fabiano Rosas <farosas@suse.de> > >> > > >> > The 'max' cpu is not expected to be stable in terms of features across > >> > QEMU versions, so it should not be expected to migrate. > >> > > >> > While the tests currently all pass with -cpu max, that is only because > >> > we're not testing across QEMU versions, which is the more common > >> > use-case for migration. > >> > > >> > We've recently introduced compatibility tests that use two different > >> > QEMU versions and the tests are now failing for aarch64. The next > >> > patch adds those tests to CI, so we cannot use the 'max' cpu > >> > anymore. Replace it with the 'neoverse-n1', which has a fixed set of > >> > features. > >> > > >> > Suggested-by: Peter Maydell <peter.maydell@linaro.org> > >> > Signed-off-by: Fabiano Rosas <farosas@suse.de> > >> > Link: https://lore.kernel.org/r/20240118164951.30350-2-farosas@suse.de > >> > Signed-off-by: Peter Xu <peterx@redhat.com> > >> > --- > >> > tests/qtest/migration-test.c | 2 +- > >> > 1 file changed, 1 insertion(+), 1 deletion(-) > >> > > >> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c > >> > index 7675519cfa..15713f3666 100644 > >> > --- a/tests/qtest/migration-test.c > >> > +++ b/tests/qtest/migration-test.c > >> > @@ -820,7 +820,7 @@ static int test_migrate_start(QTestState **from, QTestState **to, > >> > memory_size = "150M"; > >> > machine_alias = "virt"; > >> > machine_opts = "gic-version=max"; > >> > - arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath); > >> > + arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath); > >> > start_address = ARM_TEST_MEM_START; > >> > end_address = ARM_TEST_MEM_END; > >> > } else { > >> > >> This breaks the tests on an arm host with KVM support. We could drop > >> this patch from the PR, it doesn't affect anything else. > >> > >> Or squash in: > >> > >> -->8-- > >> From b8aa5d8a2b33dcc28e4cd4ce2c4f4eacc3a3b845 Mon Sep 17 00:00:00 2001 > >> From: Fabiano Rosas <farosas@suse.de> > >> Date: Fri, 26 Jan 2024 11:33:15 -0300 > >> Subject: [PATCH] fixup! tests/qtest/migration: Don't use -cpu max for aarch64 > >> > >> Signed-off-by: Fabiano Rosas <farosas@suse.de> > >> --- > >> tests/qtest/migration-test.c | 4 +++- > >> 1 file changed, 3 insertions(+), 1 deletion(-) > >> > >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c > >> index 15713f3666..2ba9cab684 100644 > >> --- a/tests/qtest/migration-test.c > >> +++ b/tests/qtest/migration-test.c > >> @@ -820,7 +820,9 @@ static int test_migrate_start(QTestState **from, QTestState **to, > >> memory_size = "150M"; > >> machine_alias = "virt"; > >> machine_opts = "gic-version=max"; > >> - arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath); > >> + arch_opts = g_strdup_printf("-cpu %s -kernel %s", > >> + qtest_has_accel("kvm") ? > >> + "host" : "neoverse-n1", bootpath); > >> start_address = ARM_TEST_MEM_START; > >> end_address = ARM_TEST_MEM_END; > >> } else { > > > > If you want to do that then a comment explaining why would be > > helpful for future readers, I think. > > Ok, let's drop this one then, I'll resend. I'll drop this one for now then, thanks. Just to double check: Fabiano, you meant that "-cpu host" won't hit the same issue as what "-cpu max" would have for the new "n-1" CI test, right? I can also wait to read your patch if that will contain the explanations.
Peter Xu <peterx@redhat.com> writes: > On Fri, Jan 26, 2024 at 11:54:32AM -0300, Fabiano Rosas wrote: >> Peter Maydell <peter.maydell@linaro.org> writes: >> >> > On Fri, 26 Jan 2024 at 14:36, Fabiano Rosas <farosas@suse.de> wrote: >> >> >> >> peterx@redhat.com writes: >> >> >> >> > From: Fabiano Rosas <farosas@suse.de> >> >> > >> >> > The 'max' cpu is not expected to be stable in terms of features across >> >> > QEMU versions, so it should not be expected to migrate. >> >> > >> >> > While the tests currently all pass with -cpu max, that is only because >> >> > we're not testing across QEMU versions, which is the more common >> >> > use-case for migration. >> >> > >> >> > We've recently introduced compatibility tests that use two different >> >> > QEMU versions and the tests are now failing for aarch64. The next >> >> > patch adds those tests to CI, so we cannot use the 'max' cpu >> >> > anymore. Replace it with the 'neoverse-n1', which has a fixed set of >> >> > features. >> >> > >> >> > Suggested-by: Peter Maydell <peter.maydell@linaro.org> >> >> > Signed-off-by: Fabiano Rosas <farosas@suse.de> >> >> > Link: https://lore.kernel.org/r/20240118164951.30350-2-farosas@suse.de >> >> > Signed-off-by: Peter Xu <peterx@redhat.com> >> >> > --- >> >> > tests/qtest/migration-test.c | 2 +- >> >> > 1 file changed, 1 insertion(+), 1 deletion(-) >> >> > >> >> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c >> >> > index 7675519cfa..15713f3666 100644 >> >> > --- a/tests/qtest/migration-test.c >> >> > +++ b/tests/qtest/migration-test.c >> >> > @@ -820,7 +820,7 @@ static int test_migrate_start(QTestState **from, QTestState **to, >> >> > memory_size = "150M"; >> >> > machine_alias = "virt"; >> >> > machine_opts = "gic-version=max"; >> >> > - arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath); >> >> > + arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath); >> >> > start_address = ARM_TEST_MEM_START; >> >> > end_address = ARM_TEST_MEM_END; >> >> > } else { >> >> >> >> This breaks the tests on an arm host with KVM support. We could drop >> >> this patch from the PR, it doesn't affect anything else. >> >> >> >> Or squash in: >> >> >> >> -->8-- >> >> From b8aa5d8a2b33dcc28e4cd4ce2c4f4eacc3a3b845 Mon Sep 17 00:00:00 2001 >> >> From: Fabiano Rosas <farosas@suse.de> >> >> Date: Fri, 26 Jan 2024 11:33:15 -0300 >> >> Subject: [PATCH] fixup! tests/qtest/migration: Don't use -cpu max for aarch64 >> >> >> >> Signed-off-by: Fabiano Rosas <farosas@suse.de> >> >> --- >> >> tests/qtest/migration-test.c | 4 +++- >> >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> >> >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c >> >> index 15713f3666..2ba9cab684 100644 >> >> --- a/tests/qtest/migration-test.c >> >> +++ b/tests/qtest/migration-test.c >> >> @@ -820,7 +820,9 @@ static int test_migrate_start(QTestState **from, QTestState **to, >> >> memory_size = "150M"; >> >> machine_alias = "virt"; >> >> machine_opts = "gic-version=max"; >> >> - arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath); >> >> + arch_opts = g_strdup_printf("-cpu %s -kernel %s", >> >> + qtest_has_accel("kvm") ? >> >> + "host" : "neoverse-n1", bootpath); >> >> start_address = ARM_TEST_MEM_START; >> >> end_address = ARM_TEST_MEM_END; >> >> } else { >> > >> > If you want to do that then a comment explaining why would be >> > helpful for future readers, I think. >> >> Ok, let's drop this one then, I'll resend. > > I'll drop this one for now then, thanks. > > Just to double check: Fabiano, you meant that "-cpu host" won't hit the > same issue as what "-cpu max" would have for the new "n-1" CI test, right? Well, no. What we need here is a cpu that works with KVM. Currently that's 'host'. If that breaks the n-1 test, then it's a regression. We also need a cpu that works with TCG. Any of them would do. Except max which changes in incompatible ways (that was the original patch's purpose). The issue that occurs to me now is that 'cpu host' will not work with TCG. We might actually need to go poking /dev/kvm for this to work.
Fabiano Rosas <farosas@suse.de> writes: > Peter Xu <peterx@redhat.com> writes: > >> On Fri, Jan 26, 2024 at 11:54:32AM -0300, Fabiano Rosas wrote: >>> Peter Maydell <peter.maydell@linaro.org> writes: >>> >>> > On Fri, 26 Jan 2024 at 14:36, Fabiano Rosas <farosas@suse.de> wrote: >>> >> >>> >> peterx@redhat.com writes: >>> >> >>> >> > From: Fabiano Rosas <farosas@suse.de> >>> >> > >>> >> > The 'max' cpu is not expected to be stable in terms of features across >>> >> > QEMU versions, so it should not be expected to migrate. >>> >> > >>> >> > While the tests currently all pass with -cpu max, that is only because >>> >> > we're not testing across QEMU versions, which is the more common >>> >> > use-case for migration. >>> >> > >>> >> > We've recently introduced compatibility tests that use two different >>> >> > QEMU versions and the tests are now failing for aarch64. The next >>> >> > patch adds those tests to CI, so we cannot use the 'max' cpu >>> >> > anymore. Replace it with the 'neoverse-n1', which has a fixed set of >>> >> > features. >>> >> > >>> >> > Suggested-by: Peter Maydell <peter.maydell@linaro.org> >>> >> > Signed-off-by: Fabiano Rosas <farosas@suse.de> >>> >> > Link: https://lore.kernel.org/r/20240118164951.30350-2-farosas@suse.de >>> >> > Signed-off-by: Peter Xu <peterx@redhat.com> >>> >> > --- >>> >> > tests/qtest/migration-test.c | 2 +- >>> >> > 1 file changed, 1 insertion(+), 1 deletion(-) >>> >> > >>> >> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c >>> >> > index 7675519cfa..15713f3666 100644 >>> >> > --- a/tests/qtest/migration-test.c >>> >> > +++ b/tests/qtest/migration-test.c >>> >> > @@ -820,7 +820,7 @@ static int test_migrate_start(QTestState **from, QTestState **to, >>> >> > memory_size = "150M"; >>> >> > machine_alias = "virt"; >>> >> > machine_opts = "gic-version=max"; >>> >> > - arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath); >>> >> > + arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath); >>> >> > start_address = ARM_TEST_MEM_START; >>> >> > end_address = ARM_TEST_MEM_END; >>> >> > } else { >>> >> >>> >> This breaks the tests on an arm host with KVM support. We could drop >>> >> this patch from the PR, it doesn't affect anything else. >>> >> >>> >> Or squash in: >>> >> >>> >> -->8-- >>> >> From b8aa5d8a2b33dcc28e4cd4ce2c4f4eacc3a3b845 Mon Sep 17 00:00:00 2001 >>> >> From: Fabiano Rosas <farosas@suse.de> >>> >> Date: Fri, 26 Jan 2024 11:33:15 -0300 >>> >> Subject: [PATCH] fixup! tests/qtest/migration: Don't use -cpu max for aarch64 >>> >> >>> >> Signed-off-by: Fabiano Rosas <farosas@suse.de> >>> >> --- >>> >> tests/qtest/migration-test.c | 4 +++- >>> >> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >> >>> >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c >>> >> index 15713f3666..2ba9cab684 100644 >>> >> --- a/tests/qtest/migration-test.c >>> >> +++ b/tests/qtest/migration-test.c >>> >> @@ -820,7 +820,9 @@ static int test_migrate_start(QTestState **from, QTestState **to, >>> >> memory_size = "150M"; >>> >> machine_alias = "virt"; >>> >> machine_opts = "gic-version=max"; >>> >> - arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath); >>> >> + arch_opts = g_strdup_printf("-cpu %s -kernel %s", >>> >> + qtest_has_accel("kvm") ? >>> >> + "host" : "neoverse-n1", bootpath); >>> >> start_address = ARM_TEST_MEM_START; >>> >> end_address = ARM_TEST_MEM_END; >>> >> } else { >>> > >>> > If you want to do that then a comment explaining why would be >>> > helpful for future readers, I think. >>> >>> Ok, let's drop this one then, I'll resend. >> >> I'll drop this one for now then, thanks. >> >> Just to double check: Fabiano, you meant that "-cpu host" won't hit the >> same issue as what "-cpu max" would have for the new "n-1" CI test, right? > > Well, no. What we need here is a cpu that works with KVM. Currently > that's 'host'. If that breaks the n-1 test, then it's a regression. > > We also need a cpu that works with TCG. Any of them would do. Except max > which changes in incompatible ways (that was the original patch's > purpose). > > The issue that occurs to me now is that 'cpu host' will not work with > TCG. We might actually need to go poking /dev/kvm for this to work. Nevermind this last part. There's not going to be a scenario where we build with CONFIG_KVM, but run in an environment that does not support KVM.
On Mon, 29 Jan 2024 at 23:31, Fabiano Rosas <farosas@suse.de> wrote: > > Fabiano Rosas <farosas@suse.de> writes: > > > Peter Xu <peterx@redhat.com> writes: > > > >> On Fri, Jan 26, 2024 at 11:54:32AM -0300, Fabiano Rosas wrote: > > The issue that occurs to me now is that 'cpu host' will not work with > > TCG. We might actually need to go poking /dev/kvm for this to work. > > Nevermind this last part. There's not going to be a scenario where we > build with CONFIG_KVM, but run in an environment that does not support > KVM. Yes, there is. We'll build with CONFIG_KVM on any aarch64 host, but that doesn't imply that the user running the build and test has permissions for /dev/kvm. thanks -- PMM
On Tue, Jan 30, 2024 at 10:18:07AM +0000, Peter Maydell wrote: > On Mon, 29 Jan 2024 at 23:31, Fabiano Rosas <farosas@suse.de> wrote: > > > > Fabiano Rosas <farosas@suse.de> writes: > > > > > Peter Xu <peterx@redhat.com> writes: > > > > > >> On Fri, Jan 26, 2024 at 11:54:32AM -0300, Fabiano Rosas wrote: > > > The issue that occurs to me now is that 'cpu host' will not work with > > > TCG. We might actually need to go poking /dev/kvm for this to work. > > > > Nevermind this last part. There's not going to be a scenario where we > > build with CONFIG_KVM, but run in an environment that does not support > > KVM. > > Yes, there is. We'll build with CONFIG_KVM on any aarch64 host, > but that doesn't imply that the user running the build and > test has permissions for /dev/kvm. I'm actually pretty confused on why this would be a problem even for neoverse-n1: can we just try to use KVM, if it fails then use TCG? Something like: (construct qemu cmdline) .. #ifdef CONFIG_KVM "-accel kvm " #endif "-accel tcg " .. ? IIUC if we specify two "-accel", we'll try the first, then if failed then the 2nd?
Peter Xu <peterx@redhat.com> writes: > On Tue, Jan 30, 2024 at 10:18:07AM +0000, Peter Maydell wrote: >> On Mon, 29 Jan 2024 at 23:31, Fabiano Rosas <farosas@suse.de> wrote: >> > >> > Fabiano Rosas <farosas@suse.de> writes: >> > >> > > Peter Xu <peterx@redhat.com> writes: >> > > >> > >> On Fri, Jan 26, 2024 at 11:54:32AM -0300, Fabiano Rosas wrote: >> > > The issue that occurs to me now is that 'cpu host' will not work with >> > > TCG. We might actually need to go poking /dev/kvm for this to work. >> > >> > Nevermind this last part. There's not going to be a scenario where we >> > build with CONFIG_KVM, but run in an environment that does not support >> > KVM. >> >> Yes, there is. We'll build with CONFIG_KVM on any aarch64 host, >> but that doesn't imply that the user running the build and >> test has permissions for /dev/kvm. > > I'm actually pretty confused on why this would be a problem even for > neoverse-n1: can we just try to use KVM, if it fails then use TCG? > Something like: > > (construct qemu cmdline) > .. > #ifdef CONFIG_KVM > "-accel kvm " > #endif > "-accel tcg " > .. > > ? > IIUC if we specify two "-accel", we'll try the first, then if failed then > the 2nd? Aside from '-cpu max', there's no -accel and -cpu combination that works on all of: x86_64 host - TCG-only aarch64 host - KVM & TCG aarch64 host with --disable-tcg - KVM-only aarch64 host without access to /dev/kvm - TCG-only And the cpus are: host - KVM-only neoverse-n1 - TCG-only We'll need something like: /* covers aarch64 host with --disable-tcg */ if (qtest_has_accel("kvm") && !qtest_has_accel("tcg")) { if (open("/dev/kvm", O_RDONLY) < 0) { g_test_skip() } else { "-accel kvm -cpu host" } } /* covers x86_64 host */ if (!qtest_has_accel("kvm") && qtest_has_accel("tcg")) { "-accel tcg -cpu neoverse-n1" } /* covers aarch64 host */ if (qtest_has_accel("kvm") && qtest_has_accel("tcg")) { if (open("/dev/kvm", O_RDONLY) < 0) { "-accel tcg -cpu neoverse-n1" } else { "-accel kvm -cpu host" } }
On Tue, Jan 30, 2024 at 06:23:10PM -0300, Fabiano Rosas wrote: > Peter Xu <peterx@redhat.com> writes: > > > On Tue, Jan 30, 2024 at 10:18:07AM +0000, Peter Maydell wrote: > >> On Mon, 29 Jan 2024 at 23:31, Fabiano Rosas <farosas@suse.de> wrote: > >> > > >> > Fabiano Rosas <farosas@suse.de> writes: > >> > > >> > > Peter Xu <peterx@redhat.com> writes: > >> > > > >> > >> On Fri, Jan 26, 2024 at 11:54:32AM -0300, Fabiano Rosas wrote: > >> > > The issue that occurs to me now is that 'cpu host' will not work with > >> > > TCG. We might actually need to go poking /dev/kvm for this to work. > >> > > >> > Nevermind this last part. There's not going to be a scenario where we > >> > build with CONFIG_KVM, but run in an environment that does not support > >> > KVM. > >> > >> Yes, there is. We'll build with CONFIG_KVM on any aarch64 host, > >> but that doesn't imply that the user running the build and > >> test has permissions for /dev/kvm. > > > > I'm actually pretty confused on why this would be a problem even for > > neoverse-n1: can we just try to use KVM, if it fails then use TCG? > > Something like: > > > > (construct qemu cmdline) > > .. > > #ifdef CONFIG_KVM > > > "-accel kvm " > > #endif > > "-accel tcg " > > .. > > > > ? > > IIUC if we specify two "-accel", we'll try the first, then if failed then > > the 2nd? > > Aside from '-cpu max', there's no -accel and -cpu combination that works > on all of: > > x86_64 host - TCG-only > aarch64 host - KVM & TCG > aarch64 host with --disable-tcg - KVM-only > aarch64 host without access to /dev/kvm - TCG-only > > And the cpus are: > host - KVM-only > neoverse-n1 - TCG-only > > We'll need something like: > > /* covers aarch64 host with --disable-tcg */ > if (qtest_has_accel("kvm") && !qtest_has_accel("tcg")) { > if (open("/dev/kvm", O_RDONLY) < 0) { > g_test_skip() > } else { > "-accel kvm -cpu host" > } > } > > /* covers x86_64 host */ > if (!qtest_has_accel("kvm") && qtest_has_accel("tcg")) { > "-accel tcg -cpu neoverse-n1" > } > > /* covers aarch64 host */ > if (qtest_has_accel("kvm") && qtest_has_accel("tcg")) { > if (open("/dev/kvm", O_RDONLY) < 0) { > "-accel tcg -cpu neoverse-n1" > } else { > "-accel kvm -cpu host" > } > } The open("/dev/kvm") logic more or less duplicates what QEMU already does when init accelerators: if (!qemu_opts_foreach(qemu_find_opts("accel"), do_configure_accelerator, &init_failed, &error_fatal)) { if (!init_failed) { error_report("no accelerator found"); } exit(1); } If /dev/kvm not accessible I think it'll already fallback to tcg here, as do_configure_accelerator() for kvm will just silently fail for qtest. I hope we can still rely on that for /dev/kvm access issues. Hmm, I just notice that test_migrate_start() already has this later: "-accel kvm%s -accel tcg " So we're actually good from that regard, AFAIU. Then did I understand it right that in the failure case KVM is properly initialized, however it crashed later in neoverse-n1 asking for TCG? So the logic in the accel code above didn't really work to do a real fallback? A backtrace of such crash would help, maybe; I tried to find it in the pipeline log but I can only see: ----------------------------------- stderr ----------------------------------- Broken pipe ../tests/qtest/libqtest.c:195: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0) Or, is there some aarch64 cpu that will have a stable CPU ABI (not like -max, which is unstable), meanwhile supports both TCG + KVM? Another thing I noticed that we may need to be caution is that currently gic is also using max version: machine_opts = "gic-version=max"; We may want to choose a sane version too, probably altogether with the patch?
Peter Xu <peterx@redhat.com> writes: > On Tue, Jan 30, 2024 at 06:23:10PM -0300, Fabiano Rosas wrote: >> Peter Xu <peterx@redhat.com> writes: >> >> > On Tue, Jan 30, 2024 at 10:18:07AM +0000, Peter Maydell wrote: >> >> On Mon, 29 Jan 2024 at 23:31, Fabiano Rosas <farosas@suse.de> wrote: >> >> > >> >> > Fabiano Rosas <farosas@suse.de> writes: >> >> > >> >> > > Peter Xu <peterx@redhat.com> writes: >> >> > > >> >> > >> On Fri, Jan 26, 2024 at 11:54:32AM -0300, Fabiano Rosas wrote: >> >> > > The issue that occurs to me now is that 'cpu host' will not work with >> >> > > TCG. We might actually need to go poking /dev/kvm for this to work. >> >> > >> >> > Nevermind this last part. There's not going to be a scenario where we >> >> > build with CONFIG_KVM, but run in an environment that does not support >> >> > KVM. >> >> >> >> Yes, there is. We'll build with CONFIG_KVM on any aarch64 host, >> >> but that doesn't imply that the user running the build and >> >> test has permissions for /dev/kvm. >> > >> > I'm actually pretty confused on why this would be a problem even for >> > neoverse-n1: can we just try to use KVM, if it fails then use TCG? >> > Something like: >> > >> > (construct qemu cmdline) >> > .. >> > #ifdef CONFIG_KVM >> >> > "-accel kvm " >> > #endif >> > "-accel tcg " >> > .. >> > >> > ? >> > IIUC if we specify two "-accel", we'll try the first, then if failed then >> > the 2nd? >> >> Aside from '-cpu max', there's no -accel and -cpu combination that works >> on all of: >> >> x86_64 host - TCG-only >> aarch64 host - KVM & TCG >> aarch64 host with --disable-tcg - KVM-only >> aarch64 host without access to /dev/kvm - TCG-only >> >> And the cpus are: >> host - KVM-only >> neoverse-n1 - TCG-only >> >> We'll need something like: >> >> /* covers aarch64 host with --disable-tcg */ >> if (qtest_has_accel("kvm") && !qtest_has_accel("tcg")) { >> if (open("/dev/kvm", O_RDONLY) < 0) { >> g_test_skip() >> } else { >> "-accel kvm -cpu host" >> } >> } >> >> /* covers x86_64 host */ >> if (!qtest_has_accel("kvm") && qtest_has_accel("tcg")) { >> "-accel tcg -cpu neoverse-n1" >> } >> >> /* covers aarch64 host */ >> if (qtest_has_accel("kvm") && qtest_has_accel("tcg")) { >> if (open("/dev/kvm", O_RDONLY) < 0) { >> "-accel tcg -cpu neoverse-n1" >> } else { >> "-accel kvm -cpu host" >> } >> } > > The open("/dev/kvm") logic more or less duplicates what QEMU already does > when init accelerators: > > if (!qemu_opts_foreach(qemu_find_opts("accel"), > do_configure_accelerator, &init_failed, &error_fatal)) { > if (!init_failed) { > error_report("no accelerator found"); > } > exit(1); > } > > If /dev/kvm not accessible I think it'll already fallback to tcg here, as > do_configure_accelerator() for kvm will just silently fail for qtest. I > hope we can still rely on that for /dev/kvm access issues. If we ask for KVM and it falls back to TCG, we need a cpu that supports both. We don't have that. I've put some command-line combinations at the end of the email[1], take a look. If we ask for KVM only and /dev/kvm is not accessible, the test will fail and we can prevent that by checking beforehand. It's much simpler to check first and do the right thing than to run the QEMU binary and somehow work around the test failure in migration-test. > > Hmm, I just notice that test_migrate_start() already has this later: > > "-accel kvm%s -accel tcg " > > So we're actually good from that regard, AFAIU. > > Then did I understand it right that in the failure case KVM is properly > initialized, however it crashed later in neoverse-n1 asking for TCG? So It didn't crash. It simply does not accept the neoverse-n1 with KVM because it's unsupported: qemu-system-aarch64: KVM is not supported for this guest CPU type qemu-system-aarch64: kvm_init_vcpu: kvm_arch_init_vcpu failed (0): Invalid argument > the logic in the accel code above didn't really work to do a real fallback? Yep, it didn't. > A backtrace of such crash would help, maybe; I tried to find it in the > pipeline log but I can only see: > > ----------------------------------- stderr ----------------------------------- > Broken pipe > ../tests/qtest/libqtest.c:195: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0) We need to fix the QTEST_LOG logic someday. It currently hides QEMU stderr. But when we enable logging then it logs every single serial read and write and query-migrate in the face of the earth and it floods the logs. > > Or, is there some aarch64 cpu that will have a stable CPU ABI (not like > -max, which is unstable), meanwhile supports both TCG + KVM? Not as far as I know. > > Another thing I noticed that we may need to be caution is that currently > gic is also using max version: > > machine_opts = "gic-version=max"; > > We may want to choose a sane version too, probably altogether with the > patch? Good point. ==================== [1] On x86_64: ========== -cpu host --------- $ ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm qemu-system-aarch64: -accel kvm: invalid accelerator kvm $ ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg qemu-system-aarch64: unable to find CPU model 'host' $ ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm -accel tcg qemu-system-aarch64: -accel kvm: invalid accelerator kvm qemu-system-aarch64: falling back to tcg qemu-system-aarch64: unable to find CPU model 'host' $ ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg -accel kvm qemu-system-aarch64: unable to find CPU model 'host' -cpu neoverse-n1 ---------------- $ ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel tcg works $ ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel kvm qemu-system-aarch64: -accel kvm: invalid accelerator kvm $ ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel kvm -accel tcg qemu-system-aarch64: -accel kvm: invalid accelerator kvm qemu-system-aarch64: falling back to tcg works $ ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel tcg -accel kvm works On aarch64: =========== -cpu host --------- # ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm works # ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg qemu-system-aarch64: The 'host' CPU type can only be used with KVM or HVF # ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm -accel tcg works # ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg -accel kvm qemu-system-aarch64: The 'host' CPU type can only be used with KVM or HVF -cpu neoverse-n1 ---------------- # ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel kvm qemu-system-aarch64: KVM is not supported for this guest CPU type qemu-system-aarch64: kvm_init_vcpu: kvm_arch_init_vcpu failed (0): Invalid argument # ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel tcg works # ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel kvm -accel tcg qemu-system-aarch64: KVM is not supported for this guest CPU type qemu-system-aarch64: kvm_init_vcpu: kvm_arch_init_vcpu failed (0): Invalid argument # ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel tcg -accel kvm works On aarch64 --disable-tcg: ========================= -cpu host --------- # ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm works # ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg qemu-system-aarch64: -accel tcg: invalid accelerator tcg # ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm -accel tcg works # ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg -accel kvm qemu-system-aarch64: -accel tcg: invalid accelerator tcg qemu-system-aarch64: falling back to KVM works -cpu neoverse-n1 ---------------- # ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel kvm qemu-system-aarch64: unable to find CPU model 'neoverse-n1' # ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel tcg qemu-system-aarch64: -accel tcg: invalid accelerator tcg # ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel kvm -accel tcg qemu-system-aarch64: unable to find CPU model 'neoverse-n1' # ./qemu-system-aarch64 -nographic -machine virt -cpu neoverse-n1 -accel tcg -accel kvm qemu-system-aarch64: -accel tcg: invalid accelerator tcg qemu-system-aarch64: falling back to KVM qemu-system-aarch64: unable to find CPU model 'neoverse-n1' On aarch64 without access to /dev/kvm: ====================================== -cpu host --------- # ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm Could not access KVM kernel module: No such file or directory qemu-system-aarch64: -accel kvm: failed to initialize kvm: No such file or directory # ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg qemu-system-aarch64: The 'host' CPU type can only be used with KVM or HVF # ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel kvm -accel tcg Could not access KVM kernel module: No such file or directory qemu-system-aarch64: -accel kvm: failed to initialize kvm: No such file or directory qemu-system-aarch64: falling back to tcg qemu-system-aarch64: The 'host' CPU type can only be used with KVM or HVF # ./qemu-system-aarch64 -nographic -machine virt -cpu host -accel tcg -accel kvm qemu-system-aarch64: The 'host' CPU type can only be used with KVM or HVF
On Wed, Jan 31, 2024 at 10:09:16AM -0300, Fabiano Rosas wrote: > If we ask for KVM and it falls back to TCG, we need a cpu that supports > both. We don't have that. I've put some command-line combinations at the > end of the email[1], take a look. Thanks a lot, Fabiano. I think I have a better picture now. Now the question is whether it'll be worthwhile we (migration) explicitly provide code to workaround such issue in qtest, or we wait for ARM side until we have a processor that can be both stable and support KVM+TCG. I actually personally prefer to wait - it's not too bad after all, because it only affects the new "n-1" migration test. Most of the migration functionality will still be covered there in CI for ARM. Meanwhile, AFAIU we do have a plan upstream to have a stable aarch64 cpu model sooner or later that at least support KVM. If that will also be able to support TCG then goal achieved. Or vice versa, if we would be able to add KVM support to some stable TCG-only cores (like neoverse-n1). Do we have a plan in this area? Copy both Peter & Eric. If we can have that in 9.0 then that'll be perfect; we can already start to switch migration tests to use the cpu model. As of now, maybe we can (1) fix the gic-version in migration-test.c to be stable; this seems a separate issue just to get prepared when a new model comes, then (2) document above decision in migration-compat-aarch64 test in .gitlab-ci.d/, if we can reach consensus. Then we only rely on x86 for "n-1" migration tests until later.
Fabiano, I think you forgot to reply-to-all.. adding back the list and people in the loop. On Thu, Feb 01, 2024 at 10:12:44AM -0300, Fabiano Rosas wrote: > Peter Xu <peterx@redhat.com> writes: > > > On Wed, Jan 31, 2024 at 10:09:16AM -0300, Fabiano Rosas wrote: > >> If we ask for KVM and it falls back to TCG, we need a cpu that supports > >> both. We don't have that. I've put some command-line combinations at the > >> end of the email[1], take a look. > > > > Thanks a lot, Fabiano. I think I have a better picture now. > > > > Now the question is whether it'll be worthwhile we (migration) explicitly > > provide code to workaround such issue in qtest, or we wait for ARM side > > until we have a processor that can be both stable and support KVM+TCG. > > > > I actually personally prefer to wait - it's not too bad after all, because > > it only affects the new "n-1" migration test. Most of the migration > > functionality will still be covered there in CI for ARM. > > That's fine with me. We just need to do something about the arm CI job > which is currently disabled waiting for a fix. We could remove it or add > some words somewhere explaining the situation. I can do that once we > reach an agreement here. Yes. IMHO we can keep the test (with SKIPPED=1) but amend the message, which will start to state inaccurately: # This job is disabled until we release 9.0. The existing # migration-test in 8.2 is broken on aarch64. The fix was already # commited, but it will only take effect once 9.0 is out. IMHO then it won't mean 9.0 will have it fixed, but we'll simply wait for a cpu model that is ready for both kvm+tcg, then we replace "max". For gic_version, I knew 3 was there for quite some time so maybe we can start from that? Or we need suggestions from ARM people.
On Thu, 1 Feb 2024 at 23:50, Peter Xu <peterx@redhat.com> wrote: > > Fabiano, I think you forgot to reply-to-all.. adding back the list and > people in the loop. > > On Thu, Feb 01, 2024 at 10:12:44AM -0300, Fabiano Rosas wrote: > > Peter Xu <peterx@redhat.com> writes: > > > > > On Wed, Jan 31, 2024 at 10:09:16AM -0300, Fabiano Rosas wrote: > > >> If we ask for KVM and it falls back to TCG, we need a cpu that supports > > >> both. We don't have that. I've put some command-line combinations at the > > >> end of the email[1], take a look. > > > > > > Thanks a lot, Fabiano. I think I have a better picture now. > > > > > > Now the question is whether it'll be worthwhile we (migration) explicitly > > > provide code to workaround such issue in qtest, or we wait for ARM side > > > until we have a processor that can be both stable and support KVM+TCG. > > > > > > I actually personally prefer to wait - it's not too bad after all, because > > > it only affects the new "n-1" migration test. Most of the migration > > > functionality will still be covered there in CI for ARM. > > > > That's fine with me. We just need to do something about the arm CI job > > which is currently disabled waiting for a fix. We could remove it or add > > some words somewhere explaining the situation. I can do that once we > > reach an agreement here. > > Yes. IMHO we can keep the test (with SKIPPED=1) but amend the message, > which will start to state inaccurately: > > # This job is disabled until we release 9.0. The existing > # migration-test in 8.2 is broken on aarch64. The fix was already > # commited, but it will only take effect once 9.0 is out. > > IMHO then it won't mean 9.0 will have it fixed, but we'll simply wait for a > cpu model that is ready for both kvm+tcg, then we replace "max". We already have a CPU model that works for both KVM and TCG: that is "max". We're not going to add another one. The difference is just that we provide different cross-version migration compatibility support levels for the two cases. (Strictly speaking, I'm not sure we strongly support migration compat for 'max' on KVM either -- for instance you probably need to be doing a migration on the same host CPU type and the same host kernel version. It's just that the definition of "max" on KVM is less QEMU-dependent and more host-kernel-dependent, so in your particular situation running the test cases you're less likely to see any possible breakage.) -- PMM
On Fri, Feb 02, 2024 at 10:51:36AM +0000, Peter Maydell wrote: > On Thu, 1 Feb 2024 at 23:50, Peter Xu <peterx@redhat.com> wrote: > > > > Fabiano, I think you forgot to reply-to-all.. adding back the list and > > people in the loop. > > > > On Thu, Feb 01, 2024 at 10:12:44AM -0300, Fabiano Rosas wrote: > > > Peter Xu <peterx@redhat.com> writes: > > > > > > > On Wed, Jan 31, 2024 at 10:09:16AM -0300, Fabiano Rosas wrote: > > > >> If we ask for KVM and it falls back to TCG, we need a cpu that supports > > > >> both. We don't have that. I've put some command-line combinations at the > > > >> end of the email[1], take a look. > > > > > > > > Thanks a lot, Fabiano. I think I have a better picture now. > > > > > > > > Now the question is whether it'll be worthwhile we (migration) explicitly > > > > provide code to workaround such issue in qtest, or we wait for ARM side > > > > until we have a processor that can be both stable and support KVM+TCG. > > > > > > > > I actually personally prefer to wait - it's not too bad after all, because > > > > it only affects the new "n-1" migration test. Most of the migration > > > > functionality will still be covered there in CI for ARM. > > > > > > That's fine with me. We just need to do something about the arm CI job > > > which is currently disabled waiting for a fix. We could remove it or add > > > some words somewhere explaining the situation. I can do that once we > > > reach an agreement here. > > > > Yes. IMHO we can keep the test (with SKIPPED=1) but amend the message, > > which will start to state inaccurately: > > > > # This job is disabled until we release 9.0. The existing > > # migration-test in 8.2 is broken on aarch64. The fix was already > > # commited, but it will only take effect once 9.0 is out. > > > > IMHO then it won't mean 9.0 will have it fixed, but we'll simply wait for a > > cpu model that is ready for both kvm+tcg, then we replace "max". > > We already have a CPU model that works for both KVM and TCG: that > is "max". We're not going to add another one. Thanks, but then this is pretty sad. I'm surprised aarch64 doesn't have such requirement to allow some VM config to run across all kinds of hosts. > The difference is just that we provide different cross-version migration > compatibility support levels for the two cases. (Strictly speaking, I'm > not sure we strongly support migration compat for 'max' on KVM either -- > for instance you probably need to be doing a migration on the same host > CPU type and the same host kernel version. It's just that the definition > of "max" on KVM is less QEMU-dependent and more host-kernel-dependent, so > in your particular situation running the test cases you're less likely to > see any possible breakage.) Yes we don't have issue for the current CI on KVM compatibilities, but QEMU does matter for sure. Then we can either (1) add code as Fabiano suggested to choose different cpu model by adding hack code in qtest, or (2) we simply not support aarch64 on cross binary test like most of the rest of the arch, but only support x86, until any arch can provide a stable CPU that support all config of hosts (we can document it in the CI file). I'd vote for (2). Fabiano, do you have any preference?
Hi, On 2/2/24 11:51, Peter Maydell wrote: > On Thu, 1 Feb 2024 at 23:50, Peter Xu <peterx@redhat.com> wrote: >> Fabiano, I think you forgot to reply-to-all.. adding back the list and >> people in the loop. >> >> On Thu, Feb 01, 2024 at 10:12:44AM -0300, Fabiano Rosas wrote: >>> Peter Xu <peterx@redhat.com> writes: >>> >>>> On Wed, Jan 31, 2024 at 10:09:16AM -0300, Fabiano Rosas wrote: >>>>> If we ask for KVM and it falls back to TCG, we need a cpu that supports >>>>> both. We don't have that. I've put some command-line combinations at the >>>>> end of the email[1], take a look. >>>> Thanks a lot, Fabiano. I think I have a better picture now. >>>> >>>> Now the question is whether it'll be worthwhile we (migration) explicitly >>>> provide code to workaround such issue in qtest, or we wait for ARM side >>>> until we have a processor that can be both stable and support KVM+TCG. >>>> >>>> I actually personally prefer to wait - it's not too bad after all, because >>>> it only affects the new "n-1" migration test. Most of the migration >>>> functionality will still be covered there in CI for ARM. >>> That's fine with me. We just need to do something about the arm CI job >>> which is currently disabled waiting for a fix. We could remove it or add >>> some words somewhere explaining the situation. I can do that once we >>> reach an agreement here. >> Yes. IMHO we can keep the test (with SKIPPED=1) but amend the message, >> which will start to state inaccurately: >> >> # This job is disabled until we release 9.0. The existing >> # migration-test in 8.2 is broken on aarch64. The fix was already >> # commited, but it will only take effect once 9.0 is out. >> >> IMHO then it won't mean 9.0 will have it fixed, but we'll simply wait for a >> cpu model that is ready for both kvm+tcg, then we replace "max". > We already have a CPU model that works for both KVM and TCG: that > is "max". We're not going to add another one. The difference is > just that we provide different cross-version migration compatibility > support levels for the two cases. (Strictly speaking, I'm not sure we > strongly support migration compat for 'max' on KVM either -- > for instance you probably need to be doing a migration on the > same host CPU type and the same host kernel version. It's just Yes in general migrating to different kernels will fail. Same for different pCPU types. We need CPU models to work around some of those limitations. Adding Sebastian in copy. He is currently working on this. Thanks Eric > that the definition of "max" on KVM is less QEMU-dependent and > more host-kernel-dependent, so in your particular situation running > the test cases you're less likely to see any possible breakage.) > > -- PMM >
On Mon, 5 Feb 2024 at 02:56, Peter Xu <peterx@redhat.com> wrote: > Thanks, but then this is pretty sad. I'm surprised aarch64 doesn't have > such requirement to allow some VM config to run across all kinds of hosts. It just hasn't been anything that anybody so far has wanted enough to put the necessary kernel-side work into. (There are also some tricky issues surrounding errata workarounds that the guest needs to do: you need to have some way of telling the guest "the vCPU looks like it's type X but you need to do errata workarounds ABC for CPU type Y, not the ones for X".) > > The difference is just that we provide different cross-version migration > > compatibility support levels for the two cases. (Strictly speaking, I'm > > not sure we strongly support migration compat for 'max' on KVM either -- > > for instance you probably need to be doing a migration on the same host > > CPU type and the same host kernel version. It's just that the definition > > of "max" on KVM is less QEMU-dependent and more host-kernel-dependent, so > > in your particular situation running the test cases you're less likely to > > see any possible breakage.) > > Yes we don't have issue for the current CI on KVM compatibilities, but QEMU > does matter for sure. > > Then we can either (1) add code as Fabiano suggested to choose different > cpu model by adding hack code in qtest, or (2) we simply not support > aarch64 on cross binary test like most of the rest of the arch, but only > support x86, until any arch can provide a stable CPU that support all > config of hosts (we can document it in the CI file). That seems a bit pessimistic. How about "always only test with TCG" ? That will defend the migration compat on all the device models etc, which is the bit we're most likely to break by accident. thanks -- PMM
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 7675519cfa..15713f3666 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -820,7 +820,7 @@ static int test_migrate_start(QTestState **from, QTestState **to, memory_size = "150M"; machine_alias = "virt"; machine_opts = "gic-version=max"; - arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath); + arch_opts = g_strdup_printf("-cpu neoverse-n1 -kernel %s", bootpath); start_address = ARM_TEST_MEM_START; end_address = ARM_TEST_MEM_END; } else {