diff mbox series

[for-4.17] automation: Do not use null scheduler for boot cpupools test

Message ID 20221021165341.7905-1-michal.orzel@amd.com (mailing list archive)
State Superseded
Headers show
Series [for-4.17] automation: Do not use null scheduler for boot cpupools test | expand

Commit Message

Michal Orzel Oct. 21, 2022, 4:53 p.m. UTC
Null scheduler is not enabled on non-debug Xen builds so the current
test can lead to a failure on such jobs. We still want to test that we
can assign the cpupool to a domU with a different scheduler than default
one (credit2). Switch to credit as it is enabled by default.

Fixes: 36e3f4158778 ("automation: Add a new job for testing boot time cpupools on arm64")
Signed-off-by: Michal Orzel <michal.orzel@amd.com>
---
This patch acts as a prerequisite before merging the following patch:
https://lore.kernel.org/xen-devel/20221021132238.16056-1-michal.orzel@amd.com/
(to which Henry already gave RAB), that helped to find the issue described
in the comment.
---
 automation/scripts/qemu-smoke-arm64.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Andrew Cooper Oct. 21, 2022, 5:21 p.m. UTC | #1
On 21/10/2022 17:53, Michal Orzel wrote:
> Null scheduler is not enabled on non-debug Xen builds so the current
> test can lead to a failure on such jobs. We still want to test that we
> can assign the cpupool to a domU with a different scheduler than default
> one (credit2). Switch to credit as it is enabled by default.
>
> Fixes: 36e3f4158778 ("automation: Add a new job for testing boot time cpupools on arm64")
> Signed-off-by: Michal Orzel <michal.orzel@amd.com>

/sigh - I'm sure I nacked that stupidity to begin with.  apparently not...

It is totally bogus for CONFIG_DEBUG to influence logical chunks of
functionality like this.  The CI script is good in its current form.

RTDS and ARINC should be default n.

NULL is more tricky.  PV_SHIM is explicitly security supported, and has
been for years, so the "UNSUPPORTED" is bogus, whatever the default is.

As NULL is explicitly tested in CI, it's clearly supported, and probably
ought to be on default.


Please instead fix Kconfig to not be broken.  That will be a far better
fix overall for people.

As a more general note, tests which are using non-default pieces of
logic ought to activate them explicitly.

~Andrew
Stefano Stabellini Oct. 21, 2022, 7:36 p.m. UTC | #2
On Fri, 21 Oct 2022, Andrew Cooper wrote:
> On 21/10/2022 17:53, Michal Orzel wrote:
> > Null scheduler is not enabled on non-debug Xen builds so the current
> > test can lead to a failure on such jobs. We still want to test that we
> > can assign the cpupool to a domU with a different scheduler than default
> > one (credit2). Switch to credit as it is enabled by default.
> >
> > Fixes: 36e3f4158778 ("automation: Add a new job for testing boot time cpupools on arm64")
> > Signed-off-by: Michal Orzel <michal.orzel@amd.com>
> 
> /sigh - I'm sure I nacked that stupidity to begin with.  apparently not...
> 
> It is totally bogus for CONFIG_DEBUG to influence logical chunks of
> functionality like this.  The CI script is good in its current form.
> 
> RTDS and ARINC should be default n.
> 
> NULL is more tricky.  PV_SHIM is explicitly security supported, and has
> been for years, so the "UNSUPPORTED" is bogus, whatever the default is.
> 
> As NULL is explicitly tested in CI, it's clearly supported, and probably
> ought to be on default.
> 
> 
> Please instead fix Kconfig to not be broken.  That will be a far better
> fix overall for people.
> 
> As a more general note, tests which are using non-default pieces of
> logic ought to activate them explicitly.


I agree with you, but first let me clarify the word "supported".


In Xen Project "supported" implies extra efforts to follow the security
process and of course the security team should be on board with it. If
we say "supported, non security supported" we don't need to follow the
security process but still we sign up for backporting fixes to the
stable tree. It is less extra effort but still some extra effort is
involved.

So, this specific issue aside, I think that as we expand the testing
capabilities of gitlab-ci, we'll have tests for things that are not
necessarily neither "supported" nor "supported, non security supported".


For the NULL scheduler, it is clearly important to many users so it
would be valuable to move it to "supported, non security supported" and
enabling it by default in the build. I don't recall if we still have any
known outstanding issues with it. I think we need a separate email
thread for that discussion and I would understand if the decision is not
to change NULL support status for the 4.17 release (maybe for the 4.18
release?).


In any case, we don't need CONFIG_DEBUG to enable CONFIG_UNSUPPORTED. It
is just that UNSUPPORTED and NULL don't get enabled by default in the
non-DEBUG build. So to fix gitlab-ci, we can simply enable
CONFIG_UNSUPPORTED explicitly for the builds where we need it
(alpine-3.12-gcc-arm64-boot-cpupools).
Julien Grall Oct. 21, 2022, 10:02 p.m. UTC | #3
Hi Stefano,

On 21/10/2022 20:36, Stefano Stabellini wrote:
> For the NULL scheduler, it is clearly important to many users so it
> would be valuable to move it to "supported, non security supported" and
> enabling it by default in the build. I don't recall if we still have any
> known outstanding issues with it. I think we need a separate email
> thread for that discussion and I would understand if the decision is not
> to change NULL support status for the 4.17 release (maybe for the 4.18
> release?).

At the moment, I am tracking two major issues for NULL scheduler:
  - ED25BE5E-D695-4763-B97A-78D6040E2341@amazon.com
  - alpine.DEB.2.22.394.2201051615060.2060010@ubuntu-linux-20-04-desktop 
(reported by you)

Have they been fixed? If not, then I don't think can be moved to 
"supported, not security supported" because it would fall over basic setup.

Cheers,
Jan Beulich Oct. 24, 2022, 6:29 a.m. UTC | #4
On 21.10.2022 19:21, Andrew Cooper wrote:
> On 21/10/2022 17:53, Michal Orzel wrote:
>> Null scheduler is not enabled on non-debug Xen builds so the current
>> test can lead to a failure on such jobs. We still want to test that we
>> can assign the cpupool to a domU with a different scheduler than default
>> one (credit2). Switch to credit as it is enabled by default.
>>
>> Fixes: 36e3f4158778 ("automation: Add a new job for testing boot time cpupools on arm64")
>> Signed-off-by: Michal Orzel <michal.orzel@amd.com>
> 
> /sigh - I'm sure I nacked that stupidity to begin with.  apparently not...
> 
> It is totally bogus for CONFIG_DEBUG to influence logical chunks of
> functionality like this.  The CI script is good in its current form.

Assuming you mean defaults of settings, I'm afraid I see nothing bogus
there at all. What's wrong with enabling more functionality by default
in debug builds, for people to easily use/test them? Yet keeping
unsupported stuff off by default in release builds? That said, ...

> RTDS and ARINC should be default n.
> 
> NULL is more tricky.  PV_SHIM is explicitly security supported, and has
> been for years, so the "UNSUPPORTED" is bogus, whatever the default is.
> 
> As NULL is explicitly tested in CI, it's clearly supported, and probably
> ought to be on default.

... the state of the NULL scheduler wrt its use by the shim has been
puzzling me before.

> Please instead fix Kconfig to not be broken.  That will be a far better
> fix overall for people.
> 
> As a more general note, tests which are using non-default pieces of
> logic ought to activate them explicitly.

Imo _this_ is the immediate course of action to take. What the appropriate
settings are in Kconfig may be less straightforward to determine (see also
Stefano's and Julien's replies).

Jan
Michal Orzel Oct. 24, 2022, 7:15 a.m. UTC | #5
Replying to all,

On 21/10/2022 21:36, Stefano Stabellini wrote:
> 
> 
> On Fri, 21 Oct 2022, Andrew Cooper wrote:
>> On 21/10/2022 17:53, Michal Orzel wrote:
>>> Null scheduler is not enabled on non-debug Xen builds so the current
>>> test can lead to a failure on such jobs. We still want to test that we
>>> can assign the cpupool to a domU with a different scheduler than default
>>> one (credit2). Switch to credit as it is enabled by default.
>>>
>>> Fixes: 36e3f4158778 ("automation: Add a new job for testing boot time cpupools on arm64")
>>> Signed-off-by: Michal Orzel <michal.orzel@amd.com>
>>
>> /sigh - I'm sure I nacked that stupidity to begin with.  apparently not...
>>
>> It is totally bogus for CONFIG_DEBUG to influence logical chunks of
>> functionality like this.  The CI script is good in its current form.
>>
>> RTDS and ARINC should be default n.
>>
>> NULL is more tricky.  PV_SHIM is explicitly security supported, and has
>> been for years, so the "UNSUPPORTED" is bogus, whatever the default is.
>>
>> As NULL is explicitly tested in CI, it's clearly supported, and probably
>> ought to be on default.
>>
>>
>> Please instead fix Kconfig to not be broken.  That will be a far better
>> fix overall for people.
>>
>> As a more general note, tests which are using non-default pieces of
>> logic ought to activate them explicitly.
> 
> 
> I agree with you, but first let me clarify the word "supported".
> 
> 
> In Xen Project "supported" implies extra efforts to follow the security
> process and of course the security team should be on board with it. If
> we say "supported, non security supported" we don't need to follow the
> security process but still we sign up for backporting fixes to the
> stable tree. It is less extra effort but still some extra effort is
> involved.
> 
> So, this specific issue aside, I think that as we expand the testing
> capabilities of gitlab-ci, we'll have tests for things that are not
> necessarily neither "supported" nor "supported, non security supported".
> 
> 
> For the NULL scheduler, it is clearly important to many users so it
> would be valuable to move it to "supported, non security supported" and
> enabling it by default in the build. I don't recall if we still have any
> known outstanding issues with it. I think we need a separate email
> thread for that discussion and I would understand if the decision is not
> to change NULL support status for the 4.17 release (maybe for the 4.18
> release?).
> 
> 
> In any case, we don't need CONFIG_DEBUG to enable CONFIG_UNSUPPORTED. It
> is just that UNSUPPORTED and NULL don't get enabled by default in the
> non-DEBUG build. So to fix gitlab-ci, we can simply enable
> CONFIG_UNSUPPORTED explicitly for the builds where we need it
> (alpine-3.12-gcc-arm64-boot-cpupools).

Given that there are still diverging opinions \wrt making use of DEBUG
to influence enabling/disabling some functionalities in the code, I would
opt for modifying the CI job to explicitly specify the required config options,
just like I did for static-mem test. The necessary options to enable NULL are:
CONFIG_EXPERT=y
CONFIG_UNSUPPORTED=y
CONFIG_SCHED_NULL=y

This will fix the issue and allow us to continue with 4.17 release.
Given the outstanding issues reported by Julien, it would be challenging to
try to mark the NULL scheduler as supported, not security supported for this release.

Besides that, I think that Andrew still has a valid point. We seem to use DEBUG
only in Kconfig.debug (obvious choice) and sched/Kconfig. So this is not something
common to rely on DEBUG to enable logical functionalities (why did we make this exception for schedulers?).
Having said that, I think the discussion on whether to switch to default n
instead of default DEBUG or not is still valid and requires more people to give feedback.

~Michal
Andrew Cooper Oct. 24, 2022, 10:09 a.m. UTC | #6
On 24/10/2022 07:29, Jan Beulich wrote:
> On 21.10.2022 19:21, Andrew Cooper wrote:
>> On 21/10/2022 17:53, Michal Orzel wrote:
>>> Null scheduler is not enabled on non-debug Xen builds so the current
>>> test can lead to a failure on such jobs. We still want to test that we
>>> can assign the cpupool to a domU with a different scheduler than default
>>> one (credit2). Switch to credit as it is enabled by default.
>>>
>>> Fixes: 36e3f4158778 ("automation: Add a new job for testing boot time cpupools on arm64")
>>> Signed-off-by: Michal Orzel <michal.orzel@amd.com>
>> /sigh - I'm sure I nacked that stupidity to begin with.  apparently not...
>>
>> It is totally bogus for CONFIG_DEBUG to influence logical chunks of
>> functionality like this.  The CI script is good in its current form.
> Assuming you mean defaults of settings, I'm afraid I see nothing bogus
> there at all.

It's a complete violation of any reasonable interpretation of DEBUG.

Apart from creating the bug at the centre of this thread, one does not
turn on DEBUG to get at this functionality in the first place, so the
options should not be interlinked.

>> RTDS and ARINC should be default n.
>>
>> NULL is more tricky.  PV_SHIM is explicitly security supported, and has
>> been for years, so the "UNSUPPORTED" is bogus, whatever the default is.
>>
>> As NULL is explicitly tested in CI, it's clearly supported, and probably
>> ought to be on default.
> ... the state of the NULL scheduler wrt its use by the shim has been
> puzzling me before.

NULL is exactly what the shim wants, and it gets thorough testing.

The only remaining problem is the paperwork.  NULL is already "security
supported in pv shim", hence the unsupported tag is bogus.

~Andrew
Stefano Stabellini Oct. 24, 2022, 9:55 p.m. UTC | #7
On Mon, 24 Oct 2022, Michal Orzel wrote:
> Replying to all,
> 
> On 21/10/2022 21:36, Stefano Stabellini wrote:
> > 
> > 
> > On Fri, 21 Oct 2022, Andrew Cooper wrote:
> >> On 21/10/2022 17:53, Michal Orzel wrote:
> >>> Null scheduler is not enabled on non-debug Xen builds so the current
> >>> test can lead to a failure on such jobs. We still want to test that we
> >>> can assign the cpupool to a domU with a different scheduler than default
> >>> one (credit2). Switch to credit as it is enabled by default.
> >>>
> >>> Fixes: 36e3f4158778 ("automation: Add a new job for testing boot time cpupools on arm64")
> >>> Signed-off-by: Michal Orzel <michal.orzel@amd.com>
> >>
> >> /sigh - I'm sure I nacked that stupidity to begin with.  apparently not...
> >>
> >> It is totally bogus for CONFIG_DEBUG to influence logical chunks of
> >> functionality like this.  The CI script is good in its current form.
> >>
> >> RTDS and ARINC should be default n.
> >>
> >> NULL is more tricky.  PV_SHIM is explicitly security supported, and has
> >> been for years, so the "UNSUPPORTED" is bogus, whatever the default is.
> >>
> >> As NULL is explicitly tested in CI, it's clearly supported, and probably
> >> ought to be on default.
> >>
> >>
> >> Please instead fix Kconfig to not be broken.  That will be a far better
> >> fix overall for people.
> >>
> >> As a more general note, tests which are using non-default pieces of
> >> logic ought to activate them explicitly.
> > 
> > 
> > I agree with you, but first let me clarify the word "supported".
> > 
> > 
> > In Xen Project "supported" implies extra efforts to follow the security
> > process and of course the security team should be on board with it. If
> > we say "supported, non security supported" we don't need to follow the
> > security process but still we sign up for backporting fixes to the
> > stable tree. It is less extra effort but still some extra effort is
> > involved.
> > 
> > So, this specific issue aside, I think that as we expand the testing
> > capabilities of gitlab-ci, we'll have tests for things that are not
> > necessarily neither "supported" nor "supported, non security supported".
> > 
> > 
> > For the NULL scheduler, it is clearly important to many users so it
> > would be valuable to move it to "supported, non security supported" and
> > enabling it by default in the build. I don't recall if we still have any
> > known outstanding issues with it. I think we need a separate email
> > thread for that discussion and I would understand if the decision is not
> > to change NULL support status for the 4.17 release (maybe for the 4.18
> > release?).
> > 
> > 
> > In any case, we don't need CONFIG_DEBUG to enable CONFIG_UNSUPPORTED. It
> > is just that UNSUPPORTED and NULL don't get enabled by default in the
> > non-DEBUG build. So to fix gitlab-ci, we can simply enable
> > CONFIG_UNSUPPORTED explicitly for the builds where we need it
> > (alpine-3.12-gcc-arm64-boot-cpupools).
> 
> Given that there are still diverging opinions \wrt making use of DEBUG
> to influence enabling/disabling some functionalities in the code, I would
> opt for modifying the CI job to explicitly specify the required config options,
> just like I did for static-mem test. The necessary options to enable NULL are:
> CONFIG_EXPERT=y
> CONFIG_UNSUPPORTED=y
> CONFIG_SCHED_NULL=y
> 
> This will fix the issue and allow us to continue with 4.17 release.
> Given the outstanding issues reported by Julien, it would be challenging to
> try to mark the NULL scheduler as supported, not security supported for this release.

Yes, sounds good


> Besides that, I think that Andrew still has a valid point. We seem to use DEBUG
> only in Kconfig.debug (obvious choice) and sched/Kconfig. So this is not something
> common to rely on DEBUG to enable logical functionalities (why did we make this exception for schedulers?).
> Having said that, I think the discussion on whether to switch to default n
> instead of default DEBUG or not is still valid and requires more people to give feedback.

Yeah
diff mbox series

Patch

diff --git a/automation/scripts/qemu-smoke-arm64.sh b/automation/scripts/qemu-smoke-arm64.sh
index 5b566072f72a..a5d8d135b659 100755
--- a/automation/scripts/qemu-smoke-arm64.sh
+++ b/automation/scripts/qemu-smoke-arm64.sh
@@ -29,10 +29,10 @@  fi
 fi
 
 if [[ "${test_variant}" == "boot-cpupools" ]]; then
-    # Check if domU0 (id=1) is assigned to Pool-1 with null scheduler
+    # Check if domU0 (id=1) is assigned to Pool-1 with credit scheduler
     passed="${test_variant} test passed"
     dom0_check="
-if xl list -c 1 | grep -q Pool-1 && xl cpupool-list Pool-1 | grep -q Pool-1; then
+if xl list -c 1 | grep -q Pool-1 && xl cpupool-list Pool-1 | grep -q credit; then
     echo ${passed}
 fi
 "
@@ -140,7 +140,7 @@  fi
 
 if [[ "${test_variant}" == "boot-cpupools" ]]; then
     echo '
-CPUPOOL[0]="cpu@1 null"
+CPUPOOL[0]="cpu@1 credit"
 DOMU_CPUPOOL[0]=0
 NUM_CPUPOOLS=1' >> binaries/config
 fi