diff mbox series

[isar-cip-core,1/1] fix(PREEMPT_RT): disable CONFIG_RT_GROUP_SCHED

Message ID 20240805091722.6871-1-felix.moessbauer@siemens.com (mailing list archive)
State New
Headers show
Series [isar-cip-core,1/1] fix(PREEMPT_RT): disable CONFIG_RT_GROUP_SCHED | expand

Commit Message

Felix Moessbauer Aug. 5, 2024, 9:17 a.m. UTC
This feature is simply broken on RT kernels that use cgroup2, as the
cpu controller features rt_period_us and rt_runtime_us are not
exposed, which are needed to configure things in a way that allows to
run tasks under RT/DL scheduling policies. It further seems to be
incompatible with the nohz_full option.

This patch just disables the feature, similar to upstream Debian.

Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
---
I'm wondering a bit how that was not discovered earlier in our RT tests.
Maybe Quirin can have a look if the cyclictest also shows errors that
the scheduling policy cannot be set. But maybe it simply did not show up
there, as the setup was to simplistic and did not enable the cgroup2 cpu
controller.

Note: This was discovered while tracing down kernel bugs around
timer slack on RT/DL tasks.

Best regards,
Felix Moessbauer
Siemens AG

 recipes-kernel/linux/files/preempt-rt.cfg | 1 +
 1 file changed, 1 insertion(+)

Comments

Jan Kiszka Aug. 5, 2024, 2:24 p.m. UTC | #1
On 05.08.24 11:17, Felix Moessbauer wrote:
> This feature is simply broken on RT kernels that use cgroup2, as the
> cpu controller features rt_period_us and rt_runtime_us are not
> exposed, which are needed to configure things in a way that allows to
> run tasks under RT/DL scheduling policies. It further seems to be
> incompatible with the nohz_full option.
> 
> This patch just disables the feature, similar to upstream Debian.
> 

Makes sense. Didn't trigger yet with the standard RT test images built
from isar-cip-core directly, but out config should think beyond.

> Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
> ---
> I'm wondering a bit how that was not discovered earlier in our RT tests.
> Maybe Quirin can have a look if the cyclictest also shows errors that
> the scheduling policy cannot be set. But maybe it simply did not show up
> there, as the setup was to simplistic and did not enable the cgroup2 cpu
> controller.

The latter is what I assume as well. I tested it with a qemu-x86
bookworm 6.1-rt setup, and cyclictest happily worked.

> 
> Note: This was discovered while tracing down kernel bugs around
> timer slack on RT/DL tasks.
> 
> Best regards,
> Felix Moessbauer
> Siemens AG
> 
>  recipes-kernel/linux/files/preempt-rt.cfg | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/recipes-kernel/linux/files/preempt-rt.cfg b/recipes-kernel/linux/files/preempt-rt.cfg
> index 4afe1bf..ea37556 100644
> --- a/recipes-kernel/linux/files/preempt-rt.cfg
> +++ b/recipes-kernel/linux/files/preempt-rt.cfg
> @@ -1,6 +1,7 @@
>  # >= 5.10
>  CONFIG_EXPERT=y
>  CONFIG_PREEMPT_RT=y
> +CONFIG_RT_GROUP_SCHED=n
>  # <= 4.19
>  CONFIG_PREEMPT_RT_FULL=y
>  

Thanks, applied.

Jan
Felix Moessbauer Aug. 7, 2024, 1:57 p.m. UTC | #2
On Mon, 2024-08-05 at 16:24 +0200, Jan Kiszka wrote:
> On 05.08.24 11:17, Felix Moessbauer wrote:
> > This feature is simply broken on RT kernels that use cgroup2, as
> > the
> > cpu controller features rt_period_us and rt_runtime_us are not
> > exposed, which are needed to configure things in a way that allows
> > to
> > run tasks under RT/DL scheduling policies. It further seems to be
> > incompatible with the nohz_full option.
> > 
> > This patch just disables the feature, similar to upstream Debian.
> > 
> 
> Makes sense. Didn't trigger yet with the standard RT test images
> built
> from isar-cip-core directly, but out config should think beyond.
> 
> > Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
> > ---
> > I'm wondering a bit how that was not discovered earlier in our RT
> > tests.
> > Maybe Quirin can have a look if the cyclictest also shows errors
> > that
> > the scheduling policy cannot be set. But maybe it simply did not
> > show up
> > there, as the setup was to simplistic and did not enable the
> > cgroup2 cpu
> > controller.
> 
> The latter is what I assume as well. I tested it with a qemu-x86
> bookworm 6.1-rt setup, and cyclictest happily worked.

I just noticed, that the CONFIG_CPU_ISOLATION is unset in the CIP RT
kernel config. While this makes sense in pure cgroup2 deployments, I'm
pretty sure many deployments still rely on the isolcpus= kernel
parameter (which only works with CONFIG_CPU_ISOLATION=y).

This makes me wonder what exactly we are testing there.

Felix

> 
> > 
> > Note: This was discovered while tracing down kernel bugs around
> > timer slack on RT/DL tasks.
> > 
> > Best regards,
> > Felix Moessbauer
> > Siemens AG
> > 
> >  recipes-kernel/linux/files/preempt-rt.cfg | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/recipes-kernel/linux/files/preempt-rt.cfg b/recipes-
> > kernel/linux/files/preempt-rt.cfg
> > index 4afe1bf..ea37556 100644
> > --- a/recipes-kernel/linux/files/preempt-rt.cfg
> > +++ b/recipes-kernel/linux/files/preempt-rt.cfg
> > @@ -1,6 +1,7 @@
> >  # >= 5.10
> >  CONFIG_EXPERT=y
> >  CONFIG_PREEMPT_RT=y
> > +CONFIG_RT_GROUP_SCHED=n
> >  # <= 4.19
> >  CONFIG_PREEMPT_RT_FULL=y
> >  
> 
> Thanks, applied.
> 
> Jan
>
Jan Kiszka Aug. 7, 2024, 2:37 p.m. UTC | #3
On 07.08.24 15:57, Moessbauer, Felix (T CED OES-DE) wrote:
> On Mon, 2024-08-05 at 16:24 +0200, Jan Kiszka wrote:
>> On 05.08.24 11:17, Felix Moessbauer wrote:
>>> This feature is simply broken on RT kernels that use cgroup2, as
>>> the
>>> cpu controller features rt_period_us and rt_runtime_us are not
>>> exposed, which are needed to configure things in a way that allows
>>> to
>>> run tasks under RT/DL scheduling policies. It further seems to be
>>> incompatible with the nohz_full option.
>>>
>>> This patch just disables the feature, similar to upstream Debian.
>>>
>>
>> Makes sense. Didn't trigger yet with the standard RT test images
>> built
>> from isar-cip-core directly, but out config should think beyond.
>>
>>> Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
>>> ---
>>> I'm wondering a bit how that was not discovered earlier in our RT
>>> tests.
>>> Maybe Quirin can have a look if the cyclictest also shows errors
>>> that
>>> the scheduling policy cannot be set. But maybe it simply did not
>>> show up
>>> there, as the setup was to simplistic and did not enable the
>>> cgroup2 cpu
>>> controller.
>>
>> The latter is what I assume as well. I tested it with a qemu-x86
>> bookworm 6.1-rt setup, and cyclictest happily worked.
> 
> I just noticed, that the CONFIG_CPU_ISOLATION is unset in the CIP RT
> kernel config. While this makes sense in pure cgroup2 deployments, I'm
> pretty sure many deployments still rely on the isolcpus= kernel
> parameter (which only works with CONFIG_CPU_ISOLATION=y).
> 
> This makes me wonder what exactly we are testing there.
> 

I suppose you are referring to cip-kernel-config, and there
x86/cip_merged_defconfig: We need input in form of project/device
configs in order to cover what people use, to maintain and test features
properly.

Jan
diff mbox series

Patch

diff --git a/recipes-kernel/linux/files/preempt-rt.cfg b/recipes-kernel/linux/files/preempt-rt.cfg
index 4afe1bf..ea37556 100644
--- a/recipes-kernel/linux/files/preempt-rt.cfg
+++ b/recipes-kernel/linux/files/preempt-rt.cfg
@@ -1,6 +1,7 @@ 
 # >= 5.10
 CONFIG_EXPERT=y
 CONFIG_PREEMPT_RT=y
+CONFIG_RT_GROUP_SCHED=n
 # <= 4.19
 CONFIG_PREEMPT_RT_FULL=y