Message ID | 20250206080952.98478-1-gmonaco@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | rv: Add scheduler specification monitors | expand |
Hi Gabriele, On 06/02/25 09:09, Gabriele Monaco wrote: > This patchset starts including adapted scheduler specifications from > Daniel's task model [1]. Thanks a lot for working on this. Apart from being cool stuff per-se, it means a lot personally to see Daniel's work continuing to be developed. > As the model is fairly complicated, it is split in several generators > and specifications. The tool used to create the model can output a > unified model, but that would be hardly readable (9k states). > > RV allows monitors to run and react concurrently. Running the cumulative > model is equivalent to running single components using the same > reactors, with the advantage that it's easier to point out which > specification failed in case of error. > > We allow this by introducing nested monitors, in short, the sysfs > monitor folder will contain a monitor named sched, which is nothing but > an empty container for other monitors. Controlling the sched monitor > (enable, disable, set reactors) controls all nested monitors. > > The task model proposed by Daniel includes 12 generators and 33 > specifications. The generators are good for documentation but are > usually implied in some specifications. > Not all monitors work out of the box, mainly because of those reasons: > * need to distinguish if preempt disable leads to schedule > * need to distinguish if irq disable comes from an actual irq > * assumptions not always true on SMP > > The original task model was designed for PREEMPT_RT and this patchset is > only tested on an upstream kernel with full preemption enabled. I played with your additions a bit and I was able to enable/disable monitors, switch reactors, etc., w/o noticing any issue. I wonder if you also had ways to test that the monitors actually react properly in case of erroneous conditions (so that we can see a reactor actually react :). Thanks, Juri
On Fri, 2025-02-07 at 11:55 +0100, Juri Lelli wrote: > Hi Gabriele, > > On 06/02/25 09:09, Gabriele Monaco wrote: > > This patchset starts including adapted scheduler specifications > > from > > Daniel's task model [1]. > > Thanks a lot for working on this. Apart from being cool stuff per-se, > it > means a lot personally to see Daniel's work continuing to be > developed. > > > As the model is fairly complicated, it is split in several > > generators > > and specifications. The tool used to create the model can output a > > unified model, but that would be hardly readable (9k states). > > > > RV allows monitors to run and react concurrently. Running the > > cumulative > > model is equivalent to running single components using the same > > reactors, with the advantage that it's easier to point out which > > specification failed in case of error. > > > > We allow this by introducing nested monitors, in short, the sysfs > > monitor folder will contain a monitor named sched, which is nothing > > but > > an empty container for other monitors. Controlling the sched > > monitor > > (enable, disable, set reactors) controls all nested monitors. > > > > The task model proposed by Daniel includes 12 generators and 33 > > specifications. The generators are good for documentation but are > > usually implied in some specifications. > > Not all monitors work out of the box, mainly because of those > > reasons: > > * need to distinguish if preempt disable leads to schedule > > * need to distinguish if irq disable comes from an actual irq > > * assumptions not always true on SMP > > > > The original task model was designed for PREEMPT_RT and this > > patchset is > > only tested on an upstream kernel with full preemption enabled. > > I played with your additions a bit and I was able to enable/disable > monitors, switch reactors, etc., w/o noticing any issue. > Thanks for trying it out! > I wonder if you also had ways to test that the monitors actually > react > properly in case of erroneous conditions (so that we can see a > reactor > actually react :). > Well, in my understanding, reactors should fire if there is a problem either in the kernel or in the model logic. While trying things out, I had more than a few models failing and I excluded them from this patch because they are not stable. Ideally you shouldn't be seeing errors using those monitors, unless you (un)intentionally break something in the kernel. That said, the monitor task switch while scheduling (tss) imposes context switches whenever we reach the scheduler. Daniel modified the sched_switch tracepoint to fire also if prev==next (in fact no switch is happening), I'm assuming the tss specification is partly why that was necessary. During my tests, I didn't apply that change, yet I've never seen the monitor failing. If you manage to call __schedule while the next picked task is the same as the currently running one, you should see an error and a reactor firing. Since I couldn't reproduce the above case, I ignored it for the current RFC, however if that's possible in practice, we should perhaps add another event describing this fake switch to prevent the monitor from failing. Thanks, Gabriele
On 07/02/25 12:36, Gabriele Monaco wrote: > > > On Fri, 2025-02-07 at 11:55 +0100, Juri Lelli wrote: > > Hi Gabriele, > > > > On 06/02/25 09:09, Gabriele Monaco wrote: > > > This patchset starts including adapted scheduler specifications > > > from > > > Daniel's task model [1]. > > > > Thanks a lot for working on this. Apart from being cool stuff per-se, > > it > > means a lot personally to see Daniel's work continuing to be > > developed. > > > > > As the model is fairly complicated, it is split in several > > > generators > > > and specifications. The tool used to create the model can output a > > > unified model, but that would be hardly readable (9k states). > > > > > > RV allows monitors to run and react concurrently. Running the > > > cumulative > > > model is equivalent to running single components using the same > > > reactors, with the advantage that it's easier to point out which > > > specification failed in case of error. > > > > > > We allow this by introducing nested monitors, in short, the sysfs > > > monitor folder will contain a monitor named sched, which is nothing > > > but > > > an empty container for other monitors. Controlling the sched > > > monitor > > > (enable, disable, set reactors) controls all nested monitors. > > > > > > The task model proposed by Daniel includes 12 generators and 33 > > > specifications. The generators are good for documentation but are > > > usually implied in some specifications. > > > Not all monitors work out of the box, mainly because of those > > > reasons: > > > * need to distinguish if preempt disable leads to schedule > > > * need to distinguish if irq disable comes from an actual irq > > > * assumptions not always true on SMP > > > > > > The original task model was designed for PREEMPT_RT and this > > > patchset is > > > only tested on an upstream kernel with full preemption enabled. > > > > I played with your additions a bit and I was able to enable/disable > > monitors, switch reactors, etc., w/o noticing any issue. > > > > Thanks for trying it out! > > > I wonder if you also had ways to test that the monitors actually > > react > > properly in case of erroneous conditions (so that we can see a > > reactor > > actually react :). > > > > Well, in my understanding, reactors should fire if there is a problem > either in the kernel or in the model logic. Right. I guess I wonder if we can find a way to inject kernel problems somehow, so that model(s) can be further tested explicitly thus making us confident that they will be able to identify real problems when they occur.
On Fri, 2025-02-07 at 15:27 +0100, Juri Lelli wrote: > On 07/02/25 12:36, Gabriele Monaco wrote: > > > > > > On Fri, 2025-02-07 at 11:55 +0100, Juri Lelli wrote: > > > Hi Gabriele, > > > > > > On 06/02/25 09:09, Gabriele Monaco wrote: > > > > This patchset starts including adapted scheduler specifications > > > > from > > > > Daniel's task model [1]. > > > > > > Thanks a lot for working on this. Apart from being cool stuff > > > per-se, > > > it > > > means a lot personally to see Daniel's work continuing to be > > > developed. > > > > > > > As the model is fairly complicated, it is split in several > > > > generators > > > > and specifications. The tool used to create the model can > > > > output a > > > > unified model, but that would be hardly readable (9k states). > > > > > > > > RV allows monitors to run and react concurrently. Running the > > > > cumulative > > > > model is equivalent to running single components using the same > > > > reactors, with the advantage that it's easier to point out > > > > which > > > > specification failed in case of error. > > > > > > > > We allow this by introducing nested monitors, in short, the > > > > sysfs > > > > monitor folder will contain a monitor named sched, which is > > > > nothing > > > > but > > > > an empty container for other monitors. Controlling the sched > > > > monitor > > > > (enable, disable, set reactors) controls all nested monitors. > > > > > > > > The task model proposed by Daniel includes 12 generators and 33 > > > > specifications. The generators are good for documentation but > > > > are > > > > usually implied in some specifications. > > > > Not all monitors work out of the box, mainly because of those > > > > reasons: > > > > * need to distinguish if preempt disable leads to schedule > > > > * need to distinguish if irq disable comes from an actual irq > > > > * assumptions not always true on SMP > > > > > > > > The original task model was designed for PREEMPT_RT and this > > > > patchset is > > > > only tested on an upstream kernel with full preemption enabled. > > > > > > I played with your additions a bit and I was able to > > > enable/disable > > > monitors, switch reactors, etc., w/o noticing any issue. > > > > > > > Thanks for trying it out! > > > > > I wonder if you also had ways to test that the monitors actually > > > react > > > properly in case of erroneous conditions (so that we can see a > > > reactor > > > actually react :). > > > > > > > Well, in my understanding, reactors should fire if there is a > > problem > > either in the kernel or in the model logic. > > Right. I guess I wonder if we can find a way to inject kernel > problems > somehow, so that model(s) can be further tested explicitly thus > making > us confident that they will be able to identify real problems when > they > occur. > Just for the sake of testing reactors there is already the wwnr monitor which is intentionally broken exactly for that reason, Daniel described a bit what scenario can trigger the error (some IRQ, so it may be harder to see in a VM). If we want any monitor to react, yeah that could be a bit trickier. We could perhaps do something like livepatching/kprobes. I'd assume we'd need some care though, since some of those monitors are pretty basic and making them fail may cause pretty bad errors in the kernel. Another approach could be to just inject the tracepoint/handler call those are all static functions but we may have some luck there and wouldn't break the system, just trick the monitor. But good point, I can have a look.
On Fri, 2025-02-07 at 15:27 +0100, Juri Lelli wrote: > > Right. I guess I wonder if we can find a way to inject kernel > problems > somehow, so that model(s) can be further tested explicitly thus > making > us confident that they will be able to identify real problems when > they > occur. > I sketched it quickly and I wouldn't include it in this series not to make it heavier, but if you want to play with it I wrote a patch exporting a function to inject events and building a kernel module calling the function periodically. You'd need to build the kernel with CONFIG_RV_DEBUG_TRIGGER and do something like # modprobe monitor=snroc event=1 # ./tools/verification/rv/rv mon snroc -r printk # dmesg [ 88.327892] rv: monitor snroc does not allow event sched_switch_in on state own_context You can omit event (in which case it will select the first) and select different monitors. In case of per-task monitors (as the one above), some events may never trigger errors, but if you choose carefully while modprobe-ing you should be fine. The events numbers are defined as enum in the monitor header (e.g. snroc.h). Subject: [PATCH] rv: Add infrastructure to trigger debug events RV monitors are supposed to test some core functionality and, in ideal scenarios, should never fire errors. This implies that we cannot test if reactors work as expected on a properly set monitor, which may add false positives. A common example is a monitor where no da_handle_start_event_ function is called, in such a case, the monitor would never start, so it never produces events nor errors. It is easy to understand this monitor is wrongly configured by checking the events tracepoints, but if a test setup expects the monitor to trigger reactors in case of failure, it will erroneously flag such a monitor as correct. Enable creation for each monitor of an exported function to simulate triggering an event. This function can be used from any other kernel code, including modules, to inject events into the monitor for debugging purposes. This can effectively used to test reactors on monitors which, otherwise, would never fire any error. The function has the following prototype, where name is the monitor name: bool da_trigger_event_name(int event) event can be any integer, but real events will be triggered only if the supplied number is a valid event for the monitor, otherwise nothing happens and the function returns false. If this configuration is disabled, no function is defined. Also add a module that relies on this function to periodically trigger events to the selected monitor. Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> --- include/rv/da_monitor.h | 59 +++++++++++++++- kernel/trace/rv/Kconfig | 27 +++++++ kernel/trace/rv/Makefile | 1 + kernel/trace/rv/rv_debug_trigger.c | 110 +++++++++++++++++++++++++++++ 4 files changed, 195 insertions(+), 2 deletions(-) create mode 100644 kernel/trace/rv/rv_debug_trigger.c diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h index 510c88bfabd43..ed9f66c53ed7b 100644 --- a/include/rv/da_monitor.h +++ b/include/rv/da_monitor.h @@ -514,6 +514,59 @@ da_handle_start_event_##name(struct task_struct *tsk, enum events_##name event) return 1; \ } +#ifdef CONFIG_RV_DEBUG_TRIGGER + +/* + * Handle event for implicit monitors + */ +#define DECLARE_DA_MON_TRIGGER_IMPLICIT(name) \ +/* \ + * da_trigger_event_##name - trigger an event from outside the monitor \ + * \ + * This function is used only for debug purposes, it calls the function to \ + * handle events to simulate the occurrence of the event. This may can be \ + * useful to make the monitor fail and test reactors but may have unintended \ + * consequences. \ + * For simplicity, accept the event as int but validate its value and return \ + * true if the event was valid, false if not and we did not trigger it. \ + */ \ +bool da_trigger_event_##name(int event); \ +bool da_trigger_event_##name(int event) \ +{ \ + if (event < 0 || event >= event_max_##name) \ + return false; \ + da_handle_event_##name(event); \ + return true; \ +} \ +EXPORT_SYMBOL(da_trigger_event_##name); + +/* + * Handle event for per-task monitors + */ +#define DECLARE_DA_MON_TRIGGER_PER_TASK(name) \ +/* \ + * da_trigger_event_##name - trigger an event from outside the monitor \ + * \ + * This function is used only for debug purposes, it calls the function to \ + * handle events to simulate the occurrence of the event. This may can be \ + * useful to make the monitor fail and test reactors but may have unintended \ + * consequences. \ + * For simplicity, accept the event as int but validate its value and return \ + * true if the event was valid, false if not and we did not trigger it. \ + * Also to keep it simple for the caller, fill the task with current. \ + */ \ +bool da_trigger_event_##name(int event); \ +bool da_trigger_event_##name(int event) \ +{ \ + if (event < 0 || event >= event_max_##name) \ + return false; \ + da_handle_event_##name(current, event); \ + return true; \ +} \ +EXPORT_SYMBOL(da_trigger_event_##name); + +#endif /* CONFIG_RV_DEBUG_TRIGGER */ + /* * Entry point for the global monitor. */ @@ -534,7 +587,8 @@ DECLARE_AUTOMATA_HELPERS(name, type) \ DECLARE_DA_MON_GENERIC_HELPERS(name, type) \ DECLARE_DA_MON_MODEL_HANDLER_IMPLICIT(name, type) \ DECLARE_DA_MON_INIT_PER_CPU(name, type) \ -DECLARE_DA_MON_MONITOR_HANDLER_IMPLICIT(name, type) +DECLARE_DA_MON_MONITOR_HANDLER_IMPLICIT(name, type) \ +DECLARE_DA_MON_TRIGGER_IMPLICIT(name) /* * Entry point for the per-task monitor. @@ -545,4 +599,5 @@ DECLARE_AUTOMATA_HELPERS(name, type) \ DECLARE_DA_MON_GENERIC_HELPERS(name, type) \ DECLARE_DA_MON_MODEL_HANDLER_PER_TASK(name, type) \ DECLARE_DA_MON_INIT_PER_TASK(name, type) \ -DECLARE_DA_MON_MONITOR_HANDLER_PER_TASK(name, type) +DECLARE_DA_MON_MONITOR_HANDLER_PER_TASK(name, type) \ +DECLARE_DA_MON_TRIGGER_PER_TASK(name) diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig index b39f36013ef23..6aa927db99ee8 100644 --- a/kernel/trace/rv/Kconfig +++ b/kernel/trace/rv/Kconfig @@ -62,3 +62,30 @@ config RV_REACT_PANIC help Enables the panic reactor. The panic reactor emits a printk() message if an exception is found and panic()s the system. + +config RV_DEBUG_TRIGGER + bool "Runtime verification debug trigger event" + default n + depends on RV + help + Enables creation for each monitor of an exported function to simulate + triggering an event. This function can be used from any other kernel + code, including modules, to inject events into the monitor for debugging + purposes. A common use-case is to test reactors on monitors which, + otherwise, would never fire any error. + Use the function with care as it might have unintended consequences. + The function has the following prototype, where name is the monitor name: + bool da_trigger_event_name(int event) + event can be any integer, but real events will be triggered only if the + supplied number is a valid event for the monitor, otherwise nothing + happens and the function returns false. + + This configuration enables compilation of the rv_debug_trigger kernel + module which relies on this function to periodically trigger events. + If the configuration is disabled, no function is defined. + +config RV_DEBUG_TRIGGER_MODULE + tristate + default m if RV_DEBUG_TRIGGER + depends on RV + depends on RV_DEBUG_TRIGGER diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile index f9b2cd0483c3c..47207c81260ee 100644 --- a/kernel/trace/rv/Makefile +++ b/kernel/trace/rv/Makefile @@ -16,3 +16,4 @@ obj-$(CONFIG_RV_MON_SNCID) += monitors/sncid/sncid.o obj-$(CONFIG_RV_REACTORS) += rv_reactors.o obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o obj-$(CONFIG_RV_REACT_PANIC) += reactor_panic.o +obj-$(CONFIG_RV_DEBUG_TRIGGER_MODULE) += rv_debug_trigger.o diff --git a/kernel/trace/rv/rv_debug_trigger.c b/kernel/trace/rv/rv_debug_trigger.c new file mode 100644 index 0000000000000..fa5bbdf14ca5f --- /dev/null +++ b/kernel/trace/rv/rv_debug_trigger.c @@ -0,0 +1,110 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2025-2028 Red Hat, Inc. Gabriele Monaco <gmonaco@redhat.com> + * + * RV debug trigger module: + * Insert this module to periodically trigger a fake event to the monitor + * provided as parameter. The numerical value of the event can be set as + * parameter as well. + */ +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/rv.h> + +bool da_trigger_event_wip(int event); +bool da_trigger_event_wwnr(int event); +bool da_trigger_event_tss(int event); +bool da_trigger_event_sco(int event); +bool da_trigger_event_snroc(int event); +bool da_trigger_event_scpd(int event); +bool da_trigger_event_snep(int event); +bool da_trigger_event_sncid(int event); + +/* do not allow periodicity lower than 1 us */ +#define MIN_PERIOD 1000 + +static char monitor[MAX_DA_NAME_LEN] = "wwnr"; +module_param_string(monitor, monitor, sizeof(monitor), 0644); + +static int event = 0; +module_param(event, int, 0644); + +static int period = 1000000; +module_param(period, int, 0644); + +static bool (*trigger)(int event); +static struct hrtimer periodic_timer; + +static enum hrtimer_restart trigger_worker(struct hrtimer *timer) +{ + hrtimer_forward_now(timer, period); + trigger(event); + return HRTIMER_RESTART; +} + +static int __init rv_debug_trigger_init(void) +{ +#ifdef CONFIG_RV_MON_WIP + if (!strcmp(monitor, "wip")) + trigger = da_trigger_event_wip; +#endif +#ifdef CONFIG_RV_MON_WWNR + if (!strcmp(monitor, "wwnr")) + trigger = da_trigger_event_wwnr; +#endif +#ifdef CONFIG_RV_MON_TSS + if (!strcmp(monitor, "tss")) + trigger = da_trigger_event_tss; +#endif +#ifdef CONFIG_RV_MON_SCO + if (!strcmp(monitor, "sco")) + trigger = da_trigger_event_sco; +#endif +#ifdef CONFIG_RV_MON_SNROC + if (!strcmp(monitor, "snroc")) + trigger = da_trigger_event_snroc; +#endif +#ifdef CONFIG_RV_MON_SCPD + if (!strcmp(monitor, "scpd")) + trigger = da_trigger_event_scpd; +#endif +#ifdef CONFIG_RV_MON_SNEP + if (!strcmp(monitor, "snep")) + trigger = da_trigger_event_snep; +#endif +#ifdef CONFIG_RV_MON_SNCID + if (!strcmp(monitor, "sncid")) + trigger = da_trigger_event_sncid; +#endif + + if (!trigger) { + pr_warn("Invalid monitor %s\n", monitor); + return -EINVAL; + } + if (!trigger(event)) { + pr_warn("Invalid event %d for monitor %s\n", event, + monitor); + return -EINVAL; + } + if (period < MIN_PERIOD) { + pr_warn("Use at least %d us as period, %d provided\n", + MIN_PERIOD, period); + return -EINVAL; + } + hrtimer_init(&periodic_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + periodic_timer.function = trigger_worker; + hrtimer_start(&periodic_timer, period, HRTIMER_MODE_REL); + return 0; +} + +static void __exit rv_debug_trigger_exit(void) +{ + hrtimer_cancel(&periodic_timer); +} + +module_init(rv_debug_trigger_init); +module_exit(rv_debug_trigger_exit); + +MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>"); +MODULE_DESCRIPTION("RV debug trigger: periodically trigger a fake event."); +MODULE_LICENSE("GPL"); base-commit: df5b7771dc64df426b4bd52c7d591a316b839570
On 10/02/25 13:56, Gabriele Monaco wrote: > On Fri, 2025-02-07 at 15:27 +0100, Juri Lelli wrote: > > > > Right. I guess I wonder if we can find a way to inject kernel > > problems > > somehow, so that model(s) can be further tested explicitly thus > > making > > us confident that they will be able to identify real problems when > > they > > occur. > > > > I sketched it quickly and I wouldn't include it in this series not to > make it heavier, but if you want to play with it I wrote a patch > exporting a function to inject events and building a kernel module > calling the function periodically. Ah, this is cool. Thanks for working on it! Yeah, agree, we can leave it for later. Thanks!