diff mbox series

[RFC,1/1] kselftests: Add test to detect boot event slowdowns

Message ID 20240725110622.96301-2-laura.nao@collabora.com (mailing list archive)
State New
Headers show
Series Add kselftest to detect boot event slowdowns | expand

Commit Message

Laura Nao July 25, 2024, 11:06 a.m. UTC
Introduce a new kselftest to identify slowdowns in key boot events.
The test uses ftrace to track timings for specific boot events.
The kprobe_timestamps_to_yaml.py script can be run once to generate a
YAML file with the initial reference timestamps for these events.
The test_boot_time.py script can then be run on subsequent boots to
compare current timings against the reference values and check for
significant slowdowns over time.
The test ships with a bootconfig file for ftrace setup and a config
fragment for the necessary kernel configurations.

Signed-off-by: Laura Nao <laura.nao@collabora.com>
---
 tools/testing/selftests/Makefile              |  1 +
 tools/testing/selftests/boot-time/Makefile    | 17 ++++
 tools/testing/selftests/boot-time/bootconfig  |  8 ++
 tools/testing/selftests/boot-time/config      |  4 +
 .../boot-time/kprobe_timestamps_to_yaml.py    | 55 +++++++++++
 .../selftests/boot-time/test_boot_time.py     | 94 +++++++++++++++++++
 .../selftests/boot-time/trace_utils.py        | 63 +++++++++++++
 7 files changed, 242 insertions(+)
 create mode 100644 tools/testing/selftests/boot-time/Makefile
 create mode 100644 tools/testing/selftests/boot-time/bootconfig
 create mode 100644 tools/testing/selftests/boot-time/config
 create mode 100755 tools/testing/selftests/boot-time/kprobe_timestamps_to_yaml.py
 create mode 100755 tools/testing/selftests/boot-time/test_boot_time.py
 create mode 100644 tools/testing/selftests/boot-time/trace_utils.py

Comments

Shuah Khan July 25, 2024, 3:50 p.m. UTC | #1
On 7/25/24 05:06, Laura Nao wrote:
> Introduce a new kselftest to identify slowdowns in key boot events.
> The test uses ftrace to track timings for specific boot events.
> The kprobe_timestamps_to_yaml.py script can be run once to generate a
> YAML file with the initial reference timestamps for these events.
> The test_boot_time.py script can then be run on subsequent boots to
> compare current timings against the reference values and check for
> significant slowdowns over time.
> The test ships with a bootconfig file for ftrace setup and a config
> fragment for the necessary kernel configurations.
> 
> Signed-off-by: Laura Nao <laura.nao@collabora.com>

I am repeating the same comments I made on the cover letter here as
well.

What are the dependencies if any for this new test to work?
Please do remember that tests in default run needs to have
minimal dependencies so they can run on systems that have
minimal support.

As mentioned earlier take a look at the tools/power/pm-graph
bootgraph.py and sleepgraph.py to see if you can leverage
them - bootgraph detects slowdowns during boot.

We don't want to add duplicate scripts if the other one
serves the needs. Those can be moved to selftests if it
make sense.

I will review this once we figure out if bootgraph serves
the needs and I understand the dependencies for this test
to work.

thanks,
-- Shuah
Laura Nao July 31, 2024, 9:25 a.m. UTC | #2
Hi Shuah,

On 7/25/24 17:50, Shuah Khan wrote:
> On 7/25/24 05:06, Laura Nao wrote:
>> Introduce a new kselftest to identify slowdowns in key boot events.
>> The test uses ftrace to track timings for specific boot events.
>> The kprobe_timestamps_to_yaml.py script can be run once to generate a
>> YAML file with the initial reference timestamps for these events.
>> The test_boot_time.py script can then be run on subsequent boots to
>> compare current timings against the reference values and check for
>> significant slowdowns over time.
>> The test ships with a bootconfig file for ftrace setup and a config
>> fragment for the necessary kernel configurations.
>>
>> Signed-off-by: Laura Nao <laura.nao@collabora.com>
> 
> I am repeating the same comments I made on the cover letter here as
> well.
> 
> What are the dependencies if any for this new test to work?
> Please do remember that tests in default run needs to have
> minimal dependencies so they can run on systems that have
> minimal support.
>

In order to run this test the kernel needs to be compiled with the
provided config fragment, which enables tracing and embeds the provided
bootconfig file in the kernel. Additionally, a YAML file with reference 
timestamps must be supplied as input to the test.
 
> As mentioned earlier take a look at the tools/power/pm-graph
> bootgraph.py and sleepgraph.py to see if you can leverage
> them - bootgraph detects slowdowns during boot.
> 
> We don't want to add duplicate scripts if the other one
> serves the needs. Those can be moved to selftests if it
> make sense.
> 
> I will review this once we figure out if bootgraph serves
> the needs and I understand the dependencies for this test
> to work.
> 

Thanks for the pointers! 

It looks like sleepgraph.py is more focused on analyzing suspend/resume 
timings, while bootgraph.py measures boot time using the kernel log and 
ftrace. The latter might indeed come in handy.
As far as I can see, the script doesn't support automatic detection of 
boot slowdowns, and the output is in HTML format, which is meant for 
human analysis. However, I can look into adding support for a more 
machine-readable output format too. The test proposed in this patch could 
then use bootgraph.py to generate the reference file and measure current 
boot timings.

I'll look into this and report back.

Thanks,

Laura
Laura Nao Aug. 8, 2024, 10:45 a.m. UTC | #3
On 7/31/24 11:25, Laura Nao wrote:
> 
> It looks like sleepgraph.py is more focused on analyzing suspend/resume
> timings, while bootgraph.py measures boot time using the kernel log and
> ftrace. The latter might indeed come in handy.
> As far as I can see, the script doesn't support automatic detection of
> boot slowdowns, and the output is in HTML format, which is meant for
> human analysis. However, I can look into adding support for a more
> machine-readable output format too. The test proposed in this patch could
> then use bootgraph.py to generate the reference file and measure current
> boot timings.
> 
> I'll look into this and report back.
> 

After examining the bootgraph.py script, it seems feasible to add
support for generating the output in a machine-readable format 
(e.g., JSON) for automated analysis. Todd, I've CC'd you on this 
discussion in case you have feedback on possibly using bootgraph.py in 
an automated test to detect slowdowns.

Some points to consider:

- The bootgraph.py script supports ftrace through the -fstat and -ftrace
  options, and it parses the kernel log to get initcall timings. To use
  this in an automated test, we need a way to provide the necessary 
  command line options. One approach is to include these options in a 
  bootconfig file embedded in the kernel image (as per proposal in this 
  RFC). Shuah, do you think this is acceptable? I haven't seen other 
  tests doing this, so I'm unsure if this is a proper way to handle 
  required command line options in a selftest.

- The bootgraph.py script tracks timings for all init calls, which might
  be excessive and generate too much output when integrated in an 
  automated test. We might need to limit the test output to report only 
  significant slowdowns to make it manageable.

- I'd like to get some feedback on which key boot process events are
  more relevant to track; depending on this, we could use the 
  bootgraph.py script to monitor initcalls and possibly other events 
  tracked via ftrace. The script currently uses the function_graph 
  tracer, and its parser is designed for this tracer's output. If we need 
  to track other events (e.g., kprobe events), the parser might need some 
  adjustments.

I'll be discussing this at LPC in September
(https://lpc.events/event/18/contributions/1700/) and look forward to
exploring more details and alternative approaches for an automated boot
time test.

Best,

Laura
diff mbox series

Patch

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index bc8fe9e8f7f2..1e675779322f 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -2,6 +2,7 @@ 
 TARGETS += alsa
 TARGETS += amd-pstate
 TARGETS += arm64
+TARGETS += boot-time
 TARGETS += bpf
 TARGETS += breakpoints
 TARGETS += cachestat
diff --git a/tools/testing/selftests/boot-time/Makefile b/tools/testing/selftests/boot-time/Makefile
new file mode 100644
index 000000000000..43ccd0a3b62e
--- /dev/null
+++ b/tools/testing/selftests/boot-time/Makefile
@@ -0,0 +1,17 @@ 
+PY3 = $(shell which python3 2>/dev/null)
+
+ifneq ($(PY3),)
+
+TEST_PROGS := test_boot_time.py
+TEST_FILES := trace_utils.py kprobe_timestamps_to_yaml.py
+
+include ../lib.mk
+
+else
+
+all: no_py3_warning
+
+no_py3_warning:
+	@echo "Missing python3. This test will be skipped."
+
+endif
\ No newline at end of file
diff --git a/tools/testing/selftests/boot-time/bootconfig b/tools/testing/selftests/boot-time/bootconfig
new file mode 100644
index 000000000000..2883f03e0766
--- /dev/null
+++ b/tools/testing/selftests/boot-time/bootconfig
@@ -0,0 +1,8 @@ 
+ftrace {
+    event.kprobes {
+        populate_rootfs_begin.probes = "populate_rootfs"
+        unpack_to_rootfs_begin.probes = "unpack_to_rootfs"
+        run_init_process_begin.probes = "run_init_process"
+        run_init_process_end.probes = "run_init_process%return"
+    }
+}
\ No newline at end of file
diff --git a/tools/testing/selftests/boot-time/config b/tools/testing/selftests/boot-time/config
new file mode 100644
index 000000000000..957b40b45bc0
--- /dev/null
+++ b/tools/testing/selftests/boot-time/config
@@ -0,0 +1,4 @@ 
+CONFIG_TRACING=y
+CONFIG_BOOTTIME_TRACING=y
+CONFIG_BOOT_CONFIG_EMBED=y
+CONFIG_BOOT_CONFIG_EMBED_FILE="tools/testing/selftests/boot-time/bootconfig"
\ No newline at end of file
diff --git a/tools/testing/selftests/boot-time/kprobe_timestamps_to_yaml.py b/tools/testing/selftests/boot-time/kprobe_timestamps_to_yaml.py
new file mode 100755
index 000000000000..43e742e0759d
--- /dev/null
+++ b/tools/testing/selftests/boot-time/kprobe_timestamps_to_yaml.py
@@ -0,0 +1,55 @@ 
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2024 Collabora Ltd
+#
+#
+# This script reads the /sys/kernel/debug/tracing/trace file, extracts kprobe
+# event names and timestamps, and saves them to a YAML file.
+#
+# Example of the YAML content generated:
+#
+# - event: populate_rootfs_begin
+#   timestamp: 0.525161
+# - event: unpack_to_rootfs_begin
+#   timestamp: 0.525169
+#
+# Usage:
+#     ./kprobe_timestamps_to_yaml.py [output_file]
+#
+
+import sys
+import argparse
+import yaml
+
+from trace_utils import mount_debugfs, parse_kprobe_event_trace
+
+
+def write_trace_timestamps_to_yaml(output_file_path):
+    """
+    Parse /sys/kernel/debug/tracing/trace file to extract kprobe event
+    names and timestamps, and write them to a YAML file.
+    @output_file_path: the path for the output YAML file, defaults to
+    'kprobe-timestamps.yaml'.
+    """
+    trace_entries = parse_kprobe_event_trace("/sys/kernel/debug/tracing/trace")
+    filtered_entries = {entry['event']: entry['timestamp'] for entry in trace_entries}
+
+    with open(output_file_path, 'w', encoding="utf-8") as yaml_file:
+        yaml.dump(filtered_entries, yaml_file)
+
+    print(f"Generated {output_file_path}")
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Extract kprobe event names and timestamps from the \
+        /sys/kernel/debug/tracing/trace file and save them to a YAML file.")
+    parser.add_argument('output_file_path', nargs='?', default='kprobe-timestamps.yaml',
+                        help="path for the output YAML file")
+    args = parser.parse_args()
+
+    if not mount_debugfs("/sys/kernel/debug"):
+        sys.exit(1)
+
+    write_trace_timestamps_to_yaml(args.output_file_path)
diff --git a/tools/testing/selftests/boot-time/test_boot_time.py b/tools/testing/selftests/boot-time/test_boot_time.py
new file mode 100755
index 000000000000..8a8de28c485c
--- /dev/null
+++ b/tools/testing/selftests/boot-time/test_boot_time.py
@@ -0,0 +1,94 @@ 
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2024 Collabora Ltd
+#
+# This script reads the /sys/kernel/debug/tracing/trace file, extracts kprobe
+# event names and timestamps, and compares them against reference timestamps
+# provided in an input YAML file to identify significant slowdowns in
+# the reference kprobe events.
+#
+# Usage:
+#     ./test_boot_time.py <kprobe_timestamps_file> <delta(seconds)>
+
+import os
+import sys
+import argparse
+import yaml
+from trace_utils import mount_debugfs, parse_kprobe_event_trace
+
+this_dir = os.path.dirname(os.path.realpath(__file__))
+sys.path.append(os.path.join(this_dir, "../kselftest/"))
+
+import ksft
+
+def load_kprobe_timestamps_from_yaml(file_path):
+    """
+    Load YAML file containing kprobe event timestamps. Return a dictionary
+    with event names as keys and the corresponding timestamps as values.
+    For example:
+        {'populate_rootfs_begin': 0.525161, 'unpack_to_rootfs_begin': 0.525169}
+    @file_path: path to a YAML file containing the kprobe event timestamps.
+    """
+    try:
+        with open(file_path, 'r', encoding="utf-8") as file:
+            return yaml.safe_load(file)
+    except FileNotFoundError:
+        ksft.print_msg(f"Error: File {file_path} not found.")
+        ksft.exit_fail()
+    except yaml.YAMLError as e:
+        ksft.print_msg(f"Error: Failed to parse YAML file {file_path}. Error: {e}")
+        ksft.exit_fail()
+
+
+def evaluate_kprobe_event_delays(ref_entries, delta):
+    """
+    Read the /sys/kernel/debug/tracing/trace file to extract kprobe event
+    names and timestamps, then compare these timestamps with the reference
+    values provided. Report a failure for each timestamp that deviates from
+    the reference by more than the specified delta.
+    @ref_entries: dictionary containing event names and their timestamps
+    @delta: maximum allowed difference in seconds between the current and
+    reference timestamps
+    """
+    ksft.set_plan(len(ref_entries))
+
+    trace_entries = {entry['event']: entry['timestamp']
+                     for entry in parse_kprobe_event_trace("/sys/kernel/debug/tracing/trace")}
+
+    for ref_event, ref_timestamp in ref_entries.items():
+        if ref_event in trace_entries:
+            timestamp = trace_entries[ref_event]
+            if timestamp >= ref_timestamp and (timestamp - ref_timestamp) >= delta:
+                ksft.print_msg(f"'{ref_event}' differs by "
+                               f"{(timestamp - ref_timestamp):.6f} seconds.")
+                ksft.test_result_fail(ref_event)
+            else:
+                ksft.test_result_pass(ref_event)
+        else:
+            ksft.print_msg(f"Reference event '{ref_event}' not found in trace.")
+            ksft.test_result_skip(ref_event)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Read the /sys/kernel/debug/tracing/trace file, extract kprobe event names \
+        and timestamps, and compares them against reference timestamps provided in an input YAML \
+        file to report significant slowdowns in the reference kprobe events.")
+    parser.add_argument('kprobe_timestamps_file', help="path to YAML file containing \
+        kprobe timestamps")
+    parser.add_argument('delta', type=float, help="maximum allowed difference in seconds \
+        (float) between the current and reference timestamps")
+    args = parser.parse_args()
+
+    # Ensure debugfs is mounted
+    if not mount_debugfs("/sys/kernel/debug"):
+        ksft.exit_fail()
+
+    ksft.print_header()
+
+    ref_entries = load_kprobe_timestamps_from_yaml(args.kprobe_timestamps_file)
+
+    evaluate_kprobe_event_delays(ref_entries, args.delta)
+
+    ksft.finished()
diff --git a/tools/testing/selftests/boot-time/trace_utils.py b/tools/testing/selftests/boot-time/trace_utils.py
new file mode 100644
index 000000000000..c40cef2bf584
--- /dev/null
+++ b/tools/testing/selftests/boot-time/trace_utils.py
@@ -0,0 +1,63 @@ 
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2024 Collabora Ltd
+#
+# Utility functions for parsing trace files and mounting debugfs.
+#
+
+import re
+import subprocess
+
+
+def mount_debugfs(path):
+    """
+    Mount debugfs at the specified path if it is not already mounted.
+    @path: path where debugfs should be mounted
+    """
+    # Check if debugfs is already mounted
+    with open('/proc/mounts', 'r', encoding="utf-8") as mounts:
+        for line in mounts:
+            if 'debugfs' in line and path in line:
+                print(f"debugfs is already mounted at {path}")
+                return True
+
+    # Mount debugfs
+    try:
+        subprocess.run(['mount', '-t', 'debugfs', 'none', path], check=True)
+        return True
+    except subprocess.CalledProcessError as e:
+        print(f"Failed to mount debugfs: {e.stderr}")
+        return False
+
+
+def parse_kprobe_event_trace(trace_file_path):
+    """
+    Parse a trace file containing kprobe events and return a list of
+    dictionaries, each representing a trace entry.
+
+    Example of trace entry:
+        {'task_pid': 'sh-1', 'timestamp': 0.256527, 'event': \\
+        'populate_rootfs_begin', 'function': 'populate_rootfs+0x4/0x50'}
+
+    @trace_file_path: path to the trace file
+    """
+    trace_pattern = r'^\s*(\S+)\s+\[.*\]\s+\S+\s+(\d+\.\d+):\s+(\S+):\s+\((.*)\)$'
+
+    trace_entries = []
+
+    # Read trace file and parse entries
+    with open(trace_file_path, 'r', encoding="utf-8") as trace_file:
+        for line in trace_file:
+            match = re.match(trace_pattern, line)
+            if match:
+                trace_entry = {
+                    'task_pid': match.group(1),
+                    'timestamp': float(match.group(2)),
+                    'event': match.group(3),
+                    'function': match.group(4)
+                }
+
+                trace_entries.append(trace_entry)
+
+    return trace_entries