Message ID | 20250122-modules-error-injection-v1-2-910590a04fd5@samsung.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/2] module: allow for module error injection | expand |
Context | Check | Description |
---|---|---|
mcgrof/vmtest-modules-next-VM_Test-0 | success | Logs for Run CI tests |
mcgrof/vmtest-modules-next-VM_Test-1 | success | Logs for Run CI tests |
mcgrof/vmtest-modules-next-VM_Test-5 | fail | Logs for setup / Setup kdevops environment |
mcgrof/vmtest-modules-next-VM_Test-4 | fail | Logs for setup / Setup kdevops environment |
mcgrof/vmtest-modules-next-VM_Test-2 | fail | Logs for cleanup / Archive results and cleanup |
mcgrof/vmtest-modules-next-VM_Test-3 | fail | Logs for cleanup / Archive results and cleanup |
mcgrof/vmtest-modules-next-PR | fail | PR summary |
On Wed, Jan 22, 2025 at 5:12 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > Add support for a module error injection tool. The tool > can inject errors in the annotated module kernel functions > such as complete_formation(), do_init_module() and > module_enable_rodata_after_init(). Module name and module function are > required parameters to have control over the error injection. > > Example: Inject error -22 to module_enable_rodata_ro_after_init for > brd module: > > sudo moderr --modname=brd --modfunc=module_enable_rodata_ro_after_init \ > --error=-22 --trace > Monitoring module error injection... Hit Ctrl-C to end. > MODULE ERROR FUNCTION > brd -22 module_enable_rodata_after_init() > > Kernel messages: > [ 89.463690] brd: module loaded > [ 89.463855] brd: module_enable_rodata_ro_after_init() returned -22, > ro_after_init data might still be writable > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> > --- > tools/bpf/Makefile | 13 ++- > tools/bpf/moderr/.gitignore | 2 + > tools/bpf/moderr/Makefile | 95 +++++++++++++++++ > tools/bpf/moderr/moderr.bpf.c | 127 +++++++++++++++++++++++ > tools/bpf/moderr/moderr.c | 236 ++++++++++++++++++++++++++++++++++++++++++ > tools/bpf/moderr/moderr.h | 40 +++++++ > 6 files changed, 510 insertions(+), 3 deletions(-) The tool looks useful, but we don't add tools to the kernel repo. It has to stay out of tree. The value of error injection is not clear to me. Other places in the kernel use it to test paths in the kernel that are difficult to do otherwise. These 3 functions don't seem to be in this category.
On Wed, Jan 22, 2025 at 09:02:19AM -0800, Alexei Starovoitov wrote: > On Wed, Jan 22, 2025 at 5:12 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > Add support for a module error injection tool. The tool > > can inject errors in the annotated module kernel functions > > such as complete_formation(), do_init_module() and > > module_enable_rodata_after_init(). Module name and module function are > > required parameters to have control over the error injection. > > > > Example: Inject error -22 to module_enable_rodata_ro_after_init for > > brd module: > > > > sudo moderr --modname=brd --modfunc=module_enable_rodata_ro_after_init \ > > --error=-22 --trace > > Monitoring module error injection... Hit Ctrl-C to end. > > MODULE ERROR FUNCTION > > brd -22 module_enable_rodata_after_init() > > > > Kernel messages: > > [ 89.463690] brd: module loaded > > [ 89.463855] brd: module_enable_rodata_ro_after_init() returned -22, > > ro_after_init data might still be writable > > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> > > --- > > tools/bpf/Makefile | 13 ++- > > tools/bpf/moderr/.gitignore | 2 + > > tools/bpf/moderr/Makefile | 95 +++++++++++++++++ > > tools/bpf/moderr/moderr.bpf.c | 127 +++++++++++++++++++++++ > > tools/bpf/moderr/moderr.c | 236 ++++++++++++++++++++++++++++++++++++++++++ > > tools/bpf/moderr/moderr.h | 40 +++++++ > > 6 files changed, 510 insertions(+), 3 deletions(-) > > The tool looks useful, but we don't add tools to the kernel repo. > It has to stay out of tree. For selftests we do add random tools. > The value of error injection is not clear to me. It is of great value, since it deals with corner cases which are otherwise hard to reproduce in places which a real error can be catostrophic. > Other places in the kernel use it to test paths in the kernel > that are difficult to do otherwise. Right. > These 3 functions don't seem to be in this category. That's the key here we should focus on. The problem is when a maintainer *does* agree that adding an error injection entry is useful for testing, and we have a developer willing to do the work to help test / validate it. In this case, this error case is rare but we do want to strive to test this as we ramp up and extend our modules selftests. Then there is the aspect of how to mitigate how instrusive code changes to allow error injection are. In 2021 we evaluated the prospect of error injection in-kernel long ago for other areas like the block layer for add_disk() failures [0] but the minimal interface to enable this from userspace with debugfs was considered just too intrusive. This effort tried to evaluate what this could look like with eBPF to mitigate the required in-kernel code, and I believe the light weight nature of it by just requiring a sprinkle with ALLOW_ERROR_INJECTION() suffices to my taste. So, perhaps the tools aspect can just go in: tools/testing/selftests/module/ [0] https://www.spinics.net/lists/linux-block/msg68159.html Luis
On Wed, Jan 22, 2025 at 09:02:19AM +0100, Alexei Starovoitov wrote: > On Wed, Jan 22, 2025 at 5:12 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > Add support for a module error injection tool. The tool > > can inject errors in the annotated module kernel functions > > such as complete_formation(), do_init_module() and > > module_enable_rodata_after_init(). Module name and module function are > > required parameters to have control over the error injection. > > > > Example: Inject error -22 to module_enable_rodata_ro_after_init for > > brd module: > > > > sudo moderr --modname=brd --modfunc=module_enable_rodata_ro_after_init \ > > --error=-22 --trace > > Monitoring module error injection... Hit Ctrl-C to end. > > MODULE ERROR FUNCTION > > brd -22 module_enable_rodata_after_init() > > > > Kernel messages: > > [ 89.463690] brd: module loaded > > [ 89.463855] brd: module_enable_rodata_ro_after_init() returned -22, > > ro_after_init data might still be writable > > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> > > --- > > tools/bpf/Makefile | 13 ++- > > tools/bpf/moderr/.gitignore | 2 + > > tools/bpf/moderr/Makefile | 95 +++++++++++++++++ > > tools/bpf/moderr/moderr.bpf.c | 127 +++++++++++++++++++++++ > > tools/bpf/moderr/moderr.c | 236 ++++++++++++++++++++++++++++++++++++++++++ > > tools/bpf/moderr/moderr.h | 40 +++++++ > > 6 files changed, 510 insertions(+), 3 deletions(-) > > The tool looks useful, but we don't add tools to the kernel repo. > It has to stay out of tree. Can you clarify what do you mean? There are other tools under tools/ and tools/ bpf [1]. [1] https://lore.kernel.org/bpf/20200114184230.GA204154@krava/ I will anyway move the tool to the suggested location in the other thread. > > The value of error injection is not clear to me. > Other places in the kernel use it to test paths in the kernel > that are difficult to do otherwise. By any chance do you know which tool are they using? > These 3 functions don't seem to be in this category. Luis already answered this.
On Tue, Feb 4, 2025 at 1:30 PM Daniel Gomez <da.gomez@kernel.org> wrote: > > On Wed, Jan 22, 2025 at 09:02:19AM +0100, Alexei Starovoitov wrote: > > On Wed, Jan 22, 2025 at 5:12 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > Add support for a module error injection tool. The tool > > > can inject errors in the annotated module kernel functions > > > such as complete_formation(), do_init_module() and > > > module_enable_rodata_after_init(). Module name and module function are > > > required parameters to have control over the error injection. > > > > > > Example: Inject error -22 to module_enable_rodata_ro_after_init for > > > brd module: > > > > > > sudo moderr --modname=brd --modfunc=module_enable_rodata_ro_after_init \ > > > --error=-22 --trace > > > Monitoring module error injection... Hit Ctrl-C to end. > > > MODULE ERROR FUNCTION > > > brd -22 module_enable_rodata_after_init() > > > > > > Kernel messages: > > > [ 89.463690] brd: module loaded > > > [ 89.463855] brd: module_enable_rodata_ro_after_init() returned -22, > > > ro_after_init data might still be writable > > > > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> > > > --- > > > tools/bpf/Makefile | 13 ++- > > > tools/bpf/moderr/.gitignore | 2 + > > > tools/bpf/moderr/Makefile | 95 +++++++++++++++++ > > > tools/bpf/moderr/moderr.bpf.c | 127 +++++++++++++++++++++++ > > > tools/bpf/moderr/moderr.c | 236 ++++++++++++++++++++++++++++++++++++++++++ > > > tools/bpf/moderr/moderr.h | 40 +++++++ > > > 6 files changed, 510 insertions(+), 3 deletions(-) > > > > The tool looks useful, but we don't add tools to the kernel repo. > > It has to stay out of tree. > > Can you clarify what do you mean? There are other tools under tools/ and tools/ > bpf [1]. > > [1] https://lore.kernel.org/bpf/20200114184230.GA204154@krava/ As you noticed we added only one tool out of many and recently discussed removing it, since the value of keeping it in the tree is minimal. > I will anyway move the tool to the suggested location in the other thread. I don't think it's a good idea. Keep it outside. There is no reason for it to live in the tree.
On Tue, Jan 28, 2025 at 12:57:05PM -0800, Luis Chamberlain wrote: >On Wed, Jan 22, 2025 at 09:02:19AM -0800, Alexei Starovoitov wrote: >> On Wed, Jan 22, 2025 at 5:12 AM Daniel Gomez <da.gomez@samsung.com> wrote: >> > >> > Add support for a module error injection tool. The tool >> > can inject errors in the annotated module kernel functions >> > such as complete_formation(), do_init_module() and >> > module_enable_rodata_after_init(). Module name and module function are >> > required parameters to have control over the error injection. >> > >> > Example: Inject error -22 to module_enable_rodata_ro_after_init for >> > brd module: >> > >> > sudo moderr --modname=brd --modfunc=module_enable_rodata_ro_after_init \ >> > --error=-22 --trace >> > Monitoring module error injection... Hit Ctrl-C to end. >> > MODULE ERROR FUNCTION >> > brd -22 module_enable_rodata_after_init() >> > >> > Kernel messages: >> > [ 89.463690] brd: module loaded >> > [ 89.463855] brd: module_enable_rodata_ro_after_init() returned -22, >> > ro_after_init data might still be writable >> > >> > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> >> > --- >> > tools/bpf/Makefile | 13 ++- >> > tools/bpf/moderr/.gitignore | 2 + >> > tools/bpf/moderr/Makefile | 95 +++++++++++++++++ >> > tools/bpf/moderr/moderr.bpf.c | 127 +++++++++++++++++++++++ >> > tools/bpf/moderr/moderr.c | 236 ++++++++++++++++++++++++++++++++++++++++++ >> > tools/bpf/moderr/moderr.h | 40 +++++++ >> > 6 files changed, 510 insertions(+), 3 deletions(-) >> >> The tool looks useful, but we don't add tools to the kernel repo. >> It has to stay out of tree. > >For selftests we do add random tools. > >> The value of error injection is not clear to me. > >It is of great value, since it deals with corner cases which are >otherwise hard to reproduce in places which a real error can be >catostrophic. > >> Other places in the kernel use it to test paths in the kernel >> that are difficult to do otherwise. > >Right. > >> These 3 functions don't seem to be in this category. > >That's the key here we should focus on. The problem is when a maintainer >*does* agree that adding an error injection entry is useful for testing, >and we have a developer willing to do the work to help test / validate >it. In this case, this error case is rare but we do want to strive to >test this as we ramp up and extend our modules selftests. > >Then there is the aspect of how to mitigate how instrusive code changes >to allow error injection are. In 2021 we evaluated the prospect of error >injection in-kernel long ago for other areas like the block layer for >add_disk() failures [0] but the minimal interface to enable this from >userspace with debugfs was considered just too intrusive. > >This effort tried to evaluate what this could look like with eBPF to >mitigate the required in-kernel code, and I believe the light weight >nature of it by just requiring a sprinkle with ALLOW_ERROR_INJECTION() >suffices to my taste. > >So, perhaps the tools aspect can just go in: > >tools/testing/selftests/module/ but why would it be module-specific? Based on its current implementation and discussion about inject.py it seems to be generic enough to be useful to test any function annotated with ALLOW_ERROR_INJECTION(). As xe driver maintainer, it may be interesting to use such a tool: $ git grep ALLOW_ERROR_INJECT -- drivers/gpu/drm/xe | wc -l 23 How does this approach compare to writing the function name on debugfs (the current approach in xe's testsuite)? fail_function @ https://docs.kernel.org/fault-injection/fault-injection.html#fault-injection-capabilities-infrastructure https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/intel/xe_fault_injection.c?ref_type=heads#L108 If you decide to have the tool to live somewhere else, then kmod repo could be a candidate. Although I think having it in kernel tree is simpler maintenance-wise. Lucas De Marchi > >[0] https://www.spinics.net/lists/linux-block/msg68159.html > > Luis
On Wed, Feb 19, 2025 at 02:17:48PM -0600, Lucas De Marchi wrote: > On Tue, Jan 28, 2025 at 12:57:05PM -0800, Luis Chamberlain wrote: > > On Wed, Jan 22, 2025 at 09:02:19AM -0800, Alexei Starovoitov wrote: > > > On Wed, Jan 22, 2025 at 5:12 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > > > Add support for a module error injection tool. The tool > > > > can inject errors in the annotated module kernel functions > > > > such as complete_formation(), do_init_module() and > > > > module_enable_rodata_after_init(). Module name and module function are > > > > required parameters to have control over the error injection. > > > > > > > > Example: Inject error -22 to module_enable_rodata_ro_after_init for > > > > brd module: > > > > > > > > sudo moderr --modname=brd --modfunc=module_enable_rodata_ro_after_init \ > > > > --error=-22 --trace > > > > Monitoring module error injection... Hit Ctrl-C to end. > > > > MODULE ERROR FUNCTION > > > > brd -22 module_enable_rodata_after_init() > > > > > > > > Kernel messages: > > > > [ 89.463690] brd: module loaded > > > > [ 89.463855] brd: module_enable_rodata_ro_after_init() returned -22, > > > > ro_after_init data might still be writable > > > > > > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> > > > > --- > > > > tools/bpf/Makefile | 13 ++- > > > > tools/bpf/moderr/.gitignore | 2 + > > > > tools/bpf/moderr/Makefile | 95 +++++++++++++++++ > > > > tools/bpf/moderr/moderr.bpf.c | 127 +++++++++++++++++++++++ > > > > tools/bpf/moderr/moderr.c | 236 ++++++++++++++++++++++++++++++++++++++++++ > > > > tools/bpf/moderr/moderr.h | 40 +++++++ > > > > 6 files changed, 510 insertions(+), 3 deletions(-) > > > > > > The tool looks useful, but we don't add tools to the kernel repo. > > > It has to stay out of tree. > > > > For selftests we do add random tools. > > > > > The value of error injection is not clear to me. > > > > It is of great value, since it deals with corner cases which are > > otherwise hard to reproduce in places which a real error can be > > catostrophic. > > > > > Other places in the kernel use it to test paths in the kernel > > > that are difficult to do otherwise. > > > > Right. > > > > > These 3 functions don't seem to be in this category. > > > > That's the key here we should focus on. The problem is when a maintainer > > *does* agree that adding an error injection entry is useful for testing, > > and we have a developer willing to do the work to help test / validate > > it. In this case, this error case is rare but we do want to strive to > > test this as we ramp up and extend our modules selftests. > > > > Then there is the aspect of how to mitigate how instrusive code changes > > to allow error injection are. In 2021 we evaluated the prospect of error > > injection in-kernel long ago for other areas like the block layer for > > add_disk() failures [0] but the minimal interface to enable this from > > userspace with debugfs was considered just too intrusive. > > > > This effort tried to evaluate what this could look like with eBPF to > > mitigate the required in-kernel code, and I believe the light weight > > nature of it by just requiring a sprinkle with ALLOW_ERROR_INJECTION() > > suffices to my taste. > > > > So, perhaps the tools aspect can just go in: > > > > tools/testing/selftests/module/ > > but why would it be module-specific? Gotta start somewhere. > Based on its current implementation > and discussion about inject.py it seems to be generic enough to be > useful to test any function annotated with ALLOW_ERROR_INJECTION(). > > As xe driver maintainer, it may be interesting to use such a tool: > > $ git grep ALLOW_ERROR_INJECT -- drivers/gpu/drm/xe | wc -l 23 > > How does this approach compare to writing the function name on debugfs > (the current approach in xe's testsuite)? > > fail_function @ https://docs.kernel.org/fault-injection/fault-injection.html#fault-injection-capabilities-infrastructure > https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/intel/xe_fault_injection.c?ref_type=heads#L108 > > If you decide to have the tool to live somewhere else, then kmod repo > could be a candidate. Would we install this upon install target? Danny can decide on this :) > Although I think having it in kernel tree is > simpler maintenance-wise. I think we have at least two users upstream who can make use of it. If we end up going through tools/testing/selftests/module/ first, can't you make use of it later? Luis
On Wed, Feb 19, 2025 at 02:17:48PM +0100, Lucas De Marchi wrote: > On Tue, Jan 28, 2025 at 12:57:05PM -0800, Luis Chamberlain wrote: > > On Wed, Jan 22, 2025 at 09:02:19AM -0800, Alexei Starovoitov wrote: > > > On Wed, Jan 22, 2025 at 5:12 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > > > Add support for a module error injection tool. The tool > > > > can inject errors in the annotated module kernel functions > > > > such as complete_formation(), do_init_module() and > > > > module_enable_rodata_after_init(). Module name and module function are > > > > required parameters to have control over the error injection. > > > > > > > > Example: Inject error -22 to module_enable_rodata_ro_after_init for > > > > brd module: > > > > > > > > sudo moderr --modname=brd --modfunc=module_enable_rodata_ro_after_init \ > > > > --error=-22 --trace > > > > Monitoring module error injection... Hit Ctrl-C to end. > > > > MODULE ERROR FUNCTION > > > > brd -22 module_enable_rodata_after_init() > > > > > > > > Kernel messages: > > > > [ 89.463690] brd: module loaded > > > > [ 89.463855] brd: module_enable_rodata_ro_after_init() returned -22, > > > > ro_after_init data might still be writable > > > > > > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> > > > > --- > > > > tools/bpf/Makefile | 13 ++- > > > > tools/bpf/moderr/.gitignore | 2 + > > > > tools/bpf/moderr/Makefile | 95 +++++++++++++++++ > > > > tools/bpf/moderr/moderr.bpf.c | 127 +++++++++++++++++++++++ > > > > tools/bpf/moderr/moderr.c | 236 ++++++++++++++++++++++++++++++++++++++++++ > > > > tools/bpf/moderr/moderr.h | 40 +++++++ > > > > 6 files changed, 510 insertions(+), 3 deletions(-) > > > > > > The tool looks useful, but we don't add tools to the kernel repo. > > > It has to stay out of tree. > > > > For selftests we do add random tools. > > > > > The value of error injection is not clear to me. > > > > It is of great value, since it deals with corner cases which are > > otherwise hard to reproduce in places which a real error can be > > catostrophic. > > > > > Other places in the kernel use it to test paths in the kernel > > > that are difficult to do otherwise. > > > > Right. > > > > > These 3 functions don't seem to be in this category. > > > > That's the key here we should focus on. The problem is when a maintainer > > *does* agree that adding an error injection entry is useful for testing, > > and we have a developer willing to do the work to help test / validate > > it. In this case, this error case is rare but we do want to strive to > > test this as we ramp up and extend our modules selftests. > > > > Then there is the aspect of how to mitigate how instrusive code changes > > to allow error injection are. In 2021 we evaluated the prospect of error > > injection in-kernel long ago for other areas like the block layer for > > add_disk() failures [0] but the minimal interface to enable this from > > userspace with debugfs was considered just too intrusive. > > > > This effort tried to evaluate what this could look like with eBPF to > > mitigate the required in-kernel code, and I believe the light weight > > nature of it by just requiring a sprinkle with ALLOW_ERROR_INJECTION() > > suffices to my taste. > > > > So, perhaps the tools aspect can just go in: > > > > tools/testing/selftests/module/ > > but why would it be module-specific? Based on its current implementation > and discussion about inject.py it seems to be generic enough to be > useful to test any function annotated with ALLOW_ERROR_INJECTION(). Right, but inject.py is based on the old/deprecated Python eBPF/BCC infrastructure (although I think it's still a working and maintained tool based on commit history). However, as I noted in the cover letter, I think it would be useful to port it to use libbpf instead. > > As xe driver maintainer, it may be interesting to use such a tool: > > $ git grep ALLOW_ERROR_INJECT -- drivers/gpu/drm/xe | wc -l 23 I was wondering if users of ALLOW_ERROR_INJECTION() still depend on inject.py tool, or other tools were used. Or perhaps all are using debugfs? > > How does this approach compare to writing the function name on debugfs > (the current approach in xe's testsuite)? > > fail_function @ https://docs.kernel.org/fault-injection/fault-injection.html#fault-injection-capabilities-infrastructure > https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/intel/xe_fault_injection.c?ref_type=heads#L108 IMHO and looking at the references above, looks "simpler" to handle these cases in eBPF. Specially because the error injection logic does not have to live along with the code but in a separate tool. Link from Luis [0] (also [1]) is a good example of when debugfs may be seen a bit too intrusive and how eBPF may resolve the problems maintainers have. [1] https://lore.kernel.org/all/20210512064629.13899-1-mcgrof@kernel.org/ In summary, a function annotated with the error injection tag can modify its return value in eBPF code by using the bpf_override_return() helper. The logic for deciding when to inject the error is likely similar to the current implementation in debugfs. This can be seen in patch 2, file tools/bpf/ moderr/moderr.bpf.c where error is injected based on module name and target function. > > If you decide to have the tool to live somewhere else, then kmod repo > could be a candidate. Although I think having it in kernel tree is Thanks Lucas. > simpler maintenance-wise. I agree with you that having the tool (or maybe tools if we decide to split) in tree it may be easier and may be a way to showcase its usage more effectively. > > Lucas De Marchi > > > > > [0] https://www.spinics.net/lists/linux-block/msg68159.html > > > > Luis
On Fri, Feb 21, 2025 at 12:15:40PM +0100, Luis Chamberlain wrote: > On Wed, Feb 19, 2025 at 02:17:48PM -0600, Lucas De Marchi wrote: > > On Tue, Jan 28, 2025 at 12:57:05PM -0800, Luis Chamberlain wrote: > > > On Wed, Jan 22, 2025 at 09:02:19AM -0800, Alexei Starovoitov wrote: > > > > On Wed, Jan 22, 2025 at 5:12 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > > > > > Add support for a module error injection tool. The tool > > > > > can inject errors in the annotated module kernel functions > > > > > such as complete_formation(), do_init_module() and > > > > > module_enable_rodata_after_init(). Module name and module function are > > > > > required parameters to have control over the error injection. > > > > > > > > > > Example: Inject error -22 to module_enable_rodata_ro_after_init for > > > > > brd module: > > > > > > > > > > sudo moderr --modname=brd --modfunc=module_enable_rodata_ro_after_init \ > > > > > --error=-22 --trace > > > > > Monitoring module error injection... Hit Ctrl-C to end. > > > > > MODULE ERROR FUNCTION > > > > > brd -22 module_enable_rodata_after_init() > > > > > > > > > > Kernel messages: > > > > > [ 89.463690] brd: module loaded > > > > > [ 89.463855] brd: module_enable_rodata_ro_after_init() returned -22, > > > > > ro_after_init data might still be writable > > > > > > > > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> > > > > > --- > > > > > tools/bpf/Makefile | 13 ++- > > > > > tools/bpf/moderr/.gitignore | 2 + > > > > > tools/bpf/moderr/Makefile | 95 +++++++++++++++++ > > > > > tools/bpf/moderr/moderr.bpf.c | 127 +++++++++++++++++++++++ > > > > > tools/bpf/moderr/moderr.c | 236 ++++++++++++++++++++++++++++++++++++++++++ > > > > > tools/bpf/moderr/moderr.h | 40 +++++++ > > > > > 6 files changed, 510 insertions(+), 3 deletions(-) > > > > > > > > The tool looks useful, but we don't add tools to the kernel repo. > > > > It has to stay out of tree. > > > > > > For selftests we do add random tools. > > > > > > > The value of error injection is not clear to me. > > > > > > It is of great value, since it deals with corner cases which are > > > otherwise hard to reproduce in places which a real error can be > > > catostrophic. > > > > > > > Other places in the kernel use it to test paths in the kernel > > > > that are difficult to do otherwise. > > > > > > Right. > > > > > > > These 3 functions don't seem to be in this category. > > > > > > That's the key here we should focus on. The problem is when a maintainer > > > *does* agree that adding an error injection entry is useful for testing, > > > and we have a developer willing to do the work to help test / validate > > > it. In this case, this error case is rare but we do want to strive to > > > test this as we ramp up and extend our modules selftests. > > > > > > Then there is the aspect of how to mitigate how instrusive code changes > > > to allow error injection are. In 2021 we evaluated the prospect of error > > > injection in-kernel long ago for other areas like the block layer for > > > add_disk() failures [0] but the minimal interface to enable this from > > > userspace with debugfs was considered just too intrusive. > > > > > > This effort tried to evaluate what this could look like with eBPF to > > > mitigate the required in-kernel code, and I believe the light weight > > > nature of it by just requiring a sprinkle with ALLOW_ERROR_INJECTION() > > > suffices to my taste. > > > > > > So, perhaps the tools aspect can just go in: > > > > > > tools/testing/selftests/module/ > > > > but why would it be module-specific? > > Gotta start somewhere. > > > Based on its current implementation > > and discussion about inject.py it seems to be generic enough to be > > useful to test any function annotated with ALLOW_ERROR_INJECTION(). > > > > As xe driver maintainer, it may be interesting to use such a tool: > > > > $ git grep ALLOW_ERROR_INJECT -- drivers/gpu/drm/xe | wc -l 23 > > > > How does this approach compare to writing the function name on debugfs > > (the current approach in xe's testsuite)? > > > > fail_function @ https://docs.kernel.org/fault-injection/fault-injection.html#fault-injection-capabilities-infrastructure > > https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/intel/xe_fault_injection.c?ref_type=heads#L108 > > > > If you decide to have the tool to live somewhere else, then kmod repo > > could be a candidate. > > Would we install this upon install target? > > Danny can decide on this :) > > > Although I think having it in kernel tree is > > simpler maintenance-wise. > > I think we have at least two users upstream who can make use of it. If > we end up going through tools/testing/selftests/module/ first, can't > you make use of it later? What are the features in debugfs required to be useful for xe that we can port to an eBPF version? I see from the link provided the use of probability, interval, times and space but these are configured to allways trigger the error. Is that right? > > Luis
On Sat, Feb 22, 2025 at 10:35:07PM +0100, Daniel Gomez wrote: >On Fri, Feb 21, 2025 at 12:15:40PM +0100, Luis Chamberlain wrote: >> On Wed, Feb 19, 2025 at 02:17:48PM -0600, Lucas De Marchi wrote: >> > On Tue, Jan 28, 2025 at 12:57:05PM -0800, Luis Chamberlain wrote: >> > > On Wed, Jan 22, 2025 at 09:02:19AM -0800, Alexei Starovoitov wrote: >> > > > On Wed, Jan 22, 2025 at 5:12 AM Daniel Gomez <da.gomez@samsung.com> wrote: >> > > > > >> > > > > Add support for a module error injection tool. The tool >> > > > > can inject errors in the annotated module kernel functions >> > > > > such as complete_formation(), do_init_module() and >> > > > > module_enable_rodata_after_init(). Module name and module function are >> > > > > required parameters to have control over the error injection. >> > > > > >> > > > > Example: Inject error -22 to module_enable_rodata_ro_after_init for >> > > > > brd module: >> > > > > >> > > > > sudo moderr --modname=brd --modfunc=module_enable_rodata_ro_after_init \ >> > > > > --error=-22 --trace >> > > > > Monitoring module error injection... Hit Ctrl-C to end. >> > > > > MODULE ERROR FUNCTION >> > > > > brd -22 module_enable_rodata_after_init() >> > > > > >> > > > > Kernel messages: >> > > > > [ 89.463690] brd: module loaded >> > > > > [ 89.463855] brd: module_enable_rodata_ro_after_init() returned -22, >> > > > > ro_after_init data might still be writable >> > > > > >> > > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> >> > > > > --- >> > > > > tools/bpf/Makefile | 13 ++- >> > > > > tools/bpf/moderr/.gitignore | 2 + >> > > > > tools/bpf/moderr/Makefile | 95 +++++++++++++++++ >> > > > > tools/bpf/moderr/moderr.bpf.c | 127 +++++++++++++++++++++++ >> > > > > tools/bpf/moderr/moderr.c | 236 ++++++++++++++++++++++++++++++++++++++++++ >> > > > > tools/bpf/moderr/moderr.h | 40 +++++++ >> > > > > 6 files changed, 510 insertions(+), 3 deletions(-) >> > > > >> > > > The tool looks useful, but we don't add tools to the kernel repo. >> > > > It has to stay out of tree. >> > > >> > > For selftests we do add random tools. >> > > >> > > > The value of error injection is not clear to me. >> > > >> > > It is of great value, since it deals with corner cases which are >> > > otherwise hard to reproduce in places which a real error can be >> > > catostrophic. >> > > >> > > > Other places in the kernel use it to test paths in the kernel >> > > > that are difficult to do otherwise. >> > > >> > > Right. >> > > >> > > > These 3 functions don't seem to be in this category. >> > > >> > > That's the key here we should focus on. The problem is when a maintainer >> > > *does* agree that adding an error injection entry is useful for testing, >> > > and we have a developer willing to do the work to help test / validate >> > > it. In this case, this error case is rare but we do want to strive to >> > > test this as we ramp up and extend our modules selftests. >> > > >> > > Then there is the aspect of how to mitigate how instrusive code changes >> > > to allow error injection are. In 2021 we evaluated the prospect of error >> > > injection in-kernel long ago for other areas like the block layer for >> > > add_disk() failures [0] but the minimal interface to enable this from >> > > userspace with debugfs was considered just too intrusive. >> > > >> > > This effort tried to evaluate what this could look like with eBPF to >> > > mitigate the required in-kernel code, and I believe the light weight >> > > nature of it by just requiring a sprinkle with ALLOW_ERROR_INJECTION() >> > > suffices to my taste. >> > > >> > > So, perhaps the tools aspect can just go in: >> > > >> > > tools/testing/selftests/module/ >> > >> > but why would it be module-specific? >> >> Gotta start somewhere. >> >> > Based on its current implementation >> > and discussion about inject.py it seems to be generic enough to be >> > useful to test any function annotated with ALLOW_ERROR_INJECTION(). >> > >> > As xe driver maintainer, it may be interesting to use such a tool: >> > >> > $ git grep ALLOW_ERROR_INJECT -- drivers/gpu/drm/xe | wc -l 23 >> > >> > How does this approach compare to writing the function name on debugfs >> > (the current approach in xe's testsuite)? >> > >> > fail_function @ https://docs.kernel.org/fault-injection/fault-injection.html#fault-injection-capabilities-infrastructure >> > https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/intel/xe_fault_injection.c?ref_type=heads#L108 >> > >> > If you decide to have the tool to live somewhere else, then kmod repo >> > could be a candidate. >> >> Would we install this upon install target? >> >> Danny can decide on this :) >> >> > Although I think having it in kernel tree is >> > simpler maintenance-wise. >> >> I think we have at least two users upstream who can make use of it. If >> we end up going through tools/testing/selftests/module/ first, can't >> you make use of it later? > >What are the features in debugfs required to be useful for xe that we can >port to an eBPF version? I see from the link provided the use of probability, >interval, times and space but these are configured to allways trigger the error. >Is that right? I don't think we use them... we just set them to "always trigger" and then create the conditions for that to happen. But my question still remains: what is the benefit of using the bpf approach over writing these files? Lucas De Marchi > >> >> Luis
On Mon, Feb 24, 2025 at 08:43:45AM +0100, Lucas De Marchi wrote: > On Sat, Feb 22, 2025 at 10:35:07PM +0100, Daniel Gomez wrote: > > On Fri, Feb 21, 2025 at 12:15:40PM +0100, Luis Chamberlain wrote: > > > On Wed, Feb 19, 2025 at 02:17:48PM -0600, Lucas De Marchi wrote: > > > > On Tue, Jan 28, 2025 at 12:57:05PM -0800, Luis Chamberlain wrote: > > > > > On Wed, Jan 22, 2025 at 09:02:19AM -0800, Alexei Starovoitov wrote: > > > > > > On Wed, Jan 22, 2025 at 5:12 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > > > > > > > > > Add support for a module error injection tool. The tool > > > > > > > can inject errors in the annotated module kernel functions > > > > > > > such as complete_formation(), do_init_module() and > > > > > > > module_enable_rodata_after_init(). Module name and module function are > > > > > > > required parameters to have control over the error injection. > > > > > > > > > > > > > > Example: Inject error -22 to module_enable_rodata_ro_after_init for > > > > > > > brd module: > > > > > > > > > > > > > > sudo moderr --modname=brd --modfunc=module_enable_rodata_ro_after_init \ > > > > > > > --error=-22 --trace > > > > > > > Monitoring module error injection... Hit Ctrl-C to end. > > > > > > > MODULE ERROR FUNCTION > > > > > > > brd -22 module_enable_rodata_after_init() > > > > > > > > > > > > > > Kernel messages: > > > > > > > [ 89.463690] brd: module loaded > > > > > > > [ 89.463855] brd: module_enable_rodata_ro_after_init() returned -22, > > > > > > > ro_after_init data might still be writable > > > > > > > > > > > > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> > > > > > > > --- > > > > > > > tools/bpf/Makefile | 13 ++- > > > > > > > tools/bpf/moderr/.gitignore | 2 + > > > > > > > tools/bpf/moderr/Makefile | 95 +++++++++++++++++ > > > > > > > tools/bpf/moderr/moderr.bpf.c | 127 +++++++++++++++++++++++ > > > > > > > tools/bpf/moderr/moderr.c | 236 ++++++++++++++++++++++++++++++++++++++++++ > > > > > > > tools/bpf/moderr/moderr.h | 40 +++++++ > > > > > > > 6 files changed, 510 insertions(+), 3 deletions(-) > > > > > > > > > > > > The tool looks useful, but we don't add tools to the kernel repo. > > > > > > It has to stay out of tree. > > > > > > > > > > For selftests we do add random tools. > > > > > > > > > > > The value of error injection is not clear to me. > > > > > > > > > > It is of great value, since it deals with corner cases which are > > > > > otherwise hard to reproduce in places which a real error can be > > > > > catostrophic. > > > > > > > > > > > Other places in the kernel use it to test paths in the kernel > > > > > > that are difficult to do otherwise. > > > > > > > > > > Right. > > > > > > > > > > > These 3 functions don't seem to be in this category. > > > > > > > > > > That's the key here we should focus on. The problem is when a maintainer > > > > > *does* agree that adding an error injection entry is useful for testing, > > > > > and we have a developer willing to do the work to help test / validate > > > > > it. In this case, this error case is rare but we do want to strive to > > > > > test this as we ramp up and extend our modules selftests. > > > > > > > > > > Then there is the aspect of how to mitigate how instrusive code changes > > > > > to allow error injection are. In 2021 we evaluated the prospect of error > > > > > injection in-kernel long ago for other areas like the block layer for > > > > > add_disk() failures [0] but the minimal interface to enable this from > > > > > userspace with debugfs was considered just too intrusive. > > > > > > > > > > This effort tried to evaluate what this could look like with eBPF to > > > > > mitigate the required in-kernel code, and I believe the light weight > > > > > nature of it by just requiring a sprinkle with ALLOW_ERROR_INJECTION() > > > > > suffices to my taste. > > > > > > > > > > So, perhaps the tools aspect can just go in: > > > > > > > > > > tools/testing/selftests/module/ > > > > > > > > but why would it be module-specific? > > > > > > Gotta start somewhere. > > > > > > > Based on its current implementation > > > > and discussion about inject.py it seems to be generic enough to be > > > > useful to test any function annotated with ALLOW_ERROR_INJECTION(). > > > > > > > > As xe driver maintainer, it may be interesting to use such a tool: > > > > > > > > $ git grep ALLOW_ERROR_INJECT -- drivers/gpu/drm/xe | wc -l 23 > > > > > > > > How does this approach compare to writing the function name on debugfs > > > > (the current approach in xe's testsuite)? > > > > > > > > fail_function @ https://docs.kernel.org/fault-injection/fault-injection.html#fault-injection-capabilities-infrastructure > > > > https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/intel/xe_fault_injection.c?ref_type=heads#L108 > > > > > > > > If you decide to have the tool to live somewhere else, then kmod repo > > > > could be a candidate. > > > > > > Would we install this upon install target? > > > > > > Danny can decide on this :) > > > > > > > Although I think having it in kernel tree is > > > > simpler maintenance-wise. > > > > > > I think we have at least two users upstream who can make use of it. If > > > we end up going through tools/testing/selftests/module/ first, can't > > > you make use of it later? > > > > What are the features in debugfs required to be useful for xe that we can > > port to an eBPF version? I see from the link provided the use of probability, > > interval, times and space but these are configured to allways trigger the error. > > Is that right? > > I don't think we use them... we just set them to "always trigger" and > then create the conditions for that to happen. But my question still > remains: what is the benefit of using the bpf approach over writing > these files? The code to trigger the error injection still needs to exist with both approaches. My understanding from debugfs and the comment from Luis earlier in the thread is that with eBPF you can mitigate how intrusive in-kernel code changes are to allow error injection. Luis added the example here [1] for debugfs. [1] https://lore.kernel.org/all/20210512064629.13899-9-mcgrof@kernel.org/ To compare patch 8 in [1] with eBPF approach: the patch describes all the necessary changes required to allow error injection on the add_disk() path. With eBPF one would simply annotate the function(s) with ALLOW_ERROR_INJECTION(), e.g. device_add() and replace the return value in eBPF code with bpf_override_return() as implemented in moderr tool for module_enable_rdata_after_init() for example. New error injection users such modules or block/disk would benefit of the eBPF approach with having a more simpler in-kernel code for the same purpose. Current users of debugfs/errorinj would have to remove all the upstream intrusive error injection related code if they want to adopt the eBPF approach. Daniel > > Lucas De Marchi > > > > > > > > > Luis
On Fri, Feb 28, 2025 at 10:27:17AM +0100, Daniel Gomez wrote: >On Mon, Feb 24, 2025 at 08:43:45AM +0100, Lucas De Marchi wrote: >> On Sat, Feb 22, 2025 at 10:35:07PM +0100, Daniel Gomez wrote: >> > On Fri, Feb 21, 2025 at 12:15:40PM +0100, Luis Chamberlain wrote: >> > > On Wed, Feb 19, 2025 at 02:17:48PM -0600, Lucas De Marchi wrote: >> > > > On Tue, Jan 28, 2025 at 12:57:05PM -0800, Luis Chamberlain wrote: >> > > > > On Wed, Jan 22, 2025 at 09:02:19AM -0800, Alexei Starovoitov wrote: >> > > > > > On Wed, Jan 22, 2025 at 5:12 AM Daniel Gomez <da.gomez@samsung.com> wrote: >> > > > > > > >> > > > > > > Add support for a module error injection tool. The tool >> > > > > > > can inject errors in the annotated module kernel functions >> > > > > > > such as complete_formation(), do_init_module() and >> > > > > > > module_enable_rodata_after_init(). Module name and module function are >> > > > > > > required parameters to have control over the error injection. >> > > > > > > >> > > > > > > Example: Inject error -22 to module_enable_rodata_ro_after_init for >> > > > > > > brd module: >> > > > > > > >> > > > > > > sudo moderr --modname=brd --modfunc=module_enable_rodata_ro_after_init \ >> > > > > > > --error=-22 --trace >> > > > > > > Monitoring module error injection... Hit Ctrl-C to end. >> > > > > > > MODULE ERROR FUNCTION >> > > > > > > brd -22 module_enable_rodata_after_init() >> > > > > > > >> > > > > > > Kernel messages: >> > > > > > > [ 89.463690] brd: module loaded >> > > > > > > [ 89.463855] brd: module_enable_rodata_ro_after_init() returned -22, >> > > > > > > ro_after_init data might still be writable >> > > > > > > >> > > > > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> >> > > > > > > --- >> > > > > > > tools/bpf/Makefile | 13 ++- >> > > > > > > tools/bpf/moderr/.gitignore | 2 + >> > > > > > > tools/bpf/moderr/Makefile | 95 +++++++++++++++++ >> > > > > > > tools/bpf/moderr/moderr.bpf.c | 127 +++++++++++++++++++++++ >> > > > > > > tools/bpf/moderr/moderr.c | 236 ++++++++++++++++++++++++++++++++++++++++++ >> > > > > > > tools/bpf/moderr/moderr.h | 40 +++++++ >> > > > > > > 6 files changed, 510 insertions(+), 3 deletions(-) >> > > > > > >> > > > > > The tool looks useful, but we don't add tools to the kernel repo. >> > > > > > It has to stay out of tree. >> > > > > >> > > > > For selftests we do add random tools. >> > > > > >> > > > > > The value of error injection is not clear to me. >> > > > > >> > > > > It is of great value, since it deals with corner cases which are >> > > > > otherwise hard to reproduce in places which a real error can be >> > > > > catostrophic. >> > > > > >> > > > > > Other places in the kernel use it to test paths in the kernel >> > > > > > that are difficult to do otherwise. >> > > > > >> > > > > Right. >> > > > > >> > > > > > These 3 functions don't seem to be in this category. >> > > > > >> > > > > That's the key here we should focus on. The problem is when a maintainer >> > > > > *does* agree that adding an error injection entry is useful for testing, >> > > > > and we have a developer willing to do the work to help test / validate >> > > > > it. In this case, this error case is rare but we do want to strive to >> > > > > test this as we ramp up and extend our modules selftests. >> > > > > >> > > > > Then there is the aspect of how to mitigate how instrusive code changes >> > > > > to allow error injection are. In 2021 we evaluated the prospect of error >> > > > > injection in-kernel long ago for other areas like the block layer for >> > > > > add_disk() failures [0] but the minimal interface to enable this from >> > > > > userspace with debugfs was considered just too intrusive. >> > > > > >> > > > > This effort tried to evaluate what this could look like with eBPF to >> > > > > mitigate the required in-kernel code, and I believe the light weight >> > > > > nature of it by just requiring a sprinkle with ALLOW_ERROR_INJECTION() >> > > > > suffices to my taste. >> > > > > >> > > > > So, perhaps the tools aspect can just go in: >> > > > > >> > > > > tools/testing/selftests/module/ >> > > > >> > > > but why would it be module-specific? >> > > >> > > Gotta start somewhere. >> > > >> > > > Based on its current implementation >> > > > and discussion about inject.py it seems to be generic enough to be >> > > > useful to test any function annotated with ALLOW_ERROR_INJECTION(). >> > > > >> > > > As xe driver maintainer, it may be interesting to use such a tool: >> > > > >> > > > $ git grep ALLOW_ERROR_INJECT -- drivers/gpu/drm/xe | wc -l 23 >> > > > >> > > > How does this approach compare to writing the function name on debugfs >> > > > (the current approach in xe's testsuite)? >> > > > >> > > > fail_function @ https://docs.kernel.org/fault-injection/fault-injection.html#fault-injection-capabilities-infrastructure >> > > > https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/intel/xe_fault_injection.c?ref_type=heads#L108 >> > > > >> > > > If you decide to have the tool to live somewhere else, then kmod repo >> > > > could be a candidate. >> > > >> > > Would we install this upon install target? >> > > >> > > Danny can decide on this :) >> > > >> > > > Although I think having it in kernel tree is >> > > > simpler maintenance-wise. >> > > >> > > I think we have at least two users upstream who can make use of it. If >> > > we end up going through tools/testing/selftests/module/ first, can't >> > > you make use of it later? >> > >> > What are the features in debugfs required to be useful for xe that we can >> > port to an eBPF version? I see from the link provided the use of probability, >> > interval, times and space but these are configured to allways trigger the error. >> > Is that right? >> >> I don't think we use them... we just set them to "always trigger" and >> then create the conditions for that to happen. But my question still >> remains: what is the benefit of using the bpf approach over writing >> these files? > >The code to trigger the error injection still needs to exist with both >approaches. My understanding from debugfs and the comment from Luis earlier in >the thread is that with eBPF you can mitigate how intrusive in-kernel code >changes are to allow error injection. Luis added the example here [1] for >debugfs. > >[1] https://lore.kernel.org/all/20210512064629.13899-9-mcgrof@kernel.org/ > >To compare patch 8 in [1] with eBPF approach: the patch describes >all the necessary changes required to allow error injection on the >add_disk() path. With eBPF one would simply annotate the function(s) with >ALLOW_ERROR_INJECTION(), e.g. device_add() and replace the return value >in eBPF code with bpf_override_return() as implemented in moderr tool for >module_enable_rdata_after_init() for example. but that is all that we need with the fail_function in debugfs too: $ git grep ALLOW_ERROR_INJECTION -- drivers/gpu/drm/xe drivers/gpu/drm/xe/xe_device.c:ALLOW_ERROR_INJECTION(xe_device_create, ERRNO); /* See xe_pci_probe() */ drivers/gpu/drm/xe/xe_device.c:ALLOW_ERROR_INJECTION(wait_for_lmem_ready, ERRNO); /* See xe_pci_probe() */ drivers/gpu/drm/xe/xe_exec_queue.c:ALLOW_ERROR_INJECTION(xe_exec_queue_create_bind, ERRNO); drivers/gpu/drm/xe/xe_ggtt.c:ALLOW_ERROR_INJECTION(xe_ggtt_init_early, ERRNO); /* See xe_pci_probe() */ drivers/gpu/drm/xe/xe_guc_ads.c:ALLOW_ERROR_INJECTION(xe_guc_ads_init, ERRNO); /* See xe_pci_probe() */ drivers/gpu/drm/xe/xe_guc_ct.c:ALLOW_ERROR_INJECTION(xe_guc_ct_init, ERRNO); /* See xe_pci_probe() */ drivers/gpu/drm/xe/xe_guc_log.c:ALLOW_ERROR_INJECTION(xe_guc_log_init, ERRNO); /* See xe_pci_probe() */ drivers/gpu/drm/xe/xe_guc_relay.c:ALLOW_ERROR_INJECTION(xe_guc_relay_init, ERRNO); /* See xe_pci_probe() */ drivers/gpu/drm/xe/xe_pci.c: * ALLOW_ERROR_INJECTION() is used to conditionally skip function execution drivers/gpu/drm/xe/xe_pci.c: * ALLOW_ERROR_INJECTION() macro but this is acceptable because for those drivers/gpu/drm/xe/xe_pm.c:ALLOW_ERROR_INJECTION(xe_pm_init_early, ERRNO); /* See xe_pci_probe() */ drivers/gpu/drm/xe/xe_pt.c:ALLOW_ERROR_INJECTION(xe_pt_create, ERRNO); drivers/gpu/drm/xe/xe_pt.c:ALLOW_ERROR_INJECTION(xe_pt_update_ops_prepare, ERRNO); drivers/gpu/drm/xe/xe_pt.c:ALLOW_ERROR_INJECTION(xe_pt_update_ops_run, ERRNO); drivers/gpu/drm/xe/xe_sriov.c:ALLOW_ERROR_INJECTION(xe_sriov_init, ERRNO); /* See xe_pci_probe() */ drivers/gpu/drm/xe/xe_sync.c:ALLOW_ERROR_INJECTION(xe_sync_entry_parse, ERRNO); drivers/gpu/drm/xe/xe_tile.c:ALLOW_ERROR_INJECTION(xe_tile_init_early, ERRNO); /* See xe_pci_probe() */ drivers/gpu/drm/xe/xe_uc_fw.c:ALLOW_ERROR_INJECTION(xe_uc_fw_init, ERRNO); /* See xe_pci_probe() */ drivers/gpu/drm/xe/xe_vm.c:ALLOW_ERROR_INJECTION(xe_vma_ops_alloc, ERRNO); drivers/gpu/drm/xe/xe_vm.c:ALLOW_ERROR_INJECTION(xe_vm_create_scratch, ERRNO); drivers/gpu/drm/xe/xe_vm.c:ALLOW_ERROR_INJECTION(vm_bind_ioctl_ops_create, ERRNO); drivers/gpu/drm/xe/xe_vm.c:ALLOW_ERROR_INJECTION(vm_bind_ioctl_ops_execute, ERRNO); drivers/gpu/drm/xe/xe_wa.c:ALLOW_ERROR_INJECTION(xe_wa_init, ERRNO); /* See xe_pci_probe() */ drivers/gpu/drm/xe/xe_wopcm.c:ALLOW_ERROR_INJECTION(xe_wopcm_init, ERRNO); /* See xe_pci_probe() */ That is different from the patch you are pointing to because that patch is trying to add arbitrary/named error injection points throughout the code. However via debugfs it's still possible to add error injection to the beginning of a function by annotating that function with ALLOW_ERROR_INJECTION. If a function is annotated with that, then if you do e.g. echo xe_device_create > /sys/kernel/debug/fail_function/inject it will cause that function to fail. There are some additional files to control _when_ that function should fail, but I'm failing to see a clear benefit. See this example in the docs: Documentation/fault-injection/fault-injection.rst:- Inject open_ctree error while btrfs mount:: Lucas De Marchi > >New error injection users such modules or block/disk would benefit of the eBPF >approach with having a more simpler in-kernel code for the same purpose. Current >users of debugfs/errorinj would have to remove all the upstream intrusive error >injection related code if they want to adopt the eBPF approach. > >Daniel > >> >> Lucas De Marchi >> >> > >> > > >> > > Luis
On Fri, Feb 28, 2025 at 12:48:38PM +0100, Lucas De Marchi wrote: > On Fri, Feb 28, 2025 at 10:27:17AM +0100, Daniel Gomez wrote: > > On Mon, Feb 24, 2025 at 08:43:45AM +0100, Lucas De Marchi wrote: > > > On Sat, Feb 22, 2025 at 10:35:07PM +0100, Daniel Gomez wrote: > > > > On Fri, Feb 21, 2025 at 12:15:40PM +0100, Luis Chamberlain wrote: > > > > > On Wed, Feb 19, 2025 at 02:17:48PM -0600, Lucas De Marchi wrote: > > > > > > On Tue, Jan 28, 2025 at 12:57:05PM -0800, Luis Chamberlain wrote: > > > > > > > On Wed, Jan 22, 2025 at 09:02:19AM -0800, Alexei Starovoitov wrote: > > > > > > > > On Wed, Jan 22, 2025 at 5:12 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > > > > > > > > > > > > > Add support for a module error injection tool. The tool > > > > > > > > > can inject errors in the annotated module kernel functions > > > > > > > > > such as complete_formation(), do_init_module() and > > > > > > > > > module_enable_rodata_after_init(). Module name and module function are > > > > > > > > > required parameters to have control over the error injection. > > > > > > > > > > > > > > > > > > Example: Inject error -22 to module_enable_rodata_ro_after_init for > > > > > > > > > brd module: > > > > > > > > > > > > > > > > > > sudo moderr --modname=brd --modfunc=module_enable_rodata_ro_after_init \ > > > > > > > > > --error=-22 --trace > > > > > > > > > Monitoring module error injection... Hit Ctrl-C to end. > > > > > > > > > MODULE ERROR FUNCTION > > > > > > > > > brd -22 module_enable_rodata_after_init() > > > > > > > > > > > > > > > > > > Kernel messages: > > > > > > > > > [ 89.463690] brd: module loaded > > > > > > > > > [ 89.463855] brd: module_enable_rodata_ro_after_init() returned -22, > > > > > > > > > ro_after_init data might still be writable > > > > > > > > > > > > > > > > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> > > > > > > > > > --- > > > > > > > > > tools/bpf/Makefile | 13 ++- > > > > > > > > > tools/bpf/moderr/.gitignore | 2 + > > > > > > > > > tools/bpf/moderr/Makefile | 95 +++++++++++++++++ > > > > > > > > > tools/bpf/moderr/moderr.bpf.c | 127 +++++++++++++++++++++++ > > > > > > > > > tools/bpf/moderr/moderr.c | 236 ++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > tools/bpf/moderr/moderr.h | 40 +++++++ > > > > > > > > > 6 files changed, 510 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > > > The tool looks useful, but we don't add tools to the kernel repo. > > > > > > > > It has to stay out of tree. > > > > > > > > > > > > > > For selftests we do add random tools. > > > > > > > > > > > > > > > The value of error injection is not clear to me. > > > > > > > > > > > > > > It is of great value, since it deals with corner cases which are > > > > > > > otherwise hard to reproduce in places which a real error can be > > > > > > > catostrophic. > > > > > > > > > > > > > > > Other places in the kernel use it to test paths in the kernel > > > > > > > > that are difficult to do otherwise. > > > > > > > > > > > > > > Right. > > > > > > > > > > > > > > > These 3 functions don't seem to be in this category. > > > > > > > > > > > > > > That's the key here we should focus on. The problem is when a maintainer > > > > > > > *does* agree that adding an error injection entry is useful for testing, > > > > > > > and we have a developer willing to do the work to help test / validate > > > > > > > it. In this case, this error case is rare but we do want to strive to > > > > > > > test this as we ramp up and extend our modules selftests. > > > > > > > > > > > > > > Then there is the aspect of how to mitigate how instrusive code changes > > > > > > > to allow error injection are. In 2021 we evaluated the prospect of error > > > > > > > injection in-kernel long ago for other areas like the block layer for > > > > > > > add_disk() failures [0] but the minimal interface to enable this from > > > > > > > userspace with debugfs was considered just too intrusive. > > > > > > > > > > > > > > This effort tried to evaluate what this could look like with eBPF to > > > > > > > mitigate the required in-kernel code, and I believe the light weight > > > > > > > nature of it by just requiring a sprinkle with ALLOW_ERROR_INJECTION() > > > > > > > suffices to my taste. > > > > > > > > > > > > > > So, perhaps the tools aspect can just go in: > > > > > > > > > > > > > > tools/testing/selftests/module/ > > > > > > > > > > > > but why would it be module-specific? > > > > > > > > > > Gotta start somewhere. > > > > > > > > > > > Based on its current implementation > > > > > > and discussion about inject.py it seems to be generic enough to be > > > > > > useful to test any function annotated with ALLOW_ERROR_INJECTION(). > > > > > > > > > > > > As xe driver maintainer, it may be interesting to use such a tool: > > > > > > > > > > > > $ git grep ALLOW_ERROR_INJECT -- drivers/gpu/drm/xe | wc -l 23 > > > > > > > > > > > > How does this approach compare to writing the function name on debugfs > > > > > > (the current approach in xe's testsuite)? > > > > > > > > > > > > fail_function @ https://docs.kernel.org/fault-injection/fault-injection.html#fault-injection-capabilities-infrastructure > > > > > > https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/intel/xe_fault_injection.c?ref_type=heads#L108 > > > > > > > > > > > > If you decide to have the tool to live somewhere else, then kmod repo > > > > > > could be a candidate. > > > > > > > > > > Would we install this upon install target? > > > > > > > > > > Danny can decide on this :) > > > > > > > > > > > Although I think having it in kernel tree is > > > > > > simpler maintenance-wise. > > > > > > > > > > I think we have at least two users upstream who can make use of it. If > > > > > we end up going through tools/testing/selftests/module/ first, can't > > > > > you make use of it later? > > > > > > > > What are the features in debugfs required to be useful for xe that we can > > > > port to an eBPF version? I see from the link provided the use of probability, > > > > interval, times and space but these are configured to allways trigger the error. > > > > Is that right? > > > > > > I don't think we use them... we just set them to "always trigger" and > > > then create the conditions for that to happen. But my question still > > > remains: what is the benefit of using the bpf approach over writing > > > these files? > > > > The code to trigger the error injection still needs to exist with both > > approaches. My understanding from debugfs and the comment from Luis earlier in > > the thread is that with eBPF you can mitigate how intrusive in-kernel code > > changes are to allow error injection. Luis added the example here [1] for > > debugfs. > > > > [1] https://lore.kernel.org/all/20210512064629.13899-9-mcgrof@kernel.org/ > > > > To compare patch 8 in [1] with eBPF approach: the patch describes > > all the necessary changes required to allow error injection on the > > add_disk() path. With eBPF one would simply annotate the function(s) with > > ALLOW_ERROR_INJECTION(), e.g. device_add() and replace the return value > > in eBPF code with bpf_override_return() as implemented in moderr tool for > > module_enable_rdata_after_init() for example. > > but that is all that we need with the fail_function in debugfs too: > > $ git grep ALLOW_ERROR_INJECTION -- drivers/gpu/drm/xe > drivers/gpu/drm/xe/xe_device.c:ALLOW_ERROR_INJECTION(xe_device_create, ERRNO); /* See xe_pci_probe() */ > drivers/gpu/drm/xe/xe_device.c:ALLOW_ERROR_INJECTION(wait_for_lmem_ready, ERRNO); /* See xe_pci_probe() */ > drivers/gpu/drm/xe/xe_exec_queue.c:ALLOW_ERROR_INJECTION(xe_exec_queue_create_bind, ERRNO); > drivers/gpu/drm/xe/xe_ggtt.c:ALLOW_ERROR_INJECTION(xe_ggtt_init_early, ERRNO); /* See xe_pci_probe() */ > drivers/gpu/drm/xe/xe_guc_ads.c:ALLOW_ERROR_INJECTION(xe_guc_ads_init, ERRNO); /* See xe_pci_probe() */ > drivers/gpu/drm/xe/xe_guc_ct.c:ALLOW_ERROR_INJECTION(xe_guc_ct_init, ERRNO); /* See xe_pci_probe() */ > drivers/gpu/drm/xe/xe_guc_log.c:ALLOW_ERROR_INJECTION(xe_guc_log_init, ERRNO); /* See xe_pci_probe() */ > drivers/gpu/drm/xe/xe_guc_relay.c:ALLOW_ERROR_INJECTION(xe_guc_relay_init, ERRNO); /* See xe_pci_probe() */ > drivers/gpu/drm/xe/xe_pci.c: * ALLOW_ERROR_INJECTION() is used to conditionally skip function execution > drivers/gpu/drm/xe/xe_pci.c: * ALLOW_ERROR_INJECTION() macro but this is acceptable because for those > drivers/gpu/drm/xe/xe_pm.c:ALLOW_ERROR_INJECTION(xe_pm_init_early, ERRNO); /* See xe_pci_probe() */ > drivers/gpu/drm/xe/xe_pt.c:ALLOW_ERROR_INJECTION(xe_pt_create, ERRNO); > drivers/gpu/drm/xe/xe_pt.c:ALLOW_ERROR_INJECTION(xe_pt_update_ops_prepare, ERRNO); > drivers/gpu/drm/xe/xe_pt.c:ALLOW_ERROR_INJECTION(xe_pt_update_ops_run, ERRNO); > drivers/gpu/drm/xe/xe_sriov.c:ALLOW_ERROR_INJECTION(xe_sriov_init, ERRNO); /* See xe_pci_probe() */ > drivers/gpu/drm/xe/xe_sync.c:ALLOW_ERROR_INJECTION(xe_sync_entry_parse, ERRNO); > drivers/gpu/drm/xe/xe_tile.c:ALLOW_ERROR_INJECTION(xe_tile_init_early, ERRNO); /* See xe_pci_probe() */ > drivers/gpu/drm/xe/xe_uc_fw.c:ALLOW_ERROR_INJECTION(xe_uc_fw_init, ERRNO); /* See xe_pci_probe() */ > drivers/gpu/drm/xe/xe_vm.c:ALLOW_ERROR_INJECTION(xe_vma_ops_alloc, ERRNO); > drivers/gpu/drm/xe/xe_vm.c:ALLOW_ERROR_INJECTION(xe_vm_create_scratch, ERRNO); > drivers/gpu/drm/xe/xe_vm.c:ALLOW_ERROR_INJECTION(vm_bind_ioctl_ops_create, ERRNO); > drivers/gpu/drm/xe/xe_vm.c:ALLOW_ERROR_INJECTION(vm_bind_ioctl_ops_execute, ERRNO); > drivers/gpu/drm/xe/xe_wa.c:ALLOW_ERROR_INJECTION(xe_wa_init, ERRNO); /* See xe_pci_probe() */ > drivers/gpu/drm/xe/xe_wopcm.c:ALLOW_ERROR_INJECTION(xe_wopcm_init, ERRNO); /* See xe_pci_probe() */ > > That is different from the patch you are pointing to because that patch > is trying to add arbitrary/named error injection points throughout the > code. However via debugfs it's still possible to add error injection to When reading the patch I assumed the block/failure-injection.c was needed for the knobs in sysfs/debugfs. But I see I was wrong and these are only needed for the arbitrary error injection points? I see mm/fail_page_alloc.c has a similar approach with should_fail_alloc_page(). > the beginning of a function by annotating that function with > ALLOW_ERROR_INJECTION. If a function is annotated with that, then if you > do e.g. > > echo xe_device_create > /sys/kernel/debug/fail_function/inject > > it will cause that function to fail. There are some additional files to > control _when_ that function should fail, but I'm failing to see a clear > benefit. See this example in the docs: Can you clarify if _when_ (in debugfs) allows you to access function arguments of a given annotated function with ALLOW_ERROR_INJECTION()? It seems that might be the only part that can be moved out of the kernel and handled in eBPF. Other than that, I don't see either a benefit of using one approach over the other. > > Documentation/fault-injection/fault-injection.rst:- Inject open_ctree error while btrfs mount:: > > Lucas De Marchi
+Francois who added most of the error injection points in xe On Wed, Mar 05, 2025 at 11:06:57AM +0100, Daniel Gomez wrote: >On Fri, Feb 28, 2025 at 12:48:38PM +0100, Lucas De Marchi wrote: >> On Fri, Feb 28, 2025 at 10:27:17AM +0100, Daniel Gomez wrote: >> > On Mon, Feb 24, 2025 at 08:43:45AM +0100, Lucas De Marchi wrote: >> > > On Sat, Feb 22, 2025 at 10:35:07PM +0100, Daniel Gomez wrote: >> > > > On Fri, Feb 21, 2025 at 12:15:40PM +0100, Luis Chamberlain wrote: >> > > > > On Wed, Feb 19, 2025 at 02:17:48PM -0600, Lucas De Marchi wrote: >> > > > > > On Tue, Jan 28, 2025 at 12:57:05PM -0800, Luis Chamberlain wrote: >> > > > > > > On Wed, Jan 22, 2025 at 09:02:19AM -0800, Alexei Starovoitov wrote: >> > > > > > > > On Wed, Jan 22, 2025 at 5:12 AM Daniel Gomez <da.gomez@samsung.com> wrote: >> > > > > > > > > >> > > > > > > > > Add support for a module error injection tool. The tool >> > > > > > > > > can inject errors in the annotated module kernel functions >> > > > > > > > > such as complete_formation(), do_init_module() and >> > > > > > > > > module_enable_rodata_after_init(). Module name and module function are >> > > > > > > > > required parameters to have control over the error injection. >> > > > > > > > > >> > > > > > > > > Example: Inject error -22 to module_enable_rodata_ro_after_init for >> > > > > > > > > brd module: >> > > > > > > > > >> > > > > > > > > sudo moderr --modname=brd --modfunc=module_enable_rodata_ro_after_init \ >> > > > > > > > > --error=-22 --trace >> > > > > > > > > Monitoring module error injection... Hit Ctrl-C to end. >> > > > > > > > > MODULE ERROR FUNCTION >> > > > > > > > > brd -22 module_enable_rodata_after_init() >> > > > > > > > > >> > > > > > > > > Kernel messages: >> > > > > > > > > [ 89.463690] brd: module loaded >> > > > > > > > > [ 89.463855] brd: module_enable_rodata_ro_after_init() returned -22, >> > > > > > > > > ro_after_init data might still be writable >> > > > > > > > > >> > > > > > > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> >> > > > > > > > > --- >> > > > > > > > > tools/bpf/Makefile | 13 ++- >> > > > > > > > > tools/bpf/moderr/.gitignore | 2 + >> > > > > > > > > tools/bpf/moderr/Makefile | 95 +++++++++++++++++ >> > > > > > > > > tools/bpf/moderr/moderr.bpf.c | 127 +++++++++++++++++++++++ >> > > > > > > > > tools/bpf/moderr/moderr.c | 236 ++++++++++++++++++++++++++++++++++++++++++ >> > > > > > > > > tools/bpf/moderr/moderr.h | 40 +++++++ >> > > > > > > > > 6 files changed, 510 insertions(+), 3 deletions(-) >> > > > > > > > >> > > > > > > > The tool looks useful, but we don't add tools to the kernel repo. >> > > > > > > > It has to stay out of tree. >> > > > > > > >> > > > > > > For selftests we do add random tools. >> > > > > > > >> > > > > > > > The value of error injection is not clear to me. >> > > > > > > >> > > > > > > It is of great value, since it deals with corner cases which are >> > > > > > > otherwise hard to reproduce in places which a real error can be >> > > > > > > catostrophic. >> > > > > > > >> > > > > > > > Other places in the kernel use it to test paths in the kernel >> > > > > > > > that are difficult to do otherwise. >> > > > > > > >> > > > > > > Right. >> > > > > > > >> > > > > > > > These 3 functions don't seem to be in this category. >> > > > > > > >> > > > > > > That's the key here we should focus on. The problem is when a maintainer >> > > > > > > *does* agree that adding an error injection entry is useful for testing, >> > > > > > > and we have a developer willing to do the work to help test / validate >> > > > > > > it. In this case, this error case is rare but we do want to strive to >> > > > > > > test this as we ramp up and extend our modules selftests. >> > > > > > > >> > > > > > > Then there is the aspect of how to mitigate how instrusive code changes >> > > > > > > to allow error injection are. In 2021 we evaluated the prospect of error >> > > > > > > injection in-kernel long ago for other areas like the block layer for >> > > > > > > add_disk() failures [0] but the minimal interface to enable this from >> > > > > > > userspace with debugfs was considered just too intrusive. >> > > > > > > >> > > > > > > This effort tried to evaluate what this could look like with eBPF to >> > > > > > > mitigate the required in-kernel code, and I believe the light weight >> > > > > > > nature of it by just requiring a sprinkle with ALLOW_ERROR_INJECTION() >> > > > > > > suffices to my taste. >> > > > > > > >> > > > > > > So, perhaps the tools aspect can just go in: >> > > > > > > >> > > > > > > tools/testing/selftests/module/ >> > > > > > >> > > > > > but why would it be module-specific? >> > > > > >> > > > > Gotta start somewhere. >> > > > > >> > > > > > Based on its current implementation >> > > > > > and discussion about inject.py it seems to be generic enough to be >> > > > > > useful to test any function annotated with ALLOW_ERROR_INJECTION(). >> > > > > > >> > > > > > As xe driver maintainer, it may be interesting to use such a tool: >> > > > > > >> > > > > > $ git grep ALLOW_ERROR_INJECT -- drivers/gpu/drm/xe | wc -l 23 >> > > > > > >> > > > > > How does this approach compare to writing the function name on debugfs >> > > > > > (the current approach in xe's testsuite)? >> > > > > > >> > > > > > fail_function @ https://docs.kernel.org/fault-injection/fault-injection.html#fault-injection-capabilities-infrastructure >> > > > > > https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/intel/xe_fault_injection.c?ref_type=heads#L108 >> > > > > > >> > > > > > If you decide to have the tool to live somewhere else, then kmod repo >> > > > > > could be a candidate. >> > > > > >> > > > > Would we install this upon install target? >> > > > > >> > > > > Danny can decide on this :) >> > > > > >> > > > > > Although I think having it in kernel tree is >> > > > > > simpler maintenance-wise. >> > > > > >> > > > > I think we have at least two users upstream who can make use of it. If >> > > > > we end up going through tools/testing/selftests/module/ first, can't >> > > > > you make use of it later? >> > > > >> > > > What are the features in debugfs required to be useful for xe that we can >> > > > port to an eBPF version? I see from the link provided the use of probability, >> > > > interval, times and space but these are configured to allways trigger the error. >> > > > Is that right? >> > > >> > > I don't think we use them... we just set them to "always trigger" and >> > > then create the conditions for that to happen. But my question still >> > > remains: what is the benefit of using the bpf approach over writing >> > > these files? >> > >> > The code to trigger the error injection still needs to exist with both >> > approaches. My understanding from debugfs and the comment from Luis earlier in >> > the thread is that with eBPF you can mitigate how intrusive in-kernel code >> > changes are to allow error injection. Luis added the example here [1] for >> > debugfs. >> > >> > [1] https://lore.kernel.org/all/20210512064629.13899-9-mcgrof@kernel.org/ >> > >> > To compare patch 8 in [1] with eBPF approach: the patch describes >> > all the necessary changes required to allow error injection on the >> > add_disk() path. With eBPF one would simply annotate the function(s) with >> > ALLOW_ERROR_INJECTION(), e.g. device_add() and replace the return value >> > in eBPF code with bpf_override_return() as implemented in moderr tool for >> > module_enable_rdata_after_init() for example. >> >> but that is all that we need with the fail_function in debugfs too: >> >> $ git grep ALLOW_ERROR_INJECTION -- drivers/gpu/drm/xe >> drivers/gpu/drm/xe/xe_device.c:ALLOW_ERROR_INJECTION(xe_device_create, ERRNO); /* See xe_pci_probe() */ >> drivers/gpu/drm/xe/xe_device.c:ALLOW_ERROR_INJECTION(wait_for_lmem_ready, ERRNO); /* See xe_pci_probe() */ >> drivers/gpu/drm/xe/xe_exec_queue.c:ALLOW_ERROR_INJECTION(xe_exec_queue_create_bind, ERRNO); >> drivers/gpu/drm/xe/xe_ggtt.c:ALLOW_ERROR_INJECTION(xe_ggtt_init_early, ERRNO); /* See xe_pci_probe() */ >> drivers/gpu/drm/xe/xe_guc_ads.c:ALLOW_ERROR_INJECTION(xe_guc_ads_init, ERRNO); /* See xe_pci_probe() */ >> drivers/gpu/drm/xe/xe_guc_ct.c:ALLOW_ERROR_INJECTION(xe_guc_ct_init, ERRNO); /* See xe_pci_probe() */ >> drivers/gpu/drm/xe/xe_guc_log.c:ALLOW_ERROR_INJECTION(xe_guc_log_init, ERRNO); /* See xe_pci_probe() */ >> drivers/gpu/drm/xe/xe_guc_relay.c:ALLOW_ERROR_INJECTION(xe_guc_relay_init, ERRNO); /* See xe_pci_probe() */ >> drivers/gpu/drm/xe/xe_pci.c: * ALLOW_ERROR_INJECTION() is used to conditionally skip function execution >> drivers/gpu/drm/xe/xe_pci.c: * ALLOW_ERROR_INJECTION() macro but this is acceptable because for those >> drivers/gpu/drm/xe/xe_pm.c:ALLOW_ERROR_INJECTION(xe_pm_init_early, ERRNO); /* See xe_pci_probe() */ >> drivers/gpu/drm/xe/xe_pt.c:ALLOW_ERROR_INJECTION(xe_pt_create, ERRNO); >> drivers/gpu/drm/xe/xe_pt.c:ALLOW_ERROR_INJECTION(xe_pt_update_ops_prepare, ERRNO); >> drivers/gpu/drm/xe/xe_pt.c:ALLOW_ERROR_INJECTION(xe_pt_update_ops_run, ERRNO); >> drivers/gpu/drm/xe/xe_sriov.c:ALLOW_ERROR_INJECTION(xe_sriov_init, ERRNO); /* See xe_pci_probe() */ >> drivers/gpu/drm/xe/xe_sync.c:ALLOW_ERROR_INJECTION(xe_sync_entry_parse, ERRNO); >> drivers/gpu/drm/xe/xe_tile.c:ALLOW_ERROR_INJECTION(xe_tile_init_early, ERRNO); /* See xe_pci_probe() */ >> drivers/gpu/drm/xe/xe_uc_fw.c:ALLOW_ERROR_INJECTION(xe_uc_fw_init, ERRNO); /* See xe_pci_probe() */ >> drivers/gpu/drm/xe/xe_vm.c:ALLOW_ERROR_INJECTION(xe_vma_ops_alloc, ERRNO); >> drivers/gpu/drm/xe/xe_vm.c:ALLOW_ERROR_INJECTION(xe_vm_create_scratch, ERRNO); >> drivers/gpu/drm/xe/xe_vm.c:ALLOW_ERRORALLOW_ERROR_INJECTION_INJECTION(vm_bind_ioctl_ops_create, ERRNO); >> drivers/gpu/drm/xe/xe_vm.c:ALLOW_ERROR_INJECTION(vm_bind_ioctl_ops_execute, ERRNO); >> drivers/gpu/drm/xe/xe_wa.c:ALLOW_ERROR_INJECTION(xe_wa_init, ERRNO); /* See xe_pci_probe() */ >> drivers/gpu/drm/xe/xe_wopcm.c:ALLOW_ERROR_INJECTION(xe_wopcm_init, ERRNO); /* See xe_pci_probe() */ >> >> That is different from the patch you are pointing to because that patch >> is trying to add arbitrary/named error injection points throughout the >> code. However via debugfs it's still possible to add error injection to > >When reading the patch I assumed the block/failure-injection.c was needed for >the knobs in sysfs/debugfs. But I see I was wrong and these are only needed for >the arbitrary error injection points? yeah. For working with ALLOW_ERROR_INJECTION() we have to refactor the code so the functions follow its requirements. When that is true, then we can simply use the fail_function/inject to trigger it. > >I see mm/fail_page_alloc.c has a similar approach with should_fail_alloc_page(). > >> the beginning of a function by annotating that function with >> ALLOW_ERROR_INJECTION. If a function is annotated with that, then if you >> do e.g. >> >> echo xe_device_create > /sys/kernel/debug/fail_function/inject >> >> it will cause that function to fail. There are some additional files to >> control _when_ that function should fail, but I'm failing to see a clear >> benefit. See this example in the docs: > >Can you clarify if _when_ (in debugfs) allows you to access function arguments >of a given annotated function with ALLOW_ERROR_INJECTION()? It seems that might >be the only part that can be moved out of the kernel and handled in eBPF. Other >than that, I don't see either a benefit of using one approach over the other. afaik we can't change the behavior based on arguments when using the debugfs approach. Lucas De Marchi > >> >> Documentation/fault-injection/fault-injection.rst:- Inject open_ctree error while btrfs mount:: >> >> Lucas De Marchi >
On Wed, Mar 05, 2025 at 03:08:25PM -0600, Lucas De Marchi wrote: > +Francois who added most of the error injection points in xe > > On Wed, Mar 05, 2025 at 11:06:57AM +0100, Daniel Gomez wrote: > > On Fri, Feb 28, 2025 at 12:48:38PM +0100, Lucas De Marchi wrote: > > > On Fri, Feb 28, 2025 at 10:27:17AM +0100, Daniel Gomez wrote: > > > > On Mon, Feb 24, 2025 at 08:43:45AM +0100, Lucas De Marchi wrote: > > > > > On Sat, Feb 22, 2025 at 10:35:07PM +0100, Daniel Gomez wrote: > > > > > > On Fri, Feb 21, 2025 at 12:15:40PM +0100, Luis Chamberlain wrote: > > > > > > > On Wed, Feb 19, 2025 at 02:17:48PM -0600, Lucas De Marchi wrote: > > > > > > > > On Tue, Jan 28, 2025 at 12:57:05PM -0800, Luis Chamberlain wrote: > > > > > > > > > On Wed, Jan 22, 2025 at 09:02:19AM -0800, Alexei Starovoitov wrote: > > > > > > > > > > On Wed, Jan 22, 2025 at 5:12 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > > > > > > > > > > > > > > > > > Add support for a module error injection tool. The tool > > > > > > > > > > > can inject errors in the annotated module kernel functions > > > > > > > > > > > such as complete_formation(), do_init_module() and > > > > > > > > > > > module_enable_rodata_after_init(). Module name and module function are > > > > > > > > > > > required parameters to have control over the error injection. > > > > > > > > > > > > > > > > > > > > > > Example: Inject error -22 to module_enable_rodata_ro_after_init for > > > > > > > > > > > brd module: > > > > > > > > > > > > > > > > > > > > > > sudo moderr --modname=brd --modfunc=module_enable_rodata_ro_after_init \ > > > > > > > > > > > --error=-22 --trace > > > > > > > > > > > Monitoring module error injection... Hit Ctrl-C to end. > > > > > > > > > > > MODULE ERROR FUNCTION > > > > > > > > > > > brd -22 module_enable_rodata_after_init() > > > > > > > > > > > > > > > > > > > > > > Kernel messages: > > > > > > > > > > > [ 89.463690] brd: module loaded > > > > > > > > > > > [ 89.463855] brd: module_enable_rodata_ro_after_init() returned -22, > > > > > > > > > > > ro_after_init data might still be writable > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> > > > > > > > > > > > --- > > > > > > > > > > > tools/bpf/Makefile | 13 ++- > > > > > > > > > > > tools/bpf/moderr/.gitignore | 2 + > > > > > > > > > > > tools/bpf/moderr/Makefile | 95 +++++++++++++++++ > > > > > > > > > > > tools/bpf/moderr/moderr.bpf.c | 127 +++++++++++++++++++++++ > > > > > > > > > > > tools/bpf/moderr/moderr.c | 236 ++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > > > tools/bpf/moderr/moderr.h | 40 +++++++ > > > > > > > > > > > 6 files changed, 510 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > > > > > > > The tool looks useful, but we don't add tools to the kernel repo. > > > > > > > > > > It has to stay out of tree. > > > > > > > > > > > > > > > > > > For selftests we do add random tools. > > > > > > > > > > > > > > > > > > > The value of error injection is not clear to me. > > > > > > > > > > > > > > > > > > It is of great value, since it deals with corner cases which are > > > > > > > > > otherwise hard to reproduce in places which a real error can be > > > > > > > > > catostrophic. > > > > > > > > > > > > > > > > > > > Other places in the kernel use it to test paths in the kernel > > > > > > > > > > that are difficult to do otherwise. > > > > > > > > > > > > > > > > > > Right. > > > > > > > > > > > > > > > > > > > These 3 functions don't seem to be in this category. > > > > > > > > > > > > > > > > > > That's the key here we should focus on. The problem is when a maintainer > > > > > > > > > *does* agree that adding an error injection entry is useful for testing, > > > > > > > > > and we have a developer willing to do the work to help test / validate > > > > > > > > > it. In this case, this error case is rare but we do want to strive to > > > > > > > > > test this as we ramp up and extend our modules selftests. > > > > > > > > > > > > > > > > > > Then there is the aspect of how to mitigate how instrusive code changes > > > > > > > > > to allow error injection are. In 2021 we evaluated the prospect of error > > > > > > > > > injection in-kernel long ago for other areas like the block layer for > > > > > > > > > add_disk() failures [0] but the minimal interface to enable this from > > > > > > > > > userspace with debugfs was considered just too intrusive. > > > > > > > > > > > > > > > > > > This effort tried to evaluate what this could look like with eBPF to > > > > > > > > > mitigate the required in-kernel code, and I believe the light weight > > > > > > > > > nature of it by just requiring a sprinkle with ALLOW_ERROR_INJECTION() > > > > > > > > > suffices to my taste. > > > > > > > > > > > > > > > > > > So, perhaps the tools aspect can just go in: > > > > > > > > > > > > > > > > > > tools/testing/selftests/module/ > > > > > > > > > > > > > > > > but why would it be module-specific? > > > > > > > > > > > > > > Gotta start somewhere. > > > > > > > > > > > > > > > Based on its current implementation > > > > > > > > and discussion about inject.py it seems to be generic enough to be > > > > > > > > useful to test any function annotated with ALLOW_ERROR_INJECTION(). > > > > > > > > > > > > > > > > As xe driver maintainer, it may be interesting to use such a tool: > > > > > > > > > > > > > > > > $ git grep ALLOW_ERROR_INJECT -- drivers/gpu/drm/xe | wc -l 23 > > > > > > > > > > > > > > > > How does this approach compare to writing the function name on debugfs > > > > > > > > (the current approach in xe's testsuite)? > > > > > > > > > > > > > > > > fail_function @ https://docs.kernel.org/fault-injection/fault-injection.html#fault-injection-capabilities-infrastructure > > > > > > > > https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/intel/xe_fault_injection.c?ref_type=heads#L108 > > > > > > > > > > > > > > > > If you decide to have the tool to live somewhere else, then kmod repo > > > > > > > > could be a candidate. > > > > > > > > > > > > > > Would we install this upon install target? > > > > > > > > > > > > > > Danny can decide on this :) > > > > > > > > > > > > > > > Although I think having it in kernel tree is > > > > > > > > simpler maintenance-wise. > > > > > > > > > > > > > > I think we have at least two users upstream who can make use of it. If > > > > > > > we end up going through tools/testing/selftests/module/ first, can't > > > > > > > you make use of it later? > > > > > > > > > > > > What are the features in debugfs required to be useful for xe that we can > > > > > > port to an eBPF version? I see from the link provided the use of probability, > > > > > > interval, times and space but these are configured to allways trigger the error. > > > > > > Is that right? > > > > > > > > > > I don't think we use them... we just set them to "always trigger" and > > > > > then create the conditions for that to happen. But my question still > > > > > remains: what is the benefit of using the bpf approach over writing > > > > > these files? > > > > > > > > The code to trigger the error injection still needs to exist with both > > > > approaches. My understanding from debugfs and the comment from Luis earlier in > > > > the thread is that with eBPF you can mitigate how intrusive in-kernel code > > > > changes are to allow error injection. Luis added the example here [1] for > > > > debugfs. > > > > > > > > [1] https://lore.kernel.org/all/20210512064629.13899-9-mcgrof@kernel.org/ > > > > > > > > To compare patch 8 in [1] with eBPF approach: the patch describes > > > > all the necessary changes required to allow error injection on the > > > > add_disk() path. With eBPF one would simply annotate the function(s) with > > > > ALLOW_ERROR_INJECTION(), e.g. device_add() and replace the return value > > > > in eBPF code with bpf_override_return() as implemented in moderr tool for > > > > module_enable_rdata_after_init() for example. > > > > > > but that is all that we need with the fail_function in debugfs too: > > > > > > $ git grep ALLOW_ERROR_INJECTION -- drivers/gpu/drm/xe > > > drivers/gpu/drm/xe/xe_device.c:ALLOW_ERROR_INJECTION(xe_device_create, ERRNO); /* See xe_pci_probe() */ > > > drivers/gpu/drm/xe/xe_device.c:ALLOW_ERROR_INJECTION(wait_for_lmem_ready, ERRNO); /* See xe_pci_probe() */ > > > drivers/gpu/drm/xe/xe_exec_queue.c:ALLOW_ERROR_INJECTION(xe_exec_queue_create_bind, ERRNO); > > > drivers/gpu/drm/xe/xe_ggtt.c:ALLOW_ERROR_INJECTION(xe_ggtt_init_early, ERRNO); /* See xe_pci_probe() */ > > > drivers/gpu/drm/xe/xe_guc_ads.c:ALLOW_ERROR_INJECTION(xe_guc_ads_init, ERRNO); /* See xe_pci_probe() */ > > > drivers/gpu/drm/xe/xe_guc_ct.c:ALLOW_ERROR_INJECTION(xe_guc_ct_init, ERRNO); /* See xe_pci_probe() */ > > > drivers/gpu/drm/xe/xe_guc_log.c:ALLOW_ERROR_INJECTION(xe_guc_log_init, ERRNO); /* See xe_pci_probe() */ > > > drivers/gpu/drm/xe/xe_guc_relay.c:ALLOW_ERROR_INJECTION(xe_guc_relay_init, ERRNO); /* See xe_pci_probe() */ > > > drivers/gpu/drm/xe/xe_pci.c: * ALLOW_ERROR_INJECTION() is used to conditionally skip function execution > > > drivers/gpu/drm/xe/xe_pci.c: * ALLOW_ERROR_INJECTION() macro but this is acceptable because for those > > > drivers/gpu/drm/xe/xe_pm.c:ALLOW_ERROR_INJECTION(xe_pm_init_early, ERRNO); /* See xe_pci_probe() */ > > > drivers/gpu/drm/xe/xe_pt.c:ALLOW_ERROR_INJECTION(xe_pt_create, ERRNO); > > > drivers/gpu/drm/xe/xe_pt.c:ALLOW_ERROR_INJECTION(xe_pt_update_ops_prepare, ERRNO); > > > drivers/gpu/drm/xe/xe_pt.c:ALLOW_ERROR_INJECTION(xe_pt_update_ops_run, ERRNO); > > > drivers/gpu/drm/xe/xe_sriov.c:ALLOW_ERROR_INJECTION(xe_sriov_init, ERRNO); /* See xe_pci_probe() */ > > > drivers/gpu/drm/xe/xe_sync.c:ALLOW_ERROR_INJECTION(xe_sync_entry_parse, ERRNO); > > > drivers/gpu/drm/xe/xe_tile.c:ALLOW_ERROR_INJECTION(xe_tile_init_early, ERRNO); /* See xe_pci_probe() */ > > > drivers/gpu/drm/xe/xe_uc_fw.c:ALLOW_ERROR_INJECTION(xe_uc_fw_init, ERRNO); /* See xe_pci_probe() */ > > > drivers/gpu/drm/xe/xe_vm.c:ALLOW_ERROR_INJECTION(xe_vma_ops_alloc, ERRNO); > > > drivers/gpu/drm/xe/xe_vm.c:ALLOW_ERROR_INJECTION(xe_vm_create_scratch, ERRNO); > > > drivers/gpu/drm/xe/xe_vm.c:ALLOW_ERRORALLOW_ERROR_INJECTION_INJECTION(vm_bind_ioctl_ops_create, ERRNO); > > > drivers/gpu/drm/xe/xe_vm.c:ALLOW_ERROR_INJECTION(vm_bind_ioctl_ops_execute, ERRNO); > > > drivers/gpu/drm/xe/xe_wa.c:ALLOW_ERROR_INJECTION(xe_wa_init, ERRNO); /* See xe_pci_probe() */ > > > drivers/gpu/drm/xe/xe_wopcm.c:ALLOW_ERROR_INJECTION(xe_wopcm_init, ERRNO); /* See xe_pci_probe() */ > > > > > > That is different from the patch you are pointing to because that patch > > > is trying to add arbitrary/named error injection points throughout the > > > code. However via debugfs it's still possible to add error injection to > > > > When reading the patch I assumed the block/failure-injection.c was needed for > > the knobs in sysfs/debugfs. But I see I was wrong and these are only needed for > > the arbitrary error injection points? Correct, ALLOW_ERROR_INJECTION() is sufficient to make a function error injectable and controllable with the debugfs knobs. Arbitrary error injection points might also be added using only ALLOW_ERROR_INJECTION() and extra functions: static int should_X_fail(void) { return 0; } ALLOW_ERROR_INJECTION(should_X_fail, ERRNO); int foo(...) { ... if (should_X_fail()) { // error } ... } But this would introduce a lot of dummy code, which the block/failure-injection.c approach prevents. Until now, fault injection in xe has mostly been used to test error handling during probe and during IOCTLs calls. Errors are unconditionnally injected in each function one by one during tests. This is why probability, interval, times and space are configured to always trigger. Besides, function-level error injection points have been a good fit in the driver, hence the absence of arbitrary error injection points. Francois > > yeah. For working with ALLOW_ERROR_INJECTION() we have to refactor the > code so the functions follow its requirements. When that is true, then > we can simply use the fail_function/inject to trigger it. > > > > > I see mm/fail_page_alloc.c has a similar approach with should_fail_alloc_page(). > > > > > the beginning of a function by annotating that function with > > > ALLOW_ERROR_INJECTION. If a function is annotated with that, then if you > > > do e.g. > > > > > > echo xe_device_create > /sys/kernel/debug/fail_function/inject > > > > > > it will cause that function to fail. There are some additional files to > > > control _when_ that function should fail, but I'm failing to see a clear > > > benefit. See this example in the docs: > > > > Can you clarify if _when_ (in debugfs) allows you to access function arguments > > of a given annotated function with ALLOW_ERROR_INJECTION()? It seems that might > > be the only part that can be moved out of the kernel and handled in eBPF. Other > > than that, I don't see either a benefit of using one approach over the other. > > afaik we can't change the behavior based on arguments when using the > debugfs approach. > > Lucas De Marchi > > > > > > > > > Documentation/fault-injection/fault-injection.rst:- Inject open_ctree error while btrfs mount:: > > > > > > Lucas De Marchi > >
diff --git a/tools/bpf/Makefile b/tools/bpf/Makefile index 243b79f2b451e52ca196f79dc46befd1b3dab458..018cab5102e7e42b8b7b2749f4f463bf55c5119b 100644 --- a/tools/bpf/Makefile +++ b/tools/bpf/Makefile @@ -38,7 +38,7 @@ FEATURE_TESTS = libbfd disassembler-four-args disassembler-init-styled FEATURE_DISPLAY = libbfd check_feat := 1 -NON_CHECK_FEAT_TARGETS := clean bpftool_clean runqslower_clean resolve_btfids_clean +NON_CHECK_FEAT_TARGETS := clean bpftool_clean moderr_clean runqslower_clean resolve_btfids_clean ifdef MAKECMDGOALS ifeq ($(filter-out $(NON_CHECK_FEAT_TARGETS),$(MAKECMDGOALS)),) check_feat := 0 @@ -76,7 +76,7 @@ $(OUTPUT)%.lex.o: $(OUTPUT)%.lex.c PROGS = $(OUTPUT)bpf_jit_disasm $(OUTPUT)bpf_dbg $(OUTPUT)bpf_asm -all: $(PROGS) bpftool runqslower +all: $(PROGS) bpftool moderr runqslower $(OUTPUT)bpf_jit_disasm: CFLAGS += -DPACKAGE='bpf_jit_disasm' $(OUTPUT)bpf_jit_disasm: $(OUTPUT)bpf_jit_disasm.o @@ -92,7 +92,7 @@ $(OUTPUT)bpf_exp.lex.c: $(OUTPUT)bpf_exp.yacc.c $(OUTPUT)bpf_exp.yacc.o: $(OUTPUT)bpf_exp.yacc.c $(OUTPUT)bpf_exp.lex.o: $(OUTPUT)bpf_exp.lex.c -clean: bpftool_clean runqslower_clean resolve_btfids_clean +clean: bpftool_clean moderr_clean runqslower_clean resolve_btfids_clean $(call QUIET_CLEAN, bpf-progs) $(Q)$(RM) -r -- $(OUTPUT)*.o $(OUTPUT)bpf_jit_disasm $(OUTPUT)bpf_dbg \ $(OUTPUT)bpf_asm $(OUTPUT)bpf_exp.yacc.* $(OUTPUT)bpf_exp.lex.* @@ -118,6 +118,12 @@ bpftool_install: bpftool_clean: $(call descend,bpftool,clean) +moderr: + $(call descend,moderr) + +moderr_clean: + $(call descend,moderr,clean) + runqslower: $(call descend,runqslower) @@ -131,5 +137,6 @@ resolve_btfids_clean: $(call descend,resolve_btfids,clean) .PHONY: all install clean bpftool bpftool_install bpftool_clean \ + moderr moderr_clean \ runqslower runqslower_clean \ resolve_btfids resolve_btfids_clean diff --git a/tools/bpf/moderr/.gitignore b/tools/bpf/moderr/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..ffdb70230c8bc308efcc8b7d2084856e2225da91 --- /dev/null +++ b/tools/bpf/moderr/.gitignore @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +/.output diff --git a/tools/bpf/moderr/Makefile b/tools/bpf/moderr/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..e6331179f7800e6c1d1945ca713e34f74f7d805d --- /dev/null +++ b/tools/bpf/moderr/Makefile @@ -0,0 +1,95 @@ +# SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) +include ../../scripts/Makefile.include +include ../../scripts/Makefile.arch + +OUTPUT ?= $(abspath .output)/ + +BPFTOOL_OUTPUT := $(OUTPUT)bpftool/ +DEFAULT_BPFTOOL := $(BPFTOOL_OUTPUT)bootstrap/bpftool +BPFTOOL ?= $(DEFAULT_BPFTOOL) +LIBBPF_SRC := $(abspath ../../lib/bpf) +BPFOBJ_OUTPUT := $(OUTPUT)libbpf/ +BPFOBJ := $(BPFOBJ_OUTPUT)libbpf.a +BPF_DESTDIR := $(BPFOBJ_OUTPUT) +BPF_INCLUDE := $(BPF_DESTDIR)include +INCLUDES := -I$(OUTPUT) -I$(BPF_INCLUDE) -I$(abspath ../../include/uapi) +CFLAGS := -g -Wall $(CLANG_CROSS_FLAGS) +CFLAGS += $(EXTRA_CFLAGS) +LDFLAGS += $(EXTRA_LDFLAGS) +LDLIBS += -lelf -lz + +# Try to detect best kernel BTF source +KERNEL_REL := $(shell uname -r) +VMLINUX_BTF_PATHS := $(if $(O),$(O)/vmlinux) \ + $(if $(KBUILD_OUTPUT),$(KBUILD_OUTPUT)/vmlinux) \ + ../../../vmlinux /sys/kernel/btf/vmlinux \ + /boot/vmlinux-$(KERNEL_REL) +VMLINUX_BTF_PATH := $(or $(VMLINUX_BTF),$(firstword \ + $(wildcard $(VMLINUX_BTF_PATHS)))) + +ifeq ($(V),1) +Q = +else +Q = @ +MAKEFLAGS += --no-print-directory +submake_extras := feature_display=0 +endif + +.DELETE_ON_ERROR: + +.PHONY: all clean moderr libbpf_hdrs +all: moderr + +moderr: $(OUTPUT)moderr + +clean: + $(call QUIET_CLEAN, moderr) + $(Q)$(RM) -r $(BPFOBJ_OUTPUT) $(BPFTOOL_OUTPUT) + $(Q)$(RM) $(OUTPUT)*.o $(OUTPUT)*.d + $(Q)$(RM) $(OUTPUT)*.skel.h $(OUTPUT)vmlinux.h + $(Q)$(RM) $(OUTPUT)moderr + $(Q)$(RM) -r .output + +libbpf_hdrs: $(BPFOBJ) + +$(OUTPUT)moderr: $(OUTPUT)moderr.o $(BPFOBJ) + $(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $^ $(LDLIBS) -o $@ + +$(OUTPUT)moderr.o: moderr.h $(OUTPUT)moderr.skel.h \ + $(OUTPUT)moderr.bpf.o | libbpf_hdrs + +$(OUTPUT)moderr.bpf.o: $(OUTPUT)vmlinux.h moderr.h | libbpf_hdrs + +$(OUTPUT)%.skel.h: $(OUTPUT)%.bpf.o | $(BPFTOOL) + $(QUIET_GEN)$(BPFTOOL) gen skeleton $< > $@ + +$(OUTPUT)%.bpf.o: %.bpf.c $(BPFOBJ) | $(OUTPUT) + $(QUIET_GEN)$(CLANG) -g -O2 --target=bpf $(INCLUDES) \ + -D__TARGET_ARCH_$(SRCARCH) \ + -c $(filter %.c,$^) -o $@ && \ + $(LLVM_STRIP) -g $@ + +$(OUTPUT)%.o: %.c | $(OUTPUT) + $(QUIET_CC)$(CC) $(CFLAGS) $(INCLUDES) -c $(filter %.c,$^) -o $@ + +$(OUTPUT) $(BPFOBJ_OUTPUT) $(BPFTOOL_OUTPUT): + $(QUIET_MKDIR)mkdir -p $@ + +$(OUTPUT)vmlinux.h: $(VMLINUX_BTF_PATH) | $(OUTPUT) $(BPFTOOL) +ifeq ($(VMLINUX_H),) + $(Q)if [ ! -e "$(VMLINUX_BTF_PATH)" ] ; then \ + echo "Couldn't find kernel BTF; set VMLINUX_BTF to" \ + "specify its location." >&2; \ + exit 1;\ + fi + $(QUIET_GEN)$(BPFTOOL) btf dump file $(VMLINUX_BTF_PATH) format c > $@ +else + $(Q)cp "$(VMLINUX_H)" $@ +endif + +$(BPFOBJ): $(wildcard $(LIBBPF_SRC)/*.[ch] $(LIBBPF_SRC)/Makefile) | $(BPFOBJ_OUTPUT) + $(Q)$(MAKE) $(submake_extras) -C $(LIBBPF_SRC) OUTPUT=$(BPFOBJ_OUTPUT) \ + DESTDIR=$(BPFOBJ_OUTPUT) prefix= $(abspath $@) install_headers + +$(DEFAULT_BPFTOOL): | $(BPFTOOL_OUTPUT) + $(Q)$(MAKE) $(submake_extras) -C ../bpftool OUTPUT=$(BPFTOOL_OUTPUT) bootstrap diff --git a/tools/bpf/moderr/moderr.bpf.c b/tools/bpf/moderr/moderr.bpf.c new file mode 100644 index 0000000000000000000000000000000000000000..1c5d03336dd87a2f065ef6b608f077a8b988e5cf --- /dev/null +++ b/tools/bpf/moderr/moderr.bpf.c @@ -0,0 +1,127 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2025 Samsung */ +#include "vmlinux.h" +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> +#include <bpf/bpf_core_read.h> +#include "moderr.h" + +const volatile bool filter_modname = false; +const volatile char targ_modname[MODULE_NAME_LEN]; +const volatile bool set_errinj = false; +const volatile int targ_errinj = 0; +const volatile bool filter_modfunc = false; +const volatile int targ_modfunc = 0; + +char LICENSE[] SEC("license") = "GPL"; + +struct { + __uint(type, BPF_MAP_TYPE_RINGBUF); + __uint(max_entries, 2097152); +} rb SEC(".maps"); + +static __always_inline bool filter_module_name(struct module *mod) +{ + char modname[MODULE_NAME_LEN]; + + bpf_probe_read_str(modname, sizeof(modname), mod->name); + + if (!filter_modname || + filter_modname && bpf_strncmp(modname, MODULE_NAME_LEN, + (const char *)targ_modname) != 0) + return false; + + return true; +} + +static __always_inline bool filter_module_func(enum modfunc fc) +{ + if (!filter_modfunc || filter_modfunc && targ_modfunc != fc) + return false; + + return true; +} + +static __always_inline bool +generate_errinj_event(struct pt_regs *ctx, struct module *mod, enum modfunc fc) +{ + struct event *e; + + e = bpf_ringbuf_reserve(&rb, sizeof(*e), 0); + if (!e) + return false; + + e->err = 0; + e->func = fc; + bpf_probe_read_str(e->modname, sizeof(e->modname), mod->name); + + if (set_errinj) { + bpf_override_return(ctx, targ_errinj); + e->err = targ_errinj; + } + + bpf_ringbuf_submit(e, 0); + return true; +} + +static __always_inline bool generate_debug_event(struct pt_regs *ctx, + struct module *mod, + enum modfunc fc, + const char *fmt) +{ + struct event *e; + + e = bpf_ringbuf_reserve(&rb, sizeof(*e), 0); + if (!e) + return false; + + e->dbg = BPF_SNPRINTF(e->msg, sizeof(e->msg), "[%s:%s]: %s", mod->name, + modfunc_to_string(fc), fmt); + + bpf_ringbuf_submit(e, 0); + return true; +} + +static __always_inline int +module_error_injection(struct pt_regs *ctx, struct module *mod, enum modfunc fc) +{ + if (!filter_module_name(mod)) { + generate_debug_event(ctx, mod, fc, + "Target module does not match"); + return 0; + } + + if (!filter_module_func(fc)) { + generate_debug_event(ctx, mod, fc, + "Target function does not match"); + return 0; + } + + if (!generate_errinj_event(ctx, mod, fc)) { + generate_debug_event( + ctx, mod, fc, + "Error injection event cannot be generated"); + return 0; + } + + return 0; +} + +SEC("kprobe/complete_formation") +int BPF_KPROBE(complete_formation, struct module *mod, struct load_info *info) +{ + return module_error_injection(ctx, mod, COMPLETE_FORMATION); +} + +SEC("kprobe/do_init_module") +int BPF_KPROBE(do_init_module, struct module *mod, struct load_info *info) +{ + return module_error_injection(ctx, mod, DO_INIT_MODULE); +} + +SEC("kprobe/module_enable_rodata_ro_after_init") +int BPF_KPROBE(module_enable_rodata_ro_after_init, struct module *mod) +{ + return module_error_injection(ctx, mod, + MODULE_ENABLE_RODATA_AFTER_INIT); +} diff --git a/tools/bpf/moderr/moderr.c b/tools/bpf/moderr/moderr.c new file mode 100644 index 0000000000000000000000000000000000000000..dce18b02b55d1ad1f7e304cb49985d570b115aa4 --- /dev/null +++ b/tools/bpf/moderr/moderr.c @@ -0,0 +1,236 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2025 Samsung */ +#include <argp.h> +#include <signal.h> +#include <stdbool.h> +#include <stdio.h> +#include <bpf/libbpf.h> +#include <bpf/bpf.h> +#include <stdlib.h> +#include <string.h> +#include "moderr.h" +#include "moderr.skel.h" + +static struct env { + bool verbose; + char modname[MODULE_NAME_LEN]; + enum modfunc func; + bool trace; + int errinj; +} env; + +const char *argp_program_version = "moderr 0.1"; +const char *argp_program_bug_address = "<da.gomez@samsung.com>"; +const char argp_program_doc[] = +"BPF moderr application.\n" +"\n" +"It injects errors in module initialization\n" +"\nUSAGE: " +"moderr [-m <module_name>] [-f <function_name>] [-e <errno>]\n"; + +static volatile bool exiting = false; + +#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x))) + +static const struct argp_option opts[] = { + { "verbose", 'v', NULL, 0, "Verbose debug output" }, + { "trace", 't', NULL, 0, "Enable trace output", 0 }, + { "modname", 'm', "MODNAME", 0, "Trace this module name only", 0 }, + { "modfunc", 'f', "MODFUNC", 0, "Trace this module function only", 0 }, + { "list", 'l', NULL, 0, "List available module functions", 0 }, + { "error", 'e', "ERROR", 0, "Inject this error", 0 }, + { NULL, 'h', NULL, OPTION_HIDDEN, "Show the full help", 0 }, + {}, +}; + +static void help_modfunc(void) +{ + printf("\nAvailable modfunc options are:\n" + "- complete_formation\n" + "- do_init_module\n" + "- module_enable_rodata_ro_after_init\n\n"); +} + +static enum modfunc string_to_modfunc(char *arg) +{ + if (strncmp(arg, "complete_formation", strlen(arg)) == 0) + return COMPLETE_FORMATION; + + if (strncmp(arg, "do_init_module", strlen(arg)) == 0) + return DO_INIT_MODULE; + + if (strncmp(arg, "module_enable_rodata_ro_after_init", strlen(arg)) == + 0) + return MODULE_ENABLE_RODATA_AFTER_INIT; + + return UNKNOWN; +} + +static error_t parse_arg(int key, char *arg, struct argp_state *state) +{ + switch (key) { + case 'h': + argp_state_help(state, stderr, ARGP_HELP_STD_HELP); + break; + case 'l': + help_modfunc(); + argp_usage(state); + break; + case 'v': + env.verbose = true; + break; + case 'm': + if (strlen(arg) + 1 > MODULE_NAME_LEN) { + fprintf(stderr, "module name error\n"); + argp_usage(state); + } + strncpy(env.modname, arg, sizeof(env.modname) - 1); + break; + case 'f': + if (strlen(arg) + 1 > MODULE_FUNC_LEN) { + fprintf(stderr, "module function too long\n"); + argp_usage(state); + } + env.func = string_to_modfunc(arg); + if (!env.func) { + fprintf(stderr, "invalid module function\n"); + help_modfunc(); + argp_usage(state); + } + break; + case 'e': + env.errinj = atoi(arg); + break; + case 't': + env.trace = true; + break; + case ARGP_KEY_ARG: + argp_usage(state); + break; + default: + return ARGP_ERR_UNKNOWN; + } + return 0; +} + +static const struct argp argp = { + .options = opts, + .parser = parse_arg, + .doc = argp_program_doc, +}; + +static int libbpf_print_fn(enum libbpf_print_level level, const char *format, + va_list args) +{ + if (level == LIBBPF_DEBUG && !env.verbose) + return 0; + return vfprintf(stderr, format, args); +} + +static void sig_handler(int sig) +{ + exiting = true; +} + +static int handle_event(void *ctx, void *data, size_t data_sz) +{ + const struct event *e = data; + + if (!env.trace) + return 0; + + if (e->dbg) { + if (env.verbose) + printf("%s\n", e->msg); + return 0; + } + + printf("%-10s %-5d %-20s\n", e->modname, e->err, + modfunc_to_string(e->func)); + + return 0; +} + +int main(int argc, char **argv) +{ + struct ring_buffer *rb = NULL; + struct moderr_bpf *obj; + int err; + + err = argp_parse(&argp, argc, argv, 0, NULL, NULL); + if (err) + return err; + + if (!strlen(env.modname) || !env.func) { + fprintf(stderr, "missing arguments\n"); + return EXIT_FAILURE; + } + + libbpf_set_print(libbpf_print_fn); + + signal(SIGINT, sig_handler); + signal(SIGTERM, sig_handler); + + obj = moderr_bpf__open(); + if (!obj) { + fprintf(stderr, "failed to open and load BPF object\n"); + return 1; + } + + obj->rodata->filter_modname = true; + strncpy(obj->rodata->targ_modname, env.modname, MODULE_NAME_LEN - 1); + obj->rodata->targ_modname[MODULE_NAME_LEN - 1] = '\0'; + + obj->rodata->filter_modfunc = true; + obj->rodata->targ_modfunc = env.func; + + if (env.errinj) { + obj->rodata->set_errinj = true; + obj->rodata->targ_errinj = env.errinj; + } + + err = moderr_bpf__load(obj); + if (err) { + fprintf(stderr, "failed to load and verify BPF object\n"); + goto cleanup; + } + + err = moderr_bpf__attach(obj); + if (err) { + fprintf(stderr, "failed to attach BPF object\n"); + goto cleanup; + } + + printf("Monitoring module error injection... Hit Ctrl-C to end.\n"); + + rb = ring_buffer__new(bpf_map__fd(obj->maps.rb), handle_event, NULL, + NULL); + if (!rb) { + err = -1; + fprintf(stderr, "failed to create ring buffer\n"); + goto cleanup; + } + + if (env.trace) + printf("%-10s %-5s %-20s\n", "MODULE", "ERROR", "FUNCTION"); + + while (!exiting) { + err = ring_buffer__poll(rb, 100); + if (err == -EINTR) { + err = 0; + break; + } + if (err < 0) { + fprintf(stderr, "error polling ring buffer: %d\n", err); + break; + } + } + + printf("\n"); + +cleanup: + ring_buffer__free(rb); + moderr_bpf__destroy(obj); + + return err < 0 ? -err : 0; +} diff --git a/tools/bpf/moderr/moderr.h b/tools/bpf/moderr/moderr.h new file mode 100644 index 0000000000000000000000000000000000000000..e17440cf4bd5fe09b927cb83807a88f66861bba5 --- /dev/null +++ b/tools/bpf/moderr/moderr.h @@ -0,0 +1,40 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (c) 2025 Samsung */ +#ifndef __MODERR_H +#define __MODERR_H + +#define MAX_PARAM_PREFIX_LEN (64 - sizeof(unsigned long)) +#define MODULE_NAME_LEN MAX_PARAM_PREFIX_LEN +#define MODULE_FUNC_LEN 128 +#define MESSAGE_LEN 128 + +enum modfunc { + UNKNOWN, + COMPLETE_FORMATION = 1, + DO_INIT_MODULE, + MODULE_ENABLE_RODATA_AFTER_INIT, +}; + +struct event { + char modname[MODULE_NAME_LEN]; + int err; + int func; + char msg[MESSAGE_LEN]; + int dbg; +}; + +static inline const char *modfunc_to_string(enum modfunc fc) +{ + switch (fc) { + case COMPLETE_FORMATION: + return "complete_formation()"; + case DO_INIT_MODULE: + return "do_init_module()"; + case MODULE_ENABLE_RODATA_AFTER_INIT: + return "module_enable_rodata_after_init()"; + default: + return "unknown"; + } +} + +#endif /* __MODERR_H */
Add support for a module error injection tool. The tool can inject errors in the annotated module kernel functions such as complete_formation(), do_init_module() and module_enable_rodata_after_init(). Module name and module function are required parameters to have control over the error injection. Example: Inject error -22 to module_enable_rodata_ro_after_init for brd module: sudo moderr --modname=brd --modfunc=module_enable_rodata_ro_after_init \ --error=-22 --trace Monitoring module error injection... Hit Ctrl-C to end. MODULE ERROR FUNCTION brd -22 module_enable_rodata_after_init() Kernel messages: [ 89.463690] brd: module loaded [ 89.463855] brd: module_enable_rodata_ro_after_init() returned -22, ro_after_init data might still be writable Signed-off-by: Daniel Gomez <da.gomez@samsung.com> --- tools/bpf/Makefile | 13 ++- tools/bpf/moderr/.gitignore | 2 + tools/bpf/moderr/Makefile | 95 +++++++++++++++++ tools/bpf/moderr/moderr.bpf.c | 127 +++++++++++++++++++++++ tools/bpf/moderr/moderr.c | 236 ++++++++++++++++++++++++++++++++++++++++++ tools/bpf/moderr/moderr.h | 40 +++++++ 6 files changed, 510 insertions(+), 3 deletions(-)