mbox series

[RFC,0/3] drm/amd/display: Introduce KUnit to Display Mode Library

Message ID 20220608010709.272962-1-maira.canal@usp.br (mailing list archive)
Headers show
Series drm/amd/display: Introduce KUnit to Display Mode Library | expand

Message

Maíra Canal June 8, 2022, 1:07 a.m. UTC
This RFC is a preview of the work being developed by Isabella Basso [1],
Maíra Canal [2], and Tales Lelo [3], as part of their Google Summer of Code
projects [4], and Magali Lemes [5], as part of her capstone project.

Our main goal is to bring unit testing to the AMDPGU driver; in particular,
we'll focus on the Display Mode Library (DML) for DCN2.0 and some of the DCE
functions. The modern AMD Linux kernel graphics driver is the single largest
driver in the mainline Linux codebase [6]. As AMD releases new GPU models,
the size of AMDGPU drivers is only becoming even larger.

Assuring the drivers' quality and reliability becomes a complex task without
systematic testing, especially for graphic drivers - which usually involve
tons of complex calculations. Also, keeping bugs away becomes an increasingly
hard task with the introduction of new code. Moreover, developers might want
to refactor old code without fear of the introduction of new issues.

In that sense, it is possible to argue for the benefits of implementing unit
testing at the AMDGPU drivers. This implementation will help developers to
recognize bugs before they are merged into the mainline and also makes it
possible for future code refactors of the AMDGPU driver.

When analyzing the AMDGPU driver, a particular part of the driver highlights
itself as a good candidate for the implementation of unit tests: the Display
Mode Library (DML), as it is focused on mathematical operations.

For the implementation of the tests, we decided to go with the Kernel Unit
Testing Framework (KUnit). KUnit makes it possible to run test suites on
kernel boot or load the tests as a module. It reports all test case results
through a TAP (Test Anything Protocol) in the kernel log.

Moreover, KUnit unifies the test structure and provides tools to simplify the
testing for developers and CI systems.

That said, we developed a little snippet on what we intend to develop in our
summer. We planned the basic structure on how the tests will be introduced
into the codebase and, on the concern of the CI systems, developed a structure
where the unit tests can be introduced as modules and run on IGT (the IGT patch
will be introduced soon).

The way the modules are implemented might seem a little unusual for KUnit
developers. We need to call the KUnit init function inside the AMDGPU stack,
otherwise, the test won't compile as a module. So, the solution to this
problem was based on the unit tests for the Thunderbolt driver, which uses
KUnit and also tests a physical driver.

As kunit_test_suites() defines itself as an init_module(), it conflicts with
the existing one at amdgpu_drv. So, if we use kunit_test_suites(), we won't
be able to compile the tests as modules and, therefore, won't be able to use
IGT to run the tests. This problem with kunit_test_suites() was already
discussed in the KUnit mailing list, as can be seen in [7].

The first patch configures the basic structure of the KUnit Tests, setting the
proper Makefile, Kconfig, and init function. It also contains a simple test
involving DML logging, which is the pretext for building the testing structure.

The second patch adds KUnit tests to bw_fixed functions. This patch represents
what we intend to do on the rest of the DML modules: systematic testing of the
public functions of the DML, especially mathematically complicated functions.
Also, it shows how simple it is to add new tests to the DML with the structure
we built.

Any feedback or ideas for the project are welcome!

[1] https://crosscat.me
[2] https://mairacanal.github.io
[3] https://tales-aparecida.github.io/
[4] https://summerofcode.withgoogle.com/programs/2022/organizations/xorg-foundation
[5] https://magalilemes.github.io/
[6] https://www.phoronix.com/scan.php?page=news_item&px=AMDGPU-Closing-4-Million
[7] https://groups.google.com/g/kunit-dev/c/hbJbh8L37FU/m/EmszZE9qBAAJ

- Isabella Basso, Magali Lemes, Maíra Canal, and Tales Lelo

Magali Lemes (1):
  drm/amd/display: Introduce KUnit tests to the bw_fixed library

Maíra Canal (2):
  drm/amd/display: Introduce KUnit to DML
  drm/amd/display: Move bw_fixed macros to header file

 drivers/gpu/drm/amd/display/Kconfig           |   1 +
 .../gpu/drm/amd/display/amdgpu_dm/Makefile    |   5 +
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |   3 +
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |   3 +
 .../drm/amd/display/amdgpu_dm/tests/Kconfig   |  41 +++
 .../drm/amd/display/amdgpu_dm/tests/Makefile  |  18 +
 .../amdgpu_dm/tests/calcs/bw_fixed_test.c     | 322 ++++++++++++++++++
 .../amdgpu_dm/tests/display_mode_lib_test.c   |  83 +++++
 .../amd/display/amdgpu_dm/tests/dml_test.c    |  26 ++
 .../amd/display/amdgpu_dm/tests/dml_test.h    |  21 ++
 .../drm/amd/display/dc/dml/calcs/bw_fixed.c   |  14 +-
 drivers/gpu/drm/amd/display/dc/inc/bw_fixed.h |  14 +
 12 files changed, 538 insertions(+), 13 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/display/amdgpu_dm/tests/Kconfig
 create mode 100644 drivers/gpu/drm/amd/display/amdgpu_dm/tests/Makefile
 create mode 100644 drivers/gpu/drm/amd/display/amdgpu_dm/tests/calcs/bw_fixed_test.c
 create mode 100644 drivers/gpu/drm/amd/display/amdgpu_dm/tests/display_mode_lib_test.c
 create mode 100644 drivers/gpu/drm/amd/display/amdgpu_dm/tests/dml_test.c
 create mode 100644 drivers/gpu/drm/amd/display/amdgpu_dm/tests/dml_test.h

Comments

David Gow June 16, 2022, 2:39 p.m. UTC | #1
On Wed, Jun 8, 2022 at 9:08 AM Maíra Canal <maira.canal@usp.br> wrote:
>
> This RFC is a preview of the work being developed by Isabella Basso [1],
> Maíra Canal [2], and Tales Lelo [3], as part of their Google Summer of Code
> projects [4], and Magali Lemes [5], as part of her capstone project.
>
> Our main goal is to bring unit testing to the AMDPGU driver; in particular,
> we'll focus on the Display Mode Library (DML) for DCN2.0 and some of the DCE
> functions. The modern AMD Linux kernel graphics driver is the single largest
> driver in the mainline Linux codebase [6]. As AMD releases new GPU models,
> the size of AMDGPU drivers is only becoming even larger.
>
> Assuring the drivers' quality and reliability becomes a complex task without
> systematic testing, especially for graphic drivers - which usually involve
> tons of complex calculations. Also, keeping bugs away becomes an increasingly
> hard task with the introduction of new code. Moreover, developers might want
> to refactor old code without fear of the introduction of new issues.
>
> In that sense, it is possible to argue for the benefits of implementing unit
> testing at the AMDGPU drivers. This implementation will help developers to
> recognize bugs before they are merged into the mainline and also makes it
> possible for future code refactors of the AMDGPU driver.
>
> When analyzing the AMDGPU driver, a particular part of the driver highlights
> itself as a good candidate for the implementation of unit tests: the Display
> Mode Library (DML), as it is focused on mathematical operations.
>
> For the implementation of the tests, we decided to go with the Kernel Unit
> Testing Framework (KUnit). KUnit makes it possible to run test suites on
> kernel boot or load the tests as a module. It reports all test case results
> through a TAP (Test Anything Protocol) in the kernel log.
>
> Moreover, KUnit unifies the test structure and provides tools to simplify the
> testing for developers and CI systems.
>
> That said, we developed a little snippet on what we intend to develop in our
> summer. We planned the basic structure on how the tests will be introduced
> into the codebase and, on the concern of the CI systems, developed a structure
> where the unit tests can be introduced as modules and run on IGT (the IGT patch
> will be introduced soon).

It's awesome to see this! It's definitely one of the more ambitious
KUnit test projects out there, and the DML code does seem particularly
well suited to unit-testings.

> The way the modules are implemented might seem a little unusual for KUnit
> developers. We need to call the KUnit init function inside the AMDGPU stack,
> otherwise, the test won't compile as a module. So, the solution to this
> problem was based on the unit tests for the Thunderbolt driver, which uses
> KUnit and also tests a physical driver.
>
> As kunit_test_suites() defines itself as an init_module(), it conflicts with
> the existing one at amdgpu_drv. So, if we use kunit_test_suites(), we won't
> be able to compile the tests as modules and, therefore, won't be able to use
> IGT to run the tests. This problem with kunit_test_suites() was already
> discussed in the KUnit mailing list, as can be seen in [7].

I'm not sure I fully understand why these tests need to be part of the
amdgpu module, though admittedly I've not played with IGT much. Would
it be possible to compile these tests as separate modules, which could
depend on amdgpu (or maybe include the DML stuff directly), and
therefore not have this conflict? I definitely was able to get these
tests working under kunit_tool (albeit as built-ins) by using
kunit_test_suites(). If each suite were built as a separate module (or
indeed, even if all the tests were in one module, with one list of
suites), then it should be possible to avoid the init_module()
conflict. That'd also make it possible to run these tests without
actually needing the driver to initialise, which seems like it might
require actual hardware(?)

There are two other reasons the 'thunderbolt'-style technique is one
we want to avoid:
1. It makes it much more difficult to run tests using kunit_tool and
KUnit-based CI tools: these tests would not run automatically, and if
they were built-in as-is, they'd need to be
2. We're planning to improve module support to replace the
init_module()-based implementation of kunit_test_suites() with one
which won't have these conflicts, so the need for this should be
short-lived.

If you're curious, an early version of the improved module support can
be found here, though it's out-of-date enough it won't apply or work
as-is:
https://lore.kernel.org/all/101d12fc9250b7a445ff50a9e7a25cd74d0e16eb.camel@codeconstruct.com.au/

Now, that's unlikely to be ready very soon, but I'd be hesitant to
implement too extensive a system for avoiding kunit_test_suites()
given at some point it should work and we'll need to migrate back to
it.

At the very least, having the dependency on KUNIT=m is a very bad
idea: it should be possible to have tests built as modules, even if
KUnit itself isn't, and ideally (even if this sort-of implementation
is required), it _should_ be possible to have these tests be built-in
if all their dependencies (KUnit, amdgpu) are, which would make it
possible to run the tests without a userland.

That being said, I've got basically no knowledge of amdgpu (or even
drm in general), so there could be something I'm missing.

>
> The first patch configures the basic structure of the KUnit Tests, setting the
> proper Makefile, Kconfig, and init function. It also contains a simple test
> involving DML logging, which is the pretext for building the testing structure.
>
> The second patch adds KUnit tests to bw_fixed functions. This patch represents
> what we intend to do on the rest of the DML modules: systematic testing of the
> public functions of the DML, especially mathematically complicated functions.
> Also, it shows how simple it is to add new tests to the DML with the structure
> we built.
>
> Any feedback or ideas for the project are welcome!
>
Looks great to me so far: I'll try to get a more detailed review in soon.

Cheers,
-- David

> [1] https://crosscat.me
> [2] https://mairacanal.github.io
> [3] https://tales-aparecida.github.io/
> [4] https://summerofcode.withgoogle.com/programs/2022/organizations/xorg-foundation
> [5] https://magalilemes.github.io/
> [6] https://www.phoronix.com/scan.php?page=news_item&px=AMDGPU-Closing-4-Million
> [7] https://groups.google.com/g/kunit-dev/c/hbJbh8L37FU/m/EmszZE9qBAAJ
>
> - Isabella Basso, Magali Lemes, Maíra Canal, and Tales Lelo
>
> Magali Lemes (1):
>   drm/amd/display: Introduce KUnit tests to the bw_fixed library
>
> Maíra Canal (2):
>   drm/amd/display: Introduce KUnit to DML
>   drm/amd/display: Move bw_fixed macros to header file
>
>  drivers/gpu/drm/amd/display/Kconfig           |   1 +
>  .../gpu/drm/amd/display/amdgpu_dm/Makefile    |   5 +
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |   3 +
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |   3 +
>  .../drm/amd/display/amdgpu_dm/tests/Kconfig   |  41 +++
>  .../drm/amd/display/amdgpu_dm/tests/Makefile  |  18 +
>  .../amdgpu_dm/tests/calcs/bw_fixed_test.c     | 322 ++++++++++++++++++
>  .../amdgpu_dm/tests/display_mode_lib_test.c   |  83 +++++
>  .../amd/display/amdgpu_dm/tests/dml_test.c    |  26 ++
>  .../amd/display/amdgpu_dm/tests/dml_test.h    |  21 ++
>  .../drm/amd/display/dc/dml/calcs/bw_fixed.c   |  14 +-
>  drivers/gpu/drm/amd/display/dc/inc/bw_fixed.h |  14 +
>  12 files changed, 538 insertions(+), 13 deletions(-)
>  create mode 100644 drivers/gpu/drm/amd/display/amdgpu_dm/tests/Kconfig
>  create mode 100644 drivers/gpu/drm/amd/display/amdgpu_dm/tests/Makefile
>  create mode 100644 drivers/gpu/drm/amd/display/amdgpu_dm/tests/calcs/bw_fixed_test.c
>  create mode 100644 drivers/gpu/drm/amd/display/amdgpu_dm/tests/display_mode_lib_test.c
>  create mode 100644 drivers/gpu/drm/amd/display/amdgpu_dm/tests/dml_test.c
>  create mode 100644 drivers/gpu/drm/amd/display/amdgpu_dm/tests/dml_test.h
>
> --
> 2.36.1
>
Maíra Canal June 16, 2022, 10:41 p.m. UTC | #2
Hi David,

Thank you for your feedback!

On 6/16/22 11:39, David Gow wrote:
> On Wed, Jun 8, 2022 at 9:08 AM Maíra Canal <maira.canal@usp.br> wrote:

>>
>> As kunit_test_suites() defines itself as an init_module(), it conflicts with
>> the existing one at amdgpu_drv. So, if we use kunit_test_suites(), we won't
>> be able to compile the tests as modules and, therefore, won't be able to use
>> IGT to run the tests. This problem with kunit_test_suites() was already
>> discussed in the KUnit mailing list, as can be seen in [7].
> 
> I'm not sure I fully understand why these tests need to be part of the
> amdgpu module, though admittedly I've not played with IGT much. Would
> it be possible to compile these tests as separate modules, which could
> depend on amdgpu (or maybe include the DML stuff directly), and
> therefore not have this conflict? I definitely was able to get these
> tests working under kunit_tool (albeit as built-ins) by using
> kunit_test_suites(). If each suite were built as a separate module (or
> indeed, even if all the tests were in one module, with one list of
> suites), then it should be possible to avoid the init_module()
> conflict. That'd also make it possible to run these tests without
> actually needing the driver to initialise, which seems like it might
> require actual hardware(?)

Initially, we tried the kunit_test_suites() approach. And it did work pretty well for the kunit_tool (although we didn't test any hardware-specific unit test). But when compiling the test as a module, we would get a linking error, pointing out multiple definitions of 'init_module'/'cleanup_module' at kunit_test_suites().

At this point, we thought about a couple of options to resolve this problem:
- Add EXPORT_SYMBOL to the functions we would test. But, this doesn't scale pretty well, because it would pollute AMDGPU code as the tests expand.
- Take the Thunderbolt path and add the tests to the driver stack.

We end up taking the Thunderbolt path as it would be more maintainable.

Compiling the tests as a module is essential to make the tests run at IGT, as IGT essentially loads the module, runs it, and parses the output (a very very simplified explanation of what IGT does). IGT is a very known tool for DRI developers, so we believe that IGT support is crucial for this project.

If you have any other options on how to make the module compilation viable without using the 'thunderbolt'-style, we would be glad to hear your suggestions.

> 
> There are two other reasons the 'thunderbolt'-style technique is one
> we want to avoid:
> 1. It makes it much more difficult to run tests using kunit_tool and
> KUnit-based CI tools: these tests would not run automatically, and if
> they were built-in as-is, they'd need to be
> 2. We're planning to improve module support to replace the
> init_module()-based implementation of kunit_test_suites() with one
> which won't have these conflicts, so the need for this should be
> short-lived.
> 
> If you're curious, an early version of the improved module support can
> be found here, though it's out-of-date enough it won't apply or work
> as-is:
> https://lore.kernel.org/all/101d12fc9250b7a445ff50a9e7a25cd74d0e16eb.camel@codeconstruct.com.au/
> 
> Now, that's unlikely to be ready very soon, but I'd be hesitant to
> implement too extensive a system for avoiding kunit_test_suites()
> given at some point it should work and we'll need to migrate back to
> it.

We hope to see in the near future the improved module support from KUnit as it would make the addition of tests much more simple and clean.

Could you explain more about what is missing to make this improved module support come upstream?

> 
> At the very least, having the dependency on KUNIT=m is a very bad
> idea: it should be possible to have tests built as modules, even if
> KUnit itself isn't, and ideally (even if this sort-of implementation
> is required), it _should_ be possible to have these tests be built-in
> if all their dependencies (KUnit, amdgpu) are, which would make it
> possible to run the tests without a userland.
> 

Thank you for the suggestion! We will change the KUNIT dependency.

- Maíra Canal
David Gow June 17, 2022, 7:55 a.m. UTC | #3
On Fri, Jun 17, 2022 at 6:41 AM Maíra Canal <maira.canal@usp.br> wrote:
>
> Hi David,
>
> Thank you for your feedback!
>
> On 6/16/22 11:39, David Gow wrote:
> > On Wed, Jun 8, 2022 at 9:08 AM Maíra Canal <maira.canal@usp.br> wrote:
>
> >>
> >> As kunit_test_suites() defines itself as an init_module(), it conflicts with
> >> the existing one at amdgpu_drv. So, if we use kunit_test_suites(), we won't
> >> be able to compile the tests as modules and, therefore, won't be able to use
> >> IGT to run the tests. This problem with kunit_test_suites() was already
> >> discussed in the KUnit mailing list, as can be seen in [7].
> >
> > I'm not sure I fully understand why these tests need to be part of the
> > amdgpu module, though admittedly I've not played with IGT much. Would
> > it be possible to compile these tests as separate modules, which could
> > depend on amdgpu (or maybe include the DML stuff directly), and
> > therefore not have this conflict? I definitely was able to get these
> > tests working under kunit_tool (albeit as built-ins) by using
> > kunit_test_suites(). If each suite were built as a separate module (or
> > indeed, even if all the tests were in one module, with one list of
> > suites), then it should be possible to avoid the init_module()
> > conflict. That'd also make it possible to run these tests without
> > actually needing the driver to initialise, which seems like it might
> > require actual hardware(?)
>
> Initially, we tried the kunit_test_suites() approach. And it did work pretty well for the kunit_tool (although we didn't test any hardware-specific unit test). But when compiling the test as a module, we would get a linking error, pointing out multiple definitions of 'init_module'/'cleanup_module' at kunit_test_suites().
>
> At this point, we thought about a couple of options to resolve this problem:
> - Add EXPORT_SYMBOL to the functions we would test. But, this doesn't scale pretty well, because it would pollute AMDGPU code as the tests expand.
> - Take the Thunderbolt path and add the tests to the driver stack.
>
> We end up taking the Thunderbolt path as it would be more maintainable.
>
> Compiling the tests as a module is essential to make the tests run at IGT, as IGT essentially loads the module, runs it, and parses the output (a very very simplified explanation of what IGT does). IGT is a very known tool for DRI developers, so we believe that IGT support is crucial for this project.
>
> If you have any other options on how to make the module compilation viable without using the 'thunderbolt'-style, we would be glad to hear your suggestions.

As you point out, there are really two separate problems with
splitting the tests out totally:
- It's ugly and pollutes the symbol namespace to have EXPORT_SYMBOL()
everywhere.
- It's impossible to have multiple init_module() "calls" in the same module.

The first of these is, I think, the harder to solve generally. (There
are some ways to mitigate the namespace pollution part of it by either
hiding the EXPORT_SYMBOL() directives behind #ifdef CONFIG_KUNIT or
similar, or by using symbol namespaces:
https://www.kernel.org/doc/html/latest/core-api/symbol-namespaces.html
-- or both -- but they don't solve the issue entirely.)

That being said, it's as much a matter of taste as anything, so if
keeping things in the amdgpu module works well, don't let me stop you.
Either way should work, and have their own advantages and
disadvantages.

The latter is just a quirk of the current KUnit implementation of
kunit_test_suites(). This multiple-definition issue will go away in
the not-too-distant future.

So my suggestion here would be to make sure any changes you make to
work around the issue with multiple init_module definitions are easy
to remove. I suspect you could probably significantly simplify the
whole dml_test.{c,h} bit to just directly export the kunit_suites and
maybe throw them all in one array to pass to
__kunit_test_suites_init(). Then, when the improved modules work
lands, they could be deleted entirely and replaced with one or more
calls to kunit_test_suite().

> >
> > There are two other reasons the 'thunderbolt'-style technique is one
> > we want to avoid:
> > 1. It makes it much more difficult to run tests using kunit_tool and
> > KUnit-based CI tools: these tests would not run automatically, and if
> > they were built-in as-is, they'd need to be
> > 2. We're planning to improve module support to replace the
> > init_module()-based implementation of kunit_test_suites() with one
> > which won't have these conflicts, so the need for this should be
> > short-lived.
> >
> > If you're curious, an early version of the improved module support can
> > be found here, though it's out-of-date enough it won't apply or work
> > as-is:
> > https://lore.kernel.org/all/101d12fc9250b7a445ff50a9e7a25cd74d0e16eb.camel@codeconstruct.com.au/
> >
> > Now, that's unlikely to be ready very soon, but I'd be hesitant to
> > implement too extensive a system for avoiding kunit_test_suites()
> > given at some point it should work and we'll need to migrate back to
> > it.
>
> We hope to see in the near future the improved module support from KUnit as it would make the addition of tests much more simple and clean.
>
> Could you explain more about what is missing to make this improved module support come upstream?
>

Mostly just time and some other priorities. We've taken another look
at it over the last couple of days, and will try to accelerate getting
it in within the next couple of kernel releases. (Hopefully sooner
rather than later.)

> >
> > At the very least, having the dependency on KUNIT=m is a very bad
> > idea: it should be possible to have tests built as modules, even if
> > KUnit itself isn't, and ideally (even if this sort-of implementation
> > is required), it _should_ be possible to have these tests be built-in
> > if all their dependencies (KUnit, amdgpu) are, which would make it
> > possible to run the tests without a userland.
> >
>
> Thank you for the suggestion! We will change the KUNIT dependency.

Thanks -- with that gone, the tests do build for me under kunit_tool,
though I can't seem to get them to run as-is. (Moving the call to
amdgpu_dml_test_init() into the amdgpu_init() function in amdgpu_drv.c
does work, though I'm not sure if that's a sufficiently correct /
viable solution.)


Cheers,
-- David
Maíra Canal June 17, 2022, 8:24 p.m. UTC | #4
On 6/17/22 04:55, David Gow wrote:
> On Fri, Jun 17, 2022 at 6:41 AM Maíra Canal <maira.canal@usp.br> wrote:
>>
>> Hi David,
>>
>> Thank you for your feedback!
>>
>> On 6/16/22 11:39, David Gow wrote:
>>> On Wed, Jun 8, 2022 at 9:08 AM Maíra Canal <maira.canal@usp.br> wrote:
>>
>>>>
>>>> As kunit_test_suites() defines itself as an init_module(), it conflicts with
>>>> the existing one at amdgpu_drv. So, if we use kunit_test_suites(), we won't
>>>> be able to compile the tests as modules and, therefore, won't be able to use
>>>> IGT to run the tests. This problem with kunit_test_suites() was already
>>>> discussed in the KUnit mailing list, as can be seen in [7].
>>>
>>> I'm not sure I fully understand why these tests need to be part of the
>>> amdgpu module, though admittedly I've not played with IGT much. Would
>>> it be possible to compile these tests as separate modules, which could
>>> depend on amdgpu (or maybe include the DML stuff directly), and
>>> therefore not have this conflict? I definitely was able to get these
>>> tests working under kunit_tool (albeit as built-ins) by using
>>> kunit_test_suites(). If each suite were built as a separate module (or
>>> indeed, even if all the tests were in one module, with one list of
>>> suites), then it should be possible to avoid the init_module()
>>> conflict. That'd also make it possible to run these tests without
>>> actually needing the driver to initialise, which seems like it might
>>> require actual hardware(?)
>>
>> Initially, we tried the kunit_test_suites() approach. And it did work pretty well for the kunit_tool (although we didn't test any hardware-specific unit test). But when compiling the test as a module, we would get a linking error, pointing out multiple definitions of 'init_module'/'cleanup_module' at kunit_test_suites().
>>
>> At this point, we thought about a couple of options to resolve this problem:
>> - Add EXPORT_SYMBOL to the functions we would test. But, this doesn't scale pretty well, because it would pollute AMDGPU code as the tests expand.
>> - Take the Thunderbolt path and add the tests to the driver stack.
>>
>> We end up taking the Thunderbolt path as it would be more maintainable.
>>
>> Compiling the tests as a module is essential to make the tests run at IGT, as IGT essentially loads the module, runs it, and parses the output (a very very simplified explanation of what IGT does). IGT is a very known tool for DRI developers, so we believe that IGT support is crucial for this project.
>>
>> If you have any other options on how to make the module compilation viable without using the 'thunderbolt'-style, we would be glad to hear your suggestions.
> 
> As you point out, there are really two separate problems with
> splitting the tests out totally:
> - It's ugly and pollutes the symbol namespace to have EXPORT_SYMBOL()
> everywhere.
> - It's impossible to have multiple init_module() "calls" in the same module.
> 
> The first of these is, I think, the harder to solve generally. (There
> are some ways to mitigate the namespace pollution part of it by either
> hiding the EXPORT_SYMBOL() directives behind #ifdef CONFIG_KUNIT or
> similar, or by using symbol namespaces:
> https://www.kernel.org/doc/html/latest/core-api/symbol-namespaces.html
> -- or both -- but they don't solve the issue entirely.)
> 
> That being said, it's as much a matter of taste as anything, so if
> keeping things in the amdgpu module works well, don't let me stop you.
> Either way should work, and have their own advantages and
> disadvantages.
> 
> The latter is just a quirk of the current KUnit implementation of
> kunit_test_suites(). This multiple-definition issue will go away in
> the not-too-distant future.
> 
> So my suggestion here would be to make sure any changes you make to
> work around the issue with multiple init_module definitions are easy
> to remove. I suspect you could probably significantly simplify the
> whole dml_test.{c,h} bit to just directly export the kunit_suites and
> maybe throw them all in one array to pass to
> __kunit_test_suites_init(). Then, when the improved modules work
> lands, they could be deleted entirely and replaced with one or more
> calls to kunit_test_suite().
> 
>>>
>>> There are two other reasons the 'thunderbolt'-style technique is one
>>> we want to avoid:
>>> 1. It makes it much more difficult to run tests using kunit_tool and
>>> KUnit-based CI tools: these tests would not run automatically, and if
>>> they were built-in as-is, they'd need to be
>>> 2. We're planning to improve module support to replace the
>>> init_module()-based implementation of kunit_test_suites() with one
>>> which won't have these conflicts, so the need for this should be
>>> short-lived.
>>>
>>> If you're curious, an early version of the improved module support can
>>> be found here, though it's out-of-date enough it won't apply or work
>>> as-is:
>>> https://lore.kernel.org/all/101d12fc9250b7a445ff50a9e7a25cd74d0e16eb.camel@codeconstruct.com.au/
>>>
>>> Now, that's unlikely to be ready very soon, but I'd be hesitant to
>>> implement too extensive a system for avoiding kunit_test_suites()
>>> given at some point it should work and we'll need to migrate back to
>>> it.
>>
>> We hope to see in the near future the improved module support from KUnit as it would make the addition of tests much more simple and clean.
>>
>> Could you explain more about what is missing to make this improved module support come upstream?
>>
> 
> Mostly just time and some other priorities. We've taken another look
> at it over the last couple of days, and will try to accelerate getting
> it in within the next couple of kernel releases. (Hopefully sooner
> rather than later.)
Is there anything we can do to make this move faster? As it is our great
interest to make this work in KUnit, maybe I, Isabella, Tales, or Magali
can start work on this feature. We don´t have much knowledge of the
inner workings of KUnit, but if you point out a path, we can try to work
on this task.

Maybe, could we work in the same way as Jeremy?
> 
> Cheers,
> -- David

- Maíra Canal
David Gow June 18, 2022, 9:08 a.m. UTC | #5
On Sat, Jun 18, 2022 at 4:24 AM Maíra Canal <maira.canal@usp.br> wrote:
>
> On 6/17/22 04:55, David Gow wrote:
> > On Fri, Jun 17, 2022 at 6:41 AM Maíra Canal <maira.canal@usp.br> wrote:
> >>
> >> Hi David,
> >>
> >> Thank you for your feedback!
> >>
> >> On 6/16/22 11:39, David Gow wrote:
> >>> On Wed, Jun 8, 2022 at 9:08 AM Maíra Canal <maira.canal@usp.br> wrote:
> >>
> >>>>
> >>>> As kunit_test_suites() defines itself as an init_module(), it conflicts with
> >>>> the existing one at amdgpu_drv. So, if we use kunit_test_suites(), we won't
> >>>> be able to compile the tests as modules and, therefore, won't be able to use
> >>>> IGT to run the tests. This problem with kunit_test_suites() was already
> >>>> discussed in the KUnit mailing list, as can be seen in [7].
> >>>
> >>> I'm not sure I fully understand why these tests need to be part of the
> >>> amdgpu module, though admittedly I've not played with IGT much. Would
> >>> it be possible to compile these tests as separate modules, which could
> >>> depend on amdgpu (or maybe include the DML stuff directly), and
> >>> therefore not have this conflict? I definitely was able to get these
> >>> tests working under kunit_tool (albeit as built-ins) by using
> >>> kunit_test_suites(). If each suite were built as a separate module (or
> >>> indeed, even if all the tests were in one module, with one list of
> >>> suites), then it should be possible to avoid the init_module()
> >>> conflict. That'd also make it possible to run these tests without
> >>> actually needing the driver to initialise, which seems like it might
> >>> require actual hardware(?)
> >>
> >> Initially, we tried the kunit_test_suites() approach. And it did work pretty well for the kunit_tool (although we didn't test any hardware-specific unit test). But when compiling the test as a module, we would get a linking error, pointing out multiple definitions of 'init_module'/'cleanup_module' at kunit_test_suites().
> >>
> >> At this point, we thought about a couple of options to resolve this problem:
> >> - Add EXPORT_SYMBOL to the functions we would test. But, this doesn't scale pretty well, because it would pollute AMDGPU code as the tests expand.
> >> - Take the Thunderbolt path and add the tests to the driver stack.
> >>
> >> We end up taking the Thunderbolt path as it would be more maintainable.
> >>
> >> Compiling the tests as a module is essential to make the tests run at IGT, as IGT essentially loads the module, runs it, and parses the output (a very very simplified explanation of what IGT does). IGT is a very known tool for DRI developers, so we believe that IGT support is crucial for this project.
> >>
> >> If you have any other options on how to make the module compilation viable without using the 'thunderbolt'-style, we would be glad to hear your suggestions.
> >
> > As you point out, there are really two separate problems with
> > splitting the tests out totally:
> > - It's ugly and pollutes the symbol namespace to have EXPORT_SYMBOL()
> > everywhere.
> > - It's impossible to have multiple init_module() "calls" in the same module.
> >
> > The first of these is, I think, the harder to solve generally. (There
> > are some ways to mitigate the namespace pollution part of it by either
> > hiding the EXPORT_SYMBOL() directives behind #ifdef CONFIG_KUNIT or
> > similar, or by using symbol namespaces:
> > https://www.kernel.org/doc/html/latest/core-api/symbol-namespaces.html
> > -- or both -- but they don't solve the issue entirely.)
> >
> > That being said, it's as much a matter of taste as anything, so if
> > keeping things in the amdgpu module works well, don't let me stop you.
> > Either way should work, and have their own advantages and
> > disadvantages.
> >
> > The latter is just a quirk of the current KUnit implementation of
> > kunit_test_suites(). This multiple-definition issue will go away in
> > the not-too-distant future.
> >
> > So my suggestion here would be to make sure any changes you make to
> > work around the issue with multiple init_module definitions are easy
> > to remove. I suspect you could probably significantly simplify the
> > whole dml_test.{c,h} bit to just directly export the kunit_suites and
> > maybe throw them all in one array to pass to
> > __kunit_test_suites_init(). Then, when the improved modules work
> > lands, they could be deleted entirely and replaced with one or more
> > calls to kunit_test_suite().
> >
> >>>
> >>> There are two other reasons the 'thunderbolt'-style technique is one
> >>> we want to avoid:
> >>> 1. It makes it much more difficult to run tests using kunit_tool and
> >>> KUnit-based CI tools: these tests would not run automatically, and if
> >>> they were built-in as-is, they'd need to be
> >>> 2. We're planning to improve module support to replace the
> >>> init_module()-based implementation of kunit_test_suites() with one
> >>> which won't have these conflicts, so the need for this should be
> >>> short-lived.
> >>>
> >>> If you're curious, an early version of the improved module support can
> >>> be found here, though it's out-of-date enough it won't apply or work
> >>> as-is:
> >>> https://lore.kernel.org/all/101d12fc9250b7a445ff50a9e7a25cd74d0e16eb.camel@codeconstruct.com.au/
> >>>
> >>> Now, that's unlikely to be ready very soon, but I'd be hesitant to
> >>> implement too extensive a system for avoiding kunit_test_suites()
> >>> given at some point it should work and we'll need to migrate back to
> >>> it.
> >>
> >> We hope to see in the near future the improved module support from KUnit as it would make the addition of tests much more simple and clean.
> >>
> >> Could you explain more about what is missing to make this improved module support come upstream?
> >>
> >
> > Mostly just time and some other priorities. We've taken another look
> > at it over the last couple of days, and will try to accelerate getting
> > it in within the next couple of kernel releases. (Hopefully sooner
> > rather than later.)
> Is there anything we can do to make this move faster? As it is our great
> interest to make this work in KUnit, maybe I, Isabella, Tales, or Magali
> can start work on this feature. We don´t have much knowledge of the
> inner workings of KUnit, but if you point out a path, we can try to work
> on this task.
>
> Maybe, could we work in the same way as Jeremy?

Daniel and I have quickly tidied up and finished the various
in-progress bits of this and sent it out here:
https://lore.kernel.org/linux-kselftest/20220618090310.1174932-1-davidgow@google.com/T/

You should be able to apply that series and then just use
kunit_test_suites(), which will no-longer conflict with module_init
functions.

The most useful thing you could do is to test and/or review it --
there's almost certainly something I'll have missed.

Cheers,
-- David
Rodrigo Siqueira Jordao June 22, 2022, 10:55 p.m. UTC | #6
Hi,

First of all, thanks a lot for exploring the introduction of kunit 
inside amdgpu.

See my inline comments

On 2022-06-18 05:08, David Gow wrote:
> On Sat, Jun 18, 2022 at 4:24 AM Maíra Canal <maira.canal@usp.br> wrote:
>>
>> On 6/17/22 04:55, David Gow wrote:
>>> On Fri, Jun 17, 2022 at 6:41 AM Maíra Canal <maira.canal@usp.br> wrote:
>>>>
>>>> Hi David,
>>>>
>>>> Thank you for your feedback!
>>>>
>>>> On 6/16/22 11:39, David Gow wrote:
>>>>> On Wed, Jun 8, 2022 at 9:08 AM Maíra Canal <maira.canal@usp.br> wrote:
>>>>
>>>>>>
>>>>>> As kunit_test_suites() defines itself as an init_module(), it conflicts with
>>>>>> the existing one at amdgpu_drv. So, if we use kunit_test_suites(), we won't
>>>>>> be able to compile the tests as modules and, therefore, won't be able to use
>>>>>> IGT to run the tests. This problem with kunit_test_suites() was already
>>>>>> discussed in the KUnit mailing list, as can be seen in [7].
>>>>>
>>>>> I'm not sure I fully understand why these tests need to be part of the
>>>>> amdgpu module, though admittedly I've not played with IGT much. Would
>>>>> it be possible to compile these tests as separate modules, which could
>>>>> depend on amdgpu (or maybe include the DML stuff directly), and
>>>>> therefore not have this conflict? I definitely was able to get these
>>>>> tests working under kunit_tool (albeit as built-ins) by using
>>>>> kunit_test_suites(). If each suite were built as a separate module (or
>>>>> indeed, even if all the tests were in one module, with one list of
>>>>> suites), then it should be possible to avoid the init_module()
>>>>> conflict. That'd also make it possible to run these tests without
>>>>> actually needing the driver to initialise, which seems like it might
>>>>> require actual hardware(?)

In the Display code for amdgpu, we heavily rely on IGT for automated 
tests. We have some internal CI where we run a large set of IGT tests 
per patch, and I'm sure many other DRM developers also use IGT. In this 
sense, if we can have an interface inside IGT that can easily run those 
kunit tests, we can enable kunit tests in our CI pipeline almost for free :)

We already have a prototype for this sort of integration at:

https://patchwork.freedesktop.org/series/105294/

>>>> Initially, we tried the kunit_test_suites() approach. And it did work pretty well for the kunit_tool (although we didn't test any hardware-specific unit test). But when compiling the test as a module, we would get a linking error, pointing out multiple definitions of 'init_module'/'cleanup_module' at kunit_test_suites().
>>>>
>>>> At this point, we thought about a couple of options to resolve this problem:
>>>> - Add EXPORT_SYMBOL to the functions we would test. But, this doesn't scale pretty well, because it would pollute AMDGPU code as the tests expand.
>>>> - Take the Thunderbolt path and add the tests to the driver stack.
>>>>
>>>> We end up taking the Thunderbolt path as it would be more maintainable.
>>>>
>>>> Compiling the tests as a module is essential to make the tests run at IGT, as IGT essentially loads the module, runs it, and parses the output (a very very simplified explanation of what IGT does). IGT is a very known tool for DRI developers, so we believe that IGT support is crucial for this project.
>>>>
>>>> If you have any other options on how to make the module compilation viable without using the 'thunderbolt'-style, we would be glad to hear your suggestions.
>>>
>>> As you point out, there are really two separate problems with
>>> splitting the tests out totally:
>>> - It's ugly and pollutes the symbol namespace to have EXPORT_SYMBOL()
>>> everywhere.
>>> - It's impossible to have multiple init_module() "calls" in the same module.
>>>
>>> The first of these is, I think, the harder to solve generally. (There
>>> are some ways to mitigate the namespace pollution part of it by either
>>> hiding the EXPORT_SYMBOL() directives behind #ifdef CONFIG_KUNIT or
>>> similar, or by using symbol namespaces:
>>> https://www.kernel.org/doc/html/latest/core-api/symbol-namespaces.html
>>> -- or both -- but they don't solve the issue entirely.)
>>>
>>> That being said, it's as much a matter of taste as anything, so if
>>> keeping things in the amdgpu module works well, don't let me stop you.
>>> Either way should work, and have their own advantages and
>>> disadvantages.

I want to avoid making changes inside the dc code [1] (or keep it 
minimal) for enabling kunit. Aligned with the IGT comment, I'm more 
inclined to a solution where we treat the kunit tests for DML as a 
module. However, I'm not sure yet if it is possible to have something 
like that...

Does it make things easier if we have a single module that handles the 
dml-kunit interface, and we can control which test to invoke via kernel 
parameter?

1. 
https://gitlab.freedesktop.org/agd5f/linux/-/tree/amd-staging-drm-next/drivers/gpu/drm/amd/display/dc

>>> The latter is just a quirk of the current KUnit implementation of
>>> kunit_test_suites(). This multiple-definition issue will go away in
>>> the not-too-distant future.
>>>
>>> So my suggestion here would be to make sure any changes you make to
>>> work around the issue with multiple init_module definitions are easy
>>> to remove. I suspect you could probably significantly simplify the
>>> whole dml_test.{c,h} bit to just directly export the kunit_suites and
>>> maybe throw them all in one array to pass to
>>> __kunit_test_suites_init(). Then, when the improved modules work
>>> lands, they could be deleted entirely and replaced with one or more
>>> calls to kunit_test_suite().
>>>
>>>>>
>>>>> There are two other reasons the 'thunderbolt'-style technique is one
>>>>> we want to avoid:
>>>>> 1. It makes it much more difficult to run tests using kunit_tool and
>>>>> KUnit-based CI tools: these tests would not run automatically, and if
>>>>> they were built-in as-is, they'd need to be
>>>>> 2. We're planning to improve module support to replace the
>>>>> init_module()-based implementation of kunit_test_suites() with one
>>>>> which won't have these conflicts, so the need for this should be
>>>>> short-lived.
>>>>>
>>>>> If you're curious, an early version of the improved module support can
>>>>> be found here, though it's out-of-date enough it won't apply or work
>>>>> as-is:
>>>>> https://lore.kernel.org/all/101d12fc9250b7a445ff50a9e7a25cd74d0e16eb.camel@codeconstruct.com.au/
>>>>>
>>>>> Now, that's unlikely to be ready very soon, but I'd be hesitant to
>>>>> implement too extensive a system for avoiding kunit_test_suites()
>>>>> given at some point it should work and we'll need to migrate back to
>>>>> it.
>>>>
>>>> We hope to see in the near future the improved module support from KUnit as it would make the addition of tests much more simple and clean.
>>>>
>>>> Could you explain more about what is missing to make this improved module support come upstream?
>>>>
>>>
>>> Mostly just time and some other priorities. We've taken another look
>>> at it over the last couple of days, and will try to accelerate getting
>>> it in within the next couple of kernel releases. (Hopefully sooner
>>> rather than later.)
>> Is there anything we can do to make this move faster? As it is our great
>> interest to make this work in KUnit, maybe I, Isabella, Tales, or Magali
>> can start work on this feature. We don´t have much knowledge of the
>> inner workings of KUnit, but if you point out a path, we can try to work
>> on this task.
>>
>> Maybe, could we work in the same way as Jeremy?
> 
> Daniel and I have quickly tidied up and finished the various
> in-progress bits of this and sent it out here:
> https://lore.kernel.org/linux-kselftest/20220618090310.1174932-1-davidgow@google.com/T/
> 
> You should be able to apply that series and then just use
> kunit_test_suites(), which will no-longer conflict with module_init
> functions.
> 
> The most useful thing you could do is to test and/or review it --
> there's almost certainly something I'll have missed.

Nice!

Maira/Magali/Isabela/Tales,

Maybe for the next version, we can use the above patches even if they 
are not in amd-staging-drm-next yet.

Thanks
Siqueira

> Cheers,
> -- David