mbox series

[0/2] drm: revert some framebuffer API tests

Message ID cover.1726594684.git.jani.nikula@intel.com (mailing list archive)
Headers show
Series drm: revert some framebuffer API tests | expand

Message

Jani Nikula Sept. 17, 2024, 5:43 p.m. UTC
The tests consistently trigger WARNs in drm_framebuffer code. I'm not
sure what the point is with type of belts and suspenders tests. The
warnings *are* the way to flag erroneous API usage.

Warnings in turn trigger failures in CI. Filtering the warnings are
error prone, and, crucially, would also filter actual errors in case the
kunit tests are not run.

I acknowledge there may be complex test cases where you'd end up
triggering warnings somewhere deep, but these are not it. These are
simple.

Revert the tests, back to the drawing board.

BR,
Jani.


Cc: Carlos Eduardo Gallo Filho <gcarlos@disroot.org>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Jeff Johnson <quic_jjohnson@quicinc.com>


Jani Nikula (2):
  Revert "drm/tests: Add test for drm_framebuffer_free()"
  Revert "drm/tests: Add test for drm_framebuffer_init()"

 drivers/gpu/drm/drm_framebuffer.c            |   1 -
 drivers/gpu/drm/drm_mode_object.c            |   1 -
 drivers/gpu/drm/tests/drm_framebuffer_test.c | 134 -------------------
 3 files changed, 136 deletions(-)

Comments

Simona Vetter Sept. 24, 2024, 10:06 a.m. UTC | #1
On Tue, Sep 17, 2024 at 08:43:50PM +0300, Jani Nikula wrote:
> The tests consistently trigger WARNs in drm_framebuffer code. I'm not
> sure what the point is with type of belts and suspenders tests. The
> warnings *are* the way to flag erroneous API usage.
> 
> Warnings in turn trigger failures in CI. Filtering the warnings are
> error prone, and, crucially, would also filter actual errors in case the
> kunit tests are not run.
> 
> I acknowledge there may be complex test cases where you'd end up
> triggering warnings somewhere deep, but these are not it. These are
> simple.
> 
> Revert the tests, back to the drawing board.

Yeah I think long-term we might want a kunit framework so that we can
catch dmesg warnings we expect and test for those, without those warnings
actually going to dmesg. Similar to how the lockdep tests also reroute
locking validation, so that the expected positive tests don't wreak
lockdep for real.

But until that exists, we can't have tests that splat in dmesg when they
work as intended.

Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>

Cheers, Sima

> 
> BR,
> Jani.
> 
> 
> Cc: Carlos Eduardo Gallo Filho <gcarlos@disroot.org>
> Cc: Maxime Ripard <mripard@kernel.org>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Thomas Zimmermann <tzimmermann@suse.de>
> Cc: Jeff Johnson <quic_jjohnson@quicinc.com>
> 
> 
> Jani Nikula (2):
>   Revert "drm/tests: Add test for drm_framebuffer_free()"
>   Revert "drm/tests: Add test for drm_framebuffer_init()"
> 
>  drivers/gpu/drm/drm_framebuffer.c            |   1 -
>  drivers/gpu/drm/drm_mode_object.c            |   1 -
>  drivers/gpu/drm/tests/drm_framebuffer_test.c | 134 -------------------
>  3 files changed, 136 deletions(-)
> 
> -- 
> 2.39.2
>
Maxime Ripard Sept. 24, 2024, 11:54 a.m. UTC | #2
+Guenter

On Tue, Sep 24, 2024 at 12:06:28PM GMT, Simona Vetter wrote:
> On Tue, Sep 17, 2024 at 08:43:50PM +0300, Jani Nikula wrote:
> > The tests consistently trigger WARNs in drm_framebuffer code. I'm not
> > sure what the point is with type of belts and suspenders tests. The
> > warnings *are* the way to flag erroneous API usage.
> > 
> > Warnings in turn trigger failures in CI. Filtering the warnings are
> > error prone, and, crucially, would also filter actual errors in case the
> > kunit tests are not run.
> > 
> > I acknowledge there may be complex test cases where you'd end up
> > triggering warnings somewhere deep, but these are not it. These are
> > simple.
> > 
> > Revert the tests, back to the drawing board.
> 
> Yeah I think long-term we might want a kunit framework so that we can
> catch dmesg warnings we expect and test for those, without those warnings
> actually going to dmesg. Similar to how the lockdep tests also reroute
> locking validation, so that the expected positive tests don't wreak
> lockdep for real.
> 
> But until that exists, we can't have tests that splat in dmesg when they
> work as intended.

It should be pretty soon:
https://lore.kernel.org/dri-devel/20240403131936.787234-1-linux@roeck-us.net/

I'm not sure what happened to that series, but it looks like everybody
was mostly happy with it?

Maxime
Guenter Roeck Sept. 24, 2024, 1:37 p.m. UTC | #3
On 9/24/24 04:54, Maxime Ripard wrote:
> +Guenter
> 
> On Tue, Sep 24, 2024 at 12:06:28PM GMT, Simona Vetter wrote:
>> On Tue, Sep 17, 2024 at 08:43:50PM +0300, Jani Nikula wrote:
>>> The tests consistently trigger WARNs in drm_framebuffer code. I'm not
>>> sure what the point is with type of belts and suspenders tests. The
>>> warnings *are* the way to flag erroneous API usage.
>>>
>>> Warnings in turn trigger failures in CI. Filtering the warnings are
>>> error prone, and, crucially, would also filter actual errors in case the
>>> kunit tests are not run.
>>>
>>> I acknowledge there may be complex test cases where you'd end up
>>> triggering warnings somewhere deep, but these are not it. These are
>>> simple.
>>>
>>> Revert the tests, back to the drawing board.
>>
>> Yeah I think long-term we might want a kunit framework so that we can
>> catch dmesg warnings we expect and test for those, without those warnings
>> actually going to dmesg. Similar to how the lockdep tests also reroute
>> locking validation, so that the expected positive tests don't wreak
>> lockdep for real.
>>
>> But until that exists, we can't have tests that splat in dmesg when they
>> work as intended.
> 
> It should be pretty soon:
> https://lore.kernel.org/dri-devel/20240403131936.787234-1-linux@roeck-us.net/
> 
> I'm not sure what happened to that series, but it looks like everybody
> was mostly happy with it?
> 

Major subsystem maintainers did not provide any feedback at all, and then
there came the "it is not perfect enough" feedback, so I gave up on it.

Guenter
Maxime Ripard Sept. 24, 2024, 1:56 p.m. UTC | #4
On Tue, Sep 24, 2024 at 06:37:59AM GMT, Guenter Roeck wrote:
> On 9/24/24 04:54, Maxime Ripard wrote:
> > +Guenter
> > 
> > On Tue, Sep 24, 2024 at 12:06:28PM GMT, Simona Vetter wrote:
> > > On Tue, Sep 17, 2024 at 08:43:50PM +0300, Jani Nikula wrote:
> > > > The tests consistently trigger WARNs in drm_framebuffer code. I'm not
> > > > sure what the point is with type of belts and suspenders tests. The
> > > > warnings *are* the way to flag erroneous API usage.
> > > > 
> > > > Warnings in turn trigger failures in CI. Filtering the warnings are
> > > > error prone, and, crucially, would also filter actual errors in case the
> > > > kunit tests are not run.
> > > > 
> > > > I acknowledge there may be complex test cases where you'd end up
> > > > triggering warnings somewhere deep, but these are not it. These are
> > > > simple.
> > > > 
> > > > Revert the tests, back to the drawing board.
> > > 
> > > Yeah I think long-term we might want a kunit framework so that we can
> > > catch dmesg warnings we expect and test for those, without those warnings
> > > actually going to dmesg. Similar to how the lockdep tests also reroute
> > > locking validation, so that the expected positive tests don't wreak
> > > lockdep for real.
> > > 
> > > But until that exists, we can't have tests that splat in dmesg when they
> > > work as intended.
> > 
> > It should be pretty soon:
> > https://lore.kernel.org/dri-devel/20240403131936.787234-1-linux@roeck-us.net/
> > 
> > I'm not sure what happened to that series, but it looks like everybody
> > was mostly happy with it?
> > 
> 
> Major subsystem maintainers did not provide any feedback at all, and then
> there came the "it is not perfect enough" feedback, so I gave up on it.

Well, if that means anything, we're interested even in something
imperfect if it solves the above case :)

Maxime
Guenter Roeck Sept. 24, 2024, 3:09 p.m. UTC | #5
On 9/24/24 06:56, Maxime Ripard wrote:
> On Tue, Sep 24, 2024 at 06:37:59AM GMT, Guenter Roeck wrote:
>> On 9/24/24 04:54, Maxime Ripard wrote:
>>> +Guenter
>>>
>>> On Tue, Sep 24, 2024 at 12:06:28PM GMT, Simona Vetter wrote:
>>>> On Tue, Sep 17, 2024 at 08:43:50PM +0300, Jani Nikula wrote:
>>>>> The tests consistently trigger WARNs in drm_framebuffer code. I'm not
>>>>> sure what the point is with type of belts and suspenders tests. The
>>>>> warnings *are* the way to flag erroneous API usage.
>>>>>
>>>>> Warnings in turn trigger failures in CI. Filtering the warnings are
>>>>> error prone, and, crucially, would also filter actual errors in case the
>>>>> kunit tests are not run.
>>>>>
>>>>> I acknowledge there may be complex test cases where you'd end up
>>>>> triggering warnings somewhere deep, but these are not it. These are
>>>>> simple.
>>>>>
>>>>> Revert the tests, back to the drawing board.
>>>>
>>>> Yeah I think long-term we might want a kunit framework so that we can
>>>> catch dmesg warnings we expect and test for those, without those warnings
>>>> actually going to dmesg. Similar to how the lockdep tests also reroute
>>>> locking validation, so that the expected positive tests don't wreak
>>>> lockdep for real.
>>>>
>>>> But until that exists, we can't have tests that splat in dmesg when they
>>>> work as intended.
>>>

FWIW, that is arguable. More and more tests are added which do add such splats,
and I don't see any hesitance by developers to adding more. So far I counted
two alone in this commit window, and that does not include new splats from
tests which I had already disabled. I simply disable those tests or don't
enable them in the first place if they are new. I did the same with the drm
unit tests due to the splats generated by the scaling unit tests, so any
additional drm unit test splats don't make a difference for me since the
tests are already disabled.

>>> It should be pretty soon:
>>> https://lore.kernel.org/dri-devel/20240403131936.787234-1-linux@roeck-us.net/
>>>
>>> I'm not sure what happened to that series, but it looks like everybody
>>> was mostly happy with it?
>>>
>>
>> Major subsystem maintainers did not provide any feedback at all, and then
>> there came the "it is not perfect enough" feedback, so I gave up on it.
> 
> Well, if that means anything, we're interested even in something
> imperfect if it solves the above case :)
> 

Maybe someone else is interested picking it up (and no need for credits).
I am buried in work and don't have the time (nor, frankly, much interest)
to revive it. Also, just for reference: The patch series touches a total
of 8 architectures, and unless I missed some I only got feedback from two
architecture maintainers. That doesn't include arm - I don't recall if it
doesn't need changes or if I never got there.

Guenter
Jani Nikula Sept. 24, 2024, 3:56 p.m. UTC | #6
On Tue, 24 Sep 2024, Guenter Roeck <linux@roeck-us.net> wrote:
>>>> On Tue, Sep 24, 2024 at 12:06:28PM GMT, Simona Vetter wrote:
>>>>> Yeah I think long-term we might want a kunit framework so that we can
>>>>> catch dmesg warnings we expect and test for those, without those warnings
>>>>> actually going to dmesg. Similar to how the lockdep tests also reroute
>>>>> locking validation, so that the expected positive tests don't wreak
>>>>> lockdep for real.
>>>>>
>>>>> But until that exists, we can't have tests that splat in dmesg when they
>>>>> work as intended.
>
> FWIW, that is arguable. More and more tests are added which do add such splats,
> and I don't see any hesitance by developers to adding more. So far I counted
> two alone in this commit window, and that does not include new splats from
> tests which I had already disabled. I simply disable those tests or don't
> enable them in the first place if they are new. I did the same with the drm
> unit tests due to the splats generated by the scaling unit tests, so any
> additional drm unit test splats don't make a difference for me since the
> tests are already disabled.

What's the point of having unit tests that CI systems routinely have to
filter out of test runs? Or filter warnings generated by the tests,
potentially missing new warnings. Who is going to run the tests if the
existing CI systems choose to ignore them?

Automation on a massive scale is key here, and making that harder is
counter-productive.


BR,
Jani.
Guenter Roeck Sept. 24, 2024, 4:35 p.m. UTC | #7
On 9/24/24 08:56, Jani Nikula wrote:
> On Tue, 24 Sep 2024, Guenter Roeck <linux@roeck-us.net> wrote:
>>>>> On Tue, Sep 24, 2024 at 12:06:28PM GMT, Simona Vetter wrote:
>>>>>> Yeah I think long-term we might want a kunit framework so that we can
>>>>>> catch dmesg warnings we expect and test for those, without those warnings
>>>>>> actually going to dmesg. Similar to how the lockdep tests also reroute
>>>>>> locking validation, so that the expected positive tests don't wreak
>>>>>> lockdep for real.
>>>>>>
>>>>>> But until that exists, we can't have tests that splat in dmesg when they
>>>>>> work as intended.
>>
>> FWIW, that is arguable. More and more tests are added which do add such splats,
>> and I don't see any hesitance by developers to adding more. So far I counted
>> two alone in this commit window, and that does not include new splats from
>> tests which I had already disabled. I simply disable those tests or don't
>> enable them in the first place if they are new. I did the same with the drm
>> unit tests due to the splats generated by the scaling unit tests, so any
>> additional drm unit test splats don't make a difference for me since the
>> tests are already disabled.
> 
> What's the point of having unit tests that CI systems routinely have to
> filter out of test runs? Or filter warnings generated by the tests,
> potentially missing new warnings. Who is going to run the tests if the
> existing CI systems choose to ignore them?
> 
> Automation on a massive scale is key here, and making that harder is
> counter-productive.
> 

As I have said elsewhere, not being able to handle backtraces generated
on purpose is a limitation of my testbed. Other testbeds with active (and
paid) maintainers may not have that limitation. I deal with it by disabling
affected tests. Others may deal with it by maintaining exception lists.

Ultimately it is up to developers and testbed maintainers to decide the level
of test coverage they are looking for or able to support.

Thanks,
Guenter
Maxime Ripard Sept. 24, 2024, 4:57 p.m. UTC | #8
On Tue, Sep 24, 2024 at 06:56:26PM GMT, Jani Nikula wrote:
> On Tue, 24 Sep 2024, Guenter Roeck <linux@roeck-us.net> wrote:
> >>>> On Tue, Sep 24, 2024 at 12:06:28PM GMT, Simona Vetter wrote:
> >>>>> Yeah I think long-term we might want a kunit framework so that we can
> >>>>> catch dmesg warnings we expect and test for those, without those warnings
> >>>>> actually going to dmesg. Similar to how the lockdep tests also reroute
> >>>>> locking validation, so that the expected positive tests don't wreak
> >>>>> lockdep for real.
> >>>>>
> >>>>> But until that exists, we can't have tests that splat in dmesg when they
> >>>>> work as intended.
> >
> > FWIW, that is arguable. More and more tests are added which do add such splats,
> > and I don't see any hesitance by developers to adding more. So far I counted
> > two alone in this commit window, and that does not include new splats from
> > tests which I had already disabled. I simply disable those tests or don't
> > enable them in the first place if they are new. I did the same with the drm
> > unit tests due to the splats generated by the scaling unit tests, so any
> > additional drm unit test splats don't make a difference for me since the
> > tests are already disabled.
> 
> What's the point of having unit tests that CI systems routinely have to
> filter out of test runs? Or filter warnings generated by the tests,
> potentially missing new warnings. Who is going to run the tests if the
> existing CI systems choose to ignore them?

If we turn this argument around, that means we can't write unit test for
code that will create a warning.

IMO, this creates a bad incentive, and saying that any capable CI system
should reject them is certainly opiniated.

Maxime
Guenter Roeck Sept. 24, 2024, 5:37 p.m. UTC | #9
On 9/24/24 09:57, Maxime Ripard wrote:
> On Tue, Sep 24, 2024 at 06:56:26PM GMT, Jani Nikula wrote:
>> On Tue, 24 Sep 2024, Guenter Roeck <linux@roeck-us.net> wrote:
>>>>>> On Tue, Sep 24, 2024 at 12:06:28PM GMT, Simona Vetter wrote:
>>>>>>> Yeah I think long-term we might want a kunit framework so that we can
>>>>>>> catch dmesg warnings we expect and test for those, without those warnings
>>>>>>> actually going to dmesg. Similar to how the lockdep tests also reroute
>>>>>>> locking validation, so that the expected positive tests don't wreak
>>>>>>> lockdep for real.
>>>>>>>
>>>>>>> But until that exists, we can't have tests that splat in dmesg when they
>>>>>>> work as intended.
>>>
>>> FWIW, that is arguable. More and more tests are added which do add such splats,
>>> and I don't see any hesitance by developers to adding more. So far I counted
>>> two alone in this commit window, and that does not include new splats from
>>> tests which I had already disabled. I simply disable those tests or don't
>>> enable them in the first place if they are new. I did the same with the drm
>>> unit tests due to the splats generated by the scaling unit tests, so any
>>> additional drm unit test splats don't make a difference for me since the
>>> tests are already disabled.
>>
>> What's the point of having unit tests that CI systems routinely have to
>> filter out of test runs? Or filter warnings generated by the tests,
>> potentially missing new warnings. Who is going to run the tests if the
>> existing CI systems choose to ignore them?
> 
> If we turn this argument around, that means we can't write unit test for
> code that will create a warning.
> 
> IMO, this creates a bad incentive, and saying that any capable CI system
> should reject them is certainly opiniated.
> 

Agreed. All I am saying is that _I_ am rejecting them, but it is up to each
individual testbed (or, rather, testbed maintainer) to decide how to handle
the situation.

Guenter
Jani Nikula Sept. 25, 2024, 9:41 a.m. UTC | #10
On Tue, 24 Sep 2024, Maxime Ripard <mripard@kernel.org> wrote:
> On Tue, Sep 24, 2024 at 06:56:26PM GMT, Jani Nikula wrote:
>> On Tue, 24 Sep 2024, Guenter Roeck <linux@roeck-us.net> wrote:
>> >>>> On Tue, Sep 24, 2024 at 12:06:28PM GMT, Simona Vetter wrote:
>> >>>>> Yeah I think long-term we might want a kunit framework so that we can
>> >>>>> catch dmesg warnings we expect and test for those, without those warnings
>> >>>>> actually going to dmesg. Similar to how the lockdep tests also reroute
>> >>>>> locking validation, so that the expected positive tests don't wreak
>> >>>>> lockdep for real.
>> >>>>>
>> >>>>> But until that exists, we can't have tests that splat in dmesg when they
>> >>>>> work as intended.
>> >
>> > FWIW, that is arguable. More and more tests are added which do add such splats,
>> > and I don't see any hesitance by developers to adding more. So far I counted
>> > two alone in this commit window, and that does not include new splats from
>> > tests which I had already disabled. I simply disable those tests or don't
>> > enable them in the first place if they are new. I did the same with the drm
>> > unit tests due to the splats generated by the scaling unit tests, so any
>> > additional drm unit test splats don't make a difference for me since the
>> > tests are already disabled.
>> 
>> What's the point of having unit tests that CI systems routinely have to
>> filter out of test runs? Or filter warnings generated by the tests,
>> potentially missing new warnings. Who is going to run the tests if the
>> existing CI systems choose to ignore them?
>
> If we turn this argument around, that means we can't write unit test for
> code that will create a warning.

The question really is, which warnings we get because of the
functionality being tested, and which warnings are due to an unexpected
and unrelated bug? Is it a pass or fail if the test triggers an
unrelated warning?

If the developer runs the tests, are they expected to look at dmesg and
inspect every warning to decide?

> IMO, this creates a bad incentive, and saying that any capable CI system
> should reject them is certainly opiniated.

All I'm saying it generates an extra burden for current real world CI
systems that run tests on a massive scale and even have paid
maintainers. It's not trivial to sort out expected and unexpected
warnings, and keep the filters updated every time the warnings
change. It's not without a cost. And the end result might just be that
the unit tests won't get run at all. Or warnings just get completely
ignored for kunit tests.

I don't completely disagree with you either. It's just that we don't
have that automation on the kernel side, and in the mean time, perhaps
we should be really conservative about the warnings we create in tests?

If we can't filter out the warnings kernel side, perhaps we could figure
out a way to annotate the kunit tests that generate warnings on passing
runs?


BR,
Jani.
Simona Vetter Sept. 25, 2024, 11:52 a.m. UTC | #11
On Tue, Sep 24, 2024 at 08:09:38AM -0700, Guenter Roeck wrote:
> On 9/24/24 06:56, Maxime Ripard wrote:
> > On Tue, Sep 24, 2024 at 06:37:59AM GMT, Guenter Roeck wrote:
> > > On 9/24/24 04:54, Maxime Ripard wrote:
> > > > +Guenter
> > > > 
> > > > On Tue, Sep 24, 2024 at 12:06:28PM GMT, Simona Vetter wrote:
> > > > > On Tue, Sep 17, 2024 at 08:43:50PM +0300, Jani Nikula wrote:
> > > > > > The tests consistently trigger WARNs in drm_framebuffer code. I'm not
> > > > > > sure what the point is with type of belts and suspenders tests. The
> > > > > > warnings *are* the way to flag erroneous API usage.
> > > > > > 
> > > > > > Warnings in turn trigger failures in CI. Filtering the warnings are
> > > > > > error prone, and, crucially, would also filter actual errors in case the
> > > > > > kunit tests are not run.
> > > > > > 
> > > > > > I acknowledge there may be complex test cases where you'd end up
> > > > > > triggering warnings somewhere deep, but these are not it. These are
> > > > > > simple.
> > > > > > 
> > > > > > Revert the tests, back to the drawing board.
> > > > > 
> > > > > Yeah I think long-term we might want a kunit framework so that we can
> > > > > catch dmesg warnings we expect and test for those, without those warnings
> > > > > actually going to dmesg. Similar to how the lockdep tests also reroute
> > > > > locking validation, so that the expected positive tests don't wreak
> > > > > lockdep for real.
> > > > > 
> > > > > But until that exists, we can't have tests that splat in dmesg when they
> > > > > work as intended.
> > > > 
> 
> FWIW, that is arguable. More and more tests are added which do add such splats,
> and I don't see any hesitance by developers to adding more. So far I counted
> two alone in this commit window, and that does not include new splats from
> tests which I had already disabled. I simply disable those tests or don't
> enable them in the first place if they are new. I did the same with the drm
> unit tests due to the splats generated by the scaling unit tests, so any
> additional drm unit test splats don't make a difference for me since the
> tests are already disabled.

I think for at least drm the consensus is clear, we won't have kunit tests
that splat. Personally I really don't see the point of unit tests that are
somewhere between unecessarily hard or outright too much pain to deploy in
a test rig: Either you don't run them (not great), or you filter splats
and might filter too much (not great either) or you filter as little as
possible and fight false positives (also kinda suboptimal). Especially for
unit tests this stuff must be totally rock solid.

We've had similar discussions in the past around unit tests that wasted
too much cpu time with randomized combinatorial testing, and we've thrown
those out too from drm. Test hygiene matters.

Also this means if you see this stuff in drm unit tests (or related things
like dma-buf/fence), please report or send revert.

> > > > It should be pretty soon:
> > > > https://lore.kernel.org/dri-devel/20240403131936.787234-1-linux@roeck-us.net/
> > > > 
> > > > I'm not sure what happened to that series, but it looks like everybody
> > > > was mostly happy with it?
> > > > 
> > > 
> > > Major subsystem maintainers did not provide any feedback at all, and then
> > > there came the "it is not perfect enough" feedback, so I gave up on it.
> > 
> > Well, if that means anything, we're interested even in something
> > imperfect if it solves the above case :)
> > 
> 
> Maybe someone else is interested picking it up (and no need for credits).
> I am buried in work and don't have the time (nor, frankly, much interest)
> to revive it. Also, just for reference: The patch series touches a total
> of 8 architectures, and unless I missed some I only got feedback from two
> architecture maintainers. That doesn't include arm - I don't recall if it
> doesn't need changes or if I never got there.

This is great, somehow I missed it and much appreciated for the initial
version even if you can't push it further. Hopefully one of the folks
working on drm kunit tests will pick it up, it will be needed here.

Cheers, Sima
Maxime Ripard Sept. 25, 2024, 12:59 p.m. UTC | #12
On Wed, Sep 25, 2024 at 12:41:40PM GMT, Jani Nikula wrote:
> On Tue, 24 Sep 2024, Maxime Ripard <mripard@kernel.org> wrote:
> > On Tue, Sep 24, 2024 at 06:56:26PM GMT, Jani Nikula wrote:
> >> On Tue, 24 Sep 2024, Guenter Roeck <linux@roeck-us.net> wrote:
> >> >>>> On Tue, Sep 24, 2024 at 12:06:28PM GMT, Simona Vetter wrote:
> >> >>>>> Yeah I think long-term we might want a kunit framework so that we can
> >> >>>>> catch dmesg warnings we expect and test for those, without those warnings
> >> >>>>> actually going to dmesg. Similar to how the lockdep tests also reroute
> >> >>>>> locking validation, so that the expected positive tests don't wreak
> >> >>>>> lockdep for real.
> >> >>>>>
> >> >>>>> But until that exists, we can't have tests that splat in dmesg when they
> >> >>>>> work as intended.
> >> >
> >> > FWIW, that is arguable. More and more tests are added which do add such splats,
> >> > and I don't see any hesitance by developers to adding more. So far I counted
> >> > two alone in this commit window, and that does not include new splats from
> >> > tests which I had already disabled. I simply disable those tests or don't
> >> > enable them in the first place if they are new. I did the same with the drm
> >> > unit tests due to the splats generated by the scaling unit tests, so any
> >> > additional drm unit test splats don't make a difference for me since the
> >> > tests are already disabled.
> >> 
> >> What's the point of having unit tests that CI systems routinely have to
> >> filter out of test runs? Or filter warnings generated by the tests,
> >> potentially missing new warnings. Who is going to run the tests if the
> >> existing CI systems choose to ignore them?
> >
> > If we turn this argument around, that means we can't write unit test for
> > code that will create a warning.
> 
> The question really is, which warnings we get because of the
> functionality being tested, and which warnings are due to an unexpected
> and unrelated bug? Is it a pass or fail if the test triggers an
> unrelated warning?
> 
> If the developer runs the tests, are they expected to look at dmesg and
> inspect every warning to decide?

No, there's no such expectation. Yet Intel's CI chose to do so, and
chose to treat any warning as a failure, despite the fact that kunit
doesn't have the same policy.

> > IMO, this creates a bad incentive, and saying that any capable CI system
> > should reject them is certainly opiniated.
> 
> All I'm saying it generates an extra burden for current real world CI
> systems that run tests on a massive scale and even have paid
> maintainers. It's not trivial to sort out expected and unexpected
> warnings, and keep the filters updated every time the warnings
> change. It's not without a cost. And the end result might just be that
> the unit tests won't get run at all. Or warnings just get completely
> ignored for kunit tests.

I realise it must take a significant amount of resources, but it's also
self inflicted. You could also stop looking for warnings when running
kunit.

> I don't completely disagree with you either. It's just that we don't
> have that automation on the kernel side, and in the mean time, perhaps
> we should be really conservative about the warnings we create in tests?
> 
> If we can't filter out the warnings kernel side, perhaps we could figure
> out a way to annotate the kunit tests that generate warnings on passing
> runs?

Yeah, I think that would be the best solution. That's what Guenther's
series was about :/

Maxime
Maxime Ripard Sept. 25, 2024, 1:05 p.m. UTC | #13
On Wed, Sep 25, 2024 at 01:52:02PM GMT, Simona Vetter wrote:
> On Tue, Sep 24, 2024 at 08:09:38AM -0700, Guenter Roeck wrote:
> > On 9/24/24 06:56, Maxime Ripard wrote:
> > > On Tue, Sep 24, 2024 at 06:37:59AM GMT, Guenter Roeck wrote:
> > > > On 9/24/24 04:54, Maxime Ripard wrote:
> > > > > +Guenter
> > > > > 
> > > > > On Tue, Sep 24, 2024 at 12:06:28PM GMT, Simona Vetter wrote:
> > > > > > On Tue, Sep 17, 2024 at 08:43:50PM +0300, Jani Nikula wrote:
> > > > > > > The tests consistently trigger WARNs in drm_framebuffer code. I'm not
> > > > > > > sure what the point is with type of belts and suspenders tests. The
> > > > > > > warnings *are* the way to flag erroneous API usage.
> > > > > > > 
> > > > > > > Warnings in turn trigger failures in CI. Filtering the warnings are
> > > > > > > error prone, and, crucially, would also filter actual errors in case the
> > > > > > > kunit tests are not run.
> > > > > > > 
> > > > > > > I acknowledge there may be complex test cases where you'd end up
> > > > > > > triggering warnings somewhere deep, but these are not it. These are
> > > > > > > simple.
> > > > > > > 
> > > > > > > Revert the tests, back to the drawing board.
> > > > > > 
> > > > > > Yeah I think long-term we might want a kunit framework so that we can
> > > > > > catch dmesg warnings we expect and test for those, without those warnings
> > > > > > actually going to dmesg. Similar to how the lockdep tests also reroute
> > > > > > locking validation, so that the expected positive tests don't wreak
> > > > > > lockdep for real.
> > > > > > 
> > > > > > But until that exists, we can't have tests that splat in dmesg when they
> > > > > > work as intended.
> > > > > 
> > 
> > FWIW, that is arguable. More and more tests are added which do add such splats,
> > and I don't see any hesitance by developers to adding more. So far I counted
> > two alone in this commit window, and that does not include new splats from
> > tests which I had already disabled. I simply disable those tests or don't
> > enable them in the first place if they are new. I did the same with the drm
> > unit tests due to the splats generated by the scaling unit tests, so any
> > additional drm unit test splats don't make a difference for me since the
> > tests are already disabled.
> 
> I think for at least drm the consensus is clear, we won't have kunit tests
> that splat.

Where was that ever discussed?

> Personally I really don't see the point of unit tests that are
> somewhere between unecessarily hard or outright too much pain to
> deploy in a test rig: Either you don't run them (not great), or you
> filter splats and might filter too much (not great either) or you
> filter as little as possible and fight false positives (also kinda
> suboptimal).

Or you don't do any of that, and just rely on the canonical way to run
kunit test and trust it's going to pass tests that do indeed pass, and
fail / warn on those that don't.

> Especially for unit tests this stuff must be totally rock solid.

Is there any evidence that those tests were not rock solid?

> We've had similar discussions in the past around unit tests that wasted
> too much cpu time with randomized combinatorial testing, and we've thrown
> those out too from drm. Test hygiene matters.

We had that discussion because those tests could run for up to around a
minute on certain platforms. It's not the same issue at all?

Maxime
Guenter Roeck Sept. 25, 2024, 3:57 p.m. UTC | #14
On 9/25/24 05:59, Maxime Ripard wrote:
...

>> All I'm saying it generates an extra burden for current real world CI
>> systems that run tests on a massive scale and even have paid
>> maintainers. It's not trivial to sort out expected and unexpected
>> warnings, and keep the filters updated every time the warnings
>> change. It's not without a cost. And the end result might just be that
>> the unit tests won't get run at all. Or warnings just get completely
>> ignored for kunit tests.
> 
> I realise it must take a significant amount of resources, but it's also
> self inflicted. You could also stop looking for warnings when running
> kunit.
> 

FWIW, that doesn't work if kunit tests are built into the kernel and
run when booting the system. It also doesn't work well if kunit tests
trigger a WARN_ONCE() because that causes any real subsequent warning to be
dropped. Also, ignoring warnings while running a kunit test means that _real_
unexpected warnings triggered by those tests will be ignored as well.

Guenter
Guenter Roeck Sept. 25, 2024, 4:04 p.m. UTC | #15
On 9/25/24 06:05, Maxime Ripard wrote:
[ ... ]

>> We've had similar discussions in the past around unit tests that wasted
>> too much cpu time with randomized combinatorial testing, and we've thrown
>> those out too from drm. Test hygiene matters.
> 
> We had that discussion because those tests could run for up to around a
> minute on certain platforms. It's not the same issue at all?
> 

I added "kunit.stats_enabled=2 kunit.filter=speed>slow" to the kernel command
line in my tests to avoid long-running tests. That works as long as slow tests
are marked accordingly using KUNIT_CASE_SLOW(), KUNIT_SPEED_SLOW, or
KUNIT_SPEED_VERY_SLOW.

Guenter
Maxime Ripard Sept. 26, 2024, 7:05 a.m. UTC | #16
On Wed, Sep 25, 2024 at 09:04:39AM GMT, Guenter Roeck wrote:
> On 9/25/24 06:05, Maxime Ripard wrote:
> [ ... ]
> 
> > > We've had similar discussions in the past around unit tests that wasted
> > > too much cpu time with randomized combinatorial testing, and we've thrown
> > > those out too from drm. Test hygiene matters.
> > 
> > We had that discussion because those tests could run for up to around a
> > minute on certain platforms. It's not the same issue at all?
> > 
> 
> I added "kunit.stats_enabled=2 kunit.filter=speed>slow" to the kernel command
> line in my tests to avoid long-running tests. That works as long as slow tests
> are marked accordingly using KUNIT_CASE_SLOW(), KUNIT_SPEED_SLOW, or
> KUNIT_SPEED_VERY_SLOW.

Yeah, and I've added a warning some time if a test runs for too long but
isn't flagged as slow.

Maxime
Jani Nikula Oct. 2, 2024, 12:10 p.m. UTC | #17
On Wed, 25 Sep 2024, Maxime Ripard <mripard@kernel.org> wrote:
> On Wed, Sep 25, 2024 at 01:52:02PM GMT, Simona Vetter wrote:
>> I think for at least drm the consensus is clear, we won't have kunit tests
>> that splat.
>
> Where was that ever discussed?

Well, where was it ever agreed that it's okay for drm kunit tests to
emit warnings? :p

>> Personally I really don't see the point of unit tests that are
>> somewhere between unecessarily hard or outright too much pain to
>> deploy in a test rig: Either you don't run them (not great), or you
>> filter splats and might filter too much (not great either) or you
>> filter as little as possible and fight false positives (also kinda
>> suboptimal).
>
> Or you don't do any of that, and just rely on the canonical way to run
> kunit test and trust it's going to pass tests that do indeed pass, and
> fail / warn on those that don't.

That still doesn't address code being tested emitting *unexpected*
warnings.


BR,
Jani.
Maxime Ripard Oct. 3, 2024, 12:04 p.m. UTC | #18
On Wed, Oct 02, 2024 at 03:10:14PM GMT, Jani Nikula wrote:
> On Wed, 25 Sep 2024, Maxime Ripard <mripard@kernel.org> wrote:
> > On Wed, Sep 25, 2024 at 01:52:02PM GMT, Simona Vetter wrote:
> >> I think for at least drm the consensus is clear, we won't have kunit tests
> >> that splat.
> >
> > Where was that ever discussed?
> 
> Well, where was it ever agreed that it's okay for drm kunit tests to
> emit warnings? :p

One policy is upstream, the other isn't. And it does seem like the
consensus towards downstream patches and policies is pretty well
established.

And I wasn't the one claiming that there's a consensus to begin with, so
I don't see why I should prove anything?

> >> Personally I really don't see the point of unit tests that are
> >> somewhere between unecessarily hard or outright too much pain to
> >> deploy in a test rig: Either you don't run them (not great), or you
> >> filter splats and might filter too much (not great either) or you
> >> filter as little as possible and fight false positives (also kinda
> >> suboptimal).
> >
> > Or you don't do any of that, and just rely on the canonical way to run
> > kunit test and trust it's going to pass tests that do indeed pass, and
> > fail / warn on those that don't.
> 
> That still doesn't address code being tested emitting *unexpected*
> warnings.

Then make kunit report a warning / failure when there's an unexpected
warning splat.


At the end of the day, the discussion is that we already require for for
each committer to:

  - Apply patches
  - Check for checkpatch issues
  - Solve potential conflicts
  - Make sure it compiles on three different architectures, with huge defconfigs
  - Make sure the unit tests still pass

I'm not going to add "and grep through the output of those 1000-ish
tests for a warning" to that list.

Maxime