Message ID | 20231019094609.251787-1-mripard@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/doc: ci: Require more context for flaky tests | expand |
On Thu, Oct 19, 2023 at 11:46:09AM +0200, Maxime Ripard wrote: > Flaky tests can be very difficult to reproduce after the facts, which > will make it even harder to ever fix. > > Let's document the metadata we agreed on to provide more context to > anyone trying to address these fixes. > > Link: https://lore.kernel.org/dri-devel/CAPj87rPbJ1V1-R7WMTHkDat2A4nwSd61Df9mdGH2PR=ZzxaU=Q@mail.gmail.com/ > Signed-off-by: Maxime Ripard <mripard@kernel.org> Not that my opinion matters much since I'm really not involved in the details, and no opinion on the specific format and all that, but this sounds like a very good idea too me. Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cheers, Sima > --- > Documentation/gpu/automated_testing.rst | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/Documentation/gpu/automated_testing.rst b/Documentation/gpu/automated_testing.rst > index 469b6fb65c30..2dd0e221c2c3 100644 > --- a/Documentation/gpu/automated_testing.rst > +++ b/Documentation/gpu/automated_testing.rst > @@ -67,6 +67,19 @@ Lists the tests that for a given driver on a specific hardware revision are > known to behave unreliably. These tests won't cause a job to fail regardless of > the result. They will still be run. > > +Each new flake entry must be associated with a link to a bug report to > +the author of the affected driver, the board name or Device Tree name of > +the board, the first kernel version affected, and an approximation of > +the failure rate. > + > +They should be provided under the following format:: > + > + # Bug Report: $LORE_OR_PATCHWORK_URL > + # Board Name: broken-board.dtb > + # Version: 6.6-rc1 > + # Failure Rate: 100 > + flaky-test > + > drivers/gpu/drm/ci/${DRIVER_NAME}-${HW_REVISION}-skips.txt > ----------------------------------------------------------- > > -- > 2.41.0 >
On 19/10/2023 06:46, Maxime Ripard wrote: > Flaky tests can be very difficult to reproduce after the facts, which > will make it even harder to ever fix. > > Let's document the metadata we agreed on to provide more context to > anyone trying to address these fixes. > > Link: https://lore.kernel.org/dri-devel/CAPj87rPbJ1V1-R7WMTHkDat2A4nwSd61Df9mdGH2PR=ZzxaU=Q@mail.gmail.com/ > Signed-off-by: Maxime Ripard <mripard@kernel.org> > --- > Documentation/gpu/automated_testing.rst | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/Documentation/gpu/automated_testing.rst b/Documentation/gpu/automated_testing.rst > index 469b6fb65c30..2dd0e221c2c3 100644 > --- a/Documentation/gpu/automated_testing.rst > +++ b/Documentation/gpu/automated_testing.rst > @@ -67,6 +67,19 @@ Lists the tests that for a given driver on a specific hardware revision are > known to behave unreliably. These tests won't cause a job to fail regardless of > the result. They will still be run. > > +Each new flake entry must be associated with a link to a bug report to What do you mean by but report? Just a link to an email to the mailing list is enough? Also, I had made a mistake to the first flakes lists, which I corrected with https://www.spinics.net/lists/kernel/msg4959629.html (there was a bug in my script which ended up erroneous adding a bunch of tests in the flake list, so I cleaned them up), I would like to kind request to let me add those documentation in a future patch to not block that patch series. Thanks Helen > +the author of the affected driver, the board name or Device Tree name of > +the board, the first kernel version affected, and an approximation of > +the failure rate. > + > +They should be provided under the following format:: > + > + # Bug Report: $LORE_OR_PATCHWORK_URL > + # Board Name: broken-board.dtb > + # Version: 6.6-rc1 > + # Failure Rate: 100 > + flaky-test > + > drivers/gpu/drm/ci/${DRIVER_NAME}-${HW_REVISION}-skips.txt > ----------------------------------------------------------- >
On 19/10/2023 13:51, Helen Koike wrote: > > > On 19/10/2023 06:46, Maxime Ripard wrote: >> Flaky tests can be very difficult to reproduce after the facts, which >> will make it even harder to ever fix. >> >> Let's document the metadata we agreed on to provide more context to >> anyone trying to address these fixes. >> >> Link: >> https://lore.kernel.org/dri-devel/CAPj87rPbJ1V1-R7WMTHkDat2A4nwSd61Df9mdGH2PR=ZzxaU=Q@mail.gmail.com/ >> Signed-off-by: Maxime Ripard <mripard@kernel.org> >> --- >> Documentation/gpu/automated_testing.rst | 13 +++++++++++++ >> 1 file changed, 13 insertions(+) >> >> diff --git a/Documentation/gpu/automated_testing.rst >> b/Documentation/gpu/automated_testing.rst >> index 469b6fb65c30..2dd0e221c2c3 100644 >> --- a/Documentation/gpu/automated_testing.rst >> +++ b/Documentation/gpu/automated_testing.rst >> @@ -67,6 +67,19 @@ Lists the tests that for a given driver on a >> specific hardware revision are >> known to behave unreliably. These tests won't cause a job to fail >> regardless of >> the result. They will still be run. >> +Each new flake entry must be associated with a link to a bug report to > > What do you mean by but report? Just a link to an email to the mailing > list is enough? > > Also, I had made a mistake to the first flakes lists, which I corrected > with https://www.spinics.net/lists/kernel/msg4959629.html (there was a > bug in my script which ended up erroneous adding a bunch of tests in the > flake list, so I cleaned them up), I would like to kind request to let > me add those documentation in a future patch to not block that patch > series. > > Thanks > Helen > > >> +the author of the affected driver, the board name or Device Tree name of >> +the board, the first kernel version affected, and an approximation of >> +the failure rate. >> + >> +They should be provided under the following format:: >> + >> + # Bug Report: $LORE_OR_PATCHWORK_URL I wonder if the commit adding the test into the flakes.txt file with and Acked-by from the device maintainer shouldn't be already considered the Bug Report. >> + # Board Name: broken-board.dtb Maybe Board Name isn't required, since it is already in the name of the file. >> + # Version: 6.6-rc1 >> + # Failure Rate: 100 Maybe also: # Pipeline url: https://gitlab.freedesktop.org/helen.fornazier/linux/-/pipelines/1014435 All this info will complicated a bit the update-xfails.py script, but well, we can handle... (see https://patchwork.kernel.org/project/dri-devel/patch/20231020034124.136295-4-helen.koike@collabora.com/ ) We need to update that script to make life easier. Vignesh sent a patch adding at least the pipeline url to the file https://patchwork.kernel.org/project/linux-arm-msm/patch/20231019070650.61159-9-vignesh.raman@collabora.com/ but to meet this doc that needs to be updated too. Regards, Helen >> + flaky-test >> + >> drivers/gpu/drm/ci/${DRIVER_NAME}-${HW_REVISION}-skips.txt >> -----------------------------------------------------------
On Thu, Oct 19, 2023 at 01:51:59PM -0300, Helen Koike wrote: > > > On 19/10/2023 06:46, Maxime Ripard wrote: > > Flaky tests can be very difficult to reproduce after the facts, which > > will make it even harder to ever fix. > > > > Let's document the metadata we agreed on to provide more context to > > anyone trying to address these fixes. > > > > Link: https://lore.kernel.org/dri-devel/CAPj87rPbJ1V1-R7WMTHkDat2A4nwSd61Df9mdGH2PR=ZzxaU=Q@mail.gmail.com/ > > Signed-off-by: Maxime Ripard <mripard@kernel.org> > > --- > > Documentation/gpu/automated_testing.rst | 13 +++++++++++++ > > 1 file changed, 13 insertions(+) > > > > diff --git a/Documentation/gpu/automated_testing.rst b/Documentation/gpu/automated_testing.rst > > index 469b6fb65c30..2dd0e221c2c3 100644 > > --- a/Documentation/gpu/automated_testing.rst > > +++ b/Documentation/gpu/automated_testing.rst > > @@ -67,6 +67,19 @@ Lists the tests that for a given driver on a specific hardware revision are > > known to behave unreliably. These tests won't cause a job to fail regardless of > > the result. They will still be run. > > +Each new flake entry must be associated with a link to a bug report to > > What do you mean by but report? Just a link to an email to the mailing list > is enough? Yes, a mail to the maintainers of that driver is enough. Waiting for an actual fix would take too long, but at least that way we have the opportunity to come back later on and see if there's progress. > Also, I had made a mistake to the first flakes lists, which I corrected with > https://www.spinics.net/lists/kernel/msg4959629.html (there was a bug in my > script which ended up erroneous adding a bunch of tests in the flake list, > so I cleaned them up), I would like to kind request to let me add those > documentation in a future patch to not block that patch series. Sounds fair, especially since you remove a significant number of them Maxime
On Fri, Oct 20, 2023 at 01:33:59AM -0300, Helen Koike wrote: > On 19/10/2023 13:51, Helen Koike wrote: > > On 19/10/2023 06:46, Maxime Ripard wrote: > > > Flaky tests can be very difficult to reproduce after the facts, which > > > will make it even harder to ever fix. > > > > > > Let's document the metadata we agreed on to provide more context to > > > anyone trying to address these fixes. > > > > > > Link: https://lore.kernel.org/dri-devel/CAPj87rPbJ1V1-R7WMTHkDat2A4nwSd61Df9mdGH2PR=ZzxaU=Q@mail.gmail.com/ > > > Signed-off-by: Maxime Ripard <mripard@kernel.org> > > > --- > > > Documentation/gpu/automated_testing.rst | 13 +++++++++++++ > > > 1 file changed, 13 insertions(+) > > > > > > diff --git a/Documentation/gpu/automated_testing.rst > > > b/Documentation/gpu/automated_testing.rst > > > index 469b6fb65c30..2dd0e221c2c3 100644 > > > --- a/Documentation/gpu/automated_testing.rst > > > +++ b/Documentation/gpu/automated_testing.rst > > > @@ -67,6 +67,19 @@ Lists the tests that for a given driver on a > > > specific hardware revision are > > > known to behave unreliably. These tests won't cause a job to fail > > > regardless of > > > the result. They will still be run. > > > +Each new flake entry must be associated with a link to a bug report to > > > > What do you mean by but report? Just a link to an email to the mailing > > list is enough? > > > > Also, I had made a mistake to the first flakes lists, which I corrected > > with https://www.spinics.net/lists/kernel/msg4959629.html (there was a > > bug in my script which ended up erroneous adding a bunch of tests in the > > flake list, so I cleaned them up), I would like to kind request to let > > me add those documentation in a future patch to not block that patch > > series. > > > > Thanks > > Helen > > > > > > > +the author of the affected driver, the board name or Device Tree name of > > > +the board, the first kernel version affected, and an approximation of > > > +the failure rate. > > > + > > > +They should be provided under the following format:: > > > + > > > + # Bug Report: $LORE_OR_PATCHWORK_URL > > I wonder if the commit adding the test into the flakes.txt file with and > Acked-by from the device maintainer shouldn't be already considered the Bug > Report. I guess it could, yes. I think I'd still prefer the link since it would allow to also evaluate if the issue is fixed or not now. > > > + # Board Name: broken-board.dtb > > Maybe Board Name isn't required, since it is already in the name of the > file. I have no idea how the i915 naming works, but on ARM at least the name of the file contains the name of the SoC, not the board where it was observed. > > > + # Version: 6.6-rc1 > > > + # Failure Rate: 100 > > Maybe also: > > # Pipeline url: > https://gitlab.freedesktop.org/helen.fornazier/linux/-/pipelines/1014435 Sounds like a good idea yeah :) Are those artifacts archived/deleted at some point or do they stick around forever? > All this info will complicated a bit the update-xfails.py script, but well, > we can handle... > (see https://patchwork.kernel.org/project/dri-devel/patch/20231020034124.136295-4-helen.koike@collabora.com/ > ) > We need to update that script to make life easier. I guess we could just add a template for now? It would keep the script easy and yet still hint its user that we want more data > Vignesh sent a patch adding at least the pipeline url to the file > https://patchwork.kernel.org/project/linux-arm-msm/patch/20231019070650.61159-9-vignesh.raman@collabora.com/ > but to meet this doc that needs to be updated too. Sure, I'll update it Maxime
On 23/10/2023 12:09, Maxime Ripard wrote: > On Fri, Oct 20, 2023 at 01:33:59AM -0300, Helen Koike wrote: >> On 19/10/2023 13:51, Helen Koike wrote: >>> On 19/10/2023 06:46, Maxime Ripard wrote: >>>> Flaky tests can be very difficult to reproduce after the facts, which >>>> will make it even harder to ever fix. >>>> >>>> Let's document the metadata we agreed on to provide more context to >>>> anyone trying to address these fixes. >>>> >>>> Link: https://lore.kernel.org/dri-devel/CAPj87rPbJ1V1-R7WMTHkDat2A4nwSd61Df9mdGH2PR=ZzxaU=Q@mail.gmail.com/ >>>> Signed-off-by: Maxime Ripard <mripard@kernel.org> >>>> --- >>>> Documentation/gpu/automated_testing.rst | 13 +++++++++++++ >>>> 1 file changed, 13 insertions(+) >>>> >>>> diff --git a/Documentation/gpu/automated_testing.rst >>>> b/Documentation/gpu/automated_testing.rst >>>> index 469b6fb65c30..2dd0e221c2c3 100644 >>>> --- a/Documentation/gpu/automated_testing.rst >>>> +++ b/Documentation/gpu/automated_testing.rst >>>> @@ -67,6 +67,19 @@ Lists the tests that for a given driver on a >>>> specific hardware revision are >>>> known to behave unreliably. These tests won't cause a job to fail >>>> regardless of >>>> the result. They will still be run. >>>> +Each new flake entry must be associated with a link to a bug report to >>> >>> What do you mean by but report? Just a link to an email to the mailing >>> list is enough? >>> >>> Also, I had made a mistake to the first flakes lists, which I corrected >>> with https://www.spinics.net/lists/kernel/msg4959629.html (there was a >>> bug in my script which ended up erroneous adding a bunch of tests in the >>> flake list, so I cleaned them up), I would like to kind request to let >>> me add those documentation in a future patch to not block that patch >>> series. >>> >>> Thanks >>> Helen >>> >>> >>>> +the author of the affected driver, the board name or Device Tree name of >>>> +the board, the first kernel version affected, and an approximation of >>>> +the failure rate. >>>> + >>>> +They should be provided under the following format:: >>>> + >>>> + # Bug Report: $LORE_OR_PATCHWORK_URL >> >> I wonder if the commit adding the test into the flakes.txt file with and >> Acked-by from the device maintainer shouldn't be already considered the Bug >> Report. > > I guess it could, yes. I think I'd still prefer the link since it would > allow to also evaluate if the issue is fixed or not now. > >>>> + # Board Name: broken-board.dtb >> >> Maybe Board Name isn't required, since it is already in the name of the >> file. > > I have no idea how the i915 naming works, but on ARM at least the name > of the file contains the name of the SoC, not the board where it was > observed. right, yeah we could use the dtb to be more clear/precise, no problem. > >>>> + # Version: 6.6-rc1 >>>> + # Failure Rate: 100 >> >> Maybe also: >> >> # Pipeline url: >> https://gitlab.freedesktop.org/helen.fornazier/linux/-/pipelines/1014435 > > Sounds like a good idea yeah :) Are those artifacts archived/deleted at > some point or do they stick around forever? Good point, I asked the admins, they stick for 4 weeks (could be more, but it is not forever) :( > >> All this info will complicated a bit the update-xfails.py script, but well, >> we can handle... >> (see https://patchwork.kernel.org/project/dri-devel/patch/20231020034124.136295-4-helen.koike@collabora.com/ >> ) >> We need to update that script to make life easier. > > I guess we could just add a template for now? It would keep the script > easy and yet still hint its user that we want more data ack Thanks Helen > >> Vignesh sent a patch adding at least the pipeline url to the file >> https://patchwork.kernel.org/project/linux-arm-msm/patch/20231019070650.61159-9-vignesh.raman@collabora.com/ >> but to meet this doc that needs to be updated too. > > Sure, I'll update it > > Maxime
On Wed, Oct 25, 2023 at 09:47:07AM -0300, Helen Koike wrote: > > > > > + # Version: 6.6-rc1 > > > > > + # Failure Rate: 100 > > > > > > Maybe also: > > > > > > # Pipeline url: > > > https://gitlab.freedesktop.org/helen.fornazier/linux/-/pipelines/1014435 > > > > Sounds like a good idea yeah :) Are those artifacts archived/deleted at > > some point or do they stick around forever? > > Good point, I asked the admins, they stick for 4 weeks (could be more, but > it is not forever) :( That's not even a release cycle :/ I guess it's too short to be useful. We can definitely revisit if that delay is extended at some point though. Maxime
On Thu, 19 Oct 2023 11:46:09 +0200, Maxime Ripard wrote: > Flaky tests can be very difficult to reproduce after the facts, which > will make it even harder to ever fix. > > Let's document the metadata we agreed on to provide more context to > anyone trying to address these fixes. > > > [...] Applied to drm/drm-misc (drm-misc-next). Thanks! Maxime
On Thu, Oct 26, 2023 at 12:58:48PM +0200, Maxime Ripard wrote: > On Thu, 19 Oct 2023 11:46:09 +0200, Maxime Ripard wrote: > > Flaky tests can be very difficult to reproduce after the facts, which > > will make it even harder to ever fix. > > > > Let's document the metadata we agreed on to provide more context to > > anyone trying to address these fixes. > > > > > > [...] > > Applied to drm/drm-misc (drm-misc-next). b4 might have been confused, but I only applied the v2. Maxime
diff --git a/Documentation/gpu/automated_testing.rst b/Documentation/gpu/automated_testing.rst index 469b6fb65c30..2dd0e221c2c3 100644 --- a/Documentation/gpu/automated_testing.rst +++ b/Documentation/gpu/automated_testing.rst @@ -67,6 +67,19 @@ Lists the tests that for a given driver on a specific hardware revision are known to behave unreliably. These tests won't cause a job to fail regardless of the result. They will still be run. +Each new flake entry must be associated with a link to a bug report to +the author of the affected driver, the board name or Device Tree name of +the board, the first kernel version affected, and an approximation of +the failure rate. + +They should be provided under the following format:: + + # Bug Report: $LORE_OR_PATCHWORK_URL + # Board Name: broken-board.dtb + # Version: 6.6-rc1 + # Failure Rate: 100 + flaky-test + drivers/gpu/drm/ci/${DRIVER_NAME}-${HW_REVISION}-skips.txt -----------------------------------------------------------
Flaky tests can be very difficult to reproduce after the facts, which will make it even harder to ever fix. Let's document the metadata we agreed on to provide more context to anyone trying to address these fixes. Link: https://lore.kernel.org/dri-devel/CAPj87rPbJ1V1-R7WMTHkDat2A4nwSd61Df9mdGH2PR=ZzxaU=Q@mail.gmail.com/ Signed-off-by: Maxime Ripard <mripard@kernel.org> --- Documentation/gpu/automated_testing.rst | 13 +++++++++++++ 1 file changed, 13 insertions(+)