Message ID | cover.1725269643.git.tjakobi@math.uni-bielefeld.de (mailing list archive) |
---|---|
Headers | show |
Series | drm/amd: fix VRR race condition during IRQ handling | expand |
On Mon Sep 2, 2024 at 2:40 AM PDT, tjakobi wrote: > From: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> > > Hello, > > this fixes a nasty race condition in the set_drr() callbacks for DCN10 > and DCN35 that has existed now since quite some time, see this GitLab > issue for reference. > > https://gitlab.freedesktop.org/drm/amd/-/issues/3142 > > The report just focuses von DCN10, but the same problem also exists in > the DCN35 code. Does the problem not exist in the following references to funcs->set_drr? drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: pipe_ctx[i]->stream_res.tg->funcs->set_drr( drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( > > With best wishes, > Tobias > > Tobias Jakobi (2): > drm/amd/display: Avoid race between dcn10_set_drr() and > dc_state_destruct() > drm/amd/display: Avoid race between dcn35_set_drr() and > dc_state_destruct() > > .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 20 +++++++++++-------- > .../amd/display/dc/hwss/dcn35/dcn35_hwseq.c | 20 +++++++++++-------- > 2 files changed, 24 insertions(+), 16 deletions(-)
On 9/8/24 09:35, Christopher Snowhill wrote: > On Mon Sep 2, 2024 at 2:40 AM PDT, tjakobi wrote: >> From: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> >> >> Hello, >> >> this fixes a nasty race condition in the set_drr() callbacks for DCN10 >> and DCN35 that has existed now since quite some time, see this GitLab >> issue for reference. >> >> https://gitlab.freedesktop.org/drm/amd/-/issues/3142 >> >> The report just focuses von DCN10, but the same problem also exists in >> the DCN35 code. > Does the problem not exist in the following references to funcs->set_drr? > > drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) > drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( > drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: pipe_ctx[i]->stream_res.tg->funcs->set_drr( > drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) > drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( > drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) > drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( > drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) > drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( > drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) > drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( Maybe. But the big difference I see here, is that in this code there isn't even any kind of NULL check applied to tg. Or most of the members of *pipe_ctx. If there really is the same kind of problem here, then one would need to rewrite a bit more code to fix stuff. E.g. in the case of dcn31_hwseq.c, the questionable code is in dcn31_reset_back_end_for_pipe(), which is static and only called from dcn31_reset_hw_ctx_wrap(). Which is assigned to the .reset_hw_ctx_wrap callback. And this specific callback, from what I can see, is only called from dce110_reset_hw_ctx_wrap(). Which is then assigned to the .apply_ctx_to_hw callback. The callback is only called from dc_commit_state_no_check(). That one is static again, and called from dc_commit_streams(). I could trace this even further. My point is: I don't think this is called from any IRQ handler code. And given the depth and complexity of the callgraph, I have to admit, that, at least at this point, this is a bit over my head. Sure, I could now sprinkle a bunch of x != NULL in the code, but that would be merely voodoo. And I usually try to have a theoretical basis when I apply changes to code. Maybe if someone from the AMD display team could give some insight if there still is potentially vulnerable code in some of the instances that Christopher has posted, then I would gladly take a look. With best wishes, Tobias > >> With best wishes, >> Tobias >> >> Tobias Jakobi (2): >> drm/amd/display: Avoid race between dcn10_set_drr() and >> dc_state_destruct() >> drm/amd/display: Avoid race between dcn35_set_drr() and >> dc_state_destruct() >> >> .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 20 +++++++++++-------- >> .../amd/display/dc/hwss/dcn35/dcn35_hwseq.c | 20 +++++++++++-------- >> 2 files changed, 24 insertions(+), 16 deletions(-)
On Sun Sep 8, 2024 at 4:23 AM PDT, Tobias Jakobi wrote: > On 9/8/24 09:35, Christopher Snowhill wrote: > > > On Mon Sep 2, 2024 at 2:40 AM PDT, tjakobi wrote: > >> From: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> > >> > >> Hello, > >> > >> this fixes a nasty race condition in the set_drr() callbacks for DCN10 > >> and DCN35 that has existed now since quite some time, see this GitLab > >> issue for reference. > >> > >> https://gitlab.freedesktop.org/drm/amd/-/issues/3142 > >> > >> The report just focuses von DCN10, but the same problem also exists in > >> the DCN35 code. > > Does the problem not exist in the following references to funcs->set_drr? > > > > drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) > > drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( > > drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: pipe_ctx[i]->stream_res.tg->funcs->set_drr( > > drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) > > drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( > > drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) > > drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( > > drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) > > drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( > > drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) > > drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( > > Maybe. But the big difference I see here, is that in this code there > isn't even any kind of NULL check applied to tg. Or most of the members > of *pipe_ctx. If there really is the same kind of problem here, then one > would need to rewrite a bit more code to fix stuff. > > E.g. in the case of dcn31_hwseq.c, the questionable code is in > dcn31_reset_back_end_for_pipe(), which is static and only called from > dcn31_reset_hw_ctx_wrap(). Which is assigned to the .reset_hw_ctx_wrap > callback. And this specific callback, from what I can see, is only > called from dce110_reset_hw_ctx_wrap(). Which is then assigned to the > .apply_ctx_to_hw callback. The callback is only called from > dc_commit_state_no_check(). That one is static again, and called from > dc_commit_streams(). > > I could trace this even further. My point is: I don't think this is > called from any IRQ handler code. And given the depth and complexity of > the callgraph, I have to admit, that, at least at this point, this is a > bit over my head. > > Sure, I could now sprinkle a bunch of x != NULL in the code, but that > would be merely voodoo. And I usually try to have a theoretical basis > when I apply changes to code. > > Maybe if someone from the AMD display team could give some insight if > there still is potentially vulnerable code in some of the instances that > Christopher has posted, then I would gladly take a look. Sorry, I was taking a note from someone else who mentioned set_drr functions, and wasn't aware that none of the other implementations happen to be called from IRQ handlers. Thanks for looking into this. -Christopher > With best wishes, > Tobias > > > > >> With best wishes, > >> Tobias > >> > >> Tobias Jakobi (2): > >> drm/amd/display: Avoid race between dcn10_set_drr() and > >> dc_state_destruct() > >> drm/amd/display: Avoid race between dcn35_set_drr() and > >> dc_state_destruct() > >> > >> .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 20 +++++++++++-------- > >> .../amd/display/dc/hwss/dcn35/dcn35_hwseq.c | 20 +++++++++++-------- > >> 2 files changed, 24 insertions(+), 16 deletions(-)
On Sun, Sep 8, 2024 at 7:23 AM Tobias Jakobi <tjakobi@math.uni-bielefeld.de> wrote: > > On 9/8/24 09:35, Christopher Snowhill wrote: > > > On Mon Sep 2, 2024 at 2:40 AM PDT, tjakobi wrote: > >> From: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> > >> > >> Hello, > >> > >> this fixes a nasty race condition in the set_drr() callbacks for DCN10 > >> and DCN35 that has existed now since quite some time, see this GitLab > >> issue for reference. > >> > >> https://gitlab.freedesktop.org/drm/amd/-/issues/3142 > >> > >> The report just focuses von DCN10, but the same problem also exists in > >> the DCN35 code. > > Does the problem not exist in the following references to funcs->set_drr? > > > > drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) > > drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( > > drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: pipe_ctx[i]->stream_res.tg->funcs->set_drr( > > drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) > > drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( > > drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) > > drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( > > drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) > > drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( > > drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) > > drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( > > Maybe. But the big difference I see here, is that in this code there > isn't even any kind of NULL check applied to tg. Or most of the members > of *pipe_ctx. If there really is the same kind of problem here, then one > would need to rewrite a bit more code to fix stuff. > > E.g. in the case of dcn31_hwseq.c, the questionable code is in > dcn31_reset_back_end_for_pipe(), which is static and only called from > dcn31_reset_hw_ctx_wrap(). Which is assigned to the .reset_hw_ctx_wrap > callback. And this specific callback, from what I can see, is only > called from dce110_reset_hw_ctx_wrap(). Which is then assigned to the > .apply_ctx_to_hw callback. The callback is only called from > dc_commit_state_no_check(). That one is static again, and called from > dc_commit_streams(). > > I could trace this even further. My point is: I don't think this is > called from any IRQ handler code. And given the depth and complexity of > the callgraph, I have to admit, that, at least at this point, this is a > bit over my head. > > Sure, I could now sprinkle a bunch of x != NULL in the code, but that > would be merely voodoo. And I usually try to have a theoretical basis > when I apply changes to code. > > Maybe if someone from the AMD display team could give some insight if > there still is potentially vulnerable code in some of the instances that > Christopher has posted, then I would gladly take a look. @Wentland, Harry can you confirm this? Alex > > With best wishes, > Tobias > > > > >> With best wishes, > >> Tobias > >> > >> Tobias Jakobi (2): > >> drm/amd/display: Avoid race between dcn10_set_drr() and > >> dc_state_destruct() > >> drm/amd/display: Avoid race between dcn35_set_drr() and > >> dc_state_destruct() > >> > >> .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 20 +++++++++++-------- > >> .../amd/display/dc/hwss/dcn35/dcn35_hwseq.c | 20 +++++++++++-------- > >> 2 files changed, 24 insertions(+), 16 deletions(-)
On 2024-09-09 13:11, Alex Deucher wrote: > On Sun, Sep 8, 2024 at 7:23 AM Tobias Jakobi > <tjakobi@math.uni-bielefeld.de> wrote: >> >> On 9/8/24 09:35, Christopher Snowhill wrote: >> >>> On Mon Sep 2, 2024 at 2:40 AM PDT, tjakobi wrote: >>>> From: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> >>>> >>>> Hello, >>>> >>>> this fixes a nasty race condition in the set_drr() callbacks for DCN10 >>>> and DCN35 that has existed now since quite some time, see this GitLab >>>> issue for reference. >>>> >>>> https://gitlab.freedesktop.org/drm/amd/-/issues/3142 >>>> >>>> The report just focuses von DCN10, but the same problem also exists in >>>> the DCN35 code. >>> Does the problem not exist in the following references to funcs->set_drr? >>> >>> drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) >>> drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( >>> drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: pipe_ctx[i]->stream_res.tg->funcs->set_drr( >>> drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) >>> drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( >>> drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) >>> drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( >>> drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) >>> drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( >>> drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) >>> drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( >> >> Maybe. But the big difference I see here, is that in this code there >> isn't even any kind of NULL check applied to tg. Or most of the members >> of *pipe_ctx. If there really is the same kind of problem here, then one >> would need to rewrite a bit more code to fix stuff. >> >> E.g. in the case of dcn31_hwseq.c, the questionable code is in >> dcn31_reset_back_end_for_pipe(), which is static and only called from >> dcn31_reset_hw_ctx_wrap(). Which is assigned to the .reset_hw_ctx_wrap >> callback. And this specific callback, from what I can see, is only >> called from dce110_reset_hw_ctx_wrap(). Which is then assigned to the >> .apply_ctx_to_hw callback. The callback is only called from >> dc_commit_state_no_check(). That one is static again, and called from >> dc_commit_streams(). >> >> I could trace this even further. My point is: I don't think this is >> called from any IRQ handler code. And given the depth and complexity of >> the callgraph, I have to admit, that, at least at this point, this is a >> bit over my head. >> >> Sure, I could now sprinkle a bunch of x != NULL in the code, but that >> would be merely voodoo. And I usually try to have a theoretical basis >> when I apply changes to code. >> >> Maybe if someone from the AMD display team could give some insight if >> there still is potentially vulnerable code in some of the instances that >> Christopher has posted, then I would gladly take a look. > > @Wentland, Harry can you confirm this? > As Tobias said, without extensive analysis and trace of the code in all possible use-case it's hard to say there's no possible way the other set_drr calls could potentially have a similar issue. I think Tobias' analysis is sound and this fixes a number of issues, hence my RB. Harry > Alex > >> >> With best wishes, >> Tobias >> >>> >>>> With best wishes, >>>> Tobias >>>> >>>> Tobias Jakobi (2): >>>> drm/amd/display: Avoid race between dcn10_set_drr() and >>>> dc_state_destruct() >>>> drm/amd/display: Avoid race between dcn35_set_drr() and >>>> dc_state_destruct() >>>> >>>> .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 20 +++++++++++-------- >>>> .../amd/display/dc/hwss/dcn35/dcn35_hwseq.c | 20 +++++++++++-------- >>>> 2 files changed, 24 insertions(+), 16 deletions(-)
On 9/9/24 19:18, Harry Wentland wrote: > > On 2024-09-09 13:11, Alex Deucher wrote: >> On Sun, Sep 8, 2024 at 7:23 AM Tobias Jakobi >> <tjakobi@math.uni-bielefeld.de> wrote: >>> On 9/8/24 09:35, Christopher Snowhill wrote: >>> >>>> On Mon Sep 2, 2024 at 2:40 AM PDT, tjakobi wrote: >>>>> From: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> >>>>> >>>>> Hello, >>>>> >>>>> this fixes a nasty race condition in the set_drr() callbacks for DCN10 >>>>> and DCN35 that has existed now since quite some time, see this GitLab >>>>> issue for reference. >>>>> >>>>> https://gitlab.freedesktop.org/drm/amd/-/issues/3142 >>>>> >>>>> The report just focuses von DCN10, but the same problem also exists in >>>>> the DCN35 code. >>>> Does the problem not exist in the following references to funcs->set_drr? >>>> >>>> drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) >>>> drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( >>>> drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: pipe_ctx[i]->stream_res.tg->funcs->set_drr( >>>> drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) >>>> drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( >>>> drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) >>>> drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( >>>> drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) >>>> drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( >>>> drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr) >>>> drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr( >>> Maybe. But the big difference I see here, is that in this code there >>> isn't even any kind of NULL check applied to tg. Or most of the members >>> of *pipe_ctx. If there really is the same kind of problem here, then one >>> would need to rewrite a bit more code to fix stuff. >>> >>> E.g. in the case of dcn31_hwseq.c, the questionable code is in >>> dcn31_reset_back_end_for_pipe(), which is static and only called from >>> dcn31_reset_hw_ctx_wrap(). Which is assigned to the .reset_hw_ctx_wrap >>> callback. And this specific callback, from what I can see, is only >>> called from dce110_reset_hw_ctx_wrap(). Which is then assigned to the >>> .apply_ctx_to_hw callback. The callback is only called from >>> dc_commit_state_no_check(). That one is static again, and called from >>> dc_commit_streams(). >>> >>> I could trace this even further. My point is: I don't think this is >>> called from any IRQ handler code. And given the depth and complexity of >>> the callgraph, I have to admit, that, at least at this point, this is a >>> bit over my head. >>> >>> Sure, I could now sprinkle a bunch of x != NULL in the code, but that >>> would be merely voodoo. And I usually try to have a theoretical basis >>> when I apply changes to code. >>> >>> Maybe if someone from the AMD display team could give some insight if >>> there still is potentially vulnerable code in some of the instances that >>> Christopher has posted, then I would gladly take a look. >> @Wentland, Harry can you confirm this? >> > As Tobias said, without extensive analysis and trace of the code in all > possible use-case it's hard to say there's no possible way the other > set_drr calls could potentially have a similar issue. > > I think Tobias' analysis is sound and this fixes a number of issues, hence > my RB. In fact one user pointed out another potentially vulnerable callback: https://gitlab.freedesktop.org/drm/amd/-/issues/3142#note_2560109 Which is set_drr() in dce110_hwseq.c -- from which we know that it's called from IRQ handler code. Also the backtrace that he posted confirms this. That code seems to be a bit older than the DCN10/DCN25 code, as it lacks any kind of NULL-check. I have posted a patch that more or less copies over the DCN10/35 code. Still waiting for conclusive feedback if the patch does something. If it does, I'm going to post it to amd-gfx as well. With best wishes, Tobias > > Harry > >> Alex >> >>> With best wishes, >>> Tobias >>> >>>>> With best wishes, >>>>> Tobias >>>>> >>>>> Tobias Jakobi (2): >>>>> drm/amd/display: Avoid race between dcn10_set_drr() and >>>>> dc_state_destruct() >>>>> drm/amd/display: Avoid race between dcn35_set_drr() and >>>>> dc_state_destruct() >>>>> >>>>> .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 20 +++++++++++-------- >>>>> .../amd/display/dc/hwss/dcn35/dcn35_hwseq.c | 20 +++++++++++-------- >>>>> 2 files changed, 24 insertions(+), 16 deletions(-)
From: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Hello, this fixes a nasty race condition in the set_drr() callbacks for DCN10 and DCN35 that has existed now since quite some time, see this GitLab issue for reference. https://gitlab.freedesktop.org/drm/amd/-/issues/3142 The report just focuses von DCN10, but the same problem also exists in the DCN35 code. With best wishes, Tobias Tobias Jakobi (2): drm/amd/display: Avoid race between dcn10_set_drr() and dc_state_destruct() drm/amd/display: Avoid race between dcn35_set_drr() and dc_state_destruct() .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 20 +++++++++++-------- .../amd/display/dc/hwss/dcn35/dcn35_hwseq.c | 20 +++++++++++-------- 2 files changed, 24 insertions(+), 16 deletions(-)