Message ID | 20230822132646.9811-1-jason-jh.lin@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/mediatek: Add spinlock for setting vblank event in atomic_begin | expand |
On Tue, Aug 22, 2023 at 10:27 PM Jason-JH.Lin <jason-jh.lin@mediatek.com> wrote: > > Add spinlock protection to avoid race condition on vblank event > between mtk_drm_crtc_atomic_begin() and mtk_drm_finish_page_flip(). > > Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek SoC MT8173.") > Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> Reviewed-by: Fei Shao <fshao@chromium.org>
Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com> On 22/08/2023 15:26, Jason-JH.Lin wrote: > Add spinlock protection to avoid race condition on vblank event > between mtk_drm_crtc_atomic_begin() and mtk_drm_finish_page_flip().
On Thu, Aug 31, 2023 at 3:12 PM Fei Shao <fshao@chromium.org> wrote: > > On Tue, Aug 22, 2023 at 10:27 PM Jason-JH.Lin <jason-jh.lin@mediatek.com> wrote: > > > > Add spinlock protection to avoid race condition on vblank event > > between mtk_drm_crtc_atomic_begin() and mtk_drm_finish_page_flip(). > > > > Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek SoC MT8173.") > > Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> > > Reviewed-by: Fei Shao <fshao@chromium.org> Also, I verified that this fixes a real world system hang issue on the MT8195 Chromebook. Tested-by: Fei Shao <fshao@chromium.org>
Il 22/08/23 15:26, Jason-JH.Lin ha scritto: > Add spinlock protection to avoid race condition on vblank event > between mtk_drm_crtc_atomic_begin() and mtk_drm_finish_page_flip(). > Hello Jason, Can you please provide more information about this race condition? (check below) > Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek SoC MT8173.") > Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> > --- > drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c > index d40142842f85..128a672fe3c9 100644 > --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c > +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c > @@ -746,6 +746,9 @@ static void mtk_drm_crtc_atomic_begin(struct drm_crtc *crtc, > crtc); > struct mtk_crtc_state *mtk_crtc_state = to_mtk_crtc_state(crtc_state); > struct mtk_drm_crtc *mtk_crtc = to_mtk_crtc(crtc); > + unsigned long flags; > + > + spin_lock_irqsave(&crtc->dev->event_lock, flags); > > if (mtk_crtc->event && mtk_crtc_state->base.event) > DRM_ERROR("new event while there is still a pending event\n"); > @@ -756,6 +759,8 @@ static void mtk_drm_crtc_atomic_begin(struct drm_crtc *crtc, ...because my suspect is that what creates the race condition in this function is the unlocked *assignment* to mtk_crtc->event, not the rest. If I'm right, you don't need to unconditionally spinlock at the beginning of this function hence ever-so-slightly improving performance compared to this version. Can you please try this one and check if this *also* solves the issue? if (mtk_crtc_state->base.event) { mtk_crtc_state->base.event->pipe = drm_crtc_index(crtc); WARN_ON(drm_crtc_vblank_get(crtc) != 0); spin_lock_irqsave(&crtc->dev->event_lock, flags); mtk_crtc->event = mtk_crtc_state->base.event; spin_lock_irqrestore(&crtc->dev->event_lock, flags); mtk_crtc_state->base.event = NULL; } P.S.: I'd try that myself, but I can't seem to reproduce the issue. Regards, Angelo
Hi Angelo, On Wed, Sep 13, 2023 at 4:35 PM AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> wrote: > > Il 22/08/23 15:26, Jason-JH.Lin ha scritto: > > Add spinlock protection to avoid race condition on vblank event > > between mtk_drm_crtc_atomic_begin() and mtk_drm_finish_page_flip(). > > > > Hello Jason, > > Can you please provide more information about this race condition? > (check below) > > > Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek SoC MT8173.") > > Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> > > --- > > drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c > > index d40142842f85..128a672fe3c9 100644 > > --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c > > +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c > > @@ -746,6 +746,9 @@ static void mtk_drm_crtc_atomic_begin(struct drm_crtc *crtc, > > crtc); > > struct mtk_crtc_state *mtk_crtc_state = to_mtk_crtc_state(crtc_state); > > struct mtk_drm_crtc *mtk_crtc = to_mtk_crtc(crtc); > > + unsigned long flags; > > + > > + spin_lock_irqsave(&crtc->dev->event_lock, flags); > > > > if (mtk_crtc->event && mtk_crtc_state->base.event) > > DRM_ERROR("new event while there is still a pending event\n"); > > @@ -756,6 +759,8 @@ static void mtk_drm_crtc_atomic_begin(struct drm_crtc *crtc, > > ...because my suspect is that what creates the race condition in this function is > the unlocked *assignment* to mtk_crtc->event, not the rest. > > If I'm right, you don't need to unconditionally spinlock at the beginning of this > function hence ever-so-slightly improving performance compared to this version. > > Can you please try this one and check if this *also* solves the issue? > > if (mtk_crtc_state->base.event) { > mtk_crtc_state->base.event->pipe = drm_crtc_index(crtc); > WARN_ON(drm_crtc_vblank_get(crtc) != 0); > > spin_lock_irqsave(&crtc->dev->event_lock, flags); > mtk_crtc->event = mtk_crtc_state->base.event; > spin_lock_irqrestore(&crtc->dev->event_lock, flags); > > mtk_crtc_state->base.event = NULL; > } > > P.S.: I'd try that myself, but I can't seem to reproduce the issue. I'm still able to reproduce it so I gave it a try, and this approach also seems to fix the issue. :) FWIW, the way I reproduce that is to toggle the night light mode on and off repeatedly through the UI panel while playing YouTube videos on my device. Jason, can you post a new version with Angelo's suggestion? Regards, Fei > > Regards, > Angelo >
Hi Angelo Thanks for the reviews. Hi Fei, Thanks for the testing. On Mon, 2023-09-18 at 18:47 +0800, Fei Shao wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > Hi Angelo, > > On Wed, Sep 13, 2023 at 4:35 PM AngeloGioacchino Del Regno > <angelogioacchino.delregno@collabora.com> wrote: > > > > Il 22/08/23 15:26, Jason-JH.Lin ha scritto: > > > Add spinlock protection to avoid race condition on vblank event > > > between mtk_drm_crtc_atomic_begin() and > mtk_drm_finish_page_flip(). > > > > > > > Hello Jason, > > > > Can you please provide more information about this race condition? > > (check below) > > > > > Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek > SoC MT8173.") > > > Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> > > > --- > > > drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 5 +++++ > > > 1 file changed, 5 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c > b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c > > > index d40142842f85..128a672fe3c9 100644 > > > --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c > > > +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c > > > @@ -746,6 +746,9 @@ static void mtk_drm_crtc_atomic_begin(struct > drm_crtc *crtc, > > > > > crtc); > > > struct mtk_crtc_state *mtk_crtc_state = > to_mtk_crtc_state(crtc_state); > > > struct mtk_drm_crtc *mtk_crtc = to_mtk_crtc(crtc); > > > + unsigned long flags; > > > + > > > + spin_lock_irqsave(&crtc->dev->event_lock, flags); > > > > > > if (mtk_crtc->event && mtk_crtc_state->base.event) > > > DRM_ERROR("new event while there is still a pending > event\n"); > > > @@ -756,6 +759,8 @@ static void mtk_drm_crtc_atomic_begin(struct > drm_crtc *crtc, > > > > ...because my suspect is that what creates the race condition in > this function is > > the unlocked *assignment* to mtk_crtc->event, not the rest. > > > > If I'm right, you don't need to unconditionally spinlock at the > beginning of this > > function hence ever-so-slightly improving performance compared to > this version. > > > > Can you please try this one and check if this *also* solves the > issue? > > > > if (mtk_crtc_state->base.event) { > > mtk_crtc_state->base.event->pipe = > drm_crtc_index(crtc); > > WARN_ON(drm_crtc_vblank_get(crtc) != 0); > > > > spin_lock_irqsave(&crtc->dev->event_lock, flags); > > mtk_crtc->event = mtk_crtc_state->base.event; > > spin_lock_irqrestore(&crtc->dev->event_lock, > flags); > > > > mtk_crtc_state->base.event = NULL; > > } > > > > P.S.: I'd try that myself, but I can't seem to reproduce the issue. > I can't reproduce the system hang issue reported from customer by toggling the night light mode in UI panel either. But I can see the error message "new event while there is still a pending event" when I was reproducing the issue. Here is the debug info from "Wei-Shun <weishunc@google.com>": From the kernel tracing: 99342.377173: mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler 99342.377226: drm_crtc_send_vblank_event <-mtk_crtc_ddp_irq 99342.393831: mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler 99342.393887: drm_crtc_send_vblank_event <-mtk_crtc_ddp_irq 99342.410469: mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler 99342.410519: drm_crtc_send_vblank_event <-mtk_crtc_ddp_irq 99342.427094: mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler 99342.443831: mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler 99342.460495: mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler 99342.477157: mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler 99342.493841: mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler 99342.510453: mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler Every mtk_crtc_ddp_irq should come with a drm_crtc_send_vblank_event. However, when the issue happens, the mtk_crtc_ddp_irq keeps firing but no drm_crtc_send_vblank_event. I suspect this is the main reason causing flip_done timeout. In mtk_drm_crtc.c, mtk_crtc->config_updating and mtk_crtc- >pending_needs_vblank are the conditions that may impact the vblank event. static void mtk_drm_finish_page_flip(struct mtk_drm_crtc *mtk_crtc) { struct drm_crtc *crtc = &mtk_crtc->base; unsigned long flags; drm_crtc_handle_vblank(&mtk_crtc->base); spin_lock_irqsave(&crtc->dev->event_lock, flags); ==> if (!mtk_crtc->config_updating && mtk_crtc->pending_needs_vblank) { mtk_drm_crtc_finish_page_flip(mtk_crtc); mtk_crtc->pending_needs_vblank = false; } spin_unlock_irqrestore(&crtc->dev->event_lock, flags); } There are 3 lines are called in mtk_drm_crtc_finish_page_flip(): drm_crtc_send_vblank_event(crtc, mtk_crtc->event); drm_crtc_vblank_put(crtc); mtk_crtc->event = NULL; So I want to protect these 3 things to avoid them encountering race conditions: mtk_crtc_state->base.event //will be provided on atomic_commit complete drm_crtc_vblank_get(crtc) mtk_crtc->event I have tried to protect this line only: spin_lock_irqsave(&crtc->dev->event_lock, flags); mtk_crtc->event = mtk_crtc_state->base.event; spin_lock_irqrestore(&crtc->dev->event_lock,flags); I can still see the error message "new event while there is still a pending event" when I toggled the night light mode. Maybe that is mtk_crtc->event here: if (mtk_crtc->event && mtk_crtc_state->base.event) haven't set to NULL by mtk_drm_crtc_finish_page_flip(). But that not a problem because we will protect mtk_crtc->event here: mtk_crtc->event = mtk_crtc_state->base.event; So should we remove that error message since it doesn't help and may cause the confuse after we fix this issue? > I'm still able to reproduce it so I gave it a try, and this approach > also seems to fix the issue. :) > FWIW, the way I reproduce that is to toggle the night light mode on > and off repeatedly through the UI panel while playing YouTube videos > on my device. > > Jason, can you post a new version with Angelo's suggestion? > OK, I can take Angelo's suggestion. Thanks! But I am wondering if we should remove this log? DRM_ERROR("new event while there is still a pending event\n"); Regards, Jason-JH.Lin > Regards, > Fei > > > > > Regards, > > Angelo > > >
Il 19/09/23 09:37, Jason-JH Lin (林睿祥) ha scritto: > Hi Angelo > > Thanks for the reviews. > > Hi Fei, > > Thanks for the testing. > > On Mon, 2023-09-18 at 18:47 +0800, Fei Shao wrote: >> >> External email : Please do not click links or open attachments until >> you have verified the sender or the content. >> Hi Angelo, >> >> On Wed, Sep 13, 2023 at 4:35 PM AngeloGioacchino Del Regno >> <angelogioacchino.delregno@collabora.com> wrote: >>> >>> Il 22/08/23 15:26, Jason-JH.Lin ha scritto: >>>> Add spinlock protection to avoid race condition on vblank event >>>> between mtk_drm_crtc_atomic_begin() and >> mtk_drm_finish_page_flip(). >>>> >>> >>> Hello Jason, >>> >>> Can you please provide more information about this race condition? >>> (check below) >>> >>>> Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek >> SoC MT8173.") >>>> Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> >>>> --- >>>> drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 5 +++++ >>>> 1 file changed, 5 insertions(+) >>>> >>>> diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c >> b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c >>>> index d40142842f85..128a672fe3c9 100644 >>>> --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c >>>> +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c >>>> @@ -746,6 +746,9 @@ static void mtk_drm_crtc_atomic_begin(struct >> drm_crtc *crtc, >>> >>> >> crtc); >>>> struct mtk_crtc_state *mtk_crtc_state = >> to_mtk_crtc_state(crtc_state); >>>> struct mtk_drm_crtc *mtk_crtc = to_mtk_crtc(crtc); >>>> + unsigned long flags; >>>> + >>>> + spin_lock_irqsave(&crtc->dev->event_lock, flags); >>>> >>>> if (mtk_crtc->event && mtk_crtc_state->base.event) >>>> DRM_ERROR("new event while there is still a pending >> event\n"); >>>> @@ -756,6 +759,8 @@ static void mtk_drm_crtc_atomic_begin(struct >> drm_crtc *crtc, >>> >>> ...because my suspect is that what creates the race condition in >> this function is >>> the unlocked *assignment* to mtk_crtc->event, not the rest. >>> >>> If I'm right, you don't need to unconditionally spinlock at the >> beginning of this >>> function hence ever-so-slightly improving performance compared to >> this version. >>> >>> Can you please try this one and check if this *also* solves the >> issue? >>> >>> if (mtk_crtc_state->base.event) { >>> mtk_crtc_state->base.event->pipe = >> drm_crtc_index(crtc); >>> WARN_ON(drm_crtc_vblank_get(crtc) != 0); >>> >>> spin_lock_irqsave(&crtc->dev->event_lock, flags); >>> mtk_crtc->event = mtk_crtc_state->base.event; >>> spin_lock_irqrestore(&crtc->dev->event_lock, >> flags); >>> >>> mtk_crtc_state->base.event = NULL; >>> } >>> >>> P.S.: I'd try that myself, but I can't seem to reproduce the issue. >> > I can't reproduce the system hang issue reported from customer by > toggling the night light mode in UI panel either. > But I can see the error message "new event while there is still a > pending event" when I was reproducing the issue. > > Here is the debug info from "Wei-Shun <weishunc@google.com>": > > From the kernel tracing: > 99342.377173: mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler > 99342.377226: > drm_crtc_send_vblank_event <-mtk_crtc_ddp_irq > 99342.393831: > mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler > 99342.393887: > drm_crtc_send_vblank_event <-mtk_crtc_ddp_irq > 99342.410469: > mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler > 99342.410519: > drm_crtc_send_vblank_event <-mtk_crtc_ddp_irq > 99342.427094: > mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler > 99342.443831: > mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler > 99342.460495: > mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler > 99342.477157: > mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler > 99342.493841: > mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler > 99342.510453: > mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler > > Every mtk_crtc_ddp_irq should come with a drm_crtc_send_vblank_event. > However, when the issue happens, the mtk_crtc_ddp_irq keeps firing but > no drm_crtc_send_vblank_event. I suspect this is the main reason > causing flip_done timeout. > > In mtk_drm_crtc.c, mtk_crtc->config_updating and mtk_crtc- >> pending_needs_vblank are the conditions that may impact the vblank > event. > > static void mtk_drm_finish_page_flip(struct mtk_drm_crtc *mtk_crtc) > { > struct drm_crtc *crtc = &mtk_crtc->base; > unsigned long flags; > > drm_crtc_handle_vblank(&mtk_crtc->base); > > spin_lock_irqsave(&crtc->dev->event_lock, flags); > ==> if (!mtk_crtc->config_updating && mtk_crtc->pending_needs_vblank) { > mtk_drm_crtc_finish_page_flip(mtk_crtc); > mtk_crtc->pending_needs_vblank = false; > } > spin_unlock_irqrestore(&crtc->dev->event_lock, flags); > } > > > There are 3 lines are called in mtk_drm_crtc_finish_page_flip(): > drm_crtc_send_vblank_event(crtc, mtk_crtc->event); > drm_crtc_vblank_put(crtc); > mtk_crtc->event = NULL; > > So I want to protect these 3 things to avoid them encountering race > conditions: > mtk_crtc_state->base.event //will be provided on atomic_commit complete > drm_crtc_vblank_get(crtc) > mtk_crtc->event > > > I have tried to protect this line only: > > spin_lock_irqsave(&crtc->dev->event_lock, flags); > mtk_crtc->event = mtk_crtc_state->base.event; > spin_lock_irqrestore(&crtc->dev->event_lock,flags); > > I can still see the error message "new event while there is still a > pending event" when I toggled the night light mode. > > Maybe that is mtk_crtc->event here: > if (mtk_crtc->event && mtk_crtc_state->base.event) > haven't set to NULL by mtk_drm_crtc_finish_page_flip(). > > But that not a problem because we will protect mtk_crtc->event here: > mtk_crtc->event = mtk_crtc_state->base.event; > > > So should we remove that error message since it doesn't help and may > cause the confuse after we fix this issue? > > >> I'm still able to reproduce it so I gave it a try, and this approach >> also seems to fix the issue. :) >> FWIW, the way I reproduce that is to toggle the night light mode on >> and off repeatedly through the UI panel while playing YouTube videos >> on my device. >> >> Jason, can you post a new version with Angelo's suggestion? >> > > OK, I can take Angelo's suggestion. Thanks! > But I am wondering if we should remove this log? > DRM_ERROR("new event while there is still a pending event\n"); > Thanks for the analysis. Please don't remove that line, this indicates that there might be another issue somewhere else, and someone will tackle that later. Regards, Angelo > > Regards, > Jason-JH.Lin > >> Regards, >> Fei >> >>> >>> Regards, >>> Angelo >>> >>
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c index d40142842f85..128a672fe3c9 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c @@ -746,6 +746,9 @@ static void mtk_drm_crtc_atomic_begin(struct drm_crtc *crtc, crtc); struct mtk_crtc_state *mtk_crtc_state = to_mtk_crtc_state(crtc_state); struct mtk_drm_crtc *mtk_crtc = to_mtk_crtc(crtc); + unsigned long flags; + + spin_lock_irqsave(&crtc->dev->event_lock, flags); if (mtk_crtc->event && mtk_crtc_state->base.event) DRM_ERROR("new event while there is still a pending event\n"); @@ -756,6 +759,8 @@ static void mtk_drm_crtc_atomic_begin(struct drm_crtc *crtc, mtk_crtc->event = mtk_crtc_state->base.event; mtk_crtc_state->base.event = NULL; } + + spin_unlock_irqrestore(&crtc->dev->event_lock, flags); } static void mtk_drm_crtc_atomic_flush(struct drm_crtc *crtc,
Add spinlock protection to avoid race condition on vblank event between mtk_drm_crtc_atomic_begin() and mtk_drm_finish_page_flip(). Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek SoC MT8173.") Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> --- drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 5 +++++ 1 file changed, 5 insertions(+)