diff mbox series

[1/2] drm/msm/dp: service only one irq_hpd if there are multiple irq_hpd pending

Message ID 1618604877-28297-1-git-send-email-khsieh@codeaurora.org (mailing list archive)
State Superseded
Headers show
Series [1/2] drm/msm/dp: service only one irq_hpd if there are multiple irq_hpd pending | expand

Commit Message

Kuogee Hsieh April 16, 2021, 8:27 p.m. UTC
Some dongle may generate more than one irq_hpd events in a short period of
time. This patch will treat those irq_hpd events as single one and service
only one irq_hpd event.

Signed-off-by: Kuogee Hsieh <khsieh@codeaurora.org>
---
 drivers/gpu/drm/msm/dp/dp_display.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Stephen Boyd April 20, 2021, 10:01 p.m. UTC | #1
Quoting Kuogee Hsieh (2021-04-16 13:27:57)
> Some dongle may generate more than one irq_hpd events in a short period of
> time. This patch will treat those irq_hpd events as single one and service
> only one irq_hpd event.

Why is it bad to get multiple irq_hpd events in a short period of time?
Please tell us here in the commit text.

> 
> Signed-off-by: Kuogee Hsieh <khsieh@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/dp/dp_display.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c
> index 5a39da6..0a7d383 100644
> --- a/drivers/gpu/drm/msm/dp/dp_display.c
> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
> @@ -707,6 +707,9 @@ static int dp_irq_hpd_handle(struct dp_display_private *dp, u32 data)
>                 return 0;
>         }
>  
> +       /* only handle first irq_hpd in case of multiple irs_hpd pending */
> +       dp_del_event(dp, EV_IRQ_HPD_INT);
> +
>         ret = dp_display_usbpd_attention_cb(&dp->pdev->dev);
>         if (ret == -ECONNRESET) { /* cable unplugged */
>                 dp->core_initialized = false;
> @@ -1300,6 +1303,9 @@ static int dp_pm_suspend(struct device *dev)
>         /* host_init will be called at pm_resume */
>         dp->core_initialized = false;
>  
> +       /* system suspended, delete pending irq_hdps */
> +       dp_del_event(dp, EV_IRQ_HPD_INT);

What happens if I suspend my device and when this function is running I
toggle my monitor to use the HDMI input that is connected instead of some
other input, maybe the second HDMI input? Wouldn't that generate an HPD
interrupt to grab the attention of this device?

> +
>         mutex_unlock(&dp->event_mutex);
>  
>         return 0;
> @@ -1496,6 +1502,9 @@ int msm_dp_display_disable(struct msm_dp *dp, struct drm_encoder *encoder)
>         /* stop sentinel checking */
>         dp_del_event(dp_display, EV_DISCONNECT_PENDING_TIMEOUT);
>  
> +       /* link is down, delete pending irq_hdps */
> +       dp_del_event(dp_display, EV_IRQ_HPD_INT);
> +

I'm becoming convinced that the whole kthread design and event queue is
broken. These sorts of patches are working around the larger problem
that the kthread is running independently of the driver and irqs can
come in at any time but the event queue is not checked from the irq
handler to debounce the irq event. Is the event queue necessary at all?
I wonder if it would be simpler to just use an irq thread and process
the hpd signal from there. Then we're guaranteed to not get an irq again
until the irq thread is done processing the event. This would naturally
debounce the irq hpd event that way.

>         dp_display_disable(dp_display, 0);
>  
>         rc = dp_display_unprepare(dp);
Kuogee Hsieh April 21, 2021, 5:26 p.m. UTC | #2
On 2021-04-20 15:01, Stephen Boyd wrote:
> Quoting Kuogee Hsieh (2021-04-16 13:27:57)
>> Some dongle may generate more than one irq_hpd events in a short 
>> period of
>> time. This patch will treat those irq_hpd events as single one and 
>> service
>> only one irq_hpd event.
> 
> Why is it bad to get multiple irq_hpd events in a short period of time?
> Please tell us here in the commit text.
> 
>> 
>> Signed-off-by: Kuogee Hsieh <khsieh@codeaurora.org>
>> ---
>>  drivers/gpu/drm/msm/dp/dp_display.c | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>> 
>> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
>> b/drivers/gpu/drm/msm/dp/dp_display.c
>> index 5a39da6..0a7d383 100644
>> --- a/drivers/gpu/drm/msm/dp/dp_display.c
>> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
>> @@ -707,6 +707,9 @@ static int dp_irq_hpd_handle(struct 
>> dp_display_private *dp, u32 data)
>>                 return 0;
>>         }
>> 
>> +       /* only handle first irq_hpd in case of multiple irs_hpd 
>> pending */
>> +       dp_del_event(dp, EV_IRQ_HPD_INT);
>> +
>>         ret = dp_display_usbpd_attention_cb(&dp->pdev->dev);
>>         if (ret == -ECONNRESET) { /* cable unplugged */
>>                 dp->core_initialized = false;
>> @@ -1300,6 +1303,9 @@ static int dp_pm_suspend(struct device *dev)
>>         /* host_init will be called at pm_resume */
>>         dp->core_initialized = false;
>> 
>> +       /* system suspended, delete pending irq_hdps */
>> +       dp_del_event(dp, EV_IRQ_HPD_INT);
> 
> What happens if I suspend my device and when this function is running I
> toggle my monitor to use the HDMI input that is connected instead of 
> some
> other input, maybe the second HDMI input? Wouldn't that generate an HPD
> interrupt to grab the attention of this device?
no,
At this time display is off. this mean dp controller is off and mainlink 
has teared down.
it will start with plug in interrupt to bring dp controller up and start 
link training.
irq_hpd can be generated only panel is at run time of operation mode and 
need attention from host.
If host is shutting down, then no need to service pending irq_hpd.

> 
>> +
>>         mutex_unlock(&dp->event_mutex);
>> 
>>         return 0;
>> @@ -1496,6 +1502,9 @@ int msm_dp_display_disable(struct msm_dp *dp, 
>> struct drm_encoder *encoder)
>>         /* stop sentinel checking */
>>         dp_del_event(dp_display, EV_DISCONNECT_PENDING_TIMEOUT);
>> 
>> +       /* link is down, delete pending irq_hdps */
>> +       dp_del_event(dp_display, EV_IRQ_HPD_INT);
>> +
> 
> I'm becoming convinced that the whole kthread design and event queue is
> broken. These sorts of patches are working around the larger problem
> that the kthread is running independently of the driver and irqs can
> come in at any time but the event queue is not checked from the irq
> handler to debounce the irq event. Is the event queue necessary at all?
> I wonder if it would be simpler to just use an irq thread and process
> the hpd signal from there. Then we're guaranteed to not get an irq 
> again
> until the irq thread is done processing the event. This would naturally
> debounce the irq hpd event that way.
event q just like bottom half of irq handler. it turns irq into event 
and handle them sequentially.
irq_hpd is asynchronous event from panel to bring up attention of hsot 
during run time of operation.
Here, the dongle is unplugged and main link had teared down so that no 
need to service pending irq_hpd if any.


> 
>>         dp_display_disable(dp_display, 0);
>> 
>>         rc = dp_display_unprepare(dp);
aravindh@codeaurora.org April 21, 2021, 6:55 p.m. UTC | #3
On 2021-04-21 10:26, khsieh@codeaurora.org wrote:
> On 2021-04-20 15:01, Stephen Boyd wrote:
>> Quoting Kuogee Hsieh (2021-04-16 13:27:57)
>>> Some dongle may generate more than one irq_hpd events in a short 
>>> period of
>>> time. This patch will treat those irq_hpd events as single one and 
>>> service
>>> only one irq_hpd event.
>> 
>> Why is it bad to get multiple irq_hpd events in a short period of 
>> time?
>> Please tell us here in the commit text.
>> 
>>> 
>>> Signed-off-by: Kuogee Hsieh <khsieh@codeaurora.org>
>>> ---
>>>  drivers/gpu/drm/msm/dp/dp_display.c | 9 +++++++++
>>>  1 file changed, 9 insertions(+)
>>> 
>>> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
>>> b/drivers/gpu/drm/msm/dp/dp_display.c
>>> index 5a39da6..0a7d383 100644
>>> --- a/drivers/gpu/drm/msm/dp/dp_display.c
>>> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
>>> @@ -707,6 +707,9 @@ static int dp_irq_hpd_handle(struct 
>>> dp_display_private *dp, u32 data)
>>>                 return 0;
>>>         }
>>> 
>>> +       /* only handle first irq_hpd in case of multiple irs_hpd 
>>> pending */
>>> +       dp_del_event(dp, EV_IRQ_HPD_INT);
>>> +
>>>         ret = dp_display_usbpd_attention_cb(&dp->pdev->dev);
>>>         if (ret == -ECONNRESET) { /* cable unplugged */
>>>                 dp->core_initialized = false;
>>> @@ -1300,6 +1303,9 @@ static int dp_pm_suspend(struct device *dev)
>>>         /* host_init will be called at pm_resume */
>>>         dp->core_initialized = false;
>>> 
>>> +       /* system suspended, delete pending irq_hdps */
>>> +       dp_del_event(dp, EV_IRQ_HPD_INT);
>> 
>> What happens if I suspend my device and when this function is running 
>> I
>> toggle my monitor to use the HDMI input that is connected instead of 
>> some
>> other input, maybe the second HDMI input? Wouldn't that generate an 
>> HPD
>> interrupt to grab the attention of this device?
> no,
> At this time display is off. this mean dp controller is off and
> mainlink has teared down.
> it will start with plug in interrupt to bring dp controller up and
> start link training.
> irq_hpd can be generated only panel is at run time of operation mode
> and need attention from host.
> If host is shutting down, then no need to service pending irq_hpd.
> 
>> 
>>> +
>>>         mutex_unlock(&dp->event_mutex);
>>> 
>>>         return 0;
>>> @@ -1496,6 +1502,9 @@ int msm_dp_display_disable(struct msm_dp *dp, 
>>> struct drm_encoder *encoder)
>>>         /* stop sentinel checking */
>>>         dp_del_event(dp_display, EV_DISCONNECT_PENDING_TIMEOUT);
>>> 
>>> +       /* link is down, delete pending irq_hdps */
>>> +       dp_del_event(dp_display, EV_IRQ_HPD_INT);
>>> +
>> 
>> I'm becoming convinced that the whole kthread design and event queue 
>> is
>> broken. These sorts of patches are working around the larger problem
>> that the kthread is running independently of the driver and irqs can
>> come in at any time but the event queue is not checked from the irq
>> handler to debounce the irq event. Is the event queue necessary at 
>> all?
>> I wonder if it would be simpler to just use an irq thread and process
>> the hpd signal from there. Then we're guaranteed to not get an irq 
>> again
>> until the irq thread is done processing the event. This would 
>> naturally
>> debounce the irq hpd event that way.
> event q just like bottom half of irq handler. it turns irq into event
> and handle them sequentially.
> irq_hpd is asynchronous event from panel to bring up attention of hsot
> during run time of operation.
> Here, the dongle is unplugged and main link had teared down so that no
> need to service pending irq_hpd if any.
> 

As Kuogee mentioned, IRQ_HPD is a message received from the panel and is 
not like your typical HW generated IRQ. There is no guarantee that we 
will not receive an IRQ_HPD until we are finished with processing of an 
earlier HPD message or an IRQ_HPD message. For example - when you run 
the protocol compliance, when we get a HPD from the sink, we are 
expected to start reading DPCD, EDID and proceed with link training. As 
soon as link training is finished (which is marked by a specific DPCD 
register write), the sink is going to issue an IRQ_HPD. At this point, 
we may not done with processing the HPD high as after link training we 
would typically notify the user mode of the newly connected display, 
etc.
> 
>> 
>>>         dp_display_disable(dp_display, 0);
>>> 
>>>         rc = dp_display_unprepare(dp);
Stephen Boyd April 28, 2021, midnight UTC | #4
Quoting aravindh@codeaurora.org (2021-04-21 11:55:21)
> On 2021-04-21 10:26, khsieh@codeaurora.org wrote:
> >>
> >>> +
> >>>         mutex_unlock(&dp->event_mutex);
> >>>
> >>>         return 0;
> >>> @@ -1496,6 +1502,9 @@ int msm_dp_display_disable(struct msm_dp *dp,
> >>> struct drm_encoder *encoder)
> >>>         /* stop sentinel checking */
> >>>         dp_del_event(dp_display, EV_DISCONNECT_PENDING_TIMEOUT);
> >>>
> >>> +       /* link is down, delete pending irq_hdps */
> >>> +       dp_del_event(dp_display, EV_IRQ_HPD_INT);
> >>> +
> >>
> >> I'm becoming convinced that the whole kthread design and event queue
> >> is
> >> broken. These sorts of patches are working around the larger problem
> >> that the kthread is running independently of the driver and irqs can
> >> come in at any time but the event queue is not checked from the irq
> >> handler to debounce the irq event. Is the event queue necessary at
> >> all?
> >> I wonder if it would be simpler to just use an irq thread and process
> >> the hpd signal from there. Then we're guaranteed to not get an irq
> >> again
> >> until the irq thread is done processing the event. This would
> >> naturally
> >> debounce the irq hpd event that way.
> > event q just like bottom half of irq handler. it turns irq into event
> > and handle them sequentially.
> > irq_hpd is asynchronous event from panel to bring up attention of hsot
> > during run time of operation.
> > Here, the dongle is unplugged and main link had teared down so that no
> > need to service pending irq_hpd if any.
> >
>
> As Kuogee mentioned, IRQ_HPD is a message received from the panel and is
> not like your typical HW generated IRQ. There is no guarantee that we
> will not receive an IRQ_HPD until we are finished with processing of an
> earlier HPD message or an IRQ_HPD message. For example - when you run
> the protocol compliance, when we get a HPD from the sink, we are
> expected to start reading DPCD, EDID and proceed with link training. As
> soon as link training is finished (which is marked by a specific DPCD
> register write), the sink is going to issue an IRQ_HPD. At this point,
> we may not done with processing the HPD high as after link training we
> would typically notify the user mode of the newly connected display,
> etc.

Given that the irq comes in and is then forked off to processing at a
later time implies that IRQ_HPD can come in at practically anytime. Case
in point, this patch, which is trying to selectively search through the
"event queue" and then remove the event that is no longer relevant
because the display is being turned off either by userspace or because
HPD has gone away. If we got rid of the queue and kthread and processed
irqs in a threaded irq handler I suspect the code would be simpler and
not have to search through an event queue when we disable the display.
Instead while disabling the display we would make sure that the irq
thread isn't running anymore with synchronize_irq() or even disable the
irq entirely, but really it would be better to just disable the irq in
the hardware with a register write to some irq mask register.

This pushes more of the logic for HPD and connect/disconnect into the
hardware and avoids reimplementing that in software: searching through
the queue, checking for duplicate events, etc.
Kuogee Hsieh April 28, 2021, 5:38 p.m. UTC | #5
On 2021-04-27 17:00, Stephen Boyd wrote:
> Quoting aravindh@codeaurora.org (2021-04-21 11:55:21)
>> On 2021-04-21 10:26, khsieh@codeaurora.org wrote:
>> >>
>> >>> +
>> >>>         mutex_unlock(&dp->event_mutex);
>> >>>
>> >>>         return 0;
>> >>> @@ -1496,6 +1502,9 @@ int msm_dp_display_disable(struct msm_dp *dp,
>> >>> struct drm_encoder *encoder)
>> >>>         /* stop sentinel checking */
>> >>>         dp_del_event(dp_display, EV_DISCONNECT_PENDING_TIMEOUT);
>> >>>
>> >>> +       /* link is down, delete pending irq_hdps */
>> >>> +       dp_del_event(dp_display, EV_IRQ_HPD_INT);
>> >>> +
>> >>
>> >> I'm becoming convinced that the whole kthread design and event queue
>> >> is
>> >> broken. These sorts of patches are working around the larger problem
>> >> that the kthread is running independently of the driver and irqs can
>> >> come in at any time but the event queue is not checked from the irq
>> >> handler to debounce the irq event. Is the event queue necessary at
>> >> all?
>> >> I wonder if it would be simpler to just use an irq thread and process
>> >> the hpd signal from there. Then we're guaranteed to not get an irq
>> >> again
>> >> until the irq thread is done processing the event. This would
>> >> naturally
>> >> debounce the irq hpd event that way.
>> > event q just like bottom half of irq handler. it turns irq into event
>> > and handle them sequentially.
>> > irq_hpd is asynchronous event from panel to bring up attention of hsot
>> > during run time of operation.
>> > Here, the dongle is unplugged and main link had teared down so that no
>> > need to service pending irq_hpd if any.
>> >
>> 
>> As Kuogee mentioned, IRQ_HPD is a message received from the panel and 
>> is
>> not like your typical HW generated IRQ. There is no guarantee that we
>> will not receive an IRQ_HPD until we are finished with processing of 
>> an
>> earlier HPD message or an IRQ_HPD message. For example - when you run
>> the protocol compliance, when we get a HPD from the sink, we are
>> expected to start reading DPCD, EDID and proceed with link training. 
>> As
>> soon as link training is finished (which is marked by a specific DPCD
>> register write), the sink is going to issue an IRQ_HPD. At this point,
>> we may not done with processing the HPD high as after link training we
>> would typically notify the user mode of the newly connected display,
>> etc.
> 
> Given that the irq comes in and is then forked off to processing at a
> later time implies that IRQ_HPD can come in at practically anytime. 
> Case
> in point, this patch, which is trying to selectively search through the
> "event queue" and then remove the event that is no longer relevant
> because the display is being turned off either by userspace or because
> HPD has gone away. If we got rid of the queue and kthread and processed
> irqs in a threaded irq handler I suspect the code would be simpler and
> not have to search through an event queue when we disable the display.
> Instead while disabling the display we would make sure that the irq
> thread isn't running anymore with synchronize_irq() or even disable the
> irq entirely, but really it would be better to just disable the irq in
> the hardware with a register write to some irq mask register.
> 
> This pushes more of the logic for HPD and connect/disconnect into the
> hardware and avoids reimplementing that in software: searching through
> the queue, checking for duplicate events, etc.

I wish we can implemented as you suggested. but it more complicate than 
that.
Let me explain below,
we have 3 transactions defined as below,

plugin transaction: irq handle do host dp ctrl initialization and link 
training. If sink_count = 0 or link train failed, then transaction 
ended. otherwise send display up uevent to frame work and wait for frame 
work thread to do mode set, start pixel clock and start video to end 
transaction.

unplugged transaction: irq handle send display off uevent to frame work 
and wait for frame work to disable pixel clock ,tear down main link and 
dp ctrl host de initialization.

irq_hpd transaction: This only happen after plugin transaction and 
before unplug transaction. irq handle read panel dpcd register and 
perform requesting action. Action including perform dp compliant 
phy/link testing.

since dongle can be plugged/unplugged at ant time, three conditions have 
to be met to avoid race condition,
1) no irq lost
2) irq happen timing order enforced at execution
3) no irq handle done in the middle transaction

for example we do not want to see
plugin --> unplug --> plugin --> unplug become plugin --> plugin--> 
unplug

The purpose of this patch is to not handle pending irq_hpd after either 
dongle or monitor had been unplugged until next plug in.
Stephen Boyd April 29, 2021, 9:26 a.m. UTC | #6
Quoting khsieh@codeaurora.org (2021-04-28 10:38:11)
> On 2021-04-27 17:00, Stephen Boyd wrote:
> > Quoting aravindh@codeaurora.org (2021-04-21 11:55:21)
> >> On 2021-04-21 10:26, khsieh@codeaurora.org wrote:
> >> >>
> >> >>> +
> >> >>>         mutex_unlock(&dp->event_mutex);
> >> >>>
> >> >>>         return 0;
> >> >>> @@ -1496,6 +1502,9 @@ int msm_dp_display_disable(struct msm_dp *dp,
> >> >>> struct drm_encoder *encoder)
> >> >>>         /* stop sentinel checking */
> >> >>>         dp_del_event(dp_display, EV_DISCONNECT_PENDING_TIMEOUT);
> >> >>>
> >> >>> +       /* link is down, delete pending irq_hdps */
> >> >>> +       dp_del_event(dp_display, EV_IRQ_HPD_INT);
> >> >>> +
> >> >>
> >> >> I'm becoming convinced that the whole kthread design and event queue
> >> >> is
> >> >> broken. These sorts of patches are working around the larger problem
> >> >> that the kthread is running independently of the driver and irqs can
> >> >> come in at any time but the event queue is not checked from the irq
> >> >> handler to debounce the irq event. Is the event queue necessary at
> >> >> all?
> >> >> I wonder if it would be simpler to just use an irq thread and process
> >> >> the hpd signal from there. Then we're guaranteed to not get an irq
> >> >> again
> >> >> until the irq thread is done processing the event. This would
> >> >> naturally
> >> >> debounce the irq hpd event that way.
> >> > event q just like bottom half of irq handler. it turns irq into event
> >> > and handle them sequentially.
> >> > irq_hpd is asynchronous event from panel to bring up attention of hsot
> >> > during run time of operation.
> >> > Here, the dongle is unplugged and main link had teared down so that no
> >> > need to service pending irq_hpd if any.
> >> >
> >>
> >> As Kuogee mentioned, IRQ_HPD is a message received from the panel and
> >> is
> >> not like your typical HW generated IRQ. There is no guarantee that we
> >> will not receive an IRQ_HPD until we are finished with processing of
> >> an
> >> earlier HPD message or an IRQ_HPD message. For example - when you run
> >> the protocol compliance, when we get a HPD from the sink, we are
> >> expected to start reading DPCD, EDID and proceed with link training.
> >> As
> >> soon as link training is finished (which is marked by a specific DPCD
> >> register write), the sink is going to issue an IRQ_HPD. At this point,
> >> we may not done with processing the HPD high as after link training we
> >> would typically notify the user mode of the newly connected display,
> >> etc.
> >
> > Given that the irq comes in and is then forked off to processing at a
> > later time implies that IRQ_HPD can come in at practically anytime.
> > Case
> > in point, this patch, which is trying to selectively search through the
> > "event queue" and then remove the event that is no longer relevant
> > because the display is being turned off either by userspace or because
> > HPD has gone away. If we got rid of the queue and kthread and processed
> > irqs in a threaded irq handler I suspect the code would be simpler and
> > not have to search through an event queue when we disable the display.
> > Instead while disabling the display we would make sure that the irq
> > thread isn't running anymore with synchronize_irq() or even disable the
> > irq entirely, but really it would be better to just disable the irq in
> > the hardware with a register write to some irq mask register.
> >
> > This pushes more of the logic for HPD and connect/disconnect into the
> > hardware and avoids reimplementing that in software: searching through
> > the queue, checking for duplicate events, etc.
>
> I wish we can implemented as you suggested. but it more complicate than
> that.
> Let me explain below,
> we have 3 transactions defined as below,
>
> plugin transaction: irq handle do host dp ctrl initialization and link
> training. If sink_count = 0 or link train failed, then transaction
> ended. otherwise send display up uevent to frame work and wait for frame
> work thread to do mode set, start pixel clock and start video to end
> transaction.

Why do we need to wait for userspace to start video? HPD is indicating
that we have something connected, so shouldn't we merely signal to
userspace that something is ready to display and then enable the irq for
IRQ_HPD?

>
> unplugged transaction: irq handle send display off uevent to frame
> work and wait for frame work to disable pixel clock ,tear down main
> link and dp ctrl host de initialization.

What do we do if userspace is slow and doesn't disable the display
before the cable is physically plugged in again?

>
> irq_hpd transaction: This only happen after plugin transaction and
> before unplug transaction. irq handle read panel dpcd register and
> perform requesting action. Action including perform dp compliant
> phy/link testing.
>
> since dongle can be plugged/unplugged at ant time, three conditions have
> to be met to avoid race condition,
> 1) no irq lost
> 2) irq happen timing order enforced at execution
> 3) no irq handle done in the middle transaction
>
> for example we do not want to see
> plugin --> unplug --> plugin --> unplug become plugin --> plugin-->
> unplug
>
> The purpose of this patch is to not handle pending irq_hpd after either
> dongle or monitor had been unplugged until next plug in.
>

I'm not suggesting to block irq handling entirely for long running
actions. A plug irq due to HPD could still notify userspace that the
display is connected but when an IRQ_HPD comes in we process it in the
irq thread instead of trying to figure out what sort of action is
necessary to quickly fork it off to a kthread to process later.

The problem seems to be that this quick forking off of the real IRQ_HPD
processing is letting the event come in, and then an unplug to come in
after that, and then a plug in to come in after that, leading to the
event queue getting full of events that are no longer relevant but still
need to be processed. If this used a workqueue instead of an open-coded
one, I'd say we should cancel any work items on the queue if an unplug
irq came in. That way we would make sure that we're not trying to do
anything with the link when it isn't present anymore.

But even then it doesn't make much sense. Userspace could be heavily
delayed after the plug in irq, when HPD is asserted, and not display
anything. The user could physically unplug and plug during that time so
we really need to not wait at all or do anything besides note the state
of the HPD when this happens. The IRQ_HPD irq is different. I don't
think we care to keep getting them if we're not done processing the
previous irq. I view it as basically an "edge" irq that we see, process,
and then if another one comes in during the processing time we ignore
it. There's only so much we can do, hence the suggestion to use a
threaded irq.

This is why IRQ_HPD is yanking the HPD line down to get the attention of
the source, but HPD high and HPD low for an extended period of time
means the cable has been plugged or unplugged. We really do care if the
line goes low for a long time, but if it only temporarily goes low for
an IRQ_HPD then we either saw it or we didn't have time to process it
yet.

It's like a person at your door ringing the doorbell. They're there (HPD
high), and they're ringing the doorbell over and over (IRQ_HPD) and
eventually they go away when you don't answer (HPD low). We don't have
to keep track of every single doorbell/IRQ_HPD event because it's mostly
a ping from the sink telling us we need to go do something, i.e. a
transitory event. The IRQ_HPD should always work once HPD is there, but
once HPD is gone we should mask it and ignore that irq until we see an
HPD high again.
Kuogee Hsieh April 29, 2021, 5:23 p.m. UTC | #7
On 2021-04-29 02:26, Stephen Boyd wrote:
> Quoting khsieh@codeaurora.org (2021-04-28 10:38:11)
>> On 2021-04-27 17:00, Stephen Boyd wrote:
>> > Quoting aravindh@codeaurora.org (2021-04-21 11:55:21)
>> >> On 2021-04-21 10:26, khsieh@codeaurora.org wrote:
>> >> >>
>> >> >>> +
>> >> >>>         mutex_unlock(&dp->event_mutex);
>> >> >>>
>> >> >>>         return 0;
>> >> >>> @@ -1496,6 +1502,9 @@ int msm_dp_display_disable(struct msm_dp *dp,
>> >> >>> struct drm_encoder *encoder)
>> >> >>>         /* stop sentinel checking */
>> >> >>>         dp_del_event(dp_display, EV_DISCONNECT_PENDING_TIMEOUT);
>> >> >>>
>> >> >>> +       /* link is down, delete pending irq_hdps */
>> >> >>> +       dp_del_event(dp_display, EV_IRQ_HPD_INT);
>> >> >>> +
>> >> >>
>> >> >> I'm becoming convinced that the whole kthread design and event queue
>> >> >> is
>> >> >> broken. These sorts of patches are working around the larger problem
>> >> >> that the kthread is running independently of the driver and irqs can
>> >> >> come in at any time but the event queue is not checked from the irq
>> >> >> handler to debounce the irq event. Is the event queue necessary at
>> >> >> all?
>> >> >> I wonder if it would be simpler to just use an irq thread and process
>> >> >> the hpd signal from there. Then we're guaranteed to not get an irq
>> >> >> again
>> >> >> until the irq thread is done processing the event. This would
>> >> >> naturally
>> >> >> debounce the irq hpd event that way.
>> >> > event q just like bottom half of irq handler. it turns irq into event
>> >> > and handle them sequentially.
>> >> > irq_hpd is asynchronous event from panel to bring up attention of hsot
>> >> > during run time of operation.
>> >> > Here, the dongle is unplugged and main link had teared down so that no
>> >> > need to service pending irq_hpd if any.
>> >> >
>> >>
>> >> As Kuogee mentioned, IRQ_HPD is a message received from the panel and
>> >> is
>> >> not like your typical HW generated IRQ. There is no guarantee that we
>> >> will not receive an IRQ_HPD until we are finished with processing of
>> >> an
>> >> earlier HPD message or an IRQ_HPD message. For example - when you run
>> >> the protocol compliance, when we get a HPD from the sink, we are
>> >> expected to start reading DPCD, EDID and proceed with link training.
>> >> As
>> >> soon as link training is finished (which is marked by a specific DPCD
>> >> register write), the sink is going to issue an IRQ_HPD. At this point,
>> >> we may not done with processing the HPD high as after link training we
>> >> would typically notify the user mode of the newly connected display,
>> >> etc.
>> >
>> > Given that the irq comes in and is then forked off to processing at a
>> > later time implies that IRQ_HPD can come in at practically anytime.
>> > Case
>> > in point, this patch, which is trying to selectively search through the
>> > "event queue" and then remove the event that is no longer relevant
>> > because the display is being turned off either by userspace or because
>> > HPD has gone away. If we got rid of the queue and kthread and processed
>> > irqs in a threaded irq handler I suspect the code would be simpler and
>> > not have to search through an event queue when we disable the display.
>> > Instead while disabling the display we would make sure that the irq
>> > thread isn't running anymore with synchronize_irq() or even disable the
>> > irq entirely, but really it would be better to just disable the irq in
>> > the hardware with a register write to some irq mask register.
>> >
>> > This pushes more of the logic for HPD and connect/disconnect into the
>> > hardware and avoids reimplementing that in software: searching through
>> > the queue, checking for duplicate events, etc.
>> 
>> I wish we can implemented as you suggested. but it more complicate 
>> than
>> that.
>> Let me explain below,
>> we have 3 transactions defined as below,
>> 
>> plugin transaction: irq handle do host dp ctrl initialization and link
>> training. If sink_count = 0 or link train failed, then transaction
>> ended. otherwise send display up uevent to frame work and wait for 
>> frame
>> work thread to do mode set, start pixel clock and start video to end
>> transaction.
> 
> Why do we need to wait for userspace to start video? HPD is indicating
> that we have something connected, so shouldn't we merely signal to
> userspace that something is ready to display and then enable the irq 
> for
> IRQ_HPD?
> 
yes, it is correct.
The problem is unplug happen after signal user space.
if unplug happen before user space start mode set and video, then it can 
just do nothing and return.
but if unplugged happen at the middle of user space doing mode set and 
start video?

remember we had run into problem system show in connect state when 
dongle unplugged, vice versa.




>> 
>> unplugged transaction: irq handle send display off uevent to frame
>> work and wait for frame work to disable pixel clock ,tear down main
>> link and dp ctrl host de initialization.
> 
> What do we do if userspace is slow and doesn't disable the display
> before the cable is physically plugged in again?
> 
plugin is not handle (re enter back into event q) until unplugged handle 
completed.
>> 
>> irq_hpd transaction: This only happen after plugin transaction and
>> before unplug transaction. irq handle read panel dpcd register and
>> perform requesting action. Action including perform dp compliant
>> phy/link testing.
>> 
>> since dongle can be plugged/unplugged at ant time, three conditions 
>> have
>> to be met to avoid race condition,
>> 1) no irq lost
>> 2) irq happen timing order enforced at execution
>> 3) no irq handle done in the middle transaction
>> 
>> for example we do not want to see
>> plugin --> unplug --> plugin --> unplug become plugin --> plugin-->
>> unplug
>> 
>> The purpose of this patch is to not handle pending irq_hpd after 
>> either
>> dongle or monitor had been unplugged until next plug in.
>> 
> 
> I'm not suggesting to block irq handling entirely for long running
> actions. A plug irq due to HPD could still notify userspace that the
> display is connected but when an IRQ_HPD comes in we process it in the
> irq thread instead of trying to figure out what sort of action is
> necessary to quickly fork it off to a kthread to process later.
> 
> The problem seems to be that this quick forking off of the real IRQ_HPD
> processing is letting the event come in, and then an unplug to come in
> after that, and then a plug in to come in after that, leading to the
> event queue getting full of events that are no longer relevant but 
> still
> need to be processed. If this used a workqueue instead of an open-coded
> one, I'd say we should cancel any work items on the queue if an unplug
> irq came in. That way we would make sure that we're not trying to do
> anything with the link when it isn't present anymore.
> 
is this same as we delete irq_hpd from event q?
What happen if the workqueue had been launched?

> But even then it doesn't make much sense. Userspace could be heavily
> delayed after the plug in irq, when HPD is asserted, and not display
> anything. The user could physically unplug and plug during that time so
> we really need to not wait at all or do anything besides note the state
> of the HPD when this happens. The IRQ_HPD irq is different. I don't
> think we care to keep getting them if we're not done processing the
> previous irq. I view it as basically an "edge" irq that we see, 
> process,
> and then if another one comes in during the processing time we ignore
> it. There's only so much we can do, hence the suggestion to use a
> threaded irq.
> 
I do not think you can ignore irq_hpd.
for example, you connect hdmi monitor to dongle then plug in dongle into 
DUT and unplug hdmi monitor immediatly.
DP driver will see plugin irq with sink_count=1 followed by irq_hpd with 
sink_count= 0.
Then we may end up you think it is in connect state but actually it 
shold be in disconnect state.
I do not think we can ignore irq_hpd but combine multiple irq_hpd into 
one and handle it.


> This is why IRQ_HPD is yanking the HPD line down to get the attention 
> of
> the source, but HPD high and HPD low for an extended period of time
> means the cable has been plugged or unplugged. We really do care if the
> line goes low for a long time, but if it only temporarily goes low for
> an IRQ_HPD then we either saw it or we didn't have time to process it
> yet.
> 
> It's like a person at your door ringing the doorbell. They're there 
> (HPD
> high), and they're ringing the doorbell over and over (IRQ_HPD) and
> eventually they go away when you don't answer (HPD low). We don't have
> to keep track of every single doorbell/IRQ_HPD event because it's 
> mostly
> a ping from the sink telling us we need to go do something, i.e. a
> transitory event. The IRQ_HPD should always work once HPD is there, but
> once HPD is gone we should mask it and ignore that irq until we see an
> HPD high again.

if amazon deliver ring the door bell 3 times, then we answer the door at 
the third time. this mean the first and second door bell ring can be 
ignored.
Also if door bell ring 3 times and left an package at door then deliver 
left, you saw deliver left form window then you still need to answer to 
find out there is package left at door. If you ignore doorbell, then you 
will missed the package.


I believe both thread_irq and event q works.
But I think event q give us more finer controller.
We are trying to fix an extreme case which generate un expected number 
of irq_hpd at an unexpected timing.
I believe other dp driver (not Qcom) will also failed on this particular 
case.
Stephen Boyd April 30, 2021, 3:11 a.m. UTC | #8
Quoting khsieh@codeaurora.org (2021-04-29 10:23:31)
> On 2021-04-29 02:26, Stephen Boyd wrote:
> > Quoting khsieh@codeaurora.org (2021-04-28 10:38:11)
> >> On 2021-04-27 17:00, Stephen Boyd wrote:
> >> > Quoting aravindh@codeaurora.org (2021-04-21 11:55:21)
> >> >> On 2021-04-21 10:26, khsieh@codeaurora.org wrote:
> >> >> >>
> >> >> >>> +
> >> >> >>>         mutex_unlock(&dp->event_mutex);
> >> >> >>>
> >> >> >>>         return 0;
> >> >> >>> @@ -1496,6 +1502,9 @@ int msm_dp_display_disable(struct msm_dp *dp,
> >> >> >>> struct drm_encoder *encoder)
> >> >> >>>         /* stop sentinel checking */
> >> >> >>>         dp_del_event(dp_display, EV_DISCONNECT_PENDING_TIMEOUT);
> >> >> >>>
> >> >> >>> +       /* link is down, delete pending irq_hdps */
> >> >> >>> +       dp_del_event(dp_display, EV_IRQ_HPD_INT);
> >> >> >>> +
> >> >> >>
> >> >> >> I'm becoming convinced that the whole kthread design and event queue
> >> >> >> is
> >> >> >> broken. These sorts of patches are working around the larger problem
> >> >> >> that the kthread is running independently of the driver and irqs can
> >> >> >> come in at any time but the event queue is not checked from the irq
> >> >> >> handler to debounce the irq event. Is the event queue necessary at
> >> >> >> all?
> >> >> >> I wonder if it would be simpler to just use an irq thread and process
> >> >> >> the hpd signal from there. Then we're guaranteed to not get an irq
> >> >> >> again
> >> >> >> until the irq thread is done processing the event. This would
> >> >> >> naturally
> >> >> >> debounce the irq hpd event that way.
> >> >> > event q just like bottom half of irq handler. it turns irq into event
> >> >> > and handle them sequentially.
> >> >> > irq_hpd is asynchronous event from panel to bring up attention of hsot
> >> >> > during run time of operation.
> >> >> > Here, the dongle is unplugged and main link had teared down so that no
> >> >> > need to service pending irq_hpd if any.
> >> >> >
> >> >>
> >> >> As Kuogee mentioned, IRQ_HPD is a message received from the panel and
> >> >> is
> >> >> not like your typical HW generated IRQ. There is no guarantee that we
> >> >> will not receive an IRQ_HPD until we are finished with processing of
> >> >> an
> >> >> earlier HPD message or an IRQ_HPD message. For example - when you run
> >> >> the protocol compliance, when we get a HPD from the sink, we are
> >> >> expected to start reading DPCD, EDID and proceed with link training.
> >> >> As
> >> >> soon as link training is finished (which is marked by a specific DPCD
> >> >> register write), the sink is going to issue an IRQ_HPD. At this point,
> >> >> we may not done with processing the HPD high as after link training we
> >> >> would typically notify the user mode of the newly connected display,
> >> >> etc.

I re-read this. I think you're saying that IRQ_HPD can come in after HPD
goes high and we finish link training? That sounds like we should enable
IRQ_HPD in the hardware once we finish link training, instead of having
it enabled all the time. Then we can finish the threaded irq handler and
the irq should be pending again once IRQ_HPD is sent over. Is there ever
a need to be processing some IRQ_HPD and then get another IRQ_HPD while
processing the first one?

> >> >
> >> > Given that the irq comes in and is then forked off to processing at a
> >> > later time implies that IRQ_HPD can come in at practically anytime.
> >> > Case
> >> > in point, this patch, which is trying to selectively search through the
> >> > "event queue" and then remove the event that is no longer relevant
> >> > because the display is being turned off either by userspace or because
> >> > HPD has gone away. If we got rid of the queue and kthread and processed
> >> > irqs in a threaded irq handler I suspect the code would be simpler and
> >> > not have to search through an event queue when we disable the display.
> >> > Instead while disabling the display we would make sure that the irq
> >> > thread isn't running anymore with synchronize_irq() or even disable the
> >> > irq entirely, but really it would be better to just disable the irq in
> >> > the hardware with a register write to some irq mask register.
> >> >
> >> > This pushes more of the logic for HPD and connect/disconnect into the
> >> > hardware and avoids reimplementing that in software: searching through
> >> > the queue, checking for duplicate events, etc.
> >>
> >> I wish we can implemented as you suggested. but it more complicate
> >> than
> >> that.
> >> Let me explain below,
> >> we have 3 transactions defined as below,
> >>
> >> plugin transaction: irq handle do host dp ctrl initialization and link
> >> training. If sink_count = 0 or link train failed, then transaction
> >> ended. otherwise send display up uevent to frame work and wait for
> >> frame
> >> work thread to do mode set, start pixel clock and start video to end
> >> transaction.
> >
> > Why do we need to wait for userspace to start video? HPD is indicating
> > that we have something connected, so shouldn't we merely signal to
> > userspace that something is ready to display and then enable the irq
> > for
> > IRQ_HPD?
> >
> yes, it is correct.
> The problem is unplug happen after signal user space.
> if unplug happen before user space start mode set and video, then it can
> just do nothing and return.
> but if unplugged happen at the middle of user space doing mode set and
> start video?

I expect the link training to fail, maybe slowly, but userspace should
still be notified that the state has changed to disconnected when the
irq comes in, around the same time that the cable is physically
disconnected.

>
> remember we had run into problem system show in connect state when
> dongle unplugged, vice versa.
>

These problems are still happening as far as I can tell. I've heard
reports that external panels are showing up as connected when no dongle
is there, implying that HPD handling is broken.

>
>
>
> >>
> >> unplugged transaction: irq handle send display off uevent to frame
> >> work and wait for frame work to disable pixel clock ,tear down main
> >> link and dp ctrl host de initialization.
> >
> > What do we do if userspace is slow and doesn't disable the display
> > before the cable is physically plugged in again?
> >
> plugin is not handle (re enter back into event q) until unplugged handle
> completed.
> >>
> >> irq_hpd transaction: This only happen after plugin transaction and
> >> before unplug transaction. irq handle read panel dpcd register and
> >> perform requesting action. Action including perform dp compliant
> >> phy/link testing.
> >>
> >> since dongle can be plugged/unplugged at ant time, three conditions
> >> have
> >> to be met to avoid race condition,
> >> 1) no irq lost
> >> 2) irq happen timing order enforced at execution
> >> 3) no irq handle done in the middle transaction
> >>
> >> for example we do not want to see
> >> plugin --> unplug --> plugin --> unplug become plugin --> plugin-->
> >> unplug
> >>
> >> The purpose of this patch is to not handle pending irq_hpd after
> >> either
> >> dongle or monitor had been unplugged until next plug in.
> >>
> >
> > I'm not suggesting to block irq handling entirely for long running
> > actions. A plug irq due to HPD could still notify userspace that the
> > display is connected but when an IRQ_HPD comes in we process it in the
> > irq thread instead of trying to figure out what sort of action is
> > necessary to quickly fork it off to a kthread to process later.
> >
> > The problem seems to be that this quick forking off of the real IRQ_HPD
> > processing is letting the event come in, and then an unplug to come in
> > after that, and then a plug in to come in after that, leading to the
> > event queue getting full of events that are no longer relevant but
> > still
> > need to be processed. If this used a workqueue instead of an open-coded
> > one, I'd say we should cancel any work items on the queue if an unplug
> > irq came in. That way we would make sure that we're not trying to do
> > anything with the link when it isn't present anymore.
> >
> is this same as we delete irq_hpd from event q?
> What happen if the workqueue had been launched?

Yes workqueues are basically functions you run on a kthread with various
ways to either make sure that the work has finished processing or to try
to cancel it out so that it either doesn't run at all because the
kthread hasn't picked it up or that it runs to completion before
continuing. The event queue should be replaced with a workqueue design,
but even better would be to use a threaded irq if possible so that
hardware can't raise more irqs while one is being handled.

>
> > But even then it doesn't make much sense. Userspace could be heavily
> > delayed after the plug in irq, when HPD is asserted, and not display
> > anything. The user could physically unplug and plug during that time so
> > we really need to not wait at all or do anything besides note the state
> > of the HPD when this happens. The IRQ_HPD irq is different. I don't
> > think we care to keep getting them if we're not done processing the
> > previous irq. I view it as basically an "edge" irq that we see,
> > process,
> > and then if another one comes in during the processing time we ignore
> > it. There's only so much we can do, hence the suggestion to use a
> > threaded irq.
> >
> I do not think you can ignore irq_hpd.
> for example, you connect hdmi monitor to dongle then plug in dongle into
> DUT and unplug hdmi monitor immediatly.
> DP driver will see plugin irq with sink_count=1 followed by irq_hpd with
> sink_count= 0.
> Then we may end up you think it is in connect state but actually it
> shold be in disconnect state.

Yes I'm saying that we should be able to use the hardware to coalesce
multiple IRQ_HPDs so that we don't unmask the IRQ_HPD until a connect
irq tells us a cable is connected, and then we mask IRQ_HPD when a
disconnect irq happens, and ignore extra IRQ_HPDs by processing the
IRQ_HPD in a threaded irq handler.

Maybe this can't work because the same hardware irq is used for the HPD
high/low and IRQ_HPD? If that's true, we should be able to keep the
IRQ_HPD masked until the event is processed by calling
dp_catalog_hpd_config_intr() to disable DP_DP_IRQ_HPD_INT_MASK when we
see it in the irq handler and only enable the irq again once we've
processed it, which I guess would be the end of dp_irq_hpd_handle()?

> I do not think we can ignore irq_hpd but combine multiple irq_hpd into
> one and handle it.
>
>
> > This is why IRQ_HPD is yanking the HPD line down to get the attention
> > of
> > the source, but HPD high and HPD low for an extended period of time
> > means the cable has been plugged or unplugged. We really do care if the
> > line goes low for a long time, but if it only temporarily goes low for
> > an IRQ_HPD then we either saw it or we didn't have time to process it
> > yet.
> >
> > It's like a person at your door ringing the doorbell. They're there
> > (HPD
> > high), and they're ringing the doorbell over and over (IRQ_HPD) and
> > eventually they go away when you don't answer (HPD low). We don't have
> > to keep track of every single doorbell/IRQ_HPD event because it's
> > mostly
> > a ping from the sink telling us we need to go do something, i.e. a
> > transitory event. The IRQ_HPD should always work once HPD is there, but
> > once HPD is gone we should mask it and ignore that irq until we see an
> > HPD high again.
>
> if amazon deliver ring the door bell 3 times, then we answer the door at
> the third time. this mean the first and second door bell ring can be
> ignored.
> Also if door bell ring 3 times and left an package at door then deliver
> left, you saw deliver left form window then you still need to answer to
> find out there is package left at door. If you ignore doorbell, then you
> will missed the package.

There isn't a package being left at the door. When HPD goes away,
there's nothing to do anymore. Stop going to the door to look for
anything. Maybe a better analogy is that the entire door and doorbell is
gone when HPD goes away.

>
>
> I believe both thread_irq and event q works.
> But I think event q give us more finer controller.

What sort of finer control? Opinions need supporting facts or they're
just opinions.

> We are trying to fix an extreme case which generate un expected number
> of irq_hpd at an unexpected timing.
> I believe other dp driver (not Qcom) will also failed on this particular
> case.
>

I don't understand why that matters. This driver being just as bad as
other drivers isn't a good quality.
Kuogee Hsieh May 3, 2021, 7:23 p.m. UTC | #9
On 2021-04-29 20:11, Stephen Boyd wrote:
> Quoting khsieh@codeaurora.org (2021-04-29 10:23:31)
>> On 2021-04-29 02:26, Stephen Boyd wrote:
>> > Quoting khsieh@codeaurora.org (2021-04-28 10:38:11)
>> >> On 2021-04-27 17:00, Stephen Boyd wrote:
>> >> > Quoting aravindh@codeaurora.org (2021-04-21 11:55:21)
>> >> >> On 2021-04-21 10:26, khsieh@codeaurora.org wrote:
>> >> >> >>
>> >> >> >>> +
>> >> >> >>>         mutex_unlock(&dp->event_mutex);
>> >> >> >>>
>> >> >> >>>         return 0;
>> >> >> >>> @@ -1496,6 +1502,9 @@ int msm_dp_display_disable(struct msm_dp *dp,
>> >> >> >>> struct drm_encoder *encoder)
>> >> >> >>>         /* stop sentinel checking */
>> >> >> >>>         dp_del_event(dp_display, EV_DISCONNECT_PENDING_TIMEOUT);
>> >> >> >>>
>> >> >> >>> +       /* link is down, delete pending irq_hdps */
>> >> >> >>> +       dp_del_event(dp_display, EV_IRQ_HPD_INT);
>> >> >> >>> +
>> >> >> >>
>> >> >> >> I'm becoming convinced that the whole kthread design and event queue
>> >> >> >> is
>> >> >> >> broken. These sorts of patches are working around the larger problem
>> >> >> >> that the kthread is running independently of the driver and irqs can
>> >> >> >> come in at any time but the event queue is not checked from the irq
>> >> >> >> handler to debounce the irq event. Is the event queue necessary at
>> >> >> >> all?
>> >> >> >> I wonder if it would be simpler to just use an irq thread and process
>> >> >> >> the hpd signal from there. Then we're guaranteed to not get an irq
>> >> >> >> again
>> >> >> >> until the irq thread is done processing the event. This would
>> >> >> >> naturally
>> >> >> >> debounce the irq hpd event that way.
>> >> >> > event q just like bottom half of irq handler. it turns irq into event
>> >> >> > and handle them sequentially.
>> >> >> > irq_hpd is asynchronous event from panel to bring up attention of hsot
>> >> >> > during run time of operation.
>> >> >> > Here, the dongle is unplugged and main link had teared down so that no
>> >> >> > need to service pending irq_hpd if any.
>> >> >> >
>> >> >>
>> >> >> As Kuogee mentioned, IRQ_HPD is a message received from the panel and
>> >> >> is
>> >> >> not like your typical HW generated IRQ. There is no guarantee that we
>> >> >> will not receive an IRQ_HPD until we are finished with processing of
>> >> >> an
>> >> >> earlier HPD message or an IRQ_HPD message. For example - when you run
>> >> >> the protocol compliance, when we get a HPD from the sink, we are
>> >> >> expected to start reading DPCD, EDID and proceed with link training.
>> >> >> As
>> >> >> soon as link training is finished (which is marked by a specific DPCD
>> >> >> register write), the sink is going to issue an IRQ_HPD. At this point,
>> >> >> we may not done with processing the HPD high as after link training we
>> >> >> would typically notify the user mode of the newly connected display,
>> >> >> etc.
> 
> I re-read this. I think you're saying that IRQ_HPD can come in after 
> HPD
> goes high and we finish link training? That sounds like we should 
> enable
> IRQ_HPD in the hardware once we finish link training, instead of having
> it enabled all the time. Then we can finish the threaded irq handler 
> and
> the irq should be pending again once IRQ_HPD is sent over. Is there 
> ever
> a need to be processing some IRQ_HPD and then get another IRQ_HPD while
> processing the first one?
yes, for example
1) plug dongle only
2) plug hdmi monitor into dongle (generated irq_hpd with sinc_count = 1)
3) unplug hdmi monitor out of the dongle (generate irq_hpd with 
sinc_count = 0)
4) go back to 2) for n times
5) unplug dongle

This patch is not fix this problem either.
The existing code has major issue which is handle irq_hpd with 
sink_count = 0 same way as handle of dongle unplugged.
I think this cause external dp display failed to work and cause crash at 
suspend/resume test case.
I will drop this patch.
I am working on handle irq_hpd with sink_count = 0 as asymmetric as 
opposite to  irq_hpd with sink_count = 1.
This means irq_hdp sink_count = 0 handle only tear down the main link 
but keep phy/aux intact.
I will re submit patch for review.


> 
>> >> >
>> >> > Given that the irq comes in and is then forked off to processing at a
>> >> > later time implies that IRQ_HPD can come in at practically anytime.
>> >> > Case
>> >> > in point, this patch, which is trying to selectively search through the
>> >> > "event queue" and then remove the event that is no longer relevant
>> >> > because the display is being turned off either by userspace or because
>> >> > HPD has gone away. If we got rid of the queue and kthread and processed
>> >> > irqs in a threaded irq handler I suspect the code would be simpler and
>> >> > not have to search through an event queue when we disable the display.
>> >> > Instead while disabling the display we would make sure that the irq
>> >> > thread isn't running anymore with synchronize_irq() or even disable the
>> >> > irq entirely, but really it would be better to just disable the irq in
>> >> > the hardware with a register write to some irq mask register.
>> >> >
>> >> > This pushes more of the logic for HPD and connect/disconnect into the
>> >> > hardware and avoids reimplementing that in software: searching through
>> >> > the queue, checking for duplicate events, etc.
>> >>
>> >> I wish we can implemented as you suggested. but it more complicate
>> >> than
>> >> that.
>> >> Let me explain below,
>> >> we have 3 transactions defined as below,
>> >>
>> >> plugin transaction: irq handle do host dp ctrl initialization and link
>> >> training. If sink_count = 0 or link train failed, then transaction
>> >> ended. otherwise send display up uevent to frame work and wait for
>> >> frame
>> >> work thread to do mode set, start pixel clock and start video to end
>> >> transaction.
>> >
>> > Why do we need to wait for userspace to start video? HPD is indicating
>> > that we have something connected, so shouldn't we merely signal to
>> > userspace that something is ready to display and then enable the irq
>> > for
>> > IRQ_HPD?
>> >
>> yes, it is correct.
>> The problem is unplug happen after signal user space.
>> if unplug happen before user space start mode set and video, then it 
>> can
>> just do nothing and return.
>> but if unplugged happen at the middle of user space doing mode set and
>> start video?
> 
> I expect the link training to fail, maybe slowly, but userspace should
> still be notified that the state has changed to disconnected when the
> irq comes in, around the same time that the cable is physically
> disconnected.
> 
>> 
>> remember we had run into problem system show in connect state when
>> dongle unplugged, vice versa.
>> 
> 
> These problems are still happening as far as I can tell. I've heard
> reports that external panels are showing up as connected when no dongle
> is there, implying that HPD handling is broken.
> 
>> 
>> 
>> 
>> >>
>> >> unplugged transaction: irq handle send display off uevent to frame
>> >> work and wait for frame work to disable pixel clock ,tear down main
>> >> link and dp ctrl host de initialization.
>> >
>> > What do we do if userspace is slow and doesn't disable the display
>> > before the cable is physically plugged in again?
>> >
>> plugin is not handle (re enter back into event q) until unplugged 
>> handle
>> completed.
>> >>
>> >> irq_hpd transaction: This only happen after plugin transaction and
>> >> before unplug transaction. irq handle read panel dpcd register and
>> >> perform requesting action. Action including perform dp compliant
>> >> phy/link testing.
>> >>
>> >> since dongle can be plugged/unplugged at ant time, three conditions
>> >> have
>> >> to be met to avoid race condition,
>> >> 1) no irq lost
>> >> 2) irq happen timing order enforced at execution
>> >> 3) no irq handle done in the middle transaction
>> >>
>> >> for example we do not want to see
>> >> plugin --> unplug --> plugin --> unplug become plugin --> plugin-->
>> >> unplug
>> >>
>> >> The purpose of this patch is to not handle pending irq_hpd after
>> >> either
>> >> dongle or monitor had been unplugged until next plug in.
>> >>
>> >
>> > I'm not suggesting to block irq handling entirely for long running
>> > actions. A plug irq due to HPD could still notify userspace that the
>> > display is connected but when an IRQ_HPD comes in we process it in the
>> > irq thread instead of trying to figure out what sort of action is
>> > necessary to quickly fork it off to a kthread to process later.
>> >
>> > The problem seems to be that this quick forking off of the real IRQ_HPD
>> > processing is letting the event come in, and then an unplug to come in
>> > after that, and then a plug in to come in after that, leading to the
>> > event queue getting full of events that are no longer relevant but
>> > still
>> > need to be processed. If this used a workqueue instead of an open-coded
>> > one, I'd say we should cancel any work items on the queue if an unplug
>> > irq came in. That way we would make sure that we're not trying to do
>> > anything with the link when it isn't present anymore.
>> >
>> is this same as we delete irq_hpd from event q?
>> What happen if the workqueue had been launched?
> 
> Yes workqueues are basically functions you run on a kthread with 
> various
> ways to either make sure that the work has finished processing or to 
> try
> to cancel it out so that it either doesn't run at all because the
> kthread hasn't picked it up or that it runs to completion before
> continuing. The event queue should be replaced with a workqueue design,
> but even better would be to use a threaded irq if possible so that
> hardware can't raise more irqs while one is being handled.
> 
>> 
>> > But even then it doesn't make much sense. Userspace could be heavily
>> > delayed after the plug in irq, when HPD is asserted, and not display
>> > anything. The user could physically unplug and plug during that time so
>> > we really need to not wait at all or do anything besides note the state
>> > of the HPD when this happens. The IRQ_HPD irq is different. I don't
>> > think we care to keep getting them if we're not done processing the
>> > previous irq. I view it as basically an "edge" irq that we see,
>> > process,
>> > and then if another one comes in during the processing time we ignore
>> > it. There's only so much we can do, hence the suggestion to use a
>> > threaded irq.
>> >
>> I do not think you can ignore irq_hpd.
>> for example, you connect hdmi monitor to dongle then plug in dongle 
>> into
>> DUT and unplug hdmi monitor immediatly.
>> DP driver will see plugin irq with sink_count=1 followed by irq_hpd 
>> with
>> sink_count= 0.
>> Then we may end up you think it is in connect state but actually it
>> shold be in disconnect state.
> 
> Yes I'm saying that we should be able to use the hardware to coalesce
> multiple IRQ_HPDs so that we don't unmask the IRQ_HPD until a connect
> irq tells us a cable is connected, and then we mask IRQ_HPD when a
> disconnect irq happens, and ignore extra IRQ_HPDs by processing the
> IRQ_HPD in a threaded irq handler.
> 
> Maybe this can't work because the same hardware irq is used for the HPD
> high/low and IRQ_HPD? If that's true, we should be able to keep the
> IRQ_HPD masked until the event is processed by calling
> dp_catalog_hpd_config_intr() to disable DP_DP_IRQ_HPD_INT_MASK when we
> see it in the irq handler and only enable the irq again once we've
> processed it, which I guess would be the end of dp_irq_hpd_handle()?
> 
>> I do not think we can ignore irq_hpd but combine multiple irq_hpd into
>> one and handle it.
>> 
>> 
>> > This is why IRQ_HPD is yanking the HPD line down to get the attention
>> > of
>> > the source, but HPD high and HPD low for an extended period of time
>> > means the cable has been plugged or unplugged. We really do care if the
>> > line goes low for a long time, but if it only temporarily goes low for
>> > an IRQ_HPD then we either saw it or we didn't have time to process it
>> > yet.
>> >
>> > It's like a person at your door ringing the doorbell. They're there
>> > (HPD
>> > high), and they're ringing the doorbell over and over (IRQ_HPD) and
>> > eventually they go away when you don't answer (HPD low). We don't have
>> > to keep track of every single doorbell/IRQ_HPD event because it's
>> > mostly
>> > a ping from the sink telling us we need to go do something, i.e. a
>> > transitory event. The IRQ_HPD should always work once HPD is there, but
>> > once HPD is gone we should mask it and ignore that irq until we see an
>> > HPD high again.
>> 
>> if amazon deliver ring the door bell 3 times, then we answer the door 
>> at
>> the third time. this mean the first and second door bell ring can be
>> ignored.
>> Also if door bell ring 3 times and left an package at door then 
>> deliver
>> left, you saw deliver left form window then you still need to answer 
>> to
>> find out there is package left at door. If you ignore doorbell, then 
>> you
>> will missed the package.
> 
> There isn't a package being left at the door. When HPD goes away,
> there's nothing to do anymore. Stop going to the door to look for
> anything. Maybe a better analogy is that the entire door and doorbell 
> is
> gone when HPD goes away.
> 
>> 
>> 
>> I believe both thread_irq and event q works.
>> But I think event q give us more finer controller.
> 
> What sort of finer control? Opinions need supporting facts or they're
> just opinions.
> 
>> We are trying to fix an extreme case which generate un expected number
>> of irq_hpd at an unexpected timing.
>> I believe other dp driver (not Qcom) will also failed on this 
>> particular
>> case.
>> 
> 
> I don't understand why that matters. This driver being just as bad as
> other drivers isn't a good quality.
Stephen Boyd May 4, 2021, 4:28 a.m. UTC | #10
Quoting khsieh@codeaurora.org (2021-05-03 12:23:31)
> On 2021-04-29 20:11, Stephen Boyd wrote:
> > Quoting khsieh@codeaurora.org (2021-04-29 10:23:31)
> >> On 2021-04-29 02:26, Stephen Boyd wrote:
> >> > Quoting khsieh@codeaurora.org (2021-04-28 10:38:11)
> >> >> On 2021-04-27 17:00, Stephen Boyd wrote:
> >> >> > Quoting aravindh@codeaurora.org (2021-04-21 11:55:21)
> >> >> >> On 2021-04-21 10:26, khsieh@codeaurora.org wrote:
> >> >> >> >>
> >> >> >> >>> +
> >> >> >> >>>         mutex_unlock(&dp->event_mutex);
> >> >> >> >>>
> >> >> >> >>>         return 0;
> >> >> >> >>> @@ -1496,6 +1502,9 @@ int msm_dp_display_disable(struct msm_dp *dp,
> >> >> >> >>> struct drm_encoder *encoder)
> >> >> >> >>>         /* stop sentinel checking */
> >> >> >> >>>         dp_del_event(dp_display, EV_DISCONNECT_PENDING_TIMEOUT);
> >> >> >> >>>
> >> >> >> >>> +       /* link is down, delete pending irq_hdps */
> >> >> >> >>> +       dp_del_event(dp_display, EV_IRQ_HPD_INT);
> >> >> >> >>> +
> >> >> >> >>
> >> >> >> >> I'm becoming convinced that the whole kthread design and event queue
> >> >> >> >> is
> >> >> >> >> broken. These sorts of patches are working around the larger problem
> >> >> >> >> that the kthread is running independently of the driver and irqs can
> >> >> >> >> come in at any time but the event queue is not checked from the irq
> >> >> >> >> handler to debounce the irq event. Is the event queue necessary at
> >> >> >> >> all?
> >> >> >> >> I wonder if it would be simpler to just use an irq thread and process
> >> >> >> >> the hpd signal from there. Then we're guaranteed to not get an irq
> >> >> >> >> again
> >> >> >> >> until the irq thread is done processing the event. This would
> >> >> >> >> naturally
> >> >> >> >> debounce the irq hpd event that way.
> >> >> >> > event q just like bottom half of irq handler. it turns irq into event
> >> >> >> > and handle them sequentially.
> >> >> >> > irq_hpd is asynchronous event from panel to bring up attention of hsot
> >> >> >> > during run time of operation.
> >> >> >> > Here, the dongle is unplugged and main link had teared down so that no
> >> >> >> > need to service pending irq_hpd if any.
> >> >> >> >
> >> >> >>
> >> >> >> As Kuogee mentioned, IRQ_HPD is a message received from the panel and
> >> >> >> is
> >> >> >> not like your typical HW generated IRQ. There is no guarantee that we
> >> >> >> will not receive an IRQ_HPD until we are finished with processing of
> >> >> >> an
> >> >> >> earlier HPD message or an IRQ_HPD message. For example - when you run
> >> >> >> the protocol compliance, when we get a HPD from the sink, we are
> >> >> >> expected to start reading DPCD, EDID and proceed with link training.
> >> >> >> As
> >> >> >> soon as link training is finished (which is marked by a specific DPCD
> >> >> >> register write), the sink is going to issue an IRQ_HPD. At this point,
> >> >> >> we may not done with processing the HPD high as after link training we
> >> >> >> would typically notify the user mode of the newly connected display,
> >> >> >> etc.
> >
> > I re-read this. I think you're saying that IRQ_HPD can come in after
> > HPD
> > goes high and we finish link training? That sounds like we should
> > enable
> > IRQ_HPD in the hardware once we finish link training, instead of having
> > it enabled all the time. Then we can finish the threaded irq handler
> > and
> > the irq should be pending again once IRQ_HPD is sent over. Is there
> > ever
> > a need to be processing some IRQ_HPD and then get another IRQ_HPD while
> > processing the first one?
> yes, for example
> 1) plug dongle only
> 2) plug hdmi monitor into dongle (generated irq_hpd with sinc_count = 1)
> 3) unplug hdmi monitor out of the dongle (generate irq_hpd with
> sinc_count = 0)
> 4) go back to 2) for n times
> 5) unplug dongle
>
> This patch is not fix this problem either.
> The existing code has major issue which is handle irq_hpd with
> sink_count = 0 same way as handle of dongle unplugged.
> I think this cause external dp display failed to work and cause crash at
> suspend/resume test case.
> I will drop this patch.
> I am working on handle irq_hpd with sink_count = 0 as asymmetric as
> opposite to  irq_hpd with sink_count = 1.
> This means irq_hdp sink_count = 0 handle only tear down the main link
> but keep phy/aux intact.
> I will re submit patch for review.
>

Ok makes sense. I'll look out for the next revision of this patch.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c
index 5a39da6..0a7d383 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -707,6 +707,9 @@  static int dp_irq_hpd_handle(struct dp_display_private *dp, u32 data)
 		return 0;
 	}
 
+	/* only handle first irq_hpd in case of multiple irs_hpd pending */
+	dp_del_event(dp, EV_IRQ_HPD_INT);
+
 	ret = dp_display_usbpd_attention_cb(&dp->pdev->dev);
 	if (ret == -ECONNRESET) { /* cable unplugged */
 		dp->core_initialized = false;
@@ -1300,6 +1303,9 @@  static int dp_pm_suspend(struct device *dev)
 	/* host_init will be called at pm_resume */
 	dp->core_initialized = false;
 
+	/* system suspended, delete pending irq_hdps */
+	dp_del_event(dp, EV_IRQ_HPD_INT);
+
 	mutex_unlock(&dp->event_mutex);
 
 	return 0;
@@ -1496,6 +1502,9 @@  int msm_dp_display_disable(struct msm_dp *dp, struct drm_encoder *encoder)
 	/* stop sentinel checking */
 	dp_del_event(dp_display, EV_DISCONNECT_PENDING_TIMEOUT);
 
+	/* link is down, delete pending irq_hdps */
+	dp_del_event(dp_display, EV_IRQ_HPD_INT);
+
 	dp_display_disable(dp_display, 0);
 
 	rc = dp_display_unprepare(dp);