mbox series

[v1,0/3] Bring back driver_deferred_probe_check_state() for now

Message ID 20220727185012.3255200-1-saravanak@google.com (mailing list archive)
Headers show
Series Bring back driver_deferred_probe_check_state() for now | expand

Message

Saravana Kannan July 27, 2022, 6:50 p.m. UTC
More fixes/changes are needed before driver_deferred_probe_check_state()
can be deleted. So, bring it back for now.

Greg,

Can we get this into 5.19? If not, it might not be worth picking up this
series. I could just do the other/more fixes in time for 5.20.

-Saravana

Cc: naresh.kamboju@linaro.org

Saravana Kannan (3):
  Revert "driver core: Delete driver_deferred_probe_check_state()"
  Revert "net: mdio: Delete usage of
    driver_deferred_probe_check_state()"
  Revert "PM: domains: Delete usage of
    driver_deferred_probe_check_state()"

 drivers/base/dd.c              | 30 ++++++++++++++++++++++++++++++
 drivers/base/power/domain.c    |  2 +-
 drivers/net/mdio/fwnode_mdio.c |  4 +++-
 include/linux/device/driver.h  |  1 +
 4 files changed, 35 insertions(+), 2 deletions(-)

Comments

Greg KH July 28, 2022, 7:28 a.m. UTC | #1
On Wed, Jul 27, 2022 at 11:50:08AM -0700, Saravana Kannan wrote:
> More fixes/changes are needed before driver_deferred_probe_check_state()
> can be deleted. So, bring it back for now.
> 
> Greg,
> 
> Can we get this into 5.19? If not, it might not be worth picking up this
> series. I could just do the other/more fixes in time for 5.20.

Wow, no, it is _WAY_ too late for 5.19 to make a change like this,
sorry.

What is so broken that we need to revert these now?  I could do so for
5.20-rc1, and then backport to 5.19.y if that release is really broken,
but this feels odd so late in the cycle.

thanks,

greg k-h
Doug Anderson Aug. 9, 2022, 11:46 p.m. UTC | #2
Hi,

On Thu, Jul 28, 2022 at 12:29 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Wed, Jul 27, 2022 at 11:50:08AM -0700, Saravana Kannan wrote:
> > More fixes/changes are needed before driver_deferred_probe_check_state()
> > can be deleted. So, bring it back for now.
> >
> > Greg,
> >
> > Can we get this into 5.19? If not, it might not be worth picking up this
> > series. I could just do the other/more fixes in time for 5.20.
>
> Wow, no, it is _WAY_ too late for 5.19 to make a change like this,
> sorry.
>
> What is so broken that we need to revert these now?  I could do so for
> 5.20-rc1, and then backport to 5.19.y if that release is really broken,
> but this feels odd so late in the cycle.

I spent a bunch of time bisecting mainline today on my
sc7180-trogdor-lazor board. When building the top of Linus's tree
today the display doesn't come up. I can make it come up by turning
fw_devlink off (after fixing a regulator bug that I just posted a fix
for).

I found that the first bad commit was commit 5a46079a9645 ("PM:
domains: Delete usage of driver_deferred_probe_check_state()")

...but only when applied to mainline. When I cherry-pick that back to
v5.19-rc1 (and pick another bugfix needed to boot my board against
v5.19-rc1) then it works OK. After yet more bisecting, I found that on
trogdor there's a bad interaction with the commit e511a760 ("arm64:
dts: qcom: sm7180: remove assigned-clock-rate property for mdp clk").
That commit is perfectly legit but I guess it somehow changed how
fw_devlink was interpreting things?

Sure enough, picking this revert series fixes things on Linus's tree.
Any chance we can still get the revert in for v5.20-rc1? ;-)


-Doug
Saravana Kannan Aug. 10, 2022, 12:24 a.m. UTC | #3
On Tue, Aug 9, 2022 at 4:47 PM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Thu, Jul 28, 2022 at 12:29 AM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Wed, Jul 27, 2022 at 11:50:08AM -0700, Saravana Kannan wrote:
> > > More fixes/changes are needed before driver_deferred_probe_check_state()
> > > can be deleted. So, bring it back for now.
> > >
> > > Greg,
> > >
> > > Can we get this into 5.19? If not, it might not be worth picking up this
> > > series. I could just do the other/more fixes in time for 5.20.
> >
> > Wow, no, it is _WAY_ too late for 5.19 to make a change like this,
> > sorry.
> >
> > What is so broken that we need to revert these now?  I could do so for
> > 5.20-rc1, and then backport to 5.19.y if that release is really broken,
> > but this feels odd so late in the cycle.

Greg,

I didn't realize the patches I'm trying to revert never landed on
5.19. So you can ignore this thread.

>
> I spent a bunch of time bisecting mainline today on my
> sc7180-trogdor-lazor board. When building the top of Linus's tree
> today the display doesn't come up. I can make it come up by turning
> fw_devlink off (after fixing a regulator bug that I just posted a fix
> for).
>
> I found that the first bad commit was commit 5a46079a9645 ("PM:
> domains: Delete usage of driver_deferred_probe_check_state()")
>
> ...but only when applied to mainline. When I cherry-pick that back to
> v5.19-rc1 (and pick another bugfix needed to boot my board against
> v5.19-rc1) then it works OK. After yet more bisecting, I found that on
> trogdor there's a bad interaction with the commit e511a760 ("arm64:
> dts: qcom: sm7180: remove assigned-clock-rate property for mdp clk").
> That commit is perfectly legit but I guess it somehow changed how
> fw_devlink was interpreting things?
>
> Sure enough, picking this revert series fixes things on Linus's tree.
> Any chance we can still get the revert in for v5.20-rc1? ;-)

I guess it's 6.0 now. But I'm almost done with my actual fixes that
rewrite some parts of fw_devlink to make it a lot more robust. So, I'm
not sure if all these reverts need to land anymore.

I'm hoping to send out the proper fixes by the end of this week. Maybe
you can try that out and let me know if it solves your issues (I
expect it to).

I'm surprised that specific clock patch has an impact though. It's
just touching properties that fw_devlink doesn't parse.

-Saravana
Tony Lindgren Aug. 15, 2022, 11:01 a.m. UTC | #4
* Saravana Kannan <saravanak@google.com> [700101 02:00]:
> More fixes/changes are needed before driver_deferred_probe_check_state()
> can be deleted. So, bring it back for now.
> 
> Greg,
> 
> Can we get this into 5.19? If not, it might not be worth picking up this
> series. I could just do the other/more fixes in time for 5.20.

Yes please pick this as fixes for v6.0-rc series, it fixes booting for
me. I've replied with fixes tags for the two patches that were causing
regressions for me.

Regards,

Tony
Luca Weiss Aug. 15, 2022, 4:57 p.m. UTC | #5
On Mon Aug 15, 2022 at 1:01 PM CEST, Tony Lindgren wrote:
> * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > More fixes/changes are needed before driver_deferred_probe_check_state()
> > can be deleted. So, bring it back for now.
> > 
> > Greg,
> > 
> > Can we get this into 5.19? If not, it might not be worth picking up this
> > series. I could just do the other/more fixes in time for 5.20.
>
> Yes please pick this as fixes for v6.0-rc series, it fixes booting for
> me. I've replied with fixes tags for the two patches that were causing
> regressions for me.
>

Hi,

for me Patch 1+3 fix display probe on Qualcomm SM6350 (although display
for this SoC isn't upstream yet, there are lots of other SoCs with very
similar setup).

Probe for DPU silently fails, with CONFIG_DEBUG_DRIVER=y we get this:

msm-mdss ae00000.mdss: __genpd_dev_pm_attach() failed to find PM domain: -2

While I'm not familiar with the specifics of fw_devlink, the dtsi has
power-domains = <&dispcc MDSS_GDSC> for this node but it doesn't pick
that up for some reason.

We can also see that a bit later dispcc finally probes.

Regards
Luca

> Regards,
>
> Tony
Saravana Kannan Aug. 15, 2022, 11:36 p.m. UTC | #6
On Mon, Aug 15, 2022 at 9:57 AM Luca Weiss <luca.weiss@fairphone.com> wrote:
>
> On Mon Aug 15, 2022 at 1:01 PM CEST, Tony Lindgren wrote:
> > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > More fixes/changes are needed before driver_deferred_probe_check_state()
> > > can be deleted. So, bring it back for now.
> > >
> > > Greg,
> > >
> > > Can we get this into 5.19? If not, it might not be worth picking up this
> > > series. I could just do the other/more fixes in time for 5.20.
> >
> > Yes please pick this as fixes for v6.0-rc series, it fixes booting for
> > me. I've replied with fixes tags for the two patches that were causing
> > regressions for me.
> >
>
> Hi,
>
> for me Patch 1+3 fix display probe on Qualcomm SM6350 (although display
> for this SoC isn't upstream yet, there are lots of other SoCs with very
> similar setup).
>
> Probe for DPU silently fails, with CONFIG_DEBUG_DRIVER=y we get this:
>
> msm-mdss ae00000.mdss: __genpd_dev_pm_attach() failed to find PM domain: -2
>
> While I'm not familiar with the specifics of fw_devlink, the dtsi has
> power-domains = <&dispcc MDSS_GDSC> for this node but it doesn't pick
> that up for some reason.
>
> We can also see that a bit later dispcc finally probes.

Luca,

Can you test with this series instead and see if it fixes this issue?
https://lore.kernel.org/lkml/20220810060040.321697-1-saravanak@google.com/

You might also need to add this delta on top of the series if the
series itself isn't sufficient.
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 2f012e826986..866755d8ad95 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2068,7 +2068,11 @@ static int fw_devlink_create_devlink(struct device *con,
                device_links_write_unlock();
        }

-       sup_dev = get_dev_from_fwnode(sup_handle);
+       if (sup_handle->flags & FWNODE_FLAG_NOT_DEVICE)
+               sup_dev = fwnode_get_next_parent_dev(sup_handle);
+       else
+               sup_dev = get_dev_from_fwnode(sup_handle);
+
        if (sup_dev) {
                /*
                 * If it's one of those drivers that don't actually bind to

-Saravana
Luca Weiss Aug. 16, 2022, 1:30 p.m. UTC | #7
Hi Saravana,

On Tue Aug 16, 2022 at 1:36 AM CEST, Saravana Kannan wrote:
> On Mon, Aug 15, 2022 at 9:57 AM Luca Weiss <luca.weiss@fairphone.com> wrote:
> >
> > On Mon Aug 15, 2022 at 1:01 PM CEST, Tony Lindgren wrote:
> > > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > > More fixes/changes are needed before driver_deferred_probe_check_state()
> > > > can be deleted. So, bring it back for now.
> > > >
> > > > Greg,
> > > >
> > > > Can we get this into 5.19? If not, it might not be worth picking up this
> > > > series. I could just do the other/more fixes in time for 5.20.
> > >
> > > Yes please pick this as fixes for v6.0-rc series, it fixes booting for
> > > me. I've replied with fixes tags for the two patches that were causing
> > > regressions for me.
> > >
> >
> > Hi,
> >
> > for me Patch 1+3 fix display probe on Qualcomm SM6350 (although display
> > for this SoC isn't upstream yet, there are lots of other SoCs with very
> > similar setup).
> >
> > Probe for DPU silently fails, with CONFIG_DEBUG_DRIVER=y we get this:
> >
> > msm-mdss ae00000.mdss: __genpd_dev_pm_attach() failed to find PM domain: -2
> >
> > While I'm not familiar with the specifics of fw_devlink, the dtsi has
> > power-domains = <&dispcc MDSS_GDSC> for this node but it doesn't pick
> > that up for some reason.
> >
> > We can also see that a bit later dispcc finally probes.
>
> Luca,
>
> Can you test with this series instead and see if it fixes this issue?
> https://lore.kernel.org/lkml/20220810060040.321697-1-saravanak@google.com/
>

Unfortunately it doesn't seem to work with the 9 patches, and the
attached diff also doesn't seem to make a difference. I do see this in
dmesg which I haven't seen in the past:

[    0.056554] platform 1d87000.phy: Fixed dependency cycle(s) with /soc@0/ufs@1d84000
[    0.060070] platform ae00000.mdss: Fixed dependency cycle(s) with /soc@0/clock-controller@af00000
[    0.060150] platform ae00000.mdss: Failed to create device link with ae00000.mdss
[    0.060188] platform ae00000.mdss: Failed to create device link with ae00000.mdss
[    0.061135] platform c440000.spmi: Failed to create device link with c440000.spmi
[    0.061157] platform c440000.spmi: Failed to create device link with c440000.spmi
[    0.061180] platform c440000.spmi: Failed to create device link with c440000.spmi
[    0.061198] platform c440000.spmi: Failed to create device link with c440000.spmi
[    0.061215] platform c440000.spmi: Failed to create device link with c440000.spmi
[    0.061231] platform c440000.spmi: Failed to create device link with c440000.spmi
[    0.061252] platform c440000.spmi: Failed to create device link with c440000.spmi

Also I'm going to be on holiday from today for about 2 weeks so I won't
be able to test anything in that time.

And in case it's interesting, here's the full dmesg to initramfs:
https://pastebin.com/raw/Fc8W4MVi

Regards
Luca

> You might also need to add this delta on top of the series if the
> series itself isn't sufficient.
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 2f012e826986..866755d8ad95 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -2068,7 +2068,11 @@ static int fw_devlink_create_devlink(struct device *con,
>                 device_links_write_unlock();
>         }
>
> -       sup_dev = get_dev_from_fwnode(sup_handle);
> +       if (sup_handle->flags & FWNODE_FLAG_NOT_DEVICE)
> +               sup_dev = fwnode_get_next_parent_dev(sup_handle);
> +       else
> +               sup_dev = get_dev_from_fwnode(sup_handle);
> +
>         if (sup_dev) {
>                 /*
>                  * If it's one of those drivers that don't actually bind to
>
> -Saravana
Doug Anderson Aug. 18, 2022, 11:30 p.m. UTC | #8
Hi,

On Tue, Aug 16, 2022 at 6:31 AM Luca Weiss <luca.weiss@fairphone.com> wrote:
>
> Hi Saravana,
>
> On Tue Aug 16, 2022 at 1:36 AM CEST, Saravana Kannan wrote:
> > On Mon, Aug 15, 2022 at 9:57 AM Luca Weiss <luca.weiss@fairphone.com> wrote:
> > >
> > > On Mon Aug 15, 2022 at 1:01 PM CEST, Tony Lindgren wrote:
> > > > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > > > More fixes/changes are needed before driver_deferred_probe_check_state()
> > > > > can be deleted. So, bring it back for now.
> > > > >
> > > > > Greg,
> > > > >
> > > > > Can we get this into 5.19? If not, it might not be worth picking up this
> > > > > series. I could just do the other/more fixes in time for 5.20.
> > > >
> > > > Yes please pick this as fixes for v6.0-rc series, it fixes booting for
> > > > me. I've replied with fixes tags for the two patches that were causing
> > > > regressions for me.
> > > >
> > >
> > > Hi,
> > >
> > > for me Patch 1+3 fix display probe on Qualcomm SM6350 (although display
> > > for this SoC isn't upstream yet, there are lots of other SoCs with very
> > > similar setup).
> > >
> > > Probe for DPU silently fails, with CONFIG_DEBUG_DRIVER=y we get this:
> > >
> > > msm-mdss ae00000.mdss: __genpd_dev_pm_attach() failed to find PM domain: -2
> > >
> > > While I'm not familiar with the specifics of fw_devlink, the dtsi has
> > > power-domains = <&dispcc MDSS_GDSC> for this node but it doesn't pick
> > > that up for some reason.
> > >
> > > We can also see that a bit later dispcc finally probes.
> >
> > Luca,
> >
> > Can you test with this series instead and see if it fixes this issue?
> > https://lore.kernel.org/lkml/20220810060040.321697-1-saravanak@google.com/
> >
>
> Unfortunately it doesn't seem to work with the 9 patches

I also tried with the 9 patches, plus an extra fix that Saravana
provided. They didn't fix my use case either. Do we want to land the
reverts as a stopgap so that people aren't broken?

For reference, the device tree for the device I'm testing on is
`arch/arm64/boot/dts/qcom/sc7180-trogdor-lazor-r3-lte.dts`. The device
that's not probing is the bridge chip, AKA 2-002d. Presumably
something about the bridge chip cycles is confusing things since the
remote endpoint that sn65dsi86 needs is actually a child of sn65dsi86
(the panel).

-Doug