Message ID | 20220727185012.3255200-1-saravanak@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Bring back driver_deferred_probe_check_state() for now | expand |
On Wed, Jul 27, 2022 at 11:50:08AM -0700, Saravana Kannan wrote: > More fixes/changes are needed before driver_deferred_probe_check_state() > can be deleted. So, bring it back for now. > > Greg, > > Can we get this into 5.19? If not, it might not be worth picking up this > series. I could just do the other/more fixes in time for 5.20. Wow, no, it is _WAY_ too late for 5.19 to make a change like this, sorry. What is so broken that we need to revert these now? I could do so for 5.20-rc1, and then backport to 5.19.y if that release is really broken, but this feels odd so late in the cycle. thanks, greg k-h
Hi, On Thu, Jul 28, 2022 at 12:29 AM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Wed, Jul 27, 2022 at 11:50:08AM -0700, Saravana Kannan wrote: > > More fixes/changes are needed before driver_deferred_probe_check_state() > > can be deleted. So, bring it back for now. > > > > Greg, > > > > Can we get this into 5.19? If not, it might not be worth picking up this > > series. I could just do the other/more fixes in time for 5.20. > > Wow, no, it is _WAY_ too late for 5.19 to make a change like this, > sorry. > > What is so broken that we need to revert these now? I could do so for > 5.20-rc1, and then backport to 5.19.y if that release is really broken, > but this feels odd so late in the cycle. I spent a bunch of time bisecting mainline today on my sc7180-trogdor-lazor board. When building the top of Linus's tree today the display doesn't come up. I can make it come up by turning fw_devlink off (after fixing a regulator bug that I just posted a fix for). I found that the first bad commit was commit 5a46079a9645 ("PM: domains: Delete usage of driver_deferred_probe_check_state()") ...but only when applied to mainline. When I cherry-pick that back to v5.19-rc1 (and pick another bugfix needed to boot my board against v5.19-rc1) then it works OK. After yet more bisecting, I found that on trogdor there's a bad interaction with the commit e511a760 ("arm64: dts: qcom: sm7180: remove assigned-clock-rate property for mdp clk"). That commit is perfectly legit but I guess it somehow changed how fw_devlink was interpreting things? Sure enough, picking this revert series fixes things on Linus's tree. Any chance we can still get the revert in for v5.20-rc1? ;-) -Doug
On Tue, Aug 9, 2022 at 4:47 PM Doug Anderson <dianders@chromium.org> wrote: > > Hi, > > On Thu, Jul 28, 2022 at 12:29 AM Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > > > On Wed, Jul 27, 2022 at 11:50:08AM -0700, Saravana Kannan wrote: > > > More fixes/changes are needed before driver_deferred_probe_check_state() > > > can be deleted. So, bring it back for now. > > > > > > Greg, > > > > > > Can we get this into 5.19? If not, it might not be worth picking up this > > > series. I could just do the other/more fixes in time for 5.20. > > > > Wow, no, it is _WAY_ too late for 5.19 to make a change like this, > > sorry. > > > > What is so broken that we need to revert these now? I could do so for > > 5.20-rc1, and then backport to 5.19.y if that release is really broken, > > but this feels odd so late in the cycle. Greg, I didn't realize the patches I'm trying to revert never landed on 5.19. So you can ignore this thread. > > I spent a bunch of time bisecting mainline today on my > sc7180-trogdor-lazor board. When building the top of Linus's tree > today the display doesn't come up. I can make it come up by turning > fw_devlink off (after fixing a regulator bug that I just posted a fix > for). > > I found that the first bad commit was commit 5a46079a9645 ("PM: > domains: Delete usage of driver_deferred_probe_check_state()") > > ...but only when applied to mainline. When I cherry-pick that back to > v5.19-rc1 (and pick another bugfix needed to boot my board against > v5.19-rc1) then it works OK. After yet more bisecting, I found that on > trogdor there's a bad interaction with the commit e511a760 ("arm64: > dts: qcom: sm7180: remove assigned-clock-rate property for mdp clk"). > That commit is perfectly legit but I guess it somehow changed how > fw_devlink was interpreting things? > > Sure enough, picking this revert series fixes things on Linus's tree. > Any chance we can still get the revert in for v5.20-rc1? ;-) I guess it's 6.0 now. But I'm almost done with my actual fixes that rewrite some parts of fw_devlink to make it a lot more robust. So, I'm not sure if all these reverts need to land anymore. I'm hoping to send out the proper fixes by the end of this week. Maybe you can try that out and let me know if it solves your issues (I expect it to). I'm surprised that specific clock patch has an impact though. It's just touching properties that fw_devlink doesn't parse. -Saravana
* Saravana Kannan <saravanak@google.com> [700101 02:00]: > More fixes/changes are needed before driver_deferred_probe_check_state() > can be deleted. So, bring it back for now. > > Greg, > > Can we get this into 5.19? If not, it might not be worth picking up this > series. I could just do the other/more fixes in time for 5.20. Yes please pick this as fixes for v6.0-rc series, it fixes booting for me. I've replied with fixes tags for the two patches that were causing regressions for me. Regards, Tony
On Mon Aug 15, 2022 at 1:01 PM CEST, Tony Lindgren wrote: > * Saravana Kannan <saravanak@google.com> [700101 02:00]: > > More fixes/changes are needed before driver_deferred_probe_check_state() > > can be deleted. So, bring it back for now. > > > > Greg, > > > > Can we get this into 5.19? If not, it might not be worth picking up this > > series. I could just do the other/more fixes in time for 5.20. > > Yes please pick this as fixes for v6.0-rc series, it fixes booting for > me. I've replied with fixes tags for the two patches that were causing > regressions for me. > Hi, for me Patch 1+3 fix display probe on Qualcomm SM6350 (although display for this SoC isn't upstream yet, there are lots of other SoCs with very similar setup). Probe for DPU silently fails, with CONFIG_DEBUG_DRIVER=y we get this: msm-mdss ae00000.mdss: __genpd_dev_pm_attach() failed to find PM domain: -2 While I'm not familiar with the specifics of fw_devlink, the dtsi has power-domains = <&dispcc MDSS_GDSC> for this node but it doesn't pick that up for some reason. We can also see that a bit later dispcc finally probes. Regards Luca > Regards, > > Tony
On Mon, Aug 15, 2022 at 9:57 AM Luca Weiss <luca.weiss@fairphone.com> wrote: > > On Mon Aug 15, 2022 at 1:01 PM CEST, Tony Lindgren wrote: > > * Saravana Kannan <saravanak@google.com> [700101 02:00]: > > > More fixes/changes are needed before driver_deferred_probe_check_state() > > > can be deleted. So, bring it back for now. > > > > > > Greg, > > > > > > Can we get this into 5.19? If not, it might not be worth picking up this > > > series. I could just do the other/more fixes in time for 5.20. > > > > Yes please pick this as fixes for v6.0-rc series, it fixes booting for > > me. I've replied with fixes tags for the two patches that were causing > > regressions for me. > > > > Hi, > > for me Patch 1+3 fix display probe on Qualcomm SM6350 (although display > for this SoC isn't upstream yet, there are lots of other SoCs with very > similar setup). > > Probe for DPU silently fails, with CONFIG_DEBUG_DRIVER=y we get this: > > msm-mdss ae00000.mdss: __genpd_dev_pm_attach() failed to find PM domain: -2 > > While I'm not familiar with the specifics of fw_devlink, the dtsi has > power-domains = <&dispcc MDSS_GDSC> for this node but it doesn't pick > that up for some reason. > > We can also see that a bit later dispcc finally probes. Luca, Can you test with this series instead and see if it fixes this issue? https://lore.kernel.org/lkml/20220810060040.321697-1-saravanak@google.com/ You might also need to add this delta on top of the series if the series itself isn't sufficient. diff --git a/drivers/base/core.c b/drivers/base/core.c index 2f012e826986..866755d8ad95 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -2068,7 +2068,11 @@ static int fw_devlink_create_devlink(struct device *con, device_links_write_unlock(); } - sup_dev = get_dev_from_fwnode(sup_handle); + if (sup_handle->flags & FWNODE_FLAG_NOT_DEVICE) + sup_dev = fwnode_get_next_parent_dev(sup_handle); + else + sup_dev = get_dev_from_fwnode(sup_handle); + if (sup_dev) { /* * If it's one of those drivers that don't actually bind to -Saravana
Hi Saravana, On Tue Aug 16, 2022 at 1:36 AM CEST, Saravana Kannan wrote: > On Mon, Aug 15, 2022 at 9:57 AM Luca Weiss <luca.weiss@fairphone.com> wrote: > > > > On Mon Aug 15, 2022 at 1:01 PM CEST, Tony Lindgren wrote: > > > * Saravana Kannan <saravanak@google.com> [700101 02:00]: > > > > More fixes/changes are needed before driver_deferred_probe_check_state() > > > > can be deleted. So, bring it back for now. > > > > > > > > Greg, > > > > > > > > Can we get this into 5.19? If not, it might not be worth picking up this > > > > series. I could just do the other/more fixes in time for 5.20. > > > > > > Yes please pick this as fixes for v6.0-rc series, it fixes booting for > > > me. I've replied with fixes tags for the two patches that were causing > > > regressions for me. > > > > > > > Hi, > > > > for me Patch 1+3 fix display probe on Qualcomm SM6350 (although display > > for this SoC isn't upstream yet, there are lots of other SoCs with very > > similar setup). > > > > Probe for DPU silently fails, with CONFIG_DEBUG_DRIVER=y we get this: > > > > msm-mdss ae00000.mdss: __genpd_dev_pm_attach() failed to find PM domain: -2 > > > > While I'm not familiar with the specifics of fw_devlink, the dtsi has > > power-domains = <&dispcc MDSS_GDSC> for this node but it doesn't pick > > that up for some reason. > > > > We can also see that a bit later dispcc finally probes. > > Luca, > > Can you test with this series instead and see if it fixes this issue? > https://lore.kernel.org/lkml/20220810060040.321697-1-saravanak@google.com/ > Unfortunately it doesn't seem to work with the 9 patches, and the attached diff also doesn't seem to make a difference. I do see this in dmesg which I haven't seen in the past: [ 0.056554] platform 1d87000.phy: Fixed dependency cycle(s) with /soc@0/ufs@1d84000 [ 0.060070] platform ae00000.mdss: Fixed dependency cycle(s) with /soc@0/clock-controller@af00000 [ 0.060150] platform ae00000.mdss: Failed to create device link with ae00000.mdss [ 0.060188] platform ae00000.mdss: Failed to create device link with ae00000.mdss [ 0.061135] platform c440000.spmi: Failed to create device link with c440000.spmi [ 0.061157] platform c440000.spmi: Failed to create device link with c440000.spmi [ 0.061180] platform c440000.spmi: Failed to create device link with c440000.spmi [ 0.061198] platform c440000.spmi: Failed to create device link with c440000.spmi [ 0.061215] platform c440000.spmi: Failed to create device link with c440000.spmi [ 0.061231] platform c440000.spmi: Failed to create device link with c440000.spmi [ 0.061252] platform c440000.spmi: Failed to create device link with c440000.spmi Also I'm going to be on holiday from today for about 2 weeks so I won't be able to test anything in that time. And in case it's interesting, here's the full dmesg to initramfs: https://pastebin.com/raw/Fc8W4MVi Regards Luca > You might also need to add this delta on top of the series if the > series itself isn't sufficient. > diff --git a/drivers/base/core.c b/drivers/base/core.c > index 2f012e826986..866755d8ad95 100644 > --- a/drivers/base/core.c > +++ b/drivers/base/core.c > @@ -2068,7 +2068,11 @@ static int fw_devlink_create_devlink(struct device *con, > device_links_write_unlock(); > } > > - sup_dev = get_dev_from_fwnode(sup_handle); > + if (sup_handle->flags & FWNODE_FLAG_NOT_DEVICE) > + sup_dev = fwnode_get_next_parent_dev(sup_handle); > + else > + sup_dev = get_dev_from_fwnode(sup_handle); > + > if (sup_dev) { > /* > * If it's one of those drivers that don't actually bind to > > -Saravana
Hi, On Tue, Aug 16, 2022 at 6:31 AM Luca Weiss <luca.weiss@fairphone.com> wrote: > > Hi Saravana, > > On Tue Aug 16, 2022 at 1:36 AM CEST, Saravana Kannan wrote: > > On Mon, Aug 15, 2022 at 9:57 AM Luca Weiss <luca.weiss@fairphone.com> wrote: > > > > > > On Mon Aug 15, 2022 at 1:01 PM CEST, Tony Lindgren wrote: > > > > * Saravana Kannan <saravanak@google.com> [700101 02:00]: > > > > > More fixes/changes are needed before driver_deferred_probe_check_state() > > > > > can be deleted. So, bring it back for now. > > > > > > > > > > Greg, > > > > > > > > > > Can we get this into 5.19? If not, it might not be worth picking up this > > > > > series. I could just do the other/more fixes in time for 5.20. > > > > > > > > Yes please pick this as fixes for v6.0-rc series, it fixes booting for > > > > me. I've replied with fixes tags for the two patches that were causing > > > > regressions for me. > > > > > > > > > > Hi, > > > > > > for me Patch 1+3 fix display probe on Qualcomm SM6350 (although display > > > for this SoC isn't upstream yet, there are lots of other SoCs with very > > > similar setup). > > > > > > Probe for DPU silently fails, with CONFIG_DEBUG_DRIVER=y we get this: > > > > > > msm-mdss ae00000.mdss: __genpd_dev_pm_attach() failed to find PM domain: -2 > > > > > > While I'm not familiar with the specifics of fw_devlink, the dtsi has > > > power-domains = <&dispcc MDSS_GDSC> for this node but it doesn't pick > > > that up for some reason. > > > > > > We can also see that a bit later dispcc finally probes. > > > > Luca, > > > > Can you test with this series instead and see if it fixes this issue? > > https://lore.kernel.org/lkml/20220810060040.321697-1-saravanak@google.com/ > > > > Unfortunately it doesn't seem to work with the 9 patches I also tried with the 9 patches, plus an extra fix that Saravana provided. They didn't fix my use case either. Do we want to land the reverts as a stopgap so that people aren't broken? For reference, the device tree for the device I'm testing on is `arch/arm64/boot/dts/qcom/sc7180-trogdor-lazor-r3-lte.dts`. The device that's not probing is the bridge chip, AKA 2-002d. Presumably something about the bridge chip cycles is confusing things since the remote endpoint that sn65dsi86 needs is actually a child of sn65dsi86 (the panel). -Doug