diff mbox series

mmc: tmio: remove workaround for NON_REMOVABLE

Message ID 20190917183652.7310-1-wsa+renesas@sang-engineering.com (mailing list archive)
State New, archived
Headers show
Series mmc: tmio: remove workaround for NON_REMOVABLE | expand

Commit Message

Wolfram Sang Sept. 17, 2019, 6:36 p.m. UTC
PM has been reworked, so eMMC gets now detected on R-Car H3 ES1.0 and
2.0 as well as M3-N without the workaround. Card detect and write
protect also still work. Remove the workaround.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 drivers/mmc/host/tmio_mmc_core.c | 9 ---------
 1 file changed, 9 deletions(-)

Comments

Ulf Hansson Oct. 3, 2019, 10:01 a.m. UTC | #1
On Tue, 17 Sep 2019 at 20:36, Wolfram Sang
<wsa+renesas@sang-engineering.com> wrote:
>
> PM has been reworked, so eMMC gets now detected on R-Car H3 ES1.0 and
> 2.0 as well as M3-N without the workaround. Card detect and write
> protect also still work. Remove the workaround.
>
> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

Applied for next, thanks!

Kind regards
Uffe


> ---
>  drivers/mmc/host/tmio_mmc_core.c | 9 ---------
>  1 file changed, 9 deletions(-)
>
> diff --git a/drivers/mmc/host/tmio_mmc_core.c b/drivers/mmc/host/tmio_mmc_core.c
> index 9b6e1001e77c..63dc37481fba 100644
> --- a/drivers/mmc/host/tmio_mmc_core.c
> +++ b/drivers/mmc/host/tmio_mmc_core.c
> @@ -1208,15 +1208,6 @@ int tmio_mmc_host_probe(struct tmio_mmc_host *_host)
>         if (!_host->reset)
>                 _host->reset = tmio_mmc_reset;
>
> -       /*
> -        * On Gen2+, eMMC with NONREMOVABLE currently fails because native
> -        * hotplug gets disabled. It seems RuntimePM related yet we need further
> -        * research. Since we are planning a PM overhaul anyway, let's enforce
> -        * for now the device being active by enabling native hotplug always.
> -        */
> -       if (pdata->flags & TMIO_MMC_MIN_RCAR2)
> -               _host->native_hotplug = true;
> -
>         /*
>          * While using internal tmio hardware logic for card detection, we need
>          * to ensure it stays powered for it to work.
> --
> 2.20.1
>
Geert Uytterhoeven Nov. 19, 2019, 3:51 p.m. UTC | #2
Hi Wolfram,

On Tue, Sep 17, 2019 at 9:46 PM Wolfram Sang
<wsa+renesas@sang-engineering.com> wrote:
> PM has been reworked, so eMMC gets now detected on R-Car H3 ES1.0 and
> 2.0 as well as M3-N without the workaround. Card detect and write
> protect also still work. Remove the workaround.
>
> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

> --- a/drivers/mmc/host/tmio_mmc_core.c
> +++ b/drivers/mmc/host/tmio_mmc_core.c
> @@ -1208,15 +1208,6 @@ int tmio_mmc_host_probe(struct tmio_mmc_host *_host)
>         if (!_host->reset)
>                 _host->reset = tmio_mmc_reset;
>
> -       /*
> -        * On Gen2+, eMMC with NONREMOVABLE currently fails because native
> -        * hotplug gets disabled. It seems RuntimePM related yet we need further
> -        * research. Since we are planning a PM overhaul anyway, let's enforce
> -        * for now the device being active by enabling native hotplug always.
> -        */
> -       if (pdata->flags & TMIO_MMC_MIN_RCAR2)
> -               _host->native_hotplug = true;
> -
>         /*
>          * While using internal tmio hardware logic for card detection, we need
>          * to ensure it stays powered for it to work.

On Salvator-XS with R-Car M3-N, this causes lock ups during early userspace
boot up (Debian 9 nfsroot), usually after:

    [  OK  ] Started D-Bus System Message Bus.

At first I suspected systemd.  Then I noticed the kernel had locked up
completely, and no longer responded to ping requests.

Interestingly, this patch has been part of renesas-drivers since the
2019-10-15-v5.4-rc3 release, without real issues.
The first time I saw the issue was last week. Then it happened only after I
had disabled a debug print, which probably caused a timing difference in
Runtime PM disablement (and was fully reproducible).

Today, it was fairly reproducible, so I managed to bisect it to commit
7a7dab237027939c ("mmc: tmio: remove workaround for NON_REMOVABLE") in
mmc/next.  Reverting this commit fixes the issue.

The issue can also be fixed by:
  1. enabling the hs400_4taps and/or hs400_disabled quirks in
     sdhi_quirks_match[], OR
  2. forcing use_4tap = true in renesas_sdhi_check_scc_error().

Salvator-X(S) with R-Car H3 ES1.0 & ES2.0, or M3-W ES1.0, the issue
does not show up (probably because of sdhi_quirks_match[]).

Do you have a clue?
Thanks!

Gr{oetje,eeting}s,

                        Geert
Wolfram Sang Nov. 19, 2019, 8:47 p.m. UTC | #3
Hi Geert,

thanks for the report!

> Interestingly, this patch has been part of renesas-drivers since the
> 2019-10-15-v5.4-rc3 release, without real issues.

Huh, interesting. With which branch does this appear then? linux-next?

> Today, it was fairly reproducible, so I managed to bisect it to commit
> 7a7dab237027939c ("mmc: tmio: remove workaround for NON_REMOVABLE") in
> mmc/next.  Reverting this commit fixes the issue.

Hmm, probably we should do the revert despite our discusstion here. And
then resend the original patch after we figured the cause of this hang.

> The issue can also be fixed by:
>   1. enabling the hs400_4taps and/or hs400_disabled quirks in
>      sdhi_quirks_match[], OR
>   2. forcing use_4tap = true in renesas_sdhi_check_scc_error().
> 
> Salvator-X(S) with R-Car H3 ES1.0 & ES2.0, or M3-W ES1.0, the issue
> does not show up (probably because of sdhi_quirks_match[]).
> 
> Do you have a clue?

Not very clear. M3-N is not a 4tap-device, so this can't be a fix.
However, both disabling HS400 as well as using 4tap will prevent the SCC
error checking in renesas_sdhi_check_scc_error(). I'd assume the SCC
hangs.

I am working on an issue where the SCC hangs, but this has to do with
always providing the SCC clock (SDnH). I don't really see the connection
of that to RuntimePM yet, though :/

Can you test this simple workaround patch instead of the revert just so
we get an idea if these issues are related?

Thanks,

   Wolfram

From: Wolfram Sang <wsa+renesas@sang-engineering.com>
Date: Thu, 27 Jun 2019 11:05:06 +0200
Subject: [PATCH] WIP: clk: renesas: rcar-gen3: enable SDnH clk for HS modes

When switching to HS400, we shortly need to switch back to plain HS
mode, but we still need the SDnH clock, so the SCC of SDHI can work.
So, make sure SDnH is still active, then.

FIXME: needs verification from the BSP/HW team!

Not-yet-Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 drivers/clk/renesas/rcar-gen3-cpg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clk/renesas/rcar-gen3-cpg.c b/drivers/clk/renesas/rcar-gen3-cpg.c
index d25c8ba00a65..043ab6ed9d55 100644
--- a/drivers/clk/renesas/rcar-gen3-cpg.c
+++ b/drivers/clk/renesas/rcar-gen3-cpg.c
@@ -263,7 +263,7 @@ static const struct sd_div_table cpg_sd_div_table[] = {
 /*	CPG_SD_DIV_TABLE_DATA(stp_hck,  stp_ck,   sd_srcfc,   sd_fc,  sd_div) */
 	CPG_SD_DIV_TABLE_DATA(0,        0,        0,          1,        4),
 	CPG_SD_DIV_TABLE_DATA(0,        0,        1,          1,        8),
-	CPG_SD_DIV_TABLE_DATA(1,        0,        2,          1,       16),
+	CPG_SD_DIV_TABLE_DATA(0,        0,        2,          1,       16),
 	CPG_SD_DIV_TABLE_DATA(1,        0,        3,          1,       32),
 	CPG_SD_DIV_TABLE_DATA(1,        0,        4,          1,       64),
 	CPG_SD_DIV_TABLE_DATA(0,        0,        0,          0,        2),
Geert Uytterhoeven Nov. 20, 2019, 7:46 a.m. UTC | #4
Hi Wolfram,

On Tue, Nov 19, 2019 at 9:47 PM Wolfram Sang <wsa@the-dreams.de> wrote:
> > Interestingly, this patch has been part of renesas-drivers since the
> > 2019-10-15-v5.4-rc3 release, without real issues.
>
> Huh, interesting. With which branch does this appear then? linux-next?

renesas-drivers

I can reproduce this with renesas-drivers-2019-11-19-v5.4-rc8 and
renesas_defconfig by booting a ramdisk, and reading from /dev/mmcblk1.

Also with renesas-drivers-2019-10-15-v5.4-rc3.
So some of my local code on top must have impacted the behavior.

> > Today, it was fairly reproducible, so I managed to bisect it to commit
> > 7a7dab237027939c ("mmc: tmio: remove workaround for NON_REMOVABLE") in
> > mmc/next.  Reverting this commit fixes the issue.
>
> Hmm, probably we should do the revert despite our discusstion here. And
> then resend the original patch after we figured the cause of this hang.
>
> > The issue can also be fixed by:
> >   1. enabling the hs400_4taps and/or hs400_disabled quirks in
> >      sdhi_quirks_match[], OR
> >   2. forcing use_4tap = true in renesas_sdhi_check_scc_error().
> >
> > Salvator-X(S) with R-Car H3 ES1.0 & ES2.0, or M3-W ES1.0, the issue
> > does not show up (probably because of sdhi_quirks_match[]).
> >
> > Do you have a clue?
>
> Not very clear. M3-N is not a 4tap-device, so this can't be a fix.
> However, both disabling HS400 as well as using 4tap will prevent the SCC
> error checking in renesas_sdhi_check_scc_error(). I'd assume the SCC
> hangs.

Makes sense.

> I am working on an issue where the SCC hangs, but this has to do with
> always providing the SCC clock (SDnH). I don't really see the connection
> of that to RuntimePM yet, though :/

Makes sense: this is consistent with the behavior when accessing
registers without enabling the corresponding module clock: it hangs.
So this can happen with other clocks, too.
One more reason not to delegate clock handling to a guest, as doing it
wrong can take down the host, too...

> Can you test this simple workaround patch instead of the revert just so
> we get an idea if these issues are related?

Thanks, applying your workaround on top of
renesas-drivers-2019-11-19-v5.4-rc8 fixes the issue.

This fix is part of renesas/topic/sdhi-manual-calib, right?
And thus has been present in some renesas-drivers release, but was
dropped _before_ the 2019-10-15-v5.4-rc3 release.


Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Wolfram Sang Nov. 21, 2019, 8:57 a.m. UTC | #5
Hi Geert,

> So some of my local code on top must have impacted the behavior.

Any change in temperature? Niklas and I wonder if it is thermal related.

> > I am working on an issue where the SCC hangs, but this has to do with
> > always providing the SCC clock (SDnH). I don't really see the connection
> > of that to RuntimePM yet, though :/
> 
> Makes sense: this is consistent with the behavior when accessing
> registers without enabling the corresponding module clock: it hangs.
> So this can happen with other clocks, too.
> One more reason not to delegate clock handling to a guest, as doing it
> wrong can take down the host, too...

You mean when it comes to virtualization?

> > Can you test this simple workaround patch instead of the revert just so
> > we get an idea if these issues are related?
> 
> Thanks, applying your workaround on top of
> renesas-drivers-2019-11-19-v5.4-rc8 fixes the issue.

Ok, good to know thanks for testing. Currently, I wonder why reverting
the NON_REMOVABLE workaround makes a difference. Maybe it is not
temperature related but a some race with RPM? I am debugging in this
direction now. But the lockup is still hard to trigger for me. Tried
v5.4-rc8 + NON_REMOVABLE patch with no luck. Will try renesas-drivers
next.

> This fix is part of renesas/topic/sdhi-manual-calib, right?

Yes.

> And thus has been present in some renesas-drivers release, but was
> dropped _before_ the 2019-10-15-v5.4-rc3 release.

That would explain why it didn't show up before, right? And don't you
have a Ebisu in your board farm, too? Luckily, I have one, too, now. It
should be affected.

Thanks for the pointers,

   Wolfram
Geert Uytterhoeven Nov. 21, 2019, 9:35 a.m. UTC | #6
Hi Wolfram,

On Thu, Nov 21, 2019 at 9:57 AM Wolfram Sang <wsa@the-dreams.de> wrote:
> > So some of my local code on top must have impacted the behavior.
>
> Any change in temperature? Niklas and I wonder if it is thermal related.

Nope. I tried an old "known" good kernel again yesterday, and it worked.
That was BTW the one which had the additional debug prints:

--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -199,6 +199,7 @@ static struct generic_pm_domain
*dev_to_genpd(struct device *dev)
 static int genpd_stop_dev(const struct generic_pm_domain *genpd,
                          struct device *dev)
 {
+pr_info("==== %s/%s: stop\n", genpd->name, dev_name(dev));
        WARN(device_may_wakeup(dev),
             "Domain %s must be active_wakeup for wakeup source %s\n",
             genpd->name, dev_name(dev));
@@ -208,6 +209,7 @@ static int genpd_stop_dev(const struct
generic_pm_domain *genpd,
 static int genpd_start_dev(const struct generic_pm_domain *genpd,
                           struct device *dev)
 {
+pr_info("==== %s/%s: start\n", genpd->name, dev_name(dev));
        return GENPD_DEV_CALLBACK(genpd, int, start, dev);
 }

Removing those prints made the old kernel fail, too, so this is why I think
it is a race with Runtime PM.

With a tree based on latest renesas-drivers, it happens regardless of those
debug prints.

> > > I am working on an issue where the SCC hangs, but this has to do with
> > > always providing the SCC clock (SDnH). I don't really see the connection
> > > of that to RuntimePM yet, though :/
> >
> > Makes sense: this is consistent with the behavior when accessing
> > registers without enabling the corresponding module clock: it hangs.
> > So this can happen with other clocks, too.
> > One more reason not to delegate clock handling to a guest, as doing it
> > wrong can take down the host, too...
>
> You mean when it comes to virtualization?

Exactly.

> > > Can you test this simple workaround patch instead of the revert just so
> > > we get an idea if these issues are related?
> >
> > Thanks, applying your workaround on top of
> > renesas-drivers-2019-11-19-v5.4-rc8 fixes the issue.
>
> Ok, good to know thanks for testing. Currently, I wonder why reverting
> the NON_REMOVABLE workaround makes a difference. Maybe it is not
> temperature related but a some race with RPM? I am debugging in this
> direction now. But the lockup is still hard to trigger for me. Tried
> v5.4-rc8 + NON_REMOVABLE patch with no luck. Will try renesas-drivers
> next.

As I managed to bisect it, it was fairly reproducible for me. Just checkout
commit 7a7dab237027939c ("mmc: tmio: remove workaround for NON_REMOVABLE"),
or use renesas-drivers.

Oh, if it's a race, it may be affected by the compiler, too.
gcc version 7.4.0 (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1)

> > This fix is part of renesas/topic/sdhi-manual-calib, right?
>
> Yes.
>
> > And thus has been present in some renesas-drivers release, but was
> > dropped _before_ the 2019-10-15-v5.4-rc3 release.
>
> That would explain why it didn't show up before, right? And don't you

Not exactly. That branch was dropped before Ulf reverted the
NON_REMOVABLE workaround.

> have a Ebisu in your board farm, too? Luckily, I have one, too, now. It
> should be affected.

Haven't seen the issue on Ebisu (yet?).
To be sure, I have just retried again with the exact same kernel image
and userland: m3n-salvator-xs hangs, ebisu boots fine (and I can read
/dev/mmcblk2).

But as it looks to be timing-related, and E3 has different/less CPU cores,
it may still be affected.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Ulf Hansson Nov. 21, 2019, 10:29 a.m. UTC | #7
Geert, Wolfram

On Thu, 21 Nov 2019 at 10:35, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Wolfram,
>
> On Thu, Nov 21, 2019 at 9:57 AM Wolfram Sang <wsa@the-dreams.de> wrote:
> > > So some of my local code on top must have impacted the behavior.
> >
> > Any change in temperature? Niklas and I wonder if it is thermal related.
>
> Nope. I tried an old "known" good kernel again yesterday, and it worked.
> That was BTW the one which had the additional debug prints:
>
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -199,6 +199,7 @@ static struct generic_pm_domain
> *dev_to_genpd(struct device *dev)
>  static int genpd_stop_dev(const struct generic_pm_domain *genpd,
>                           struct device *dev)
>  {
> +pr_info("==== %s/%s: stop\n", genpd->name, dev_name(dev));
>         WARN(device_may_wakeup(dev),
>              "Domain %s must be active_wakeup for wakeup source %s\n",
>              genpd->name, dev_name(dev));
> @@ -208,6 +209,7 @@ static int genpd_stop_dev(const struct
> generic_pm_domain *genpd,
>  static int genpd_start_dev(const struct generic_pm_domain *genpd,
>                            struct device *dev)
>  {
> +pr_info("==== %s/%s: start\n", genpd->name, dev_name(dev));
>         return GENPD_DEV_CALLBACK(genpd, int, start, dev);
>  }
>
> Removing those prints made the old kernel fail, too, so this is why I think
> it is a race with Runtime PM.
>
> With a tree based on latest renesas-drivers, it happens regardless of those
> debug prints.
>
> > > > I am working on an issue where the SCC hangs, but this has to do with
> > > > always providing the SCC clock (SDnH). I don't really see the connection
> > > > of that to RuntimePM yet, though :/
> > >
> > > Makes sense: this is consistent with the behavior when accessing
> > > registers without enabling the corresponding module clock: it hangs.
> > > So this can happen with other clocks, too.
> > > One more reason not to delegate clock handling to a guest, as doing it
> > > wrong can take down the host, too...
> >
> > You mean when it comes to virtualization?
>
> Exactly.
>
> > > > Can you test this simple workaround patch instead of the revert just so
> > > > we get an idea if these issues are related?
> > >
> > > Thanks, applying your workaround on top of
> > > renesas-drivers-2019-11-19-v5.4-rc8 fixes the issue.
> >
> > Ok, good to know thanks for testing. Currently, I wonder why reverting
> > the NON_REMOVABLE workaround makes a difference. Maybe it is not
> > temperature related but a some race with RPM? I am debugging in this
> > direction now. But the lockup is still hard to trigger for me. Tried
> > v5.4-rc8 + NON_REMOVABLE patch with no luck. Will try renesas-drivers
> > next.
>
> As I managed to bisect it, it was fairly reproducible for me. Just checkout
> commit 7a7dab237027939c ("mmc: tmio: remove workaround for NON_REMOVABLE"),
> or use renesas-drivers.
>
> Oh, if it's a race, it may be affected by the compiler, too.
> gcc version 7.4.0 (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1)
>
> > > This fix is part of renesas/topic/sdhi-manual-calib, right?
> >
> > Yes.
> >
> > > And thus has been present in some renesas-drivers release, but was
> > > dropped _before_ the 2019-10-15-v5.4-rc3 release.
> >
> > That would explain why it didn't show up before, right? And don't you
>
> Not exactly. That branch was dropped before Ulf reverted the
> NON_REMOVABLE workaround.
>
> > have a Ebisu in your board farm, too? Luckily, I have one, too, now. It
> > should be affected.
>
> Haven't seen the issue on Ebisu (yet?).
> To be sure, I have just retried again with the exact same kernel image
> and userland: m3n-salvator-xs hangs, ebisu boots fine (and I can read
> /dev/mmcblk2).
>
> But as it looks to be timing-related, and E3 has different/less CPU cores,
> it may still be affected.

A few comments around your runtime PM concerns, not sure if this
matters to you issues.

So, only by looking at the code for probing of the tmio variant
drivers, it seems like there are accesses (both reads and writes) for
the device being done, without first making sure that the clock
managed by the PM domain has been enabled. Is that really okay? Note,
this isn't a new thing, but it has been there as long as can remember.

For example, in renesas_sdhi_probe() there are calls made to
sd_ctrl_write16|read16() before calling tmio_mmc_host_probe().

Additionally in tmio_mmc_host_probe(), there are calls made to
sd_ctrl_write16|read16(), before calling pm_runtime_get_sync().

If you haven't noticed, we have also managed to replace the call with
pm_runtime_get_sync() with a call to dev_pm_domain_start(), to
simplify the code. The point is, these changes are queued via Rafael's
pm-tree for v5.5.

So, perhaps at this point we should simply drop the offending commit
and revisit this once v5.5-rc1 is out?

Kind regards
Uffe
Geert Uytterhoeven Nov. 21, 2019, 10:52 a.m. UTC | #8
Hi Ulf,

On Thu, Nov 21, 2019 at 11:30 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On Thu, 21 Nov 2019 at 10:35, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Thu, Nov 21, 2019 at 9:57 AM Wolfram Sang <wsa@the-dreams.de> wrote:
> > > > So some of my local code on top must have impacted the behavior.
> > >
> > > Any change in temperature? Niklas and I wonder if it is thermal related.
> >
> > Nope. I tried an old "known" good kernel again yesterday, and it worked.
> > That was BTW the one which had the additional debug prints:
> >
> > --- a/drivers/base/power/domain.c
> > +++ b/drivers/base/power/domain.c
> > @@ -199,6 +199,7 @@ static struct generic_pm_domain
> > *dev_to_genpd(struct device *dev)
> >  static int genpd_stop_dev(const struct generic_pm_domain *genpd,
> >                           struct device *dev)
> >  {
> > +pr_info("==== %s/%s: stop\n", genpd->name, dev_name(dev));
> >         WARN(device_may_wakeup(dev),
> >              "Domain %s must be active_wakeup for wakeup source %s\n",
> >              genpd->name, dev_name(dev));
> > @@ -208,6 +209,7 @@ static int genpd_stop_dev(const struct
> > generic_pm_domain *genpd,
> >  static int genpd_start_dev(const struct generic_pm_domain *genpd,
> >                            struct device *dev)
> >  {
> > +pr_info("==== %s/%s: start\n", genpd->name, dev_name(dev));
> >         return GENPD_DEV_CALLBACK(genpd, int, start, dev);
> >  }
> >
> > Removing those prints made the old kernel fail, too, so this is why I think
> > it is a race with Runtime PM.
> >
> > With a tree based on latest renesas-drivers, it happens regardless of those
> > debug prints.
> >
> > > > > I am working on an issue where the SCC hangs, but this has to do with
> > > > > always providing the SCC clock (SDnH). I don't really see the connection
> > > > > of that to RuntimePM yet, though :/

> A few comments around your runtime PM concerns, not sure if this
> matters to you issues.
>
> So, only by looking at the code for probing of the tmio variant
> drivers, it seems like there are accesses (both reads and writes) for
> the device being done, without first making sure that the clock
> managed by the PM domain has been enabled. Is that really okay? Note,
> this isn't a new thing, but it has been there as long as can remember.

No, that is not OK.
On R-Car Gen2, my debug code that disables all non-critical module
clocks during early boot should have caught them.
On R-Car Gen3, it's different, unfortunately, as apparently not all
module clocks can be disabled (protected by secure code?).
So my debug code has limited use on those platforms.

Note that the SCC seems to be clocked by a normal clock (SDnH), not by a
module clock, so it's not controlled by Runtime PM.
In fact, without "[PATCH] WIP: clk: renesas: rcar-gen3: enable SDnH clk
for HS modes", Linux doesn't enable it at all.

BTW, perhaps U-Boot leaves the SCC in a different state on R-Car E3
than on M3-N?

> For example, in renesas_sdhi_probe() there are calls made to
> sd_ctrl_write16|read16() before calling tmio_mmc_host_probe().
>
> Additionally in tmio_mmc_host_probe(), there are calls made to
> sd_ctrl_write16|read16(), before calling pm_runtime_get_sync().

Ugh.

> If you haven't noticed, we have also managed to replace the call with
> pm_runtime_get_sync() with a call to dev_pm_domain_start(), to
> simplify the code. The point is, these changes are queued via Rafael's
> pm-tree for v5.5.

Thanks, I hadn't noticed that.
I do have pm/linux-next in renesas-drivers, so that code has been exercised.

> So, perhaps at this point we should simply drop the offending commit
> and revisit this once v5.5-rc1 is out?

Yes, that looks like the best short-term solution.
Thanks!

Gr{oetje,eeting}s,

                        Geert
Wolfram Sang Nov. 21, 2019, 11:10 a.m. UTC | #9
> As I managed to bisect it, it was fairly reproducible for me. Just checkout
> commit 7a7dab237027939c ("mmc: tmio: remove workaround for NON_REMOVABLE"),
> or use renesas-drivers.

That didn't work for me until I used renesas_defconfig with
renesas-drivers. Obviously (by now) the issue depends on the kernel
config. I don't know what I changed so I couldn't trigger the bug
anymore. However, I know have the issue highly reproducible with
renesas_defconfig and renesas-drivers. Good!

> But as it looks to be timing-related, and E3 has different/less CPU cores,
> it may still be affected.

I'll check my Ebisu later, too. Couldn't get the kernel to run on the
first try. Probably PEBKAC.
Wolfram Sang Nov. 21, 2019, 11:12 a.m. UTC | #10
> If you haven't noticed, we have also managed to replace the call with
> pm_runtime_get_sync() with a call to dev_pm_domain_start(), to
> simplify the code. The point is, these changes are queued via Rafael's
> pm-tree for v5.5.

I'll have a look at those already. I had them in mind, too, once we
discovered that it is still RPM related.

> So, perhaps at this point we should simply drop the offending commit
> and revisit this once v5.5-rc1 is out?

Yes, this was my suggestion, too. I'll send a patch.
Wolfram Sang Dec. 2, 2019, 8:20 a.m. UTC | #11
> anymore. However, I know have the issue highly reproducible with
> renesas_defconfig and renesas-drivers. Good!

Bummer, it is not that reproducible :(

Yesterday, I tried latest linus tree which includes Ulf's changes to
genpd and it worked, even with the NON_REMOVABLE workaround removed
again. Then I reverted Ulf's changes to double check it made a
difference, but the SCC still worked. So, I switched back to the
renesas-drivers tree which used to fail last week, and it sadly works,
too. Sigh...

I'll move over now to upport the manual calibration mechanism and will
keep an eye on if/when the SCC fails again...
Geert Uytterhoeven Dec. 2, 2019, 8:31 a.m. UTC | #12
Hi Wolfram,

On Mon, Dec 2, 2019 at 9:20 AM Wolfram Sang <wsa@the-dreams.de> wrote:
> > anymore. However, I know have the issue highly reproducible with
> > renesas_defconfig and renesas-drivers. Good!
>
> Bummer, it is not that reproducible :(
>
> Yesterday, I tried latest linus tree which includes Ulf's changes to
> genpd and it worked, even with the NON_REMOVABLE workaround removed
> again. Then I reverted Ulf's changes to double check it made a
> difference, but the SCC still worked. So, I switched back to the
> renesas-drivers tree which used to fail last week, and it sadly works,
> too. Sigh...

How do you reboot in between tests?
I usually use /sbin/reboot if the target booted fine, and the (remote
controlled)
reset button if the target locked up.

Gr{oetje,eeting}s,

                        Geert
Wolfram Sang Dec. 2, 2019, 8:50 a.m. UTC | #13
> How do you reboot in between tests?
> I usually use /sbin/reboot if the target booted fine, and the (remote
> controlled)
> reset button if the target locked up.

I use mostly the reset button. As I recall, last week the issue happened
even after a cold boot... but I can retry using 'reboot'.
diff mbox series

Patch

diff --git a/drivers/mmc/host/tmio_mmc_core.c b/drivers/mmc/host/tmio_mmc_core.c
index 9b6e1001e77c..63dc37481fba 100644
--- a/drivers/mmc/host/tmio_mmc_core.c
+++ b/drivers/mmc/host/tmio_mmc_core.c
@@ -1208,15 +1208,6 @@  int tmio_mmc_host_probe(struct tmio_mmc_host *_host)
 	if (!_host->reset)
 		_host->reset = tmio_mmc_reset;
 
-	/*
-	 * On Gen2+, eMMC with NONREMOVABLE currently fails because native
-	 * hotplug gets disabled. It seems RuntimePM related yet we need further
-	 * research. Since we are planning a PM overhaul anyway, let's enforce
-	 * for now the device being active by enabling native hotplug always.
-	 */
-	if (pdata->flags & TMIO_MMC_MIN_RCAR2)
-		_host->native_hotplug = true;
-
 	/*
 	 * While using internal tmio hardware logic for card detection, we need
 	 * to ensure it stays powered for it to work.