[V2,3/8] soc: qcom-geni-se: Add interconnect support to fix earlycon crash
diff mbox series

Message ID 1584105134-13583-4-git-send-email-akashast@codeaurora.org
State Superseded
Headers show
Series
  • Add interconnect support to QSPI and QUP drivers
Related show

Commit Message

Akash Asthana March 13, 2020, 1:12 p.m. UTC
V1 patch@https://patchwork.kernel.org/patch/11386469/ caused SC7180 system
to reset at boot time.

As QUP core clock is shared among all the SE drivers present on particular
QUP wrapper, the reset seen is due to earlycon usage after QUP core clock
is put to 0 from other SE drivers before real console comes up.

As earlycon can't vote for it's QUP core need, to fix this add ICC
support to common/QUP wrapper driver and put vote for QUP core from
probe on behalf of earlycon and remove vote during sys suspend.

Signed-off-by: Akash Asthana <akashast@codeaurora.org>
Reported-by: Matthias Kaehlcke <mka@chromium.org>
---
 drivers/soc/qcom/qcom-geni-se.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

Comments

Matthias Kaehlcke March 13, 2020, 8:44 p.m. UTC | #1
Hi Akash,

On Fri, Mar 13, 2020 at 06:42:09PM +0530, Akash Asthana wrote:
> V1 patch@https://patchwork.kernel.org/patch/11386469/ caused SC7180 system
> to reset at boot time.

The v1 patch isn't relevant in the commit message, please just describe the
problem. Also the crash only occurs when earlycon is used.

> As QUP core clock is shared among all the SE drivers present on particular
> QUP wrapper, the reset seen is due to earlycon usage after QUP core clock
> is put to 0 from other SE drivers before real console comes up.
> 
> As earlycon can't vote for it's QUP core need, to fix this add ICC
> support to common/QUP wrapper driver and put vote for QUP core from
> probe on behalf of earlycon and remove vote during sys suspend.

Only removing the vote on suspend isn't ideal, the system might never get
suspended. That said I don't have a really good alternative suggestion.

One thing you could possibly do is to launch a delayed work, check
console_device() every second or so and remove the vote when it returns
non-NULL. Not claiming this would be a great solution ...

The cleanest solution might be a notifier when the early console is
unregistered, it seems somewhat over-engineered though ... Then again
other (future) uart drivers with interconnect support might run into
the same problem.

> Signed-off-by: Akash Asthana <akashast@codeaurora.org>
> Reported-by: Matthias Kaehlcke <mka@chromium.org>
> ---
>  drivers/soc/qcom/qcom-geni-se.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
> 
> diff --git a/drivers/soc/qcom/qcom-geni-se.c b/drivers/soc/qcom/qcom-geni-se.c
> index 7d622ea..d244dfc 100644
> --- a/drivers/soc/qcom/qcom-geni-se.c
> +++ b/drivers/soc/qcom/qcom-geni-se.c
> @@ -90,6 +90,7 @@ struct geni_wrapper {
>  	struct device *dev;
>  	void __iomem *base;
>  	struct clk_bulk_data ahb_clks[NUM_AHB_CLKS];
> +	struct icc_path *icc_path_geni_to_core;
>  };
>  
>  #define QUP_HW_VER_REG			0x4
> @@ -747,11 +748,50 @@ static int geni_se_probe(struct platform_device *pdev)
>  		}
>  	}
>  
> +#ifdef CONFIG_SERIAL_EARLYCON
> +	wrapper->icc_path_geni_to_core = devm_of_icc_get(dev, "qup-core");
> +	if (IS_ERR(wrapper->icc_path_geni_to_core))
> +		return PTR_ERR(wrapper->icc_path_geni_to_core);
> +	/*
> +	 * Put minmal BW request on core clocks on behalf of early console.
> +	 * The vote will be removed in suspend call.
> +	 */
> +	ret = icc_set_bw(wrapper->icc_path_geni_to_core, Bps_to_icc(1000),
> +			Bps_to_icc(1000));
> +	if (ret) {
> +		dev_err(&pdev->dev, "%s: ICC BW voting failed for core\n",
> +			__func__);
> +		return ret;
> +	}

What is ugly about this is that it's done for every QUP, not only the one
with the early console. Again, I don't have a good solution for it, maybe
it's a limitation we have to live with :(

> +#endif
> +
>  	dev_set_drvdata(dev, wrapper);
>  	dev_dbg(dev, "GENI SE Driver probed\n");
>  	return devm_of_platform_populate(dev);
>  }
>  
> +static int __maybe_unused geni_se_sys_suspend(struct device *dev)
> +{
> +	struct geni_wrapper *wrapper = dev_get_drvdata(dev);
> +	int ret;
> +
> +#ifdef CONFIG_SERIAL_EARLYCON
> +	ret = icc_set_bw(wrapper->icc_path_geni_to_core, 0, 0);

I think you only want to do this on the first suspend.

Do we need to handle the case where no 'real' console is configured?
In this case the early console would be active forever and setting
the bandwidths to 0 might cause a similar crash than the one you are
trying to fix. Not sure if that's a real world use case, but wanted to
mention it. Maybe this is an argument of the notifier approach?

> +	if (ret) {
> +		dev_err(dev, "%s: ICC BW remove failed for core\n",
> +			__func__);
> +		return ret;

Aborting suspend seems too harsh since the QUP should still be fully
functional unless there is a general problem with the interconnects.

I would suggest to change the log to dev_warn() and return 0.
Akash Asthana March 17, 2020, 10:57 a.m. UTC | #2
Hi Matthias,

On 3/14/2020 2:14 AM, Matthias Kaehlcke wrote:
> Hi Akash,
>
> On Fri, Mar 13, 2020 at 06:42:09PM +0530, Akash Asthana wrote:
>> V1 patch@https://patchwork.kernel.org/patch/11386469/ caused SC7180 system
>> to reset at boot time.
> The v1 patch isn't relevant in the commit message, please just describe the
> problem. Also the crash only occurs when earlycon is used.
ok
>
>> As QUP core clock is shared among all the SE drivers present on particular
>> QUP wrapper, the reset seen is due to earlycon usage after QUP core clock
>> is put to 0 from other SE drivers before real console comes up.
>>
>> As earlycon can't vote for it's QUP core need, to fix this add ICC
>> support to common/QUP wrapper driver and put vote for QUP core from
>> probe on behalf of earlycon and remove vote during sys suspend.
> Only removing the vote on suspend isn't ideal, the system might never get
> suspended. That said I don't have a really good alternative suggestion.
>
> One thing you could possibly do is to launch a delayed work, check
> console_device() every second or so and remove the vote when it returns
> non-NULL. Not claiming this would be a great solution ...
>
> The cleanest solution might be a notifier when the early console is
> unregistered, it seems somewhat over-engineered though ... Then again
> other (future) uart drivers with interconnect support might run into
> the same problem.

We are hitting this problem because QUP core clocks are shared among all 
the SE driver present in particular QUP wrapper, if other HW controllers 
has similar architecture we will hit this issue.

How about if we expose an API from common driver(geni-se) for putting 
QUP core BW vote to 0.

We call this from console probe just after uart_add_one_port call 
(console resources are enabled as part of this call) to put core quota 
to 0 on behalf of earlyconsole?

>
>> Signed-off-by: Akash Asthana <akashast@codeaurora.org>
>> Reported-by: Matthias Kaehlcke <mka@chromium.org>
>> ---
>>   drivers/soc/qcom/qcom-geni-se.c | 41 +++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 41 insertions(+)
>>
>> diff --git a/drivers/soc/qcom/qcom-geni-se.c b/drivers/soc/qcom/qcom-geni-se.c
>> index 7d622ea..d244dfc 100644
>> --- a/drivers/soc/qcom/qcom-geni-se.c
>> +++ b/drivers/soc/qcom/qcom-geni-se.c
>> @@ -90,6 +90,7 @@ struct geni_wrapper {
>>   	struct device *dev;
>>   	void __iomem *base;
>>   	struct clk_bulk_data ahb_clks[NUM_AHB_CLKS];
>> +	struct icc_path *icc_path_geni_to_core;
>>   };
>>   
>>   #define QUP_HW_VER_REG			0x4
>> @@ -747,11 +748,50 @@ static int geni_se_probe(struct platform_device *pdev)
>>   		}
>>   	}
>>   
>> +#ifdef CONFIG_SERIAL_EARLYCON
>> +	wrapper->icc_path_geni_to_core = devm_of_icc_get(dev, "qup-core");
>> +	if (IS_ERR(wrapper->icc_path_geni_to_core))
>> +		return PTR_ERR(wrapper->icc_path_geni_to_core);
>> +	/*
>> +	 * Put minmal BW request on core clocks on behalf of early console.
>> +	 * The vote will be removed in suspend call.
>> +	 */
>> +	ret = icc_set_bw(wrapper->icc_path_geni_to_core, Bps_to_icc(1000),
>> +			Bps_to_icc(1000));
>> +	if (ret) {
>> +		dev_err(&pdev->dev, "%s: ICC BW voting failed for core\n",
>> +			__func__);
>> +		return ret;
>> +	}
> What is ugly about this is that it's done for every QUP, not only the one
> with the early console. Again, I don't have a good solution for it, maybe
> it's a limitation we have to live with :(

There is one more limitation from QUP core side. Core clocks for both 
the QUP wrapper runs at same speed.

core2x_1 = core2x_2 = max(core2x_1, core2x_2);

So with above limitation and if we are removing early con vote from Core 
when real console comes up. It doesn't matter whether it's done for 
every QUP or the only with early console.

>
>> +#endif
>> +
>>   	dev_set_drvdata(dev, wrapper);
>>   	dev_dbg(dev, "GENI SE Driver probed\n");
>>   	return devm_of_platform_populate(dev);
>>   }
>>   
>> +static int __maybe_unused geni_se_sys_suspend(struct device *dev)
>> +{
>> +	struct geni_wrapper *wrapper = dev_get_drvdata(dev);
>> +	int ret;
>> +
>> +#ifdef CONFIG_SERIAL_EARLYCON
>> +	ret = icc_set_bw(wrapper->icc_path_geni_to_core, 0, 0);
> I think you only want to do this on the first suspend.
Ok, I can add that logic using global variable.
>
> Do we need to handle the case where no 'real' console is configured?
> In this case the early console would be active forever and setting
> the bandwidths to 0 might cause a similar crash than the one you are
> trying to fix. Not sure if that's a real world use case, but wanted to
> mention it. Maybe this is an argument of the notifier approach?
We can't support earlycon without real console.

As earlyconsole doesn't do any kind of resource enablement(SE clocks, 
pinctrl, etc) it assumes that resources are already enabled from 
previous stages.

So if real console doesn't come up no one will vote for that SE clock, 
and it will be disabled from clk late_init call which will result into 
un-clocked access.


>
>> +	if (ret) {
>> +		dev_err(dev, "%s: ICC BW remove failed for core\n",
>> +			__func__);
>> +		return ret;
> Aborting suspend seems too harsh since the QUP should still be fully
> functional unless there is a general problem with the interconnects.
>
> I would suggest to change the log to dev_warn() and return 0.

Ok

Thanks for reviewing the patch.

regards,

Akash
Matthias Kaehlcke March 17, 2020, 6:29 p.m. UTC | #3
Hi Akash,

On Tue, Mar 17, 2020 at 04:27:47PM +0530, Akash Asthana wrote:
> Hi Matthias,
> 
> On 3/14/2020 2:14 AM, Matthias Kaehlcke wrote:
> > Hi Akash,
> > 
> > On Fri, Mar 13, 2020 at 06:42:09PM +0530, Akash Asthana wrote:
> > > V1 patch@https://patchwork.kernel.org/patch/11386469/ caused SC7180 system
> > > to reset at boot time.
> > The v1 patch isn't relevant in the commit message, please just describe the
> > problem. Also the crash only occurs when earlycon is used.
> ok
> > 
> > > As QUP core clock is shared among all the SE drivers present on particular
> > > QUP wrapper, the reset seen is due to earlycon usage after QUP core clock
> > > is put to 0 from other SE drivers before real console comes up.
> > > 
> > > As earlycon can't vote for it's QUP core need, to fix this add ICC
> > > support to common/QUP wrapper driver and put vote for QUP core from
> > > probe on behalf of earlycon and remove vote during sys suspend.
> > Only removing the vote on suspend isn't ideal, the system might never get
> > suspended. That said I don't have a really good alternative suggestion.
> > 
> > One thing you could possibly do is to launch a delayed work, check
> > console_device() every second or so and remove the vote when it returns
> > non-NULL. Not claiming this would be a great solution ...
> > 
> > The cleanest solution might be a notifier when the early console is
> > unregistered, it seems somewhat over-engineered though ... Then again
> > other (future) uart drivers with interconnect support might run into
> > the same problem.
> 
> We are hitting this problem because QUP core clocks are shared among all the
> SE driver present in particular QUP wrapper, if other HW controllers has
> similar architecture we will hit this issue.
> 
> How about if we expose an API from common driver(geni-se) for putting QUP
> core BW vote to 0.
> 
> We call this from console probe just after uart_add_one_port call (console
> resources are enabled as part of this call) to put core quota to 0 on behalf
> of earlyconsole?

From my notes from earlier debugging I have doubts this would work:

  There is a short window where the early console and the 'real' console coexist:

  [    3.858122] printk: console [ttyMSM0] enabled
  [    3.875692] printk: bootconsole [qcom_geni0] disabled

  The reset probably occurs when the early console tries to write, but the ICC
  is effectively disabled because ttyMSM0 and the other geni ports are runtime
  suspended.

> > > Signed-off-by: Akash Asthana <akashast@codeaurora.org>
> > > Reported-by: Matthias Kaehlcke <mka@chromium.org>
> > > ---
> > >   drivers/soc/qcom/qcom-geni-se.c | 41 +++++++++++++++++++++++++++++++++++++++++
> > >   1 file changed, 41 insertions(+)
> > > 
> > > diff --git a/drivers/soc/qcom/qcom-geni-se.c b/drivers/soc/qcom/qcom-geni-se.c
> > > index 7d622ea..d244dfc 100644
> > > --- a/drivers/soc/qcom/qcom-geni-se.c
> > > +++ b/drivers/soc/qcom/qcom-geni-se.c
> > > @@ -90,6 +90,7 @@ struct geni_wrapper {
> > >   	struct device *dev;
> > >   	void __iomem *base;
> > >   	struct clk_bulk_data ahb_clks[NUM_AHB_CLKS];
> > > +	struct icc_path *icc_path_geni_to_core;
> > >   };
> > >   #define QUP_HW_VER_REG			0x4
> > > @@ -747,11 +748,50 @@ static int geni_se_probe(struct platform_device *pdev)
> > >   		}
> > >   	}
> > > +#ifdef CONFIG_SERIAL_EARLYCON
> > > +	wrapper->icc_path_geni_to_core = devm_of_icc_get(dev, "qup-core");
> > > +	if (IS_ERR(wrapper->icc_path_geni_to_core))
> > > +		return PTR_ERR(wrapper->icc_path_geni_to_core);
> > > +	/*
> > > +	 * Put minmal BW request on core clocks on behalf of early console.
> > > +	 * The vote will be removed in suspend call.
> > > +	 */
> > > +	ret = icc_set_bw(wrapper->icc_path_geni_to_core, Bps_to_icc(1000),
> > > +			Bps_to_icc(1000));
> > > +	if (ret) {
> > > +		dev_err(&pdev->dev, "%s: ICC BW voting failed for core\n",
> > > +			__func__);
> > > +		return ret;
> > > +	}
> > What is ugly about this is that it's done for every QUP, not only the one
> > with the early console. Again, I don't have a good solution for it, maybe
> > it's a limitation we have to live with :(
> 
> There is one more limitation from QUP core side. Core clocks for both the
> QUP wrapper runs at same speed.
> 
> core2x_1 = core2x_2 = max(core2x_1, core2x_2);
> 
> So with above limitation and if we are removing early con vote from Core
> when real console comes up. It doesn't matter whether it's done for every
> QUP or the only with early console.

it's still sorta ugly at an abstraction level, but it seem we have to be
pragmatic here.

> > > +#endif
> > > +
> > >   	dev_set_drvdata(dev, wrapper);
> > >   	dev_dbg(dev, "GENI SE Driver probed\n");
> > >   	return devm_of_platform_populate(dev);
> > >   }
> > > +static int __maybe_unused geni_se_sys_suspend(struct device *dev)
> > > +{
> > > +	struct geni_wrapper *wrapper = dev_get_drvdata(dev);
> > > +	int ret;
> > > +
> > > +#ifdef CONFIG_SERIAL_EARLYCON
> > > +	ret = icc_set_bw(wrapper->icc_path_geni_to_core, 0, 0);
> > I think you only want to do this on the first suspend.
> Ok, I can add that logic using global variable.
> > 
> > Do we need to handle the case where no 'real' console is configured?
> > In this case the early console would be active forever and setting
> > the bandwidths to 0 might cause a similar crash than the one you are
> > trying to fix. Not sure if that's a real world use case, but wanted to
> > mention it. Maybe this is an argument of the notifier approach?
> We can't support earlycon without real console.
> 
> As earlyconsole doesn't do any kind of resource enablement(SE clocks,
> pinctrl, etc) it assumes that resources are already enabled from previous
> stages.
> 
> So if real console doesn't come up no one will vote for that SE clock, and
> it will be disabled from clk late_init call which will result into
> un-clocked access.

Ok, IIUC what you are saying is that earlycon can't work on its own after geni
initialization. Because it clearly can work before (otherwise what would be
its purpose?), supposedly because the bootloader configures the necessary
bits.

In any case the bottom line is that earlycon requires a real console to be
configured and there is no need to handle the corner case I brought up.
Evan Green March 17, 2020, 7:08 p.m. UTC | #4
On Tue, Mar 17, 2020 at 3:58 AM Akash Asthana <akashast@codeaurora.org> wrote:
>
> Hi Matthias,
>
> On 3/14/2020 2:14 AM, Matthias Kaehlcke wrote:
> > Hi Akash,
> >
> > On Fri, Mar 13, 2020 at 06:42:09PM +0530, Akash Asthana wrote:
> >> V1 patch@https://patchwork.kernel.org/patch/11386469/ caused SC7180 system
> >> to reset at boot time.
> > The v1 patch isn't relevant in the commit message, please just describe the
> > problem. Also the crash only occurs when earlycon is used.
> ok
> >
> >> As QUP core clock is shared among all the SE drivers present on particular
> >> QUP wrapper, the reset seen is due to earlycon usage after QUP core clock
> >> is put to 0 from other SE drivers before real console comes up.
> >>
> >> As earlycon can't vote for it's QUP core need, to fix this add ICC
> >> support to common/QUP wrapper driver and put vote for QUP core from
> >> probe on behalf of earlycon and remove vote during sys suspend.
> > Only removing the vote on suspend isn't ideal, the system might never get
> > suspended. That said I don't have a really good alternative suggestion.
> >
> > One thing you could possibly do is to launch a delayed work, check
> > console_device() every second or so and remove the vote when it returns
> > non-NULL. Not claiming this would be a great solution ...
> >
> > The cleanest solution might be a notifier when the early console is
> > unregistered, it seems somewhat over-engineered though ... Then again
> > other (future) uart drivers with interconnect support might run into
> > the same problem.
>
> We are hitting this problem because QUP core clocks are shared among all
> the SE driver present in particular QUP wrapper, if other HW controllers
> has similar architecture we will hit this issue.
>
> How about if we expose an API from common driver(geni-se) for putting
> QUP core BW vote to 0.
>
> We call this from console probe just after uart_add_one_port call
> (console resources are enabled as part of this call) to put core quota
> to 0 on behalf of earlyconsole?

+Georgi

Hm, these boot proxy votes are annoying, since the whole house of
cards comes down if you replace these votes in the wrong order.

I believe consensus in the other patches was to consolidate most of
the interconnect support into the common SE code, right? Would that
help you with these boot proxy votes? What I'm thinking is something
along the lines of:
 * SPI, I2C, UART all call into the new common geni_se_icc_on/off()
(or whatever it's called)
 * If geni_se_icc_off() sees that console UART hasn't voted yet, save
the votes but don't actually call icc_set(0) now.
 * Once uart votes for the first time, call icc_set() on all of SPI,
I2C, UART to get things back in sync.

That's a sort of roll-your-own solution for GENI, but we do have this
problem elsewhere as well. A more general solution would be to have
the interconnect providers prop things up (ie ignore votes to lower
bandwidth) until some "go" moment where we feel we've enumerated all
devices. I was originally thinking to model this off of something like
clk_disable_unused(), but after chatting with Stephen it's clear
late_initcall's aren't really indicative of all devices having
actually come up. So I'm not sure where the appropriate "go" moment
is.

-Evan


>
> >
> >> Signed-off-by: Akash Asthana <akashast@codeaurora.org>
> >> Reported-by: Matthias Kaehlcke <mka@chromium.org>
> >> ---
> >>   drivers/soc/qcom/qcom-geni-se.c | 41 +++++++++++++++++++++++++++++++++++++++++
> >>   1 file changed, 41 insertions(+)
> >>
> >> diff --git a/drivers/soc/qcom/qcom-geni-se.c b/drivers/soc/qcom/qcom-geni-se.c
> >> index 7d622ea..d244dfc 100644
> >> --- a/drivers/soc/qcom/qcom-geni-se.c
> >> +++ b/drivers/soc/qcom/qcom-geni-se.c
> >> @@ -90,6 +90,7 @@ struct geni_wrapper {
> >>      struct device *dev;
> >>      void __iomem *base;
> >>      struct clk_bulk_data ahb_clks[NUM_AHB_CLKS];
> >> +    struct icc_path *icc_path_geni_to_core;
> >>   };
> >>
> >>   #define QUP_HW_VER_REG                     0x4
> >> @@ -747,11 +748,50 @@ static int geni_se_probe(struct platform_device *pdev)
> >>              }
> >>      }
> >>
> >> +#ifdef CONFIG_SERIAL_EARLYCON
> >> +    wrapper->icc_path_geni_to_core = devm_of_icc_get(dev, "qup-core");
> >> +    if (IS_ERR(wrapper->icc_path_geni_to_core))
> >> +            return PTR_ERR(wrapper->icc_path_geni_to_core);
> >> +    /*
> >> +     * Put minmal BW request on core clocks on behalf of early console.
> >> +     * The vote will be removed in suspend call.
> >> +     */
> >> +    ret = icc_set_bw(wrapper->icc_path_geni_to_core, Bps_to_icc(1000),
> >> +                    Bps_to_icc(1000));
> >> +    if (ret) {
> >> +            dev_err(&pdev->dev, "%s: ICC BW voting failed for core\n",
> >> +                    __func__);
> >> +            return ret;
> >> +    }
> > What is ugly about this is that it's done for every QUP, not only the one
> > with the early console. Again, I don't have a good solution for it, maybe
> > it's a limitation we have to live with :(
>
> There is one more limitation from QUP core side. Core clocks for both
> the QUP wrapper runs at same speed.
>
> core2x_1 = core2x_2 = max(core2x_1, core2x_2);
>
> So with above limitation and if we are removing early con vote from Core
> when real console comes up. It doesn't matter whether it's done for
> every QUP or the only with early console.
>
> >
> >> +#endif
> >> +
> >>      dev_set_drvdata(dev, wrapper);
> >>      dev_dbg(dev, "GENI SE Driver probed\n");
> >>      return devm_of_platform_populate(dev);
> >>   }
> >>
> >> +static int __maybe_unused geni_se_sys_suspend(struct device *dev)
> >> +{
> >> +    struct geni_wrapper *wrapper = dev_get_drvdata(dev);
> >> +    int ret;
> >> +
> >> +#ifdef CONFIG_SERIAL_EARLYCON
> >> +    ret = icc_set_bw(wrapper->icc_path_geni_to_core, 0, 0);
> > I think you only want to do this on the first suspend.
> Ok, I can add that logic using global variable.
> >
> > Do we need to handle the case where no 'real' console is configured?
> > In this case the early console would be active forever and setting
> > the bandwidths to 0 might cause a similar crash than the one you are
> > trying to fix. Not sure if that's a real world use case, but wanted to
> > mention it. Maybe this is an argument of the notifier approach?
> We can't support earlycon without real console.
>
> As earlyconsole doesn't do any kind of resource enablement(SE clocks,
> pinctrl, etc) it assumes that resources are already enabled from
> previous stages.
>
> So if real console doesn't come up no one will vote for that SE clock,
> and it will be disabled from clk late_init call which will result into
> un-clocked access.
>
>
> >
> >> +    if (ret) {
> >> +            dev_err(dev, "%s: ICC BW remove failed for core\n",
> >> +                    __func__);
> >> +            return ret;
> > Aborting suspend seems too harsh since the QUP should still be fully
> > functional unless there is a general problem with the interconnects.
> >
> > I would suggest to change the log to dev_warn() and return 0.
>
> Ok
>
> Thanks for reviewing the patch.
>
> regards,
>
> Akash
>
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,\na Linux Foundation Collaborative Project
Douglas Anderson March 17, 2020, 7:46 p.m. UTC | #5
Hi,

On Tue, Mar 17, 2020 at 12:08 PM Evan Green <evgreen@chromium.org> wrote:
>
> On Tue, Mar 17, 2020 at 3:58 AM Akash Asthana <akashast@codeaurora.org> wrote:
> >
> > Hi Matthias,
> >
> > On 3/14/2020 2:14 AM, Matthias Kaehlcke wrote:
> > > Hi Akash,
> > >
> > > On Fri, Mar 13, 2020 at 06:42:09PM +0530, Akash Asthana wrote:
> > >> V1 patch@https://patchwork.kernel.org/patch/11386469/ caused SC7180 system
> > >> to reset at boot time.
> > > The v1 patch isn't relevant in the commit message, please just describe the
> > > problem. Also the crash only occurs when earlycon is used.
> > ok
> > >
> > >> As QUP core clock is shared among all the SE drivers present on particular
> > >> QUP wrapper, the reset seen is due to earlycon usage after QUP core clock
> > >> is put to 0 from other SE drivers before real console comes up.
> > >>
> > >> As earlycon can't vote for it's QUP core need, to fix this add ICC
> > >> support to common/QUP wrapper driver and put vote for QUP core from
> > >> probe on behalf of earlycon and remove vote during sys suspend.
> > > Only removing the vote on suspend isn't ideal, the system might never get
> > > suspended. That said I don't have a really good alternative suggestion.
> > >
> > > One thing you could possibly do is to launch a delayed work, check
> > > console_device() every second or so and remove the vote when it returns
> > > non-NULL. Not claiming this would be a great solution ...
> > >
> > > The cleanest solution might be a notifier when the early console is
> > > unregistered, it seems somewhat over-engineered though ... Then again
> > > other (future) uart drivers with interconnect support might run into
> > > the same problem.
> >
> > We are hitting this problem because QUP core clocks are shared among all
> > the SE driver present in particular QUP wrapper, if other HW controllers
> > has similar architecture we will hit this issue.
> >
> > How about if we expose an API from common driver(geni-se) for putting
> > QUP core BW vote to 0.
> >
> > We call this from console probe just after uart_add_one_port call
> > (console resources are enabled as part of this call) to put core quota
> > to 0 on behalf of earlyconsole?
>
> +Georgi
>
> Hm, these boot proxy votes are annoying, since the whole house of
> cards comes down if you replace these votes in the wrong order.
>
> I believe consensus in the other patches was to consolidate most of
> the interconnect support into the common SE code, right? Would that
> help you with these boot proxy votes? What I'm thinking is something
> along the lines of:
>  * SPI, I2C, UART all call into the new common geni_se_icc_on/off()
> (or whatever it's called)
>  * If geni_se_icc_off() sees that console UART hasn't voted yet, save
> the votes but don't actually call icc_set(0) now.
>  * Once uart votes for the first time, call icc_set() on all of SPI,
> I2C, UART to get things back in sync.
>
> That's a sort of roll-your-own solution for GENI, but we do have this
> problem elsewhere as well. A more general solution would be to have
> the interconnect providers prop things up (ie ignore votes to lower
> bandwidth) until some "go" moment where we feel we've enumerated all
> devices. I was originally thinking to model this off of something like
> clk_disable_unused(), but after chatting with Stephen it's clear
> late_initcall's aren't really indicative of all devices having
> actually come up. So I'm not sure where the appropriate "go" moment
> is.

I ran across this gem the other day, which explains why I get a bunch
of regulator yells 30 seconds after bootup:

/*
 * We punt completion for an arbitrary amount of time since
 * systems like distros will load many drivers from userspace
 * so consumers might not always be ready yet, this is
 * particularly an issue with laptops where this might bounce
 * the display off then on.  Ideally we'd get a notification
 * from userspace when this happens but we don't so just wait
 * a bit and hope we waited long enough.  It'd be better if
 * we'd only do this on systems that need it, and a kernel
 * command line option might be useful.
 */
schedule_delayed_work(&regulator_init_complete_work,
      msecs_to_jiffies(30000));

...but that also means that this is basically an unsolved problem.  I
suppose one thing you could do would be to centralize this "30 seconds
after bootup" for several subsystems (regulator, clock, interconnect,
...) and then at least it would leave a nice place for someone to do
better...  ;-)

-Doug
Akash Asthana March 18, 2020, 8:54 a.m. UTC | #6
Hi Matthias,

On 3/17/2020 11:59 PM, Matthias Kaehlcke wrote:
> Hi Akash,
>
> On Tue, Mar 17, 2020 at 04:27:47PM +0530, Akash Asthana wrote:
>> Hi Matthias,
>>
>> On 3/14/2020 2:14 AM, Matthias Kaehlcke wrote:
>>> Hi Akash,
>>>
>>> On Fri, Mar 13, 2020 at 06:42:09PM +0530, Akash Asthana wrote:
>>>> V1 patch@https://patchwork.kernel.org/patch/11386469/ caused SC7180 system
>>>> to reset at boot time.
>>> The v1 patch isn't relevant in the commit message, please just describe the
>>> problem. Also the crash only occurs when earlycon is used.
>> ok
>>>> As QUP core clock is shared among all the SE drivers present on particular
>>>> QUP wrapper, the reset seen is due to earlycon usage after QUP core clock
>>>> is put to 0 from other SE drivers before real console comes up.
>>>>
>>>> As earlycon can't vote for it's QUP core need, to fix this add ICC
>>>> support to common/QUP wrapper driver and put vote for QUP core from
>>>> probe on behalf of earlycon and remove vote during sys suspend.
>>> Only removing the vote on suspend isn't ideal, the system might never get
>>> suspended. That said I don't have a really good alternative suggestion.
>>>
>>> One thing you could possibly do is to launch a delayed work, check
>>> console_device() every second or so and remove the vote when it returns
>>> non-NULL. Not claiming this would be a great solution ...
>>>
>>> The cleanest solution might be a notifier when the early console is
>>> unregistered, it seems somewhat over-engineered though ... Then again
>>> other (future) uart drivers with interconnect support might run into
>>> the same problem.
>> We are hitting this problem because QUP core clocks are shared among all the
>> SE driver present in particular QUP wrapper, if other HW controllers has
>> similar architecture we will hit this issue.
>>
>> How about if we expose an API from common driver(geni-se) for putting QUP
>> core BW vote to 0.
>>
>> We call this from console probe just after uart_add_one_port call (console
>> resources are enabled as part of this call) to put core quota to 0 on behalf
>> of earlyconsole?
>  From my notes from earlier debugging I have doubts this would work:
>
>    There is a short window where the early console and the 'real' console coexist:
>
>    [    3.858122] printk: console [ttyMSM0] enabled
>    [    3.875692] printk: bootconsole [qcom_geni0] disabled
>
>    The reset probably occurs when the early console tries to write, but the ICC
>    is effectively disabled because ttyMSM0 and the other geni ports are runtime
>    suspended.

Code flow from console driver probe(qcom_geni_serial.c)

uart_add_one_port--->uart_configure_port--->{ 1) uart_change_pm(enable 
console resources)  2)register_console(boot to real console switch 
happens here)}

Console resources are not disabled from anywhere before the switch 
happens completely. I meant to say until we saw below logs.

[    3.875692] printk: bootconsole [qcom_geni0] disabled

I think the board reset issue cannot occur during the window where early 
console and 'real' console coexist.

I have validated proposed solution by me, it is working fine.

Currently voting is done for every QUP and not only to which earlycon is 
connect, with the above approach we can't remove vote from other QUPs.

However we can limit voting only to earlycon QUP by removing 
interconnect from DT node of other QUPs.

I am not sure how clean is this solution.

>
>>>> Signed-off-by: Akash Asthana <akashast@codeaurora.org>
>>>> Reported-by: Matthias Kaehlcke <mka@chromium.org>
>>>> ---
>>>>    drivers/soc/qcom/qcom-geni-se.c | 41 +++++++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 41 insertions(+)
>>>>
>>>> diff --git a/drivers/soc/qcom/qcom-geni-se.c b/drivers/soc/qcom/qcom-geni-se.c
>>>> index 7d622ea..d244dfc 100644
>>>> --- a/drivers/soc/qcom/qcom-geni-se.c
>>>> +++ b/drivers/soc/qcom/qcom-geni-se.c
>>>> @@ -90,6 +90,7 @@ struct geni_wrapper {
>>>>    	struct device *dev;
>>>>    	void __iomem *base;
>>>>    	struct clk_bulk_data ahb_clks[NUM_AHB_CLKS];
>>>> +	struct icc_path *icc_path_geni_to_core;
>>>>    };
>>>>    #define QUP_HW_VER_REG			0x4
>>>> @@ -747,11 +748,50 @@ static int geni_se_probe(struct platform_device *pdev)
>>>>    		}
>>>>    	}
>>>> +#ifdef CONFIG_SERIAL_EARLYCON
>>>> +	wrapper->icc_path_geni_to_core = devm_of_icc_get(dev, "qup-core");
>>>> +	if (IS_ERR(wrapper->icc_path_geni_to_core))
>>>> +		return PTR_ERR(wrapper->icc_path_geni_to_core);
>>>> +	/*
>>>> +	 * Put minmal BW request on core clocks on behalf of early console.
>>>> +	 * The vote will be removed in suspend call.
>>>> +	 */
>>>> +	ret = icc_set_bw(wrapper->icc_path_geni_to_core, Bps_to_icc(1000),
>>>> +			Bps_to_icc(1000));
>>>> +	if (ret) {
>>>> +		dev_err(&pdev->dev, "%s: ICC BW voting failed for core\n",
>>>> +			__func__);
>>>> +		return ret;
>>>> +	}
>>> What is ugly about this is that it's done for every QUP, not only the one
>>> with the early console. Again, I don't have a good solution for it, maybe
>>> it's a limitation we have to live with :(
>> There is one more limitation from QUP core side. Core clocks for both the
>> QUP wrapper runs at same speed.
>>
>> core2x_1 = core2x_2 = max(core2x_1, core2x_2);
>>
>> So with above limitation and if we are removing early con vote from Core
>> when real console comes up. It doesn't matter whether it's done for every
>> QUP or the only with early console.
> it's still sorta ugly at an abstraction level, but it seem we have to be
> pragmatic here.

How about if we limit voting only to earlycon QUP by removing 
interconnect from DT node of other QUPs.?

>
>>>> +#endif
>>>> +
>>>>    	dev_set_drvdata(dev, wrapper);
>>>>    	dev_dbg(dev, "GENI SE Driver probed\n");
>>>>    	return devm_of_platform_populate(dev);
>>>>    }
>>>> +static int __maybe_unused geni_se_sys_suspend(struct device *dev)
>>>> +{
>>>> +	struct geni_wrapper *wrapper = dev_get_drvdata(dev);
>>>> +	int ret;
>>>> +
>>>> +#ifdef CONFIG_SERIAL_EARLYCON
>>>> +	ret = icc_set_bw(wrapper->icc_path_geni_to_core, 0, 0);
>>> I think you only want to do this on the first suspend.
>> Ok, I can add that logic using global variable.
>>> Do we need to handle the case where no 'real' console is configured?
>>> In this case the early console would be active forever and setting
>>> the bandwidths to 0 might cause a similar crash than the one you are
>>> trying to fix. Not sure if that's a real world use case, but wanted to
>>> mention it. Maybe this is an argument of the notifier approach?
>> We can't support earlycon without real console.
>>
>> As earlyconsole doesn't do any kind of resource enablement(SE clocks,
>> pinctrl, etc) it assumes that resources are already enabled from previous
>> stages.
>>
>> So if real console doesn't come up no one will vote for that SE clock, and
>> it will be disabled from clk late_init call which will result into
>> un-clocked access.
> Ok, IIUC what you are saying is that earlycon can't work on its own after geni
> initialization. Because it clearly can work before (otherwise what would be
> its purpose?), supposedly because the bootloader configures the necessary
> bits.

If there is no real console then earlyconsole should work until clk 
driver disables unvoted SE clocks.

Yes, early console depends on bootloader to enable neccessary resources 
for it. I guess currently Coreboot console driver is doing that.

>
> In any case the bottom line is that earlycon requires a real console to be
> configured and there is no need to handle the corner case I brought up.

Yes that is correct. We can't support earlyconsole for long without 
having a real console.


Thanks,

Akash
Akash Asthana March 18, 2020, 10:57 a.m. UTC | #7
Hi Evan

On 3/18/2020 12:38 AM, Evan Green wrote:
> On Tue, Mar 17, 2020 at 3:58 AM Akash Asthana <akashast@codeaurora.org> wrote:
>> Hi Matthias,
>>
>> On 3/14/2020 2:14 AM, Matthias Kaehlcke wrote:
>>> Hi Akash,
>>>
>>> On Fri, Mar 13, 2020 at 06:42:09PM +0530, Akash Asthana wrote:
>>>> V1 patch@https://patchwork.kernel.org/patch/11386469/ caused SC7180 system
>>>> to reset at boot time.
>>> The v1 patch isn't relevant in the commit message, please just describe the
>>> problem. Also the crash only occurs when earlycon is used.
>> ok
>>>> As QUP core clock is shared among all the SE drivers present on particular
>>>> QUP wrapper, the reset seen is due to earlycon usage after QUP core clock
>>>> is put to 0 from other SE drivers before real console comes up.
>>>>
>>>> As earlycon can't vote for it's QUP core need, to fix this add ICC
>>>> support to common/QUP wrapper driver and put vote for QUP core from
>>>> probe on behalf of earlycon and remove vote during sys suspend.
>>> Only removing the vote on suspend isn't ideal, the system might never get
>>> suspended. That said I don't have a really good alternative suggestion.
>>>
>>> One thing you could possibly do is to launch a delayed work, check
>>> console_device() every second or so and remove the vote when it returns
>>> non-NULL. Not claiming this would be a great solution ...
>>>
>>> The cleanest solution might be a notifier when the early console is
>>> unregistered, it seems somewhat over-engineered though ... Then again
>>> other (future) uart drivers with interconnect support might run into
>>> the same problem.
>> We are hitting this problem because QUP core clocks are shared among all
>> the SE driver present in particular QUP wrapper, if other HW controllers
>> has similar architecture we will hit this issue.
>>
>> How about if we expose an API from common driver(geni-se) for putting
>> QUP core BW vote to 0.
>>
>> We call this from console probe just after uart_add_one_port call
>> (console resources are enabled as part of this call) to put core quota
>> to 0 on behalf of earlyconsole?
> +Georgi
>
> Hm, these boot proxy votes are annoying, since the whole house of
> cards comes down if you replace these votes in the wrong order.
>
> I believe consensus in the other patches was to consolidate most of
> the interconnect support into the common SE code, right?

I think what Matthias suggested is to maintain ICC functions defined 
across I2C, SPI and UART as a library in common SE code.

Still every SE driver will interact with ICC framework individually 
rather than using common SE driver as a bridge.

>   Would that
> help you with these boot proxy votes? What I'm thinking is something
> along the lines of:
>   * SPI, I2C, UART all call into the new common geni_se_icc_on/off()
> (or whatever it's called)
>   * If geni_se_icc_off() sees that console UART hasn't voted yet, save
> the votes but don't actually call icc_set(0) now.
>   * Once uart votes for the first time, call icc_set() on all of SPI,
> I2C, UART to get things back in sync.

IIUC, you are suggesting to enhancing ICC 
design@https://patchwork.kernel.org/patch/10774897/ [The very first ICC 
patch posted during sdm845 timeframe].

Where common SE driver aggregate real time BW requirement from all the 
SE driver and put net request to ICC framework.

We received comments on that version of ICC to move voting to individual 
SE driver from common driver. Hence we updated the design accordingly.

Thanks for reviewing

regards,

Akash

> That's a sort of roll-your-own solution for GENI, but we do have this
> problem elsewhere as well. A more general solution would be to have
> the interconnect providers prop things up (ie ignore votes to lower
> bandwidth) until some "go" moment where we feel we've enumerated all
> devices. I was originally thinking to model this off of something like
> clk_disable_unused(), but after chatting with Stephen it's clear
> late_initcall's aren't really indicative of all devices having
> actually come up. So I'm not sure where the appropriate "go" moment
> is.
>
> -Evan
>
>
>>>> Signed-off-by: Akash Asthana <akashast@codeaurora.org>
>>>> Reported-by: Matthias Kaehlcke <mka@chromium.org>
>>>> ---
>>>>    drivers/soc/qcom/qcom-geni-se.c | 41 +++++++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 41 insertions(+)
>>>>
>>>> diff --git a/drivers/soc/qcom/qcom-geni-se.c b/drivers/soc/qcom/qcom-geni-se.c
>>>> index 7d622ea..d244dfc 100644
>>>> --- a/drivers/soc/qcom/qcom-geni-se.c
>>>> +++ b/drivers/soc/qcom/qcom-geni-se.c
>>>> @@ -90,6 +90,7 @@ struct geni_wrapper {
>>>>       struct device *dev;
>>>>       void __iomem *base;
>>>>       struct clk_bulk_data ahb_clks[NUM_AHB_CLKS];
>>>> +    struct icc_path *icc_path_geni_to_core;
>>>>    };
>>>>
>>>>    #define QUP_HW_VER_REG                     0x4
>>>> @@ -747,11 +748,50 @@ static int geni_se_probe(struct platform_device *pdev)
>>>>               }
>>>>       }
>>>>
>>>> +#ifdef CONFIG_SERIAL_EARLYCON
>>>> +    wrapper->icc_path_geni_to_core = devm_of_icc_get(dev, "qup-core");
>>>> +    if (IS_ERR(wrapper->icc_path_geni_to_core))
>>>> +            return PTR_ERR(wrapper->icc_path_geni_to_core);
>>>> +    /*
>>>> +     * Put minmal BW request on core clocks on behalf of early console.
>>>> +     * The vote will be removed in suspend call.
>>>> +     */
>>>> +    ret = icc_set_bw(wrapper->icc_path_geni_to_core, Bps_to_icc(1000),
>>>> +                    Bps_to_icc(1000));
>>>> +    if (ret) {
>>>> +            dev_err(&pdev->dev, "%s: ICC BW voting failed for core\n",
>>>> +                    __func__);
>>>> +            return ret;
>>>> +    }
>>> What is ugly about this is that it's done for every QUP, not only the one
>>> with the early console. Again, I don't have a good solution for it, maybe
>>> it's a limitation we have to live with :(
>> There is one more limitation from QUP core side. Core clocks for both
>> the QUP wrapper runs at same speed.
>>
>> core2x_1 = core2x_2 = max(core2x_1, core2x_2);
>>
>> So with above limitation and if we are removing early con vote from Core
>> when real console comes up. It doesn't matter whether it's done for
>> every QUP or the only with early console.
>>
>>>> +#endif
>>>> +
>>>>       dev_set_drvdata(dev, wrapper);
>>>>       dev_dbg(dev, "GENI SE Driver probed\n");
>>>>       return devm_of_platform_populate(dev);
>>>>    }
>>>>
>>>> +static int __maybe_unused geni_se_sys_suspend(struct device *dev)
>>>> +{
>>>> +    struct geni_wrapper *wrapper = dev_get_drvdata(dev);
>>>> +    int ret;
>>>> +
>>>> +#ifdef CONFIG_SERIAL_EARLYCON
>>>> +    ret = icc_set_bw(wrapper->icc_path_geni_to_core, 0, 0);
>>> I think you only want to do this on the first suspend.
>> Ok, I can add that logic using global variable.
>>> Do we need to handle the case where no 'real' console is configured?
>>> In this case the early console would be active forever and setting
>>> the bandwidths to 0 might cause a similar crash than the one you are
>>> trying to fix. Not sure if that's a real world use case, but wanted to
>>> mention it. Maybe this is an argument of the notifier approach?
>> We can't support earlycon without real console.
>>
>> As earlyconsole doesn't do any kind of resource enablement(SE clocks,
>> pinctrl, etc) it assumes that resources are already enabled from
>> previous stages.
>>
>> So if real console doesn't come up no one will vote for that SE clock,
>> and it will be disabled from clk late_init call which will result into
>> un-clocked access.
>>
>>
>>>> +    if (ret) {
>>>> +            dev_err(dev, "%s: ICC BW remove failed for core\n",
>>>> +                    __func__);
>>>> +            return ret;
>>> Aborting suspend seems too harsh since the QUP should still be fully
>>> functional unless there is a general problem with the interconnects.
>>>
>>> I would suggest to change the log to dev_warn() and return 0.
>> Ok
>>
>> Thanks for reviewing the patch.
>>
>> regards,
>>
>> Akash
>>
>> --
>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,\na Linux Foundation Collaborative Project
Evan Green March 18, 2020, 4:22 p.m. UTC | #8
On Wed, Mar 18, 2020 at 3:57 AM Akash Asthana <akashast@codeaurora.org> wrote:
>
> Hi Evan
>
> On 3/18/2020 12:38 AM, Evan Green wrote:
> > On Tue, Mar 17, 2020 at 3:58 AM Akash Asthana <akashast@codeaurora.org> wrote:
> >> Hi Matthias,
> >>
> >> On 3/14/2020 2:14 AM, Matthias Kaehlcke wrote:
> >>> Hi Akash,
> >>>
> >>> On Fri, Mar 13, 2020 at 06:42:09PM +0530, Akash Asthana wrote:
> >>>> V1 patch@https://patchwork.kernel.org/patch/11386469/ caused SC7180 system
> >>>> to reset at boot time.
> >>> The v1 patch isn't relevant in the commit message, please just describe the
> >>> problem. Also the crash only occurs when earlycon is used.
> >> ok
> >>>> As QUP core clock is shared among all the SE drivers present on particular
> >>>> QUP wrapper, the reset seen is due to earlycon usage after QUP core clock
> >>>> is put to 0 from other SE drivers before real console comes up.
> >>>>
> >>>> As earlycon can't vote for it's QUP core need, to fix this add ICC
> >>>> support to common/QUP wrapper driver and put vote for QUP core from
> >>>> probe on behalf of earlycon and remove vote during sys suspend.
> >>> Only removing the vote on suspend isn't ideal, the system might never get
> >>> suspended. That said I don't have a really good alternative suggestion.
> >>>
> >>> One thing you could possibly do is to launch a delayed work, check
> >>> console_device() every second or so and remove the vote when it returns
> >>> non-NULL. Not claiming this would be a great solution ...
> >>>
> >>> The cleanest solution might be a notifier when the early console is
> >>> unregistered, it seems somewhat over-engineered though ... Then again
> >>> other (future) uart drivers with interconnect support might run into
> >>> the same problem.
> >> We are hitting this problem because QUP core clocks are shared among all
> >> the SE driver present in particular QUP wrapper, if other HW controllers
> >> has similar architecture we will hit this issue.
> >>
> >> How about if we expose an API from common driver(geni-se) for putting
> >> QUP core BW vote to 0.
> >>
> >> We call this from console probe just after uart_add_one_port call
> >> (console resources are enabled as part of this call) to put core quota
> >> to 0 on behalf of earlyconsole?
> > +Georgi
> >
> > Hm, these boot proxy votes are annoying, since the whole house of
> > cards comes down if you replace these votes in the wrong order.
> >
> > I believe consensus in the other patches was to consolidate most of
> > the interconnect support into the common SE code, right?
>
> I think what Matthias suggested is to maintain ICC functions defined
> across I2C, SPI and UART as a library in common SE code.
>
> Still every SE driver will interact with ICC framework individually
> rather than using common SE driver as a bridge.

Right, I'm sort of proposing a blend here, where the individual
drivers pass through the SE library, which looks at some shared state,
and may defer sending the votes during boot time. I was thinking
consolidating this into SE engine library code may make it easier for
you to peek at that shared state.

>
> >   Would that
> > help you with these boot proxy votes? What I'm thinking is something
> > along the lines of:
> >   * SPI, I2C, UART all call into the new common geni_se_icc_on/off()
> > (or whatever it's called)
> >   * If geni_se_icc_off() sees that console UART hasn't voted yet, save
> > the votes but don't actually call icc_set(0) now.
> >   * Once uart votes for the first time, call icc_set() on all of SPI,
> > I2C, UART to get things back in sync.
>
> IIUC, you are suggesting to enhancing ICC
> design@https://patchwork.kernel.org/patch/10774897/ [The very first ICC
> patch posted during sdm845 timeframe].
>
> Where common SE driver aggregate real time BW requirement from all the
> SE driver and put net request to ICC framework.
>
> We received comments on that version of ICC to move voting to individual
> SE driver from common driver. Hence we updated the design accordingly.

I think most of the reaction to that original series came from the
fact that the common SE code was doing aggregation work, which is
something the interconnect core was designed to do. In the solution
I'm proposing, the SE library either passes through votes as-is, or
delays them until the console UART has voted, at which time it passes
them all down as they were.

You could still make the case this is something the interconnect core
should help us with, which is why I was brainstorming about the
provider propping up votes until some probe-finished deadline, maybe
just a 30 second timer :)
-Evan
Matthias Kaehlcke March 19, 2020, 7:43 p.m. UTC | #9
On Wed, Mar 18, 2020 at 02:24:35PM +0530, Akash Asthana wrote:
> Hi Matthias,
> 
> On 3/17/2020 11:59 PM, Matthias Kaehlcke wrote:
> > Hi Akash,
> > 
> > On Tue, Mar 17, 2020 at 04:27:47PM +0530, Akash Asthana wrote:
> > > Hi Matthias,
> > > 
> > > On 3/14/2020 2:14 AM, Matthias Kaehlcke wrote:
> > > > Hi Akash,
> > > > 
> > > > On Fri, Mar 13, 2020 at 06:42:09PM +0530, Akash Asthana wrote:
> > > > > V1 patch@https://patchwork.kernel.org/patch/11386469/ caused SC7180 system
> > > > > to reset at boot time.
> > > > The v1 patch isn't relevant in the commit message, please just describe the
> > > > problem. Also the crash only occurs when earlycon is used.
> > > ok
> > > > > As QUP core clock is shared among all the SE drivers present on particular
> > > > > QUP wrapper, the reset seen is due to earlycon usage after QUP core clock
> > > > > is put to 0 from other SE drivers before real console comes up.
> > > > > 
> > > > > As earlycon can't vote for it's QUP core need, to fix this add ICC
> > > > > support to common/QUP wrapper driver and put vote for QUP core from
> > > > > probe on behalf of earlycon and remove vote during sys suspend.
> > > > Only removing the vote on suspend isn't ideal, the system might never get
> > > > suspended. That said I don't have a really good alternative suggestion.
> > > > 
> > > > One thing you could possibly do is to launch a delayed work, check
> > > > console_device() every second or so and remove the vote when it returns
> > > > non-NULL. Not claiming this would be a great solution ...
> > > > 
> > > > The cleanest solution might be a notifier when the early console is
> > > > unregistered, it seems somewhat over-engineered though ... Then again
> > > > other (future) uart drivers with interconnect support might run into
> > > > the same problem.
> > > We are hitting this problem because QUP core clocks are shared among all the
> > > SE driver present in particular QUP wrapper, if other HW controllers has
> > > similar architecture we will hit this issue.
> > > 
> > > How about if we expose an API from common driver(geni-se) for putting QUP
> > > core BW vote to 0.
> > > 
> > > We call this from console probe just after uart_add_one_port call (console
> > > resources are enabled as part of this call) to put core quota to 0 on behalf
> > > of earlyconsole?
> >  From my notes from earlier debugging I have doubts this would work:
> > 
> >    There is a short window where the early console and the 'real' console coexist:
> > 
> >    [    3.858122] printk: console [ttyMSM0] enabled
> >    [    3.875692] printk: bootconsole [qcom_geni0] disabled
> > 
> >    The reset probably occurs when the early console tries to write, but the ICC
> >    is effectively disabled because ttyMSM0 and the other geni ports are runtime
> >    suspended.
> 
> Code flow from console driver probe(qcom_geni_serial.c)
> 
> uart_add_one_port--->uart_configure_port--->{ 1) uart_change_pm(enable
> console resources)  2)register_console(boot to real console switch happens
> here)}
> 
> Console resources are not disabled from anywhere before the switch happens
> completely. I meant to say until we saw below logs.
> 
> [    3.875692] printk: bootconsole [qcom_geni0] disabled
> 
> I think the board reset issue cannot occur during the window where early
> console and 'real' console coexist.

Thanks for the clarification! Indeed my notes were only a hypothesis, I
don't see evidence that there is an actual downvote shortly after console
registration.

> I have validated proposed solution by me, it is working fine.
> 
> Currently voting is done for every QUP and not only to which earlycon is
> connect, with the above approach we can't remove vote from other QUPs.
> 
> However we can limit voting only to earlycon QUP by removing interconnect
> from DT node of other QUPs.
> 
> I am not sure how clean is this solution.

I'm more inclined towards a solution along the lines of what Evan
proposed, i.e. delaying the votes (either in geni or ICC) until we
are ready.
Akash Asthana March 20, 2020, 10:22 a.m. UTC | #10
Hi Evan, Matthias,

On 3/20/2020 1:13 AM, Matthias Kaehlcke wrote:
> On Wed, Mar 18, 2020 at 02:24:35PM +0530, Akash Asthana wrote:
>> Hi Matthias,
>>
>> On 3/17/2020 11:59 PM, Matthias Kaehlcke wrote:
>>> Hi Akash,
>>>
>>> On Tue, Mar 17, 2020 at 04:27:47PM +0530, Akash Asthana wrote:
>>>> Hi Matthias,
>>>>
>>>> On 3/14/2020 2:14 AM, Matthias Kaehlcke wrote:
>>>>> Hi Akash,
>>>>>
>>>>> On Fri, Mar 13, 2020 at 06:42:09PM +0530, Akash Asthana wrote:
>>>>>> V1 patch@https://patchwork.kernel.org/patch/11386469/ caused SC7180 system
>>>>>> to reset at boot time.
>>>>> The v1 patch isn't relevant in the commit message, please just describe the
>>>>> problem. Also the crash only occurs when earlycon is used.
>>>> ok
>>>>>> As QUP core clock is shared among all the SE drivers present on particular
>>>>>> QUP wrapper, the reset seen is due to earlycon usage after QUP core clock
>>>>>> is put to 0 from other SE drivers before real console comes up.
>>>>>>
>>>>>> As earlycon can't vote for it's QUP core need, to fix this add ICC
>>>>>> support to common/QUP wrapper driver and put vote for QUP core from
>>>>>> probe on behalf of earlycon and remove vote during sys suspend.
>>>>> Only removing the vote on suspend isn't ideal, the system might never get
>>>>> suspended. That said I don't have a really good alternative suggestion.
>>>>>
>>>>> One thing you could possibly do is to launch a delayed work, check
>>>>> console_device() every second or so and remove the vote when it returns
>>>>> non-NULL. Not claiming this would be a great solution ...
>>>>>
>>>>> The cleanest solution might be a notifier when the early console is
>>>>> unregistered, it seems somewhat over-engineered though ... Then again
>>>>> other (future) uart drivers with interconnect support might run into
>>>>> the same problem.
>>>> We are hitting this problem because QUP core clocks are shared among all the
>>>> SE driver present in particular QUP wrapper, if other HW controllers has
>>>> similar architecture we will hit this issue.
>>>>
>>>> How about if we expose an API from common driver(geni-se) for putting QUP
>>>> core BW vote to 0.
>>>>
>>>> We call this from console probe just after uart_add_one_port call (console
>>>> resources are enabled as part of this call) to put core quota to 0 on behalf
>>>> of earlyconsole?
>>>   From my notes from earlier debugging I have doubts this would work:
>>>
>>>     There is a short window where the early console and the 'real' console coexist:
>>>
>>>     [    3.858122] printk: console [ttyMSM0] enabled
>>>     [    3.875692] printk: bootconsole [qcom_geni0] disabled
>>>
>>>     The reset probably occurs when the early console tries to write, but the ICC
>>>     is effectively disabled because ttyMSM0 and the other geni ports are runtime
>>>     suspended.
>> Code flow from console driver probe(qcom_geni_serial.c)
>>
>> uart_add_one_port--->uart_configure_port--->{ 1) uart_change_pm(enable
>> console resources)  2)register_console(boot to real console switch happens
>> here)}
>>
>> Console resources are not disabled from anywhere before the switch happens
>> completely. I meant to say until we saw below logs.
>>
>> [    3.875692] printk: bootconsole [qcom_geni0] disabled
>>
>> I think the board reset issue cannot occur during the window where early
>> console and 'real' console coexist.
> Thanks for the clarification! Indeed my notes were only a hypothesis, I
> don't see evidence that there is an actual downvote shortly after console
> registration.
>
>> I have validated proposed solution by me, it is working fine.
>>
>> Currently voting is done for every QUP and not only to which earlycon is
>> connect, with the above approach we can't remove vote from other QUPs.
>>
>> However we can limit voting only to earlycon QUP by removing interconnect
>> from DT node of other QUPs.
>>
>> I am not sure how clean is this solution.
> I'm more inclined towards a solution along the lines of what Evan
> proposed, i.e. delaying the votes (either in geni or ICC) until we
> are ready.

Based on discussion I think the delayed solution is most suited if 
implemented in ICC core because other ICC client might face the similar 
problem.

However for geni case I am more inclined towards below proposed solution.

-----------------------------------------------------------------------------------------------------

How about if we expose an API from common driver(geni-se) for putting QUP
core BW vote to 0.

We call this from console probe just after uart_add_one_port call (console
resources are enabled as part of this call) to put core quota to 0 on behalf
of earlyconsole?
---------------------------------------------------------------------------

I think below are the pros and cons for above solution.

Pros:

1) Not only it'll solve the current issue also it'll lay foundation to 
support earlyconsole without having a real console. In future if needed 
we can enable rest of the earlyconsole resources(SE clocks) from common 
driver to make it independent of real console.

2) Inorder to solve bug:120934049(Add runtime PM support to common 
driver and move enablement/disablement of AHB clocks there) reported by 
Stephen Boyd, we'll end up adding ICC support to this driver because as 
per HPG the order of enabling the QUP clocks should be 
(Core-->AHB-->SE(per engine)). So we need to have the minimal possible 
vote on QUP core before enabling AHB clocks.

I agree still we have keep earlycon fix as it is(vote from common driver 
probe and remove once real console is up) .

I mentioned above bug in case it seems silly to add ICC support to 
common driver just for earlyconsole fix.

Cons:

1) As Evan mentioned and I agree that the whole house of cards comes 
down if we replace these votes in the wrong order. But I think this is 
very unlikely to happen once fixed.

2) As Matthias mentioned we are voting for all the QUPs regardless 
earlycon is present or not, I think this is HW limitation and we have to 
live with it.

Even if I try something like below the QUP1 can come up before 
QUP2(console) and child of QUP1 can remove vote from share QUP clocks.

if (of_get_compatible_child(pdev->dev.of_node, "qcom,geni-debug-uart")) {

     ret = icc_set_bw(wrapper->icc_path_geni_to_core, Bps_to_icc(1000),

         Bps_to_icc(1000));

}


Thanks,

Akash
Evan Green March 20, 2020, 4:30 p.m. UTC | #11
On Fri, Mar 20, 2020 at 3:22 AM Akash Asthana <akashast@codeaurora.org> wrote:
>
> Hi Evan, Matthias,
>
> On 3/20/2020 1:13 AM, Matthias Kaehlcke wrote:
> > On Wed, Mar 18, 2020 at 02:24:35PM +0530, Akash Asthana wrote:
> >> Hi Matthias,
> >>
> >> On 3/17/2020 11:59 PM, Matthias Kaehlcke wrote:
> >>> Hi Akash,
> >>>
> >>> On Tue, Mar 17, 2020 at 04:27:47PM +0530, Akash Asthana wrote:
> >>>> Hi Matthias,
> >>>>
> >>>> On 3/14/2020 2:14 AM, Matthias Kaehlcke wrote:
> >>>>> Hi Akash,
> >>>>>
> >>>>> On Fri, Mar 13, 2020 at 06:42:09PM +0530, Akash Asthana wrote:
> >>>>>> V1 patch@https://patchwork.kernel.org/patch/11386469/ caused SC7180 system
> >>>>>> to reset at boot time.
> >>>>> The v1 patch isn't relevant in the commit message, please just describe the
> >>>>> problem. Also the crash only occurs when earlycon is used.
> >>>> ok
> >>>>>> As QUP core clock is shared among all the SE drivers present on particular
> >>>>>> QUP wrapper, the reset seen is due to earlycon usage after QUP core clock
> >>>>>> is put to 0 from other SE drivers before real console comes up.
> >>>>>>
> >>>>>> As earlycon can't vote for it's QUP core need, to fix this add ICC
> >>>>>> support to common/QUP wrapper driver and put vote for QUP core from
> >>>>>> probe on behalf of earlycon and remove vote during sys suspend.
> >>>>> Only removing the vote on suspend isn't ideal, the system might never get
> >>>>> suspended. That said I don't have a really good alternative suggestion.
> >>>>>
> >>>>> One thing you could possibly do is to launch a delayed work, check
> >>>>> console_device() every second or so and remove the vote when it returns
> >>>>> non-NULL. Not claiming this would be a great solution ...
> >>>>>
> >>>>> The cleanest solution might be a notifier when the early console is
> >>>>> unregistered, it seems somewhat over-engineered though ... Then again
> >>>>> other (future) uart drivers with interconnect support might run into
> >>>>> the same problem.
> >>>> We are hitting this problem because QUP core clocks are shared among all the
> >>>> SE driver present in particular QUP wrapper, if other HW controllers has
> >>>> similar architecture we will hit this issue.
> >>>>
> >>>> How about if we expose an API from common driver(geni-se) for putting QUP
> >>>> core BW vote to 0.
> >>>>
> >>>> We call this from console probe just after uart_add_one_port call (console
> >>>> resources are enabled as part of this call) to put core quota to 0 on behalf
> >>>> of earlyconsole?
> >>>   From my notes from earlier debugging I have doubts this would work:
> >>>
> >>>     There is a short window where the early console and the 'real' console coexist:
> >>>
> >>>     [    3.858122] printk: console [ttyMSM0] enabled
> >>>     [    3.875692] printk: bootconsole [qcom_geni0] disabled
> >>>
> >>>     The reset probably occurs when the early console tries to write, but the ICC
> >>>     is effectively disabled because ttyMSM0 and the other geni ports are runtime
> >>>     suspended.
> >> Code flow from console driver probe(qcom_geni_serial.c)
> >>
> >> uart_add_one_port--->uart_configure_port--->{ 1) uart_change_pm(enable
> >> console resources)  2)register_console(boot to real console switch happens
> >> here)}
> >>
> >> Console resources are not disabled from anywhere before the switch happens
> >> completely. I meant to say until we saw below logs.
> >>
> >> [    3.875692] printk: bootconsole [qcom_geni0] disabled
> >>
> >> I think the board reset issue cannot occur during the window where early
> >> console and 'real' console coexist.
> > Thanks for the clarification! Indeed my notes were only a hypothesis, I
> > don't see evidence that there is an actual downvote shortly after console
> > registration.
> >
> >> I have validated proposed solution by me, it is working fine.
> >>
> >> Currently voting is done for every QUP and not only to which earlycon is
> >> connect, with the above approach we can't remove vote from other QUPs.
> >>
> >> However we can limit voting only to earlycon QUP by removing interconnect
> >> from DT node of other QUPs.
> >>
> >> I am not sure how clean is this solution.
> > I'm more inclined towards a solution along the lines of what Evan
> > proposed, i.e. delaying the votes (either in geni or ICC) until we
> > are ready.
>
> Based on discussion I think the delayed solution is most suited if
> implemented in ICC core because other ICC client might face the similar
> problem.
>
> However for geni case I am more inclined towards below proposed solution.
>
> -----------------------------------------------------------------------------------------------------
>
> How about if we expose an API from common driver(geni-se) for putting QUP
> core BW vote to 0.
>
> We call this from console probe just after uart_add_one_port call (console
> resources are enabled as part of this call) to put core quota to 0 on behalf
> of earlyconsole?

This seems ok to me. Earlycon sets up a vote, and then real probe
tears it down. As long as in the shuffle of all of these things into
SE library helpers you still have a way of differentiating the
earlycon vote from the real vote. In other words, don't reuse this
early icc_path for the real UART vote. You should probably also
destroy the path once you've voted zero on it.
-Evan
Akash Asthana March 27, 2020, 5:04 a.m. UTC | #12
Hi Evan,

On 3/20/2020 10:00 PM, Evan Green wrote:
> On Fri, Mar 20, 2020 at 3:22 AM Akash Asthana <akashast@codeaurora.org> wrote:
>> Hi Evan, Matthias,
>>
>> On 3/20/2020 1:13 AM, Matthias Kaehlcke wrote:
>>> On Wed, Mar 18, 2020 at 02:24:35PM +0530, Akash Asthana wrote:
>>>> Hi Matthias,
>>>>
>>>> On 3/17/2020 11:59 PM, Matthias Kaehlcke wrote:
>>>>> Hi Akash,
>>>>>
>>>>> On Tue, Mar 17, 2020 at 04:27:47PM +0530, Akash Asthana wrote:
>>>>>> Hi Matthias,
>>>>>>
>>>>>> On 3/14/2020 2:14 AM, Matthias Kaehlcke wrote:
>>>>>>> Hi Akash,
>>>>>>>
>>>>>>> On Fri, Mar 13, 2020 at 06:42:09PM +0530, Akash Asthana wrote:
>>>>>>>> V1 patch@https://patchwork.kernel.org/patch/11386469/ caused SC7180 system
>>>>>>>> to reset at boot time.
>>>>>>> The v1 patch isn't relevant in the commit message, please just describe the
>>>>>>> problem. Also the crash only occurs when earlycon is used.
>>>>>> ok
>>>>>>>> As QUP core clock is shared among all the SE drivers present on particular
>>>>>>>> QUP wrapper, the reset seen is due to earlycon usage after QUP core clock
>>>>>>>> is put to 0 from other SE drivers before real console comes up.
>>>>>>>>
>>>>>>>> As earlycon can't vote for it's QUP core need, to fix this add ICC
>>>>>>>> support to common/QUP wrapper driver and put vote for QUP core from
>>>>>>>> probe on behalf of earlycon and remove vote during sys suspend.
>>>>>>> Only removing the vote on suspend isn't ideal, the system might never get
>>>>>>> suspended. That said I don't have a really good alternative suggestion.
>>>>>>>
>>>>>>> One thing you could possibly do is to launch a delayed work, check
>>>>>>> console_device() every second or so and remove the vote when it returns
>>>>>>> non-NULL. Not claiming this would be a great solution ...
>>>>>>>
>>>>>>> The cleanest solution might be a notifier when the early console is
>>>>>>> unregistered, it seems somewhat over-engineered though ... Then again
>>>>>>> other (future) uart drivers with interconnect support might run into
>>>>>>> the same problem.
>>>>>> We are hitting this problem because QUP core clocks are shared among all the
>>>>>> SE driver present in particular QUP wrapper, if other HW controllers has
>>>>>> similar architecture we will hit this issue.
>>>>>>
>>>>>> How about if we expose an API from common driver(geni-se) for putting QUP
>>>>>> core BW vote to 0.
>>>>>>
>>>>>> We call this from console probe just after uart_add_one_port call (console
>>>>>> resources are enabled as part of this call) to put core quota to 0 on behalf
>>>>>> of earlyconsole?
>>>>>    From my notes from earlier debugging I have doubts this would work:
>>>>>
>>>>>      There is a short window where the early console and the 'real' console coexist:
>>>>>
>>>>>      [    3.858122] printk: console [ttyMSM0] enabled
>>>>>      [    3.875692] printk: bootconsole [qcom_geni0] disabled
>>>>>
>>>>>      The reset probably occurs when the early console tries to write, but the ICC
>>>>>      is effectively disabled because ttyMSM0 and the other geni ports are runtime
>>>>>      suspended.
>>>> Code flow from console driver probe(qcom_geni_serial.c)
>>>>
>>>> uart_add_one_port--->uart_configure_port--->{ 1) uart_change_pm(enable
>>>> console resources)  2)register_console(boot to real console switch happens
>>>> here)}
>>>>
>>>> Console resources are not disabled from anywhere before the switch happens
>>>> completely. I meant to say until we saw below logs.
>>>>
>>>> [    3.875692] printk: bootconsole [qcom_geni0] disabled
>>>>
>>>> I think the board reset issue cannot occur during the window where early
>>>> console and 'real' console coexist.
>>> Thanks for the clarification! Indeed my notes were only a hypothesis, I
>>> don't see evidence that there is an actual downvote shortly after console
>>> registration.
>>>
>>>> I have validated proposed solution by me, it is working fine.
>>>>
>>>> Currently voting is done for every QUP and not only to which earlycon is
>>>> connect, with the above approach we can't remove vote from other QUPs.
>>>>
>>>> However we can limit voting only to earlycon QUP by removing interconnect
>>>> from DT node of other QUPs.
>>>>
>>>> I am not sure how clean is this solution.
>>> I'm more inclined towards a solution along the lines of what Evan
>>> proposed, i.e. delaying the votes (either in geni or ICC) until we
>>> are ready.
>> Based on discussion I think the delayed solution is most suited if
>> implemented in ICC core because other ICC client might face the similar
>> problem.
>>
>> However for geni case I am more inclined towards below proposed solution.
>>
>> -----------------------------------------------------------------------------------------------------
>>
>> How about if we expose an API from common driver(geni-se) for putting QUP
>> core BW vote to 0.
>>
>> We call this from console probe just after uart_add_one_port call (console
>> resources are enabled as part of this call) to put core quota to 0 on behalf
>> of earlyconsole?
> This seems ok to me. Earlycon sets up a vote, and then real probe
> tears it down. As long as in the shuffle of all of these things into
> SE library helpers you still have a way of differentiating the
> earlycon vote from the real vote. In other words, don't reuse this
> early icc_path for the real UART vote. You should probably also
> destroy the path once you've voted zero on it.
> -Evan

Thanks for confirming, I will not use early icc_path for real console 
and I will destroy it once voted to 0 from real console probe.

Regards,

Akash
Bjorn Andersson March 27, 2020, 11:23 p.m. UTC | #13
On Fri 20 Mar 09:30 PDT 2020, Evan Green wrote:

> On Fri, Mar 20, 2020 at 3:22 AM Akash Asthana <akashast@codeaurora.org> wrote:
> >
> > Hi Evan, Matthias,
> >
> > On 3/20/2020 1:13 AM, Matthias Kaehlcke wrote:
> > > On Wed, Mar 18, 2020 at 02:24:35PM +0530, Akash Asthana wrote:
> > >> Hi Matthias,
> > >>
> > >> On 3/17/2020 11:59 PM, Matthias Kaehlcke wrote:
> > >>> Hi Akash,
> > >>>
> > >>> On Tue, Mar 17, 2020 at 04:27:47PM +0530, Akash Asthana wrote:
> > >>>> Hi Matthias,
> > >>>>
> > >>>> On 3/14/2020 2:14 AM, Matthias Kaehlcke wrote:
> > >>>>> Hi Akash,
> > >>>>>
> > >>>>> On Fri, Mar 13, 2020 at 06:42:09PM +0530, Akash Asthana wrote:
> > >>>>>> V1 patch@https://patchwork.kernel.org/patch/11386469/ caused SC7180 system
> > >>>>>> to reset at boot time.
> > >>>>> The v1 patch isn't relevant in the commit message, please just describe the
> > >>>>> problem. Also the crash only occurs when earlycon is used.
> > >>>> ok
> > >>>>>> As QUP core clock is shared among all the SE drivers present on particular
> > >>>>>> QUP wrapper, the reset seen is due to earlycon usage after QUP core clock
> > >>>>>> is put to 0 from other SE drivers before real console comes up.
> > >>>>>>
> > >>>>>> As earlycon can't vote for it's QUP core need, to fix this add ICC
> > >>>>>> support to common/QUP wrapper driver and put vote for QUP core from
> > >>>>>> probe on behalf of earlycon and remove vote during sys suspend.
> > >>>>> Only removing the vote on suspend isn't ideal, the system might never get
> > >>>>> suspended. That said I don't have a really good alternative suggestion.
> > >>>>>
> > >>>>> One thing you could possibly do is to launch a delayed work, check
> > >>>>> console_device() every second or so and remove the vote when it returns
> > >>>>> non-NULL. Not claiming this would be a great solution ...
> > >>>>>
> > >>>>> The cleanest solution might be a notifier when the early console is
> > >>>>> unregistered, it seems somewhat over-engineered though ... Then again
> > >>>>> other (future) uart drivers with interconnect support might run into
> > >>>>> the same problem.
> > >>>> We are hitting this problem because QUP core clocks are shared among all the
> > >>>> SE driver present in particular QUP wrapper, if other HW controllers has
> > >>>> similar architecture we will hit this issue.
> > >>>>
> > >>>> How about if we expose an API from common driver(geni-se) for putting QUP
> > >>>> core BW vote to 0.
> > >>>>
> > >>>> We call this from console probe just after uart_add_one_port call (console
> > >>>> resources are enabled as part of this call) to put core quota to 0 on behalf
> > >>>> of earlyconsole?
> > >>>   From my notes from earlier debugging I have doubts this would work:
> > >>>
> > >>>     There is a short window where the early console and the 'real' console coexist:
> > >>>
> > >>>     [    3.858122] printk: console [ttyMSM0] enabled
> > >>>     [    3.875692] printk: bootconsole [qcom_geni0] disabled
> > >>>
> > >>>     The reset probably occurs when the early console tries to write, but the ICC
> > >>>     is effectively disabled because ttyMSM0 and the other geni ports are runtime
> > >>>     suspended.
> > >> Code flow from console driver probe(qcom_geni_serial.c)
> > >>
> > >> uart_add_one_port--->uart_configure_port--->{ 1) uart_change_pm(enable
> > >> console resources)  2)register_console(boot to real console switch happens
> > >> here)}
> > >>
> > >> Console resources are not disabled from anywhere before the switch happens
> > >> completely. I meant to say until we saw below logs.
> > >>
> > >> [    3.875692] printk: bootconsole [qcom_geni0] disabled
> > >>
> > >> I think the board reset issue cannot occur during the window where early
> > >> console and 'real' console coexist.
> > > Thanks for the clarification! Indeed my notes were only a hypothesis, I
> > > don't see evidence that there is an actual downvote shortly after console
> > > registration.
> > >
> > >> I have validated proposed solution by me, it is working fine.
> > >>
> > >> Currently voting is done for every QUP and not only to which earlycon is
> > >> connect, with the above approach we can't remove vote from other QUPs.
> > >>
> > >> However we can limit voting only to earlycon QUP by removing interconnect
> > >> from DT node of other QUPs.
> > >>
> > >> I am not sure how clean is this solution.
> > > I'm more inclined towards a solution along the lines of what Evan
> > > proposed, i.e. delaying the votes (either in geni or ICC) until we
> > > are ready.
> >
> > Based on discussion I think the delayed solution is most suited if
> > implemented in ICC core because other ICC client might face the similar
> > problem.
> >
> > However for geni case I am more inclined towards below proposed solution.
> >
> > -----------------------------------------------------------------------------------------------------
> >
> > How about if we expose an API from common driver(geni-se) for putting QUP
> > core BW vote to 0.
> >
> > We call this from console probe just after uart_add_one_port call (console
> > resources are enabled as part of this call) to put core quota to 0 on behalf
> > of earlyconsole?
> 
> This seems ok to me. Earlycon sets up a vote, and then real probe
> tears it down. As long as in the shuffle of all of these things into
> SE library helpers you still have a way of differentiating the
> earlycon vote from the real vote.

Note though that the boot console will outlive the real console when the
kernel is booted with "keep_bootcon" on the command line (something I do
from time to time).

So rather than relying on "real probe" to signal when we can release the
earlycon's icc vote I think we should specify dev->con->exit in
qcom_geni_serial_earlycon_setup(), so that it will signal when the
earlycon actually goes away - and until that point the clocks should
just be on.

> In other words, don't reuse this
> early icc_path for the real UART vote. You should probably also
> destroy the path once you've voted zero on it.

Maintaining the early and real votes completely separate sounds like a
sure way to keep this sane; and there's no drawback on having multiple
votes for the same thing and rely on the framework to keep track of the
various users.

Regards,
Bjorn
Akash Asthana March 31, 2020, 10:55 a.m. UTC | #14
Hi Bjorn,

On 3/28/2020 4:53 AM, Bjorn Andersson wrote:
> Note though that the boot console will outlive the real console when the
> kernel is booted with "keep_bootcon" on the command line (something I do
> from time to time).
>
> So rather than relying on "real probe" to signal when we can release the
> earlycon's icc vote I think we should specify dev->con->exit in
> qcom_geni_serial_early

Okay,

Thanks,

Akash

Patch
diff mbox series

diff --git a/drivers/soc/qcom/qcom-geni-se.c b/drivers/soc/qcom/qcom-geni-se.c
index 7d622ea..d244dfc 100644
--- a/drivers/soc/qcom/qcom-geni-se.c
+++ b/drivers/soc/qcom/qcom-geni-se.c
@@ -90,6 +90,7 @@  struct geni_wrapper {
 	struct device *dev;
 	void __iomem *base;
 	struct clk_bulk_data ahb_clks[NUM_AHB_CLKS];
+	struct icc_path *icc_path_geni_to_core;
 };
 
 #define QUP_HW_VER_REG			0x4
@@ -747,11 +748,50 @@  static int geni_se_probe(struct platform_device *pdev)
 		}
 	}
 
+#ifdef CONFIG_SERIAL_EARLYCON
+	wrapper->icc_path_geni_to_core = devm_of_icc_get(dev, "qup-core");
+	if (IS_ERR(wrapper->icc_path_geni_to_core))
+		return PTR_ERR(wrapper->icc_path_geni_to_core);
+	/*
+	 * Put minmal BW request on core clocks on behalf of early console.
+	 * The vote will be removed in suspend call.
+	 */
+	ret = icc_set_bw(wrapper->icc_path_geni_to_core, Bps_to_icc(1000),
+			Bps_to_icc(1000));
+	if (ret) {
+		dev_err(&pdev->dev, "%s: ICC BW voting failed for core\n",
+			__func__);
+		return ret;
+	}
+#endif
+
 	dev_set_drvdata(dev, wrapper);
 	dev_dbg(dev, "GENI SE Driver probed\n");
 	return devm_of_platform_populate(dev);
 }
 
+static int __maybe_unused geni_se_sys_suspend(struct device *dev)
+{
+	struct geni_wrapper *wrapper = dev_get_drvdata(dev);
+	int ret;
+
+#ifdef CONFIG_SERIAL_EARLYCON
+	ret = icc_set_bw(wrapper->icc_path_geni_to_core, 0, 0);
+	if (ret) {
+		dev_err(dev, "%s: ICC BW remove failed for core\n",
+			__func__);
+		return ret;
+	}
+#endif
+
+	return 0;
+}
+
+static const struct dev_pm_ops geni_se_pm_ops = {
+	SET_SYSTEM_SLEEP_PM_OPS(geni_se_sys_suspend,
+				NULL)
+};
+
 static const struct of_device_id geni_se_dt_match[] = {
 	{ .compatible = "qcom,geni-se-qup", },
 	{}
@@ -762,6 +802,7 @@  static struct platform_driver geni_se_driver = {
 	.driver = {
 		.name = "geni_se_qup",
 		.of_match_table = geni_se_dt_match,
+		.pm = &geni_se_pm_ops,
 	},
 	.probe = geni_se_probe,
 };