diff mbox series

phy: qcom-qmp: Correct READY_STATUS poll break condition

Message ID 20190604232443.3417-1-bjorn.andersson@linaro.org (mailing list archive)
State Accepted
Commit 885bd765963b42c380db442db7f1c0f2a26076fa
Headers show
Series phy: qcom-qmp: Correct READY_STATUS poll break condition | expand

Commit Message

Bjorn Andersson June 4, 2019, 11:24 p.m. UTC
After issuing a PHY_START request to the QMP, the hardware documentation
states that the software should wait for the PCS_READY_STATUS to become
1.

With the introduction of c9b589791fc1 ("phy: qcom: Utilize UFS reset
controller") an additional 1ms delay was introduced between the start
request and the check of the status bit. This greatly increases the
chances for the hardware to actually becoming ready before the status
bit is read.

The result can be seen in that UFS PHY enabling is now reported as a
failure in 10% of the boots on SDM845, which is a clear regression from
the previous rare/occasional failure.

This patch fixes the "break condition" of the poll to check for the
correct state of the status bit.

Unfortunately PCIe on 8996 and 8998 does not specify the mask_pcs_ready
register, which means that the code checks a bit that's always 0. So the
patch also fixes these, in order to not regress these targets.

Cc: stable@vger.kernel.org
Cc: Evan Green <evgreen@chromium.org>
Cc: Marc Gonzalez <marc.w.gonzalez@free.fr>
Cc: Vivek Gautam <vivek.gautam@codeaurora.org>
Fixes: 73d7ec899bd8 ("phy: qcom-qmp: Add msm8998 PCIe QMP PHY support")
Fixes: e78f3d15e115 ("phy: qcom-qmp: new qmp phy driver for qcom-chipsets")
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
---

@Kishon, this is a regression spotted in v5.2-rc1, so please consider applying
this towards v5.2.

 drivers/phy/qualcomm/phy-qcom-qmp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Evan Green June 4, 2019, 11:35 p.m. UTC | #1
On Tue, Jun 4, 2019 at 4:24 PM Bjorn Andersson
<bjorn.andersson@linaro.org> wrote:
>
> After issuing a PHY_START request to the QMP, the hardware documentation
> states that the software should wait for the PCS_READY_STATUS to become
> 1.
>
> With the introduction of c9b589791fc1 ("phy: qcom: Utilize UFS reset
> controller") an additional 1ms delay was introduced between the start
> request and the check of the status bit. This greatly increases the
> chances for the hardware to actually becoming ready before the status
> bit is read.
>
> The result can be seen in that UFS PHY enabling is now reported as a
> failure in 10% of the boots on SDM845, which is a clear regression from
> the previous rare/occasional failure.
>
> This patch fixes the "break condition" of the poll to check for the
> correct state of the status bit.
>
> Unfortunately PCIe on 8996 and 8998 does not specify the mask_pcs_ready
> register, which means that the code checks a bit that's always 0. So the
> patch also fixes these, in order to not regress these targets.
>
> Cc: stable@vger.kernel.org
> Cc: Evan Green <evgreen@chromium.org>
> Cc: Marc Gonzalez <marc.w.gonzalez@free.fr>
> Cc: Vivek Gautam <vivek.gautam@codeaurora.org>
> Fixes: 73d7ec899bd8 ("phy: qcom-qmp: Add msm8998 PCIe QMP PHY support")
> Fixes: e78f3d15e115 ("phy: qcom-qmp: new qmp phy driver for qcom-chipsets")
> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>

Nice find.

Reviewed-by: Evan Green <evgreen@chromium.org>
Niklas Cassel June 12, 2019, 1:08 p.m. UTC | #2
On Tue, Jun 04, 2019 at 04:24:43PM -0700, Bjorn Andersson wrote:
> After issuing a PHY_START request to the QMP, the hardware documentation
> states that the software should wait for the PCS_READY_STATUS to become
> 1.
> 
> With the introduction of c9b589791fc1 ("phy: qcom: Utilize UFS reset
> controller") an additional 1ms delay was introduced between the start
> request and the check of the status bit. This greatly increases the
> chances for the hardware to actually becoming ready before the status
> bit is read.
> 
> The result can be seen in that UFS PHY enabling is now reported as a
> failure in 10% of the boots on SDM845, which is a clear regression from
> the previous rare/occasional failure.
> 
> This patch fixes the "break condition" of the poll to check for the
> correct state of the status bit.
> 
> Unfortunately PCIe on 8996 and 8998 does not specify the mask_pcs_ready
> register, which means that the code checks a bit that's always 0. So the
> patch also fixes these, in order to not regress these targets.
> 
> Cc: stable@vger.kernel.org
> Cc: Evan Green <evgreen@chromium.org>
> Cc: Marc Gonzalez <marc.w.gonzalez@free.fr>
> Cc: Vivek Gautam <vivek.gautam@codeaurora.org>
> Fixes: 73d7ec899bd8 ("phy: qcom-qmp: Add msm8998 PCIe QMP PHY support")
> Fixes: e78f3d15e115 ("phy: qcom-qmp: new qmp phy driver for qcom-chipsets")
> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> ---
> 
> @Kishon, this is a regression spotted in v5.2-rc1, so please consider applying
> this towards v5.2.
> 
>  drivers/phy/qualcomm/phy-qcom-qmp.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c b/drivers/phy/qualcomm/phy-qcom-qmp.c
> index cd91b4179b10..43abdfd0deed 100644
> --- a/drivers/phy/qualcomm/phy-qcom-qmp.c
> +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c
> @@ -1074,6 +1074,7 @@ static const struct qmp_phy_cfg msm8996_pciephy_cfg = {
>  
>  	.start_ctrl		= PCS_START | PLL_READY_GATE_EN,
>  	.pwrdn_ctrl		= SW_PWRDN | REFCLK_DRV_DSBL,
> +	.mask_pcs_ready		= PHYSTATUS,
>  	.mask_com_pcs_ready	= PCS_READY,
>  
>  	.has_phy_com_ctrl	= true,
> @@ -1253,6 +1254,7 @@ static const struct qmp_phy_cfg msm8998_pciephy_cfg = {
>  
>  	.start_ctrl             = SERDES_START | PCS_START,
>  	.pwrdn_ctrl		= SW_PWRDN | REFCLK_DRV_DSBL,
> +	.mask_pcs_ready		= PHYSTATUS,
>  	.mask_com_pcs_ready	= PCS_READY,
>  };
>  
> @@ -1547,7 +1549,7 @@ static int qcom_qmp_phy_enable(struct phy *phy)
>  	status = pcs + cfg->regs[QPHY_PCS_READY_STATUS];
>  	mask = cfg->mask_pcs_ready;
>  
> -	ret = readl_poll_timeout(status, val, !(val & mask), 1,
> +	ret = readl_poll_timeout(status, val, val & mask, 1,
>  				 PHY_INIT_COMPLETE_TIMEOUT);
>  	if (ret) {
>  		dev_err(qmp->dev, "phy initialization timed-out\n");
> -- 
> 2.18.0
> 

msm8996_pciephy_cfg and msm8998_pciephy_cfg not having a bit mask defined
for PCS ready is really a separate bug, so personally I would have created
two patches, one that adds the missing masks, and one patch that fixes the
broken break condition.

Either way:

Reviewed-by: Niklas Cassel <niklas.cassel@linaro.org>
Marc Gonzalez June 12, 2019, 4:24 p.m. UTC | #3
On 05/06/2019 01:24, Bjorn Andersson wrote:

> After issuing a PHY_START request to the QMP, the hardware documentation
> states that the software should wait for the PCS_READY_STATUS to become
> 1.
> 
> With the introduction of c9b589791fc1 ("phy: qcom: Utilize UFS reset
> controller") an additional 1ms delay was introduced between the start
> request and the check of the status bit. This greatly increases the
> chances for the hardware to actually becoming ready before the status
> bit is read.
> 
> The result can be seen in that UFS PHY enabling is now reported as a
> failure in 10% of the boots on SDM845, which is a clear regression from
> the previous rare/occasional failure.
> 
> This patch fixes the "break condition" of the poll to check for the
> correct state of the status bit.
> 
> Unfortunately PCIe on 8996 and 8998 does not specify the mask_pcs_ready
> register, which means that the code checks a bit that's always 0. So the
> patch also fixes these, in order to not regress these targets.
> 
> Cc: stable@vger.kernel.org
> Cc: Evan Green <evgreen@chromium.org>
> Cc: Marc Gonzalez <marc.w.gonzalez@free.fr>
> Cc: Vivek Gautam <vivek.gautam@codeaurora.org>
> Fixes: 73d7ec899bd8 ("phy: qcom-qmp: Add msm8998 PCIe QMP PHY support")
> Fixes: e78f3d15e115 ("phy: qcom-qmp: new qmp phy driver for qcom-chipsets")
> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> ---
> 
> @Kishon, this is a regression spotted in v5.2-rc1, so please consider applying
> this towards v5.2.
> 
>  drivers/phy/qualcomm/phy-qcom-qmp.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c b/drivers/phy/qualcomm/phy-qcom-qmp.c
> index cd91b4179b10..43abdfd0deed 100644
> --- a/drivers/phy/qualcomm/phy-qcom-qmp.c
> +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c
> @@ -1074,6 +1074,7 @@ static const struct qmp_phy_cfg msm8996_pciephy_cfg = {
>  
>  	.start_ctrl		= PCS_START | PLL_READY_GATE_EN,
>  	.pwrdn_ctrl		= SW_PWRDN | REFCLK_DRV_DSBL,
> +	.mask_pcs_ready		= PHYSTATUS,
>  	.mask_com_pcs_ready	= PCS_READY,
>  
>  	.has_phy_com_ctrl	= true,
> @@ -1253,6 +1254,7 @@ static const struct qmp_phy_cfg msm8998_pciephy_cfg = {
>  
>  	.start_ctrl             = SERDES_START | PCS_START,
>  	.pwrdn_ctrl		= SW_PWRDN | REFCLK_DRV_DSBL,
> +	.mask_pcs_ready		= PHYSTATUS,
>  	.mask_com_pcs_ready	= PCS_READY,
>  };
>  
> @@ -1547,7 +1549,7 @@ static int qcom_qmp_phy_enable(struct phy *phy)
>  	status = pcs + cfg->regs[QPHY_PCS_READY_STATUS];
>  	mask = cfg->mask_pcs_ready;
>  
> -	ret = readl_poll_timeout(status, val, !(val & mask), 1,
> +	ret = readl_poll_timeout(status, val, val & mask, 1,
>  				 PHY_INIT_COMPLETE_TIMEOUT);
>  	if (ret) {
>  		dev_err(qmp->dev, "phy initialization timed-out\n");

Your patch made me realize that:
msm8998_pciephy_cfg.has_phy_com_ctrl = false
thus
msm8998_pciephy_cfg.mask_com_pcs_ready is useless, AFAICT.

(I copied msm8996_pciephy_cfg for msm8998_pciephy_cfg)

Does msm8996_pciephy_cfg really need both mask_pcs_ready AND
mask_com_pcs_ready?

I'll test your patch tomorrow.

Regards.
Bjorn Andersson June 12, 2019, 5:25 p.m. UTC | #4
On Wed 12 Jun 09:24 PDT 2019, Marc Gonzalez wrote:

> On 05/06/2019 01:24, Bjorn Andersson wrote:
> 
> > After issuing a PHY_START request to the QMP, the hardware documentation
> > states that the software should wait for the PCS_READY_STATUS to become
> > 1.
> > 
> > With the introduction of c9b589791fc1 ("phy: qcom: Utilize UFS reset
> > controller") an additional 1ms delay was introduced between the start
> > request and the check of the status bit. This greatly increases the
> > chances for the hardware to actually becoming ready before the status
> > bit is read.
> > 
> > The result can be seen in that UFS PHY enabling is now reported as a
> > failure in 10% of the boots on SDM845, which is a clear regression from
> > the previous rare/occasional failure.
> > 
> > This patch fixes the "break condition" of the poll to check for the
> > correct state of the status bit.
> > 
> > Unfortunately PCIe on 8996 and 8998 does not specify the mask_pcs_ready
> > register, which means that the code checks a bit that's always 0. So the
> > patch also fixes these, in order to not regress these targets.
> > 
> > Cc: stable@vger.kernel.org
> > Cc: Evan Green <evgreen@chromium.org>
> > Cc: Marc Gonzalez <marc.w.gonzalez@free.fr>
> > Cc: Vivek Gautam <vivek.gautam@codeaurora.org>
> > Fixes: 73d7ec899bd8 ("phy: qcom-qmp: Add msm8998 PCIe QMP PHY support")
> > Fixes: e78f3d15e115 ("phy: qcom-qmp: new qmp phy driver for qcom-chipsets")
> > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> > ---
> > 
> > @Kishon, this is a regression spotted in v5.2-rc1, so please consider applying
> > this towards v5.2.
> > 
> >  drivers/phy/qualcomm/phy-qcom-qmp.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c b/drivers/phy/qualcomm/phy-qcom-qmp.c
> > index cd91b4179b10..43abdfd0deed 100644
> > --- a/drivers/phy/qualcomm/phy-qcom-qmp.c
> > +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c
> > @@ -1074,6 +1074,7 @@ static const struct qmp_phy_cfg msm8996_pciephy_cfg = {
> >  
> >  	.start_ctrl		= PCS_START | PLL_READY_GATE_EN,
> >  	.pwrdn_ctrl		= SW_PWRDN | REFCLK_DRV_DSBL,
> > +	.mask_pcs_ready		= PHYSTATUS,
> >  	.mask_com_pcs_ready	= PCS_READY,
> >  
> >  	.has_phy_com_ctrl	= true,
> > @@ -1253,6 +1254,7 @@ static const struct qmp_phy_cfg msm8998_pciephy_cfg = {
> >  
> >  	.start_ctrl             = SERDES_START | PCS_START,
> >  	.pwrdn_ctrl		= SW_PWRDN | REFCLK_DRV_DSBL,
> > +	.mask_pcs_ready		= PHYSTATUS,
> >  	.mask_com_pcs_ready	= PCS_READY,
> >  };
> >  
> > @@ -1547,7 +1549,7 @@ static int qcom_qmp_phy_enable(struct phy *phy)
> >  	status = pcs + cfg->regs[QPHY_PCS_READY_STATUS];
> >  	mask = cfg->mask_pcs_ready;
> >  
> > -	ret = readl_poll_timeout(status, val, !(val & mask), 1,
> > +	ret = readl_poll_timeout(status, val, val & mask, 1,
> >  				 PHY_INIT_COMPLETE_TIMEOUT);
> >  	if (ret) {
> >  		dev_err(qmp->dev, "phy initialization timed-out\n");
> 
> Your patch made me realize that:
> msm8998_pciephy_cfg.has_phy_com_ctrl = false
> thus
> msm8998_pciephy_cfg.mask_com_pcs_ready is useless, AFAICT.
> 

While 8998 has a COM block, it does (among other things) not have a
ready bit. So afaict has_phy_com_ctrl = false is correct.

The addition of mask_pcs_ready is part of resolving the regression in
5.2, so I suggest that we remove mask_com_pcs_ready separately.

> (I copied msm8996_pciephy_cfg for msm8998_pciephy_cfg)
> 
> Does msm8996_pciephy_cfg really need both mask_pcs_ready AND
> mask_com_pcs_ready?
> 

8996 has a COM block and it contains both the control bits and the
status bits, so that looks correct.

> I'll test your patch tomorrow.
> 

I appreciate that.

Thanks,
Bjorn
Bjorn Andersson June 12, 2019, 5:34 p.m. UTC | #5
On Wed 12 Jun 06:08 PDT 2019, Niklas Cassel wrote:

> On Tue, Jun 04, 2019 at 04:24:43PM -0700, Bjorn Andersson wrote:
> > After issuing a PHY_START request to the QMP, the hardware documentation
> > states that the software should wait for the PCS_READY_STATUS to become
> > 1.
> > 
> > With the introduction of c9b589791fc1 ("phy: qcom: Utilize UFS reset
> > controller") an additional 1ms delay was introduced between the start
> > request and the check of the status bit. This greatly increases the
> > chances for the hardware to actually becoming ready before the status
> > bit is read.
> > 
> > The result can be seen in that UFS PHY enabling is now reported as a
> > failure in 10% of the boots on SDM845, which is a clear regression from
> > the previous rare/occasional failure.
> > 
> > This patch fixes the "break condition" of the poll to check for the
> > correct state of the status bit.
> > 
> > Unfortunately PCIe on 8996 and 8998 does not specify the mask_pcs_ready
> > register, which means that the code checks a bit that's always 0. So the
> > patch also fixes these, in order to not regress these targets.
> > 
> > Cc: stable@vger.kernel.org
> > Cc: Evan Green <evgreen@chromium.org>
> > Cc: Marc Gonzalez <marc.w.gonzalez@free.fr>
> > Cc: Vivek Gautam <vivek.gautam@codeaurora.org>
> > Fixes: 73d7ec899bd8 ("phy: qcom-qmp: Add msm8998 PCIe QMP PHY support")
> > Fixes: e78f3d15e115 ("phy: qcom-qmp: new qmp phy driver for qcom-chipsets")
> > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> > ---
> > 
> > @Kishon, this is a regression spotted in v5.2-rc1, so please consider applying
> > this towards v5.2.
> > 
> >  drivers/phy/qualcomm/phy-qcom-qmp.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c b/drivers/phy/qualcomm/phy-qcom-qmp.c
> > index cd91b4179b10..43abdfd0deed 100644
> > --- a/drivers/phy/qualcomm/phy-qcom-qmp.c
> > +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c
> > @@ -1074,6 +1074,7 @@ static const struct qmp_phy_cfg msm8996_pciephy_cfg = {
> >  
> >  	.start_ctrl		= PCS_START | PLL_READY_GATE_EN,
> >  	.pwrdn_ctrl		= SW_PWRDN | REFCLK_DRV_DSBL,
> > +	.mask_pcs_ready		= PHYSTATUS,
> >  	.mask_com_pcs_ready	= PCS_READY,
> >  
> >  	.has_phy_com_ctrl	= true,
> > @@ -1253,6 +1254,7 @@ static const struct qmp_phy_cfg msm8998_pciephy_cfg = {
> >  
> >  	.start_ctrl             = SERDES_START | PCS_START,
> >  	.pwrdn_ctrl		= SW_PWRDN | REFCLK_DRV_DSBL,
> > +	.mask_pcs_ready		= PHYSTATUS,
> >  	.mask_com_pcs_ready	= PCS_READY,
> >  };
> >  
> > @@ -1547,7 +1549,7 @@ static int qcom_qmp_phy_enable(struct phy *phy)
> >  	status = pcs + cfg->regs[QPHY_PCS_READY_STATUS];
> >  	mask = cfg->mask_pcs_ready;
> >  
> > -	ret = readl_poll_timeout(status, val, !(val & mask), 1,
> > +	ret = readl_poll_timeout(status, val, val & mask, 1,
> >  				 PHY_INIT_COMPLETE_TIMEOUT);
> >  	if (ret) {
> >  		dev_err(qmp->dev, "phy initialization timed-out\n");
> > -- 
> > 2.18.0
> > 
> 
> msm8996_pciephy_cfg and msm8998_pciephy_cfg not having a bit mask defined
> for PCS ready is really a separate bug, so personally I would have created
> two patches, one that adds the missing masks, and one patch that fixes the
> broken break condition.
> 

We can't add mask_pcs_ready in a separate commit after the poll change,
because this would introduce a regression in the history and we can't
add the mask_pcs_ready before because when I tested this on db820c I saw
occasional initialization failures.

I was not able to verify 8998, but I presume that the same dependency
exists there.

> Either way:
> 
> Reviewed-by: Niklas Cassel <niklas.cassel@linaro.org>

Thanks,
Bjorn
Marc Gonzalez June 13, 2019, 9:10 a.m. UTC | #6
On 12/06/2019 19:25, Bjorn Andersson wrote:

> On Wed 12 Jun 09:24 PDT 2019, Marc Gonzalez wrote:
> 
>> On 05/06/2019 01:24, Bjorn Andersson wrote:
>>
>>> After issuing a PHY_START request to the QMP, the hardware documentation
>>> states that the software should wait for the PCS_READY_STATUS to become 1.
>>>
>>> With the introduction of c9b589791fc1 ("phy: qcom: Utilize UFS reset
>>> controller") an additional 1ms delay was introduced between the start
>>> request and the check of the status bit. This greatly increases the
>>> chances for the hardware to actually becoming ready before the status
>>> bit is read.
>>>
>>> The result can be seen in that UFS PHY enabling is now reported as a
>>> failure in 10% of the boots on SDM845, which is a clear regression from
>>> the previous rare/occasional failure.
>>>
>>> This patch fixes the "break condition" of the poll to check for the
>>> correct state of the status bit.
>>>
>>> Unfortunately PCIe on 8996 and 8998 does not specify the mask_pcs_ready
>>> register, which means that the code checks a bit that's always 0. So the
>>> patch also fixes these, in order to not regress these targets.
>>>
>>> Cc: stable@vger.kernel.org
>>> Cc: Evan Green <evgreen@chromium.org>
>>> Cc: Marc Gonzalez <marc.w.gonzalez@free.fr>
>>> Cc: Vivek Gautam <vivek.gautam@codeaurora.org>
>>> Fixes: 73d7ec899bd8 ("phy: qcom-qmp: Add msm8998 PCIe QMP PHY support")
>>> Fixes: e78f3d15e115 ("phy: qcom-qmp: new qmp phy driver for qcom-chipsets")
>>> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
>>> ---
>>>
>>> @Kishon, this is a regression spotted in v5.2-rc1, so please consider applying
>>> this towards v5.2.
>>>
>>>  drivers/phy/qualcomm/phy-qcom-qmp.c | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c b/drivers/phy/qualcomm/phy-qcom-qmp.c
>>> index cd91b4179b10..43abdfd0deed 100644
>>> --- a/drivers/phy/qualcomm/phy-qcom-qmp.c
>>> +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c
>>> @@ -1074,6 +1074,7 @@ static const struct qmp_phy_cfg msm8996_pciephy_cfg = {
>>>  
>>>  	.start_ctrl		= PCS_START | PLL_READY_GATE_EN,
>>>  	.pwrdn_ctrl		= SW_PWRDN | REFCLK_DRV_DSBL,
>>> +	.mask_pcs_ready		= PHYSTATUS,
>>>  	.mask_com_pcs_ready	= PCS_READY,
>>>  
>>>  	.has_phy_com_ctrl	= true,
>>> @@ -1253,6 +1254,7 @@ static const struct qmp_phy_cfg msm8998_pciephy_cfg = {
>>>  
>>>  	.start_ctrl             = SERDES_START | PCS_START,
>>>  	.pwrdn_ctrl		= SW_PWRDN | REFCLK_DRV_DSBL,
>>> +	.mask_pcs_ready		= PHYSTATUS,
>>>  	.mask_com_pcs_ready	= PCS_READY,
>>>  };
>>>  
>>> @@ -1547,7 +1549,7 @@ static int qcom_qmp_phy_enable(struct phy *phy)
>>>  	status = pcs + cfg->regs[QPHY_PCS_READY_STATUS];
>>>  	mask = cfg->mask_pcs_ready;
>>>  
>>> -	ret = readl_poll_timeout(status, val, !(val & mask), 1,
>>> +	ret = readl_poll_timeout(status, val, val & mask, 1,
>>>  				 PHY_INIT_COMPLETE_TIMEOUT);
>>>  	if (ret) {
>>>  		dev_err(qmp->dev, "phy initialization timed-out\n");
>>
>> Your patch made me realize that:
>> msm8998_pciephy_cfg.has_phy_com_ctrl = false
>> thus
>> msm8998_pciephy_cfg.mask_com_pcs_ready is useless, AFAICT.
> 
> While 8998 has a COM block, it does (among other things) not have a
> ready bit. So afaict has_phy_com_ctrl = false is correct.

Pfff... Working blind without the HPG sucks...

> The addition of mask_pcs_ready is part of resolving the regression in
> 5.2, so I suggest that we remove mask_com_pcs_ready separately.

I agree that it should be done separately.
I'll send a patch on top of yours.

>> (I copied msm8996_pciephy_cfg for msm8998_pciephy_cfg)
>>
>> Does msm8996_pciephy_cfg really need both mask_pcs_ready AND
>> mask_com_pcs_ready?
> 
> 8996 has a COM block and it contains both the control bits and the
> status bits, so that looks correct.

Thanks for checking.

>> I'll test your patch tomorrow.
> 
> I appreciate that.

Here are my observations for a 8998 board:

1) If I apply only the readl_poll_timeout() fix (not the mask_pcs_ready fixup)
qcom_pcie_probe() fails with a timeout in phy_init.
=> this is in line with your regression analysis.

2) Your patch also fixes a long-standing bug in UFS init whereby sending
lots of information to the console during phy init would lead to an
incorrectly diagnosed time-out.

Good stuff!

Reviewed-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
Tested-by: Marc Gonzalez <marc.w.gonzalez@free.fr>

Regards.
Marc Gonzalez June 19, 2019, 12:43 p.m. UTC | #7
On 13/06/2019 11:10, Marc Gonzalez wrote:

> Here are my observations for a 8998 board:
> 
> 1) If I apply only the readl_poll_timeout() fix (not the mask_pcs_ready fixup)
> qcom_pcie_probe() fails with a timeout in phy_init.
> => this is in line with your regression analysis.
> 
> 2) Your patch also fixes a long-standing bug in UFS init whereby sending
> lots of information to the console during phy init would lead to an
> incorrectly diagnosed time-out.
> 
> Good stuff!
> 
> Reviewed-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
> Tested-by: Marc Gonzalez <marc.w.gonzalez@free.fr>

Hello Kishon,

Could you take this patch through your tree?
It fixes a pair of nasty bugs.

I do have a follow-up (trivial) patch on top of this one:
https://lore.kernel.org/patchwork/patch/1088044/

What are your thoughts on the usleep_range issue?
https://lore.kernel.org/patchwork/patch/1088035/

Regards.
Marc Gonzalez July 19, 2019, 3:50 p.m. UTC | #8
On 13/06/2019 11:10, Marc Gonzalez wrote:

> Here are my observations for a 8998 board:
> 
> 1) If I apply only the readl_poll_timeout() fix (not the mask_pcs_ready fixup)
> qcom_pcie_probe() fails with a timeout in phy_init.
> => this is in line with your regression analysis.
> 
> 2) Your patch also fixes a long-standing bug in UFS init whereby sending
> lots of information to the console during phy init would lead to an
> incorrectly diagnosed time-out.
> 
> Good stuff!
> 
> Reviewed-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
> Tested-by: Marc Gonzalez <marc.w.gonzalez@free.fr>

It looks like this patch fixed UFS, but broke PCIe and USB3 ^_^

qcom-qmp-phy 1c06000.phy: Registered Qcom-QMP phy
qcom-qmp-phy c010000.phy: Registered Qcom-QMP phy
qcom-qmp-phy 1da7000.phy: Registered Qcom-QMP phy

qcom-qmp-phy 1c06000.phy: BEFORE=000000a6 AFTER=000000a6
qcom-qmp-phy 1c06000.phy: phy initialization timed-out
phy phy-1c06000.phy.0: phy init failed --> -110
qcom-pcie: probe of 1c00000.pci failed with error -110

qcom-qmp-phy 1da7000.phy: BEFORE=00000040 AFTER=0000000d

qcom-qmp-phy c010000.phy: BEFORE=69696969 AFTER=b7b7b7b7
qcom-qmp-phy c010000.phy: phy initialization timed-out
phy phy-c010000.phy.1: phy init failed --> -110
dwc3 a800000.dwc3: failed to initialize core: -110
dwc3: probe of a800000.dwc3 failed with error -110


Downstream code for PCIe is:

static bool pcie_phy_is_ready(struct msm_pcie_dev_t *dev)
{
	if (dev->phy_ver >= 0x20) {
		if (readl_relaxed(dev->phy + PCIE_N_PCS_STATUS(dev->rc_idx, dev->common_phy)) &	BIT(6))
			return false;
		else
			return true;
	}

	if (!(readl_relaxed(dev->phy + PCIE_COM_PCS_READY_STATUS) & 0x1))
		return false;
	else
		return true;
}

AFAICT:
PCIe and USB3 QMP PHYs are ready when PHYSTATUS=BIT(6) goes to 0.
But UFS is ready when PCS_READY=BIT(0) goes to 1.


Can someone verify that USB3 is broken on 845 with 885bd765963b?

Regards.
Marc Gonzalez July 23, 2019, 10:31 a.m. UTC | #9
On 19/07/2019 17:50, Marc Gonzalez wrote:

> On 13/06/2019 11:10, Marc Gonzalez wrote:
> 
>> Here are my observations for a 8998 board:
>>
>> 1) If I apply only the readl_poll_timeout() fix (not the mask_pcs_ready fixup)
>> qcom_pcie_probe() fails with a timeout in phy_init.
>> => this is in line with your regression analysis.
>>
>> 2) Your patch also fixes a long-standing bug in UFS init whereby sending
>> lots of information to the console during phy init would lead to an
>> incorrectly diagnosed time-out.
>>
>> Good stuff!
>>
>> Reviewed-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
>> Tested-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
> 
> It looks like this patch fixed UFS, but broke PCIe and USB3 ^_^
> 
> qcom-qmp-phy 1c06000.phy: Registered Qcom-QMP phy
> qcom-qmp-phy c010000.phy: Registered Qcom-QMP phy
> qcom-qmp-phy 1da7000.phy: Registered Qcom-QMP phy
> 
> qcom-qmp-phy 1c06000.phy: BEFORE=000000a6 AFTER=000000a6
> qcom-qmp-phy 1c06000.phy: phy initialization timed-out
> phy phy-1c06000.phy.0: phy init failed --> -110
> qcom-pcie: probe of 1c00000.pci failed with error -110
> 
> qcom-qmp-phy 1da7000.phy: BEFORE=00000040 AFTER=0000000d
> 
> qcom-qmp-phy c010000.phy: BEFORE=69696969 AFTER=b7b7b7b7
> qcom-qmp-phy c010000.phy: phy initialization timed-out
> phy phy-c010000.phy.1: phy init failed --> -110
> dwc3 a800000.dwc3: failed to initialize core: -110
> dwc3: probe of a800000.dwc3 failed with error -110
> 
> 
> Downstream code for PCIe is:
> 
> static bool pcie_phy_is_ready(struct msm_pcie_dev_t *dev)
> {
> 	if (dev->phy_ver >= 0x20) {
> 		if (readl_relaxed(dev->phy + PCIE_N_PCS_STATUS(dev->rc_idx, dev->common_phy)) &	BIT(6))
> 			return false;
> 		else
> 			return true;
> 	}
> 
> 	if (!(readl_relaxed(dev->phy + PCIE_COM_PCS_READY_STATUS) & 0x1))
> 		return false;
> 	else
> 		return true;
> }
> 
> AFAICT:
> PCIe and USB3 QMP PHYs are ready when PHYSTATUS=BIT(6) goes to 0.
> But UFS is ready when PCS_READY=BIT(0) goes to 1.
> 
> 
> Can someone verify that USB3 is broken on 845 with 885bd765963b?

Suggested fix:

diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c b/drivers/phy/qualcomm/phy-qcom-qmp.c
index 34ff6434da8f..11c1b02f0206 100644
--- a/drivers/phy/qualcomm/phy-qcom-qmp.c
+++ b/drivers/phy/qualcomm/phy-qcom-qmp.c
@@ -1447,6 +1447,11 @@ static int qcom_qmp_phy_com_exit(struct qcom_qmp *qmp)
 	return 0;
 }
 
+static bool phy_is_ready(unsigned int val, unsigned int mask)
+{
+	return mask == PCS_READY ? val & mask : !(val & mask);
+}
+
 static int qcom_qmp_phy_enable(struct phy *phy)
 {
 	struct qmp_phy *qphy = phy_get_drvdata(phy);
@@ -1548,7 +1553,7 @@ static int qcom_qmp_phy_enable(struct phy *phy)
 	status = pcs + cfg->regs[QPHY_PCS_READY_STATUS];
 	mask = cfg->mask_pcs_ready;
 
-	ret = readl_poll_timeout(status, val, val & mask, 10,
+	ret = readl_poll_timeout(status, val, phy_is_ready(val, mask), 10,
 				 PHY_INIT_COMPLETE_TIMEOUT);
 	if (ret) {
 		dev_err(qmp->dev, "phy initialization timed-out\n");
Marc Gonzalez Aug. 2, 2019, 7:54 p.m. UTC | #10
On 23/07/2019 12:31, Marc Gonzalez wrote:

> On 19/07/2019 17:50, Marc Gonzalez wrote:
> 
>> On 13/06/2019 11:10, Marc Gonzalez wrote:
>>
>>> Here are my observations for a 8998 board:
>>>
>>> 1) If I apply only the readl_poll_timeout() fix (not the mask_pcs_ready fixup)
>>> qcom_pcie_probe() fails with a timeout in phy_init.
>>> => this is in line with your regression analysis.
>>>
>>> 2) Your patch also fixes a long-standing bug in UFS init whereby sending
>>> lots of information to the console during phy init would lead to an
>>> incorrectly diagnosed time-out.
>>>
>>> Good stuff!
>>>
>>> Reviewed-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
>>> Tested-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
>>
>> It looks like this patch fixed UFS, but broke PCIe and USB3 ^_^
>>
>> qcom-qmp-phy 1c06000.phy: Registered Qcom-QMP phy
>> qcom-qmp-phy c010000.phy: Registered Qcom-QMP phy
>> qcom-qmp-phy 1da7000.phy: Registered Qcom-QMP phy
>>
>> qcom-qmp-phy 1c06000.phy: BEFORE=000000a6 AFTER=000000a6
>> qcom-qmp-phy 1c06000.phy: phy initialization timed-out
>> phy phy-1c06000.phy.0: phy init failed --> -110
>> qcom-pcie: probe of 1c00000.pci failed with error -110
>>
>> qcom-qmp-phy 1da7000.phy: BEFORE=00000040 AFTER=0000000d
>>
>> qcom-qmp-phy c010000.phy: BEFORE=69696969 AFTER=b7b7b7b7
>> qcom-qmp-phy c010000.phy: phy initialization timed-out
>> phy phy-c010000.phy.1: phy init failed --> -110
>> dwc3 a800000.dwc3: failed to initialize core: -110
>> dwc3: probe of a800000.dwc3 failed with error -110
>>
>>
>> Downstream code for PCIe is:
>>
>> static bool pcie_phy_is_ready(struct msm_pcie_dev_t *dev)
>> {
>> 	if (dev->phy_ver >= 0x20) {
>> 		if (readl_relaxed(dev->phy + PCIE_N_PCS_STATUS(dev->rc_idx, dev->common_phy)) &	BIT(6))
>> 			return false;
>> 		else
>> 			return true;
>> 	}
>>
>> 	if (!(readl_relaxed(dev->phy + PCIE_COM_PCS_READY_STATUS) & 0x1))
>> 		return false;
>> 	else
>> 		return true;
>> }
>>
>> AFAICT:
>> PCIe and USB3 QMP PHYs are ready when PHYSTATUS=BIT(6) goes to 0.
>> But UFS is ready when PCS_READY=BIT(0) goes to 1.
>>
>>
>> Can someone verify that USB3 is broken on 845 with 885bd765963b?
> 
> Suggested fix:
> 
> diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c b/drivers/phy/qualcomm/phy-qcom-qmp.c
> index 34ff6434da8f..11c1b02f0206 100644
> --- a/drivers/phy/qualcomm/phy-qcom-qmp.c
> +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c
> @@ -1447,6 +1447,11 @@ static int qcom_qmp_phy_com_exit(struct qcom_qmp *qmp)
>  	return 0;
>  }
>  
> +static bool phy_is_ready(unsigned int val, unsigned int mask)
> +{
> +	return mask == PCS_READY ? val & mask : !(val & mask);
> +}
> +
>  static int qcom_qmp_phy_enable(struct phy *phy)
>  {
>  	struct qmp_phy *qphy = phy_get_drvdata(phy);
> @@ -1548,7 +1553,7 @@ static int qcom_qmp_phy_enable(struct phy *phy)
>  	status = pcs + cfg->regs[QPHY_PCS_READY_STATUS];
>  	mask = cfg->mask_pcs_ready;
>  
> -	ret = readl_poll_timeout(status, val, val & mask, 10,
> +	ret = readl_poll_timeout(status, val, phy_is_ready(val, mask), 10,
>  				 PHY_INIT_COMPLETE_TIMEOUT);
>  	if (ret) {
>  		dev_err(qmp->dev, "phy initialization timed-out\n");
> 

It seems there is a problem. This patch made its way to stable,
even though it appears to break (at least) msm8998, and possibly
sdm845. Could someone with access to one or both platforms confirm
that the patch broke something, and that the proposed fix actually
fixes the issue?

Regards.
Bjorn Andersson Aug. 6, 2019, 12:43 a.m. UTC | #11
On Fri 19 Jul 08:50 PDT 2019, Marc Gonzalez wrote:

> On 13/06/2019 11:10, Marc Gonzalez wrote:
> 
> > Here are my observations for a 8998 board:
> > 
> > 1) If I apply only the readl_poll_timeout() fix (not the mask_pcs_ready fixup)
> > qcom_pcie_probe() fails with a timeout in phy_init.
> > => this is in line with your regression analysis.
> > 
> > 2) Your patch also fixes a long-standing bug in UFS init whereby sending
> > lots of information to the console during phy init would lead to an
> > incorrectly diagnosed time-out.
> > 
> > Good stuff!
> > 
> > Reviewed-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
> > Tested-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
> 
> It looks like this patch fixed UFS, but broke PCIe and USB3 ^_^
> 
> qcom-qmp-phy 1c06000.phy: Registered Qcom-QMP phy
> qcom-qmp-phy c010000.phy: Registered Qcom-QMP phy
> qcom-qmp-phy 1da7000.phy: Registered Qcom-QMP phy
> 
> qcom-qmp-phy 1c06000.phy: BEFORE=000000a6 AFTER=000000a6
> qcom-qmp-phy 1c06000.phy: phy initialization timed-out
> phy phy-1c06000.phy.0: phy init failed --> -110
> qcom-pcie: probe of 1c00000.pci failed with error -110
> 
> qcom-qmp-phy 1da7000.phy: BEFORE=00000040 AFTER=0000000d
> 
> qcom-qmp-phy c010000.phy: BEFORE=69696969 AFTER=b7b7b7b7
> qcom-qmp-phy c010000.phy: phy initialization timed-out
> phy phy-c010000.phy.1: phy init failed --> -110
> dwc3 a800000.dwc3: failed to initialize core: -110
> dwc3: probe of a800000.dwc3 failed with error -110
> 
> 
> Downstream code for PCIe is:
> 
> static bool pcie_phy_is_ready(struct msm_pcie_dev_t *dev)
> {
> 	if (dev->phy_ver >= 0x20) {
> 		if (readl_relaxed(dev->phy + PCIE_N_PCS_STATUS(dev->rc_idx, dev->common_phy)) &	BIT(6))
> 			return false;
> 		else
> 			return true;
> 	}
> 
> 	if (!(readl_relaxed(dev->phy + PCIE_COM_PCS_READY_STATUS) & 0x1))
> 		return false;
> 	else
> 		return true;
> }
> 
> AFAICT:
> PCIe and USB3 QMP PHYs are ready when PHYSTATUS=BIT(6) goes to 0.
> But UFS is ready when PCS_READY=BIT(0) goes to 1.
> 
> 
> Can someone verify that USB3 is broken on 845 with 885bd765963b?
> 

Both USB3 and PCIe still works for me on 845, but you are correct;
PHYSTATUS is ready low, while PCS_READY is ready high.

I posted a patch that attempts to clean up the differences, please give
it a spin (and review).

Regards,
Bjorn
diff mbox series

Patch

diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c b/drivers/phy/qualcomm/phy-qcom-qmp.c
index cd91b4179b10..43abdfd0deed 100644
--- a/drivers/phy/qualcomm/phy-qcom-qmp.c
+++ b/drivers/phy/qualcomm/phy-qcom-qmp.c
@@ -1074,6 +1074,7 @@  static const struct qmp_phy_cfg msm8996_pciephy_cfg = {
 
 	.start_ctrl		= PCS_START | PLL_READY_GATE_EN,
 	.pwrdn_ctrl		= SW_PWRDN | REFCLK_DRV_DSBL,
+	.mask_pcs_ready		= PHYSTATUS,
 	.mask_com_pcs_ready	= PCS_READY,
 
 	.has_phy_com_ctrl	= true,
@@ -1253,6 +1254,7 @@  static const struct qmp_phy_cfg msm8998_pciephy_cfg = {
 
 	.start_ctrl             = SERDES_START | PCS_START,
 	.pwrdn_ctrl		= SW_PWRDN | REFCLK_DRV_DSBL,
+	.mask_pcs_ready		= PHYSTATUS,
 	.mask_com_pcs_ready	= PCS_READY,
 };
 
@@ -1547,7 +1549,7 @@  static int qcom_qmp_phy_enable(struct phy *phy)
 	status = pcs + cfg->regs[QPHY_PCS_READY_STATUS];
 	mask = cfg->mask_pcs_ready;
 
-	ret = readl_poll_timeout(status, val, !(val & mask), 1,
+	ret = readl_poll_timeout(status, val, val & mask, 1,
 				 PHY_INIT_COMPLETE_TIMEOUT);
 	if (ret) {
 		dev_err(qmp->dev, "phy initialization timed-out\n");