diff mbox series

PCI: dw-rockchip: Remove PCIE_L0S_ENTRY check from rockchip_pcie_link_up()

Message ID 1744180833-68472-1-git-send-email-shawn.lin@rock-chips.com (mailing list archive)
State Superseded
Delegated to: Krzysztof Wilczyński
Headers show
Series PCI: dw-rockchip: Remove PCIE_L0S_ENTRY check from rockchip_pcie_link_up() | expand

Commit Message

Shawn Lin April 9, 2025, 6:40 a.m. UTC
Two mistakes here:
1. 0x11 is L0 not L0S, so the naming is wrong from the very beginning.
2. It's totally broken if enabling ASPM as rockchip_pcie_link_up() treat other
states, for instance, L0S or L1 as link down which is obvioult wrong.

Remove the check.

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
---

 drivers/pci/controller/dwc/pcie-dw-rockchip.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

Niklas Cassel April 9, 2025, 8:30 a.m. UTC | #1
On Wed, Apr 09, 2025 at 02:40:33PM +0800, Shawn Lin wrote:
> Two mistakes here:
> 1. 0x11 is L0 not L0S, so the naming is wrong from the very beginning.
> 2. It's totally broken if enabling ASPM as rockchip_pcie_link_up() treat other
> states, for instance, L0S or L1 as link down which is obvioult wrong.
> 
> Remove the check.
> 
> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
> ---
> 
>  drivers/pci/controller/dwc/pcie-dw-rockchip.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> index c624b7e..21dc99c 100644
> --- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> +++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> @@ -44,7 +44,6 @@
>  #define PCIE_LINKUP			(PCIE_SMLH_LINKUP | PCIE_RDLH_LINKUP)
>  #define PCIE_RDLH_LINK_UP_CHGED		BIT(1)
>  #define PCIE_LINK_REQ_RST_NOT_INT	BIT(2)
> -#define PCIE_L0S_ENTRY			0x11
>  #define PCIE_CLIENT_GENERAL_CONTROL	0x0
>  #define PCIE_CLIENT_INTR_STATUS_LEGACY	0x8
>  #define PCIE_CLIENT_INTR_MASK_LEGACY	0x1c
> @@ -177,8 +176,7 @@ static int rockchip_pcie_link_up(struct dw_pcie *pci)
>  	struct rockchip_pcie *rockchip = to_rockchip_pcie(pci);
>  	u32 val = rockchip_pcie_get_ltssm(rockchip);
>  
> -	if ((val & PCIE_LINKUP) == PCIE_LINKUP &&
> -	    (val & PCIE_LTSSM_STATUS_MASK) == PCIE_L0S_ENTRY)
> +	if ((val & PCIE_LINKUP) == PCIE_LINKUP)
>  		return 1;
>  
>  	return 0;
> -- 
> 2.7.4
> 

You should probably also add:
Fixes: 0e898eb8df4e ("PCI: rockchip-dwc: Add Rockchip RK356X host controller driver")


Considering that dw_pcie_link_up() looks like this:
https://github.com/torvalds/linux/blob/v6.15-rc1/drivers/pci/controller/dwc/pcie-designware.c#L714-L725

Why not simply remove the rockchip_pcie_link_up() callback completely?

Is there any advantage of using a rockchip specific way to read link up,
rather than just reading link up via the DWC PCIE_PORT_DEBUG1 register?


Kind regards,
Niklas
Shawn Lin April 9, 2025, 9:09 a.m. UTC | #2
在 2025/04/09 星期三 16:30, Niklas Cassel 写道:
> On Wed, Apr 09, 2025 at 02:40:33PM +0800, Shawn Lin wrote:
>> Two mistakes here:
>> 1. 0x11 is L0 not L0S, so the naming is wrong from the very beginning.
>> 2. It's totally broken if enabling ASPM as rockchip_pcie_link_up() treat other
>> states, for instance, L0S or L1 as link down which is obvioult wrong.
>>
>> Remove the check.
>>
>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>> ---
>>
>>   drivers/pci/controller/dwc/pcie-dw-rockchip.c | 4 +---
>>   1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
>> index c624b7e..21dc99c 100644
>> --- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
>> +++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
>> @@ -44,7 +44,6 @@
>>   #define PCIE_LINKUP			(PCIE_SMLH_LINKUP | PCIE_RDLH_LINKUP)
>>   #define PCIE_RDLH_LINK_UP_CHGED		BIT(1)
>>   #define PCIE_LINK_REQ_RST_NOT_INT	BIT(2)
>> -#define PCIE_L0S_ENTRY			0x11
>>   #define PCIE_CLIENT_GENERAL_CONTROL	0x0
>>   #define PCIE_CLIENT_INTR_STATUS_LEGACY	0x8
>>   #define PCIE_CLIENT_INTR_MASK_LEGACY	0x1c
>> @@ -177,8 +176,7 @@ static int rockchip_pcie_link_up(struct dw_pcie *pci)
>>   	struct rockchip_pcie *rockchip = to_rockchip_pcie(pci);
>>   	u32 val = rockchip_pcie_get_ltssm(rockchip);
>>   
>> -	if ((val & PCIE_LINKUP) == PCIE_LINKUP &&
>> -	    (val & PCIE_LTSSM_STATUS_MASK) == PCIE_L0S_ENTRY)
>> +	if ((val & PCIE_LINKUP) == PCIE_LINKUP)
>>   		return 1;
>>   
>>   	return 0;
>> -- 
>> 2.7.4
>>
> 
> You should probably also add:
> Fixes: 0e898eb8df4e ("PCI: rockchip-dwc: Add Rockchip RK356X host controller driver")
> 

Will add.

> 
> Considering that dw_pcie_link_up() looks like this:
> https://github.com/torvalds/linux/blob/v6.15-rc1/drivers/pci/controller/dwc/pcie-designware.c#L714-L725
> 
> Why not simply remove the rockchip_pcie_link_up() callback completely?
> 
> Is there any advantage of using a rockchip specific way to read link up,
> rather than just reading link up via the DWC PCIE_PORT_DEBUG1 register?

This is a very good question which we tried but made real products
suffer from it for a long time, and finally we found the reason and
discarded it.

Quoted from DWC databook, section 8.2.3 AXI Bridge Initialization,
Clocking and Reset:

"In RC Mode, your AXI application must not generate any MEM or I/O
requests, until the host software has enabled the Memory Space Enable
(MSE), and IO Space Enable (ISE) bits respectively. Your RC application
should not generate CFG requests until it has confirmed that the link is
up by sampling the smlh_link_up and rdlh_link_up outputs."

Quoted from DWC databook, section 5.50 SII: Debug Signals
"[36]: smlh_link_up: LTSSM reports PHY link up or LTSSM is in
Loopback.Active for Loopback Master" which refers to
PCIE_PORT_DEBUG1_LINK_UP per code.

The timing in dwc core is drving smlh_link_up->L0->rdlh_link_up->FC
init(a fixed delay) from IC simulation when linking up.

The dw_pcie_link_up() wasn't reliably work as expected by massive test.
The problem is clear from our ASIC view, that cxpl_debug_info from DWC
core is missing rdlh_link_up. cxpl_debug_info[32:63] is indentical to
PCIE_PORT_DEBUG1, So reading PCIE_PORT_DEBUG1 and check
smlh_link_up isn't enough.

The problem was introduced by commit 1 and fixed by commit 2 but not to
the end. And finally commit 3 rename the register but not fix anything.

It was broken from the first time. Any dwc controllers should not be use
the buggy default method to check link up state from our view.
So this's the whole story for it, which may help you understand the
indeed problem and why we reinvent rockchip_pcie_link_up() here.

[1]. commit dac29e6c5460 ("PCI: designware: Add default link up check if
     sub-driver doesn't override")

[2]. commit 01c076732e82 ("PCI: designware: Check LTSSM training bit
     before deciding link is up")

[3]. commit 60ef4b072ba0 ("PCI: dwc: imx6: Share PHY debug register
     definitions")


> 
> 
> Kind regards,
> Niklas
>
Niklas Cassel April 9, 2025, 9:19 a.m. UTC | #3
Hello Shawn,

On Wed, Apr 09, 2025 at 05:09:38PM +0800, Shawn Lin wrote:
> 在 2025/04/09 星期三 16:30, Niklas Cassel 写道:
> > On Wed, Apr 09, 2025 at 02:40:33PM +0800, Shawn Lin wrote:
> > 
> > Is there any advantage of using a rockchip specific way to read link up,
> > rather than just reading link up via the DWC PCIE_PORT_DEBUG1 register?
> 
> This is a very good question which we tried but made real products
> suffer from it for a long time, and finally we found the reason and
> discarded it.
> 
> Quoted from DWC databook, section 8.2.3 AXI Bridge Initialization,
> Clocking and Reset:
> 
> "In RC Mode, your AXI application must not generate any MEM or I/O
> requests, until the host software has enabled the Memory Space Enable
> (MSE), and IO Space Enable (ISE) bits respectively. Your RC application
> should not generate CFG requests until it has confirmed that the link is
> up by sampling the smlh_link_up and rdlh_link_up outputs."
> 
> Quoted from DWC databook, section 5.50 SII: Debug Signals
> "[36]: smlh_link_up: LTSSM reports PHY link up or LTSSM is in
> Loopback.Active for Loopback Master" which refers to
> PCIE_PORT_DEBUG1_LINK_UP per code.
> 
> The timing in dwc core is drving smlh_link_up->L0->rdlh_link_up->FC
> init(a fixed delay) from IC simulation when linking up.
> 
> The dw_pcie_link_up() wasn't reliably work as expected by massive test.
> The problem is clear from our ASIC view, that cxpl_debug_info from DWC
> core is missing rdlh_link_up. cxpl_debug_info[32:63] is indentical to
> PCIE_PORT_DEBUG1, So reading PCIE_PORT_DEBUG1 and check
> smlh_link_up isn't enough.
> 
> The problem was introduced by commit 1 and fixed by commit 2 but not to
> the end. And finally commit 3 rename the register but not fix anything.
> 
> It was broken from the first time. Any dwc controllers should not be use
> the buggy default method to check link up state from our view.
> So this's the whole story for it, which may help you understand the
> indeed problem and why we reinvent rockchip_pcie_link_up() here.
> 
> [1]. commit dac29e6c5460 ("PCI: designware: Add default link up check if
>     sub-driver doesn't override")
> 
> [2]. commit 01c076732e82 ("PCI: designware: Check LTSSM training bit
>     before deciding link is up")
> 
> [3]. commit 60ef4b072ba0 ("PCI: dwc: imx6: Share PHY debug register
>     definitions")

Thank you for the detailed answer.

It seems like we should really add a warning and a comment in
dw_pcie_link_up(), so that others don't get hit by this hard to debug issue!

(Especially since dw_pcie_link_up() was added by someone with a @synopsys.com
email).


Kind regards,
Niklas
Shawn Lin April 9, 2025, 9:41 a.m. UTC | #4
在 2025/04/09 星期三 17:19, Niklas Cassel 写道:
> Hello Shawn,
> 
> On Wed, Apr 09, 2025 at 05:09:38PM +0800, Shawn Lin wrote:
>> 在 2025/04/09 星期三 16:30, Niklas Cassel 写道:
>>> On Wed, Apr 09, 2025 at 02:40:33PM +0800, Shawn Lin wrote:
>>>
>>> Is there any advantage of using a rockchip specific way to read link up,
>>> rather than just reading link up via the DWC PCIE_PORT_DEBUG1 register?
>>
>> This is a very good question which we tried but made real products
>> suffer from it for a long time, and finally we found the reason and
>> discarded it.
>>
>> Quoted from DWC databook, section 8.2.3 AXI Bridge Initialization,
>> Clocking and Reset:
>>
>> "In RC Mode, your AXI application must not generate any MEM or I/O
>> requests, until the host software has enabled the Memory Space Enable
>> (MSE), and IO Space Enable (ISE) bits respectively. Your RC application
>> should not generate CFG requests until it has confirmed that the link is
>> up by sampling the smlh_link_up and rdlh_link_up outputs."
>>
>> Quoted from DWC databook, section 5.50 SII: Debug Signals
>> "[36]: smlh_link_up: LTSSM reports PHY link up or LTSSM is in
>> Loopback.Active for Loopback Master" which refers to
>> PCIE_PORT_DEBUG1_LINK_UP per code.
>>
>> The timing in dwc core is drving smlh_link_up->L0->rdlh_link_up->FC
>> init(a fixed delay) from IC simulation when linking up.
>>
>> The dw_pcie_link_up() wasn't reliably work as expected by massive test.
>> The problem is clear from our ASIC view, that cxpl_debug_info from DWC
>> core is missing rdlh_link_up. cxpl_debug_info[32:63] is indentical to
>> PCIE_PORT_DEBUG1, So reading PCIE_PORT_DEBUG1 and check
>> smlh_link_up isn't enough.
>>
>> The problem was introduced by commit 1 and fixed by commit 2 but not to
>> the end. And finally commit 3 rename the register but not fix anything.
>>
>> It was broken from the first time. Any dwc controllers should not be use
>> the buggy default method to check link up state from our view.
>> So this's the whole story for it, which may help you understand the
>> indeed problem and why we reinvent rockchip_pcie_link_up() here.
>>
>> [1]. commit dac29e6c5460 ("PCI: designware: Add default link up check if
>>      sub-driver doesn't override")
>>
>> [2]. commit 01c076732e82 ("PCI: designware: Check LTSSM training bit
>>      before deciding link is up")
>>
>> [3]. commit 60ef4b072ba0 ("PCI: dwc: imx6: Share PHY debug register
>>      definitions")
> 
> Thank you for the detailed answer.
> 
> It seems like we should really add a warning and a comment in
> dw_pcie_link_up(), so that others don't get hit by this hard to debug issue!
> 
> (Especially since dw_pcie_link_up() was added by someone with a @synopsys.com
> email).
> 

Yep, will do it with another patch.

> 
> Kind regards,
> Niklas
>
Manivannan Sadhasivam April 13, 2025, 2:24 p.m. UTC | #5
On Wed, Apr 09, 2025 at 11:19:03AM +0200, Niklas Cassel wrote:
> Hello Shawn,
> 
> On Wed, Apr 09, 2025 at 05:09:38PM +0800, Shawn Lin wrote:
> > 在 2025/04/09 星期三 16:30, Niklas Cassel 写道:
> > > On Wed, Apr 09, 2025 at 02:40:33PM +0800, Shawn Lin wrote:
> > > 
> > > Is there any advantage of using a rockchip specific way to read link up,
> > > rather than just reading link up via the DWC PCIE_PORT_DEBUG1 register?
> > 
> > This is a very good question which we tried but made real products
> > suffer from it for a long time, and finally we found the reason and
> > discarded it.
> > 
> > Quoted from DWC databook, section 8.2.3 AXI Bridge Initialization,
> > Clocking and Reset:
> > 
> > "In RC Mode, your AXI application must not generate any MEM or I/O
> > requests, until the host software has enabled the Memory Space Enable
> > (MSE), and IO Space Enable (ISE) bits respectively. Your RC application
> > should not generate CFG requests until it has confirmed that the link is
> > up by sampling the smlh_link_up and rdlh_link_up outputs."
> > 
> > Quoted from DWC databook, section 5.50 SII: Debug Signals
> > "[36]: smlh_link_up: LTSSM reports PHY link up or LTSSM is in
> > Loopback.Active for Loopback Master" which refers to
> > PCIE_PORT_DEBUG1_LINK_UP per code.
> > 
> > The timing in dwc core is drving smlh_link_up->L0->rdlh_link_up->FC
> > init(a fixed delay) from IC simulation when linking up.
> > 
> > The dw_pcie_link_up() wasn't reliably work as expected by massive test.
> > The problem is clear from our ASIC view, that cxpl_debug_info from DWC
> > core is missing rdlh_link_up. cxpl_debug_info[32:63] is indentical to
> > PCIE_PORT_DEBUG1, So reading PCIE_PORT_DEBUG1 and check
> > smlh_link_up isn't enough.
> > 
> > The problem was introduced by commit 1 and fixed by commit 2 but not to
> > the end. And finally commit 3 rename the register but not fix anything.
> > 
> > It was broken from the first time. Any dwc controllers should not be use
> > the buggy default method to check link up state from our view.
> > So this's the whole story for it, which may help you understand the
> > indeed problem and why we reinvent rockchip_pcie_link_up() here.
> > 
> > [1]. commit dac29e6c5460 ("PCI: designware: Add default link up check if
> >     sub-driver doesn't override")
> > 
> > [2]. commit 01c076732e82 ("PCI: designware: Check LTSSM training bit
> >     before deciding link is up")
> > 
> > [3]. commit 60ef4b072ba0 ("PCI: dwc: imx6: Share PHY debug register
> >     definitions")
> 
> Thank you for the detailed answer.
> 
> It seems like we should really add a warning and a comment in
> dw_pcie_link_up(), so that others don't get hit by this hard to debug issue!
> 

Right. But I'm also wondering if we should use the 'Data Link Layer Link Active'
bit in PCI Express Capability for checking link up. Qcom driver has been using
it from the start and there are no reported issues. We could add this as the
first fallback if the link_up callback is not provided.

- Mani
Niklas Cassel April 14, 2025, 10:06 a.m. UTC | #6
Hello Mani,

On Sun, Apr 13, 2025 at 07:54:28PM +0530, Manivannan Sadhasivam wrote:
> On Wed, Apr 09, 2025 at 11:19:03AM +0200, Niklas Cassel wrote:
> > 
> > It seems like we should really add a warning and a comment in
> > dw_pcie_link_up(), so that others don't get hit by this hard to debug issue!
> > 
> 
> Right. But I'm also wondering if we should use the 'Data Link Layer Link Active'
> bit in PCI Express Capability for checking link up. Qcom driver has been using
> it from the start and there are no reported issues. We could add this as the
> first fallback if the link_up callback is not provided.

Sounds like a good idea, but from looking at:
7.5.3.6 Link Capabilities Register (Offset 0Ch)

"
Data Link Layer Link Active Reporting Capable - For a Downstream Port,
this bit must be hardwired to 1b if the component supports the optional
capability of reporting the DL_Active state of the Data Link Control and
Management State Machine. For a hot-plug capable Downstream Port (as
indicated by the Hot-Plug Capable bit of the Slot Capabilities Register)
or a Downstream Port that supports Link speeds greater than 5.0 GT/s,
this bit must be hardwired to 1b.

For Upstream Ports and components that do not support this optional
capability, this bit must be hardwired to 0b.
"

It sounds like the the 'Data Link Layer Link Active' bit is optional,
or at least optional for Gen1 and Gen2.


Kind regards,
Niklas
Manivannan Sadhasivam April 14, 2025, 11:15 a.m. UTC | #7
On Mon, Apr 14, 2025 at 12:06:09PM +0200, Niklas Cassel wrote:
> Hello Mani,
> 
> On Sun, Apr 13, 2025 at 07:54:28PM +0530, Manivannan Sadhasivam wrote:
> > On Wed, Apr 09, 2025 at 11:19:03AM +0200, Niklas Cassel wrote:
> > > 
> > > It seems like we should really add a warning and a comment in
> > > dw_pcie_link_up(), so that others don't get hit by this hard to debug issue!
> > > 
> > 
> > Right. But I'm also wondering if we should use the 'Data Link Layer Link Active'
> > bit in PCI Express Capability for checking link up. Qcom driver has been using
> > it from the start and there are no reported issues. We could add this as the
> > first fallback if the link_up callback is not provided.
> 
> Sounds like a good idea, but from looking at:
> 7.5.3.6 Link Capabilities Register (Offset 0Ch)
> 
> "
> Data Link Layer Link Active Reporting Capable - For a Downstream Port,
> this bit must be hardwired to 1b if the component supports the optional
> capability of reporting the DL_Active state of the Data Link Control and
> Management State Machine. For a hot-plug capable Downstream Port (as
> indicated by the Hot-Plug Capable bit of the Slot Capabilities Register)
> or a Downstream Port that supports Link speeds greater than 5.0 GT/s,
> this bit must be hardwired to 1b.
> 
> For Upstream Ports and components that do not support this optional
> capability, this bit must be hardwired to 0b.
> "
> 
> It sounds like the the 'Data Link Layer Link Active' bit is optional,
> or at least optional for Gen1 and Gen2.

Yeah, we should avoid relying on it. Thanks for digging through the spec :)
Really appreciated!

- Mani
diff mbox series

Patch

diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
index c624b7e..21dc99c 100644
--- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
+++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
@@ -44,7 +44,6 @@ 
 #define PCIE_LINKUP			(PCIE_SMLH_LINKUP | PCIE_RDLH_LINKUP)
 #define PCIE_RDLH_LINK_UP_CHGED		BIT(1)
 #define PCIE_LINK_REQ_RST_NOT_INT	BIT(2)
-#define PCIE_L0S_ENTRY			0x11
 #define PCIE_CLIENT_GENERAL_CONTROL	0x0
 #define PCIE_CLIENT_INTR_STATUS_LEGACY	0x8
 #define PCIE_CLIENT_INTR_MASK_LEGACY	0x1c
@@ -177,8 +176,7 @@  static int rockchip_pcie_link_up(struct dw_pcie *pci)
 	struct rockchip_pcie *rockchip = to_rockchip_pcie(pci);
 	u32 val = rockchip_pcie_get_ltssm(rockchip);
 
-	if ((val & PCIE_LINKUP) == PCIE_LINKUP &&
-	    (val & PCIE_LTSSM_STATUS_MASK) == PCIE_L0S_ENTRY)
+	if ((val & PCIE_LINKUP) == PCIE_LINKUP)
 		return 1;
 
 	return 0;