diff mbox series

[v7] PCI/ASPM: Update LTR threshold based upon reported max latencies

Message ID 1663315719-21563-1-git-send-email-quic_krichai@quicinc.com (mailing list archive)
State Not Applicable
Headers show
Series [v7] PCI/ASPM: Update LTR threshold based upon reported max latencies | expand

Commit Message

Krishna chaitanya chundru Sept. 16, 2022, 8:08 a.m. UTC
In ASPM driver, LTR threshold scale and value are updated based on
tcommon_mode and t_poweron values. In Kioxia NVMe L1.2 is failing due to
LTR threshold scale and value are greater values than max snoop/non-snoop
value.

Based on PCIe r4.1, sec 5.5.1, L1.2 substate must be entered when
reported snoop/no-snoop values is greater than or equal to
LTR_L1.2_THRESHOLD value.

Signed-off-by: Prasad Malisetty  <quic_pmaliset@quicinc.com>
Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
---

I am taking this patch forward as prasad is no more working with our org.
changes since v6:
	- Rebasing with pci/next.
changes since v5:
	- no changes, just reposting as standalone patch instead of reply to
	  previous patch.
Changes since v4:
	- Replaced conditional statements with min and max.
changes since v3:
	- Changed the logic to include this condition "snoop/nosnoop
	  latencies are not equal to zero and lower than LTR_L1.2_THRESHOLD"
Changes since v2:
	- Replaced LTRME logic with max snoop/no-snoop latencies check.
Changes since v1:
	- Added missing variable declaration in v1 patch
---
 drivers/pci/pcie/aspm.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

Comments

Krishna chaitanya chundru Sept. 26, 2022, 4:13 a.m. UTC | #1
On 9/16/2022 1:38 PM, Krishna chaitanya chundru wrote:
> In ASPM driver, LTR threshold scale and value are updated based on
> tcommon_mode and t_poweron values. In Kioxia NVMe L1.2 is failing due to
> LTR threshold scale and value are greater values than max snoop/non-snoop
> value.
>
> Based on PCIe r4.1, sec 5.5.1, L1.2 substate must be entered when
> reported snoop/no-snoop values is greater than or equal to
> LTR_L1.2_THRESHOLD value.
>
> Signed-off-by: Prasad Malisetty  <quic_pmaliset@quicinc.com>
> Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
> Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

Ping for updates.

There are no outstanding comments for this patch and also it is not 
being picked

up in upstream.

> ---
>
> I am taking this patch forward as prasad is no more working with our org.
> changes since v6:
> 	- Rebasing with pci/next.
> changes since v5:
> 	- no changes, just reposting as standalone patch instead of reply to
> 	  previous patch.
> Changes since v4:
> 	- Replaced conditional statements with min and max.
> changes since v3:
> 	- Changed the logic to include this condition "snoop/nosnoop
> 	  latencies are not equal to zero and lower than LTR_L1.2_THRESHOLD"
> Changes since v2:
> 	- Replaced LTRME logic with max snoop/no-snoop latencies check.
> Changes since v1:
> 	- Added missing variable declaration in v1 patch
> ---
>   drivers/pci/pcie/aspm.c | 30 ++++++++++++++++++++++++++++++
>   1 file changed, 30 insertions(+)
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 928bf64..2bb8470 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -486,13 +486,35 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
>   {
>   	struct pci_dev *child = link->downstream, *parent = link->pdev;
>   	u32 val1, val2, scale1, scale2;
> +	u32 max_val, max_scale, max_snp_scale, max_snp_val, max_nsnp_scale, max_nsnp_val;
>   	u32 t_common_mode, t_power_on, l1_2_threshold, scale, value;
>   	u32 ctl1 = 0, ctl2 = 0;
>   	u32 pctl1, pctl2, cctl1, cctl2;
> +	u16 ltr;
> +	u16 max_snoop_lat, max_nosnoop_lat;
>   
>   	if (!(link->aspm_support & ASPM_STATE_L1_2_MASK))
>   		return;
>   
> +	ltr = pci_find_ext_capability(child, PCI_EXT_CAP_ID_LTR);
> +	if (!ltr)
> +		return;
> +
> +	pci_read_config_word(child, ltr + PCI_LTR_MAX_SNOOP_LAT, &max_snoop_lat);
> +	pci_read_config_word(child, ltr + PCI_LTR_MAX_NOSNOOP_LAT, &max_nosnoop_lat);
> +
> +	max_snp_scale = (max_snoop_lat & PCI_LTR_SCALE_MASK) >> PCI_LTR_SCALE_SHIFT;
> +	max_snp_val = max_snoop_lat & PCI_LTR_VALUE_MASK;
> +
> +	max_nsnp_scale = (max_nosnoop_lat & PCI_LTR_SCALE_MASK) >> PCI_LTR_SCALE_SHIFT;
> +	max_nsnp_val = max_nosnoop_lat & PCI_LTR_VALUE_MASK;
> +
> +	/* choose the greater max scale value between snoop and no snoop value*/
> +	max_scale = max(max_snp_scale, max_nsnp_scale);
> +
> +	/* choose the greater max value between snoop and no snoop scales */
> +	max_val = max(max_snp_val, max_nsnp_val);
> +
>   	/* Choose the greater of the two Port Common_Mode_Restore_Times */
>   	val1 = (parent_l1ss_cap & PCI_L1SS_CAP_CM_RESTORE_TIME) >> 8;
>   	val2 = (child_l1ss_cap & PCI_L1SS_CAP_CM_RESTORE_TIME) >> 8;
> @@ -525,6 +547,14 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
>   	 */
>   	l1_2_threshold = 2 + 4 + t_common_mode + t_power_on;
>   	encode_l12_threshold(l1_2_threshold, &scale, &value);
> +
> +	/*
> +	 * Based on PCIe r4.1, sec 5.5.1, L1.2 substate must be entered when reported
> +	 * snoop/no-snoop values are greater than or equal to LTR_L1.2_THRESHOLD value.
> +	 */
> +	scale = min(scale, max_scale);
> +	value = min(value, max_val);
> +
>   	ctl1 |= t_common_mode << 8 | scale << 29 | value << 16;
>   
>   	/* Some broken devices only support dword access to L1 SS */
Bjorn Helgaas Sept. 30, 2022, 9:29 p.m. UTC | #2
On Fri, Sep 16, 2022 at 01:38:37PM +0530, Krishna chaitanya chundru wrote:
> In ASPM driver, LTR threshold scale and value are updated based on
> tcommon_mode and t_poweron values. In Kioxia NVMe L1.2 is failing due to
> LTR threshold scale and value are greater values than max snoop/non-snoop
> value.
>
> Based on PCIe r4.1, sec 5.5.1, L1.2 substate must be entered when
> reported snoop/no-snoop values is greater than or equal to
> LTR_L1.2_THRESHOLD value.

I find LTR configuration pretty much impenetrable, but this doesn't
seem right to me.  If I understand correctly, LTR messages are a way
for endpoints to report their latency requirements, i.e., sort of a
dynamic version of "Endpoint L1 Acceptable Latency".

As you said, a comparison between the most recent LTR value and
LTR_L1.2_THESHOLD determines whether the link goes to L1.1 or L1.2.

So I assume LTR_L1.2_THESHOLD must be the minimum time required to
transition the link from L0 to L1.2 and back to L0, which includes
T_POWER_OFF, T_L1.2, T_POWER_ON, and T_COMMONMODE (sec 5.5.3.3.1,
5.5.5).

If the device can tolerate at least that much time, i.e., if the
LTR value >= LTR_L1.2_THESHOLD, the link should go to L1.2.

I'm not a hardware person, but I don't see how LTR_L1.2_THESHOLD can
*depend* on the LTR max latency values.  The LTR max latencies depend
on the endpoint.  I think LTR_L1.2_THESHOLD depends on the circuit
design of both ends of the link.

More comments below, but they're only pertinent if we can figure out
that this is the correct approach.

Bjorn

> Signed-off-by: Prasad Malisetty  <quic_pmaliset@quicinc.com>
> Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
> Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> ---
> 
> I am taking this patch forward as prasad is no more working with our org.
> changes since v6:
> 	- Rebasing with pci/next.

It's best if you base patches on my "main" branch (not "next"), which
is typically -rc1, unless they depend on something that's already been
merged.

In the patch below, rewrap so everything still fits in 80 columns like
the rest of the file.

Update citations to current spec version (r6.0).  It looks like the
section numbers are the same.

> changes since v5:
> 	- no changes, just reposting as standalone patch instead of reply to
> 	  previous patch.
> Changes since v4:
> 	- Replaced conditional statements with min and max.
> changes since v3:
> 	- Changed the logic to include this condition "snoop/nosnoop
> 	  latencies are not equal to zero and lower than LTR_L1.2_THRESHOLD"
> Changes since v2:
> 	- Replaced LTRME logic with max snoop/no-snoop latencies check.
> Changes since v1:
> 	- Added missing variable declaration in v1 patch
> ---
>  drivers/pci/pcie/aspm.c | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 928bf64..2bb8470 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -486,13 +486,35 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
>  {
>  	struct pci_dev *child = link->downstream, *parent = link->pdev;
>  	u32 val1, val2, scale1, scale2;
> +	u32 max_val, max_scale, max_snp_scale, max_snp_val, max_nsnp_scale, max_nsnp_val;
>  	u32 t_common_mode, t_power_on, l1_2_threshold, scale, value;
>  	u32 ctl1 = 0, ctl2 = 0;
>  	u32 pctl1, pctl2, cctl1, cctl2;
> +	u16 ltr;
> +	u16 max_snoop_lat, max_nosnoop_lat;
>  
>  	if (!(link->aspm_support & ASPM_STATE_L1_2_MASK))
>  		return;
>  
> +	ltr = pci_find_ext_capability(child, PCI_EXT_CAP_ID_LTR);
> +	if (!ltr)
> +		return;
> +
> +	pci_read_config_word(child, ltr + PCI_LTR_MAX_SNOOP_LAT, &max_snoop_lat);
> +	pci_read_config_word(child, ltr + PCI_LTR_MAX_NOSNOOP_LAT, &max_nosnoop_lat);
> +
> +	max_snp_scale = (max_snoop_lat & PCI_LTR_SCALE_MASK) >> PCI_LTR_SCALE_SHIFT;
> +	max_snp_val = max_snoop_lat & PCI_LTR_VALUE_MASK;
> +
> +	max_nsnp_scale = (max_nosnoop_lat & PCI_LTR_SCALE_MASK) >> PCI_LTR_SCALE_SHIFT;
> +	max_nsnp_val = max_nosnoop_lat & PCI_LTR_VALUE_MASK;
> +
> +	/* choose the greater max scale value between snoop and no snoop value*/

Add space before */

Capitalize comments to match style of file.

> +	max_scale = max(max_snp_scale, max_nsnp_scale);
> +
> +	/* choose the greater max value between snoop and no snoop scales */
> +	max_val = max(max_snp_val, max_nsnp_val);
> +
>  	/* Choose the greater of the two Port Common_Mode_Restore_Times */
>  	val1 = (parent_l1ss_cap & PCI_L1SS_CAP_CM_RESTORE_TIME) >> 8;
>  	val2 = (child_l1ss_cap & PCI_L1SS_CAP_CM_RESTORE_TIME) >> 8;
> @@ -525,6 +547,14 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
>  	 */
>  	l1_2_threshold = 2 + 4 + t_common_mode + t_power_on;
>  	encode_l12_threshold(l1_2_threshold, &scale, &value);
> +
> +	/*
> +	 * Based on PCIe r4.1, sec 5.5.1, L1.2 substate must be entered when reported
> +	 * snoop/no-snoop values are greater than or equal to LTR_L1.2_THRESHOLD value.
> +	 */
> +	scale = min(scale, max_scale);
> +	value = min(value, max_val);

I don't think this computes the right thing.  If we have this:

  scale = 001b (x 32ns)
  value = 1024
  max_scale = 010b (x 1024ns)
  max_value = 1

Then the latencies are both 1024ns, so I would expect a min() of
1024ns.  But computing min() separately for the scale and value will
give "scale = 001b" (x 32ns) and "value = 1", for a latency of 32ns.

I think you would need to compare the values in ns, i.e.,
"l1_2_threshold".

I assume the max() computations above have a similar issue, but I
didn't work it out.

But I'm not convinced that this is the right approach to begin with.

>  	ctl1 |= t_common_mode << 8 | scale << 29 | value << 16;
>  
>  	/* Some broken devices only support dword access to L1 SS */
> -- 
> 2.7.4
>
Matthias Kaehlcke Nov. 18, 2022, 8:27 p.m. UTC | #3
On Fri, Sep 16, 2022 at 01:38:37PM +0530, Krishna chaitanya chundru wrote:
> In ASPM driver, LTR threshold scale and value are updated based on
> tcommon_mode and t_poweron values. In Kioxia NVMe L1.2 is failing due to
> LTR threshold scale and value are greater values than max snoop/non-snoop
> value.
> 
> Based on PCIe r4.1, sec 5.5.1, L1.2 substate must be entered when
> reported snoop/no-snoop values is greater than or equal to
> LTR_L1.2_THRESHOLD value.
> 
> Signed-off-by: Prasad Malisetty  <quic_pmaliset@quicinc.com>
> Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
> Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> ---
> 
> I am taking this patch forward as prasad is no more working with our org.
> changes since v6:
> 	- Rebasing with pci/next.
> changes since v5:
> 	- no changes, just reposting as standalone patch instead of reply to
> 	  previous patch.
> Changes since v4:
> 	- Replaced conditional statements with min and max.
> changes since v3:
> 	- Changed the logic to include this condition "snoop/nosnoop
> 	  latencies are not equal to zero and lower than LTR_L1.2_THRESHOLD"
> Changes since v2:
> 	- Replaced LTRME logic with max snoop/no-snoop latencies check.
> Changes since v1:
> 	- Added missing variable declaration in v1 patch
> ---
>  drivers/pci/pcie/aspm.c | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 928bf64..2bb8470 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -486,13 +486,35 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
>  {
>  	struct pci_dev *child = link->downstream, *parent = link->pdev;
>  	u32 val1, val2, scale1, scale2;
> +	u32 max_val, max_scale, max_snp_scale, max_snp_val, max_nsnp_scale, max_nsnp_val;
>  	u32 t_common_mode, t_power_on, l1_2_threshold, scale, value;
>  	u32 ctl1 = 0, ctl2 = 0;
>  	u32 pctl1, pctl2, cctl1, cctl2;
> +	u16 ltr;
> +	u16 max_snoop_lat, max_nosnoop_lat;
>  
>  	if (!(link->aspm_support & ASPM_STATE_L1_2_MASK))
>  		return;
>  
> +	ltr = pci_find_ext_capability(child, PCI_EXT_CAP_ID_LTR);
> +	if (!ltr)
> +		return;
> +
> +	pci_read_config_word(child, ltr + PCI_LTR_MAX_SNOOP_LAT, &max_snoop_lat);
> +	pci_read_config_word(child, ltr + PCI_LTR_MAX_NOSNOOP_LAT, &max_nosnoop_lat);
> +
> +	max_snp_scale = (max_snoop_lat & PCI_LTR_SCALE_MASK) >> PCI_LTR_SCALE_SHIFT;
> +	max_snp_val = max_snoop_lat & PCI_LTR_VALUE_MASK;
> +
> +	max_nsnp_scale = (max_nosnoop_lat & PCI_LTR_SCALE_MASK) >> PCI_LTR_SCALE_SHIFT;
> +	max_nsnp_val = max_nosnoop_lat & PCI_LTR_VALUE_MASK;
> +
> +	/* choose the greater max scale value between snoop and no snoop value*/
> +	max_scale = max(max_snp_scale, max_nsnp_scale);
> +
> +	/* choose the greater max value between snoop and no snoop scales */
> +	max_val = max(max_snp_val, max_nsnp_val);

I suppose the goal is to dermine the max of snoop/no-snoop latency. Taking the
max of scale and value separately won't yield the correct result when the scale
is different for snoop vs no-snoop. Instead you need to convert scale + value
to ns and determine the max of that (as done for 't_power_on').

> +
>  	/* Choose the greater of the two Port Common_Mode_Restore_Times */
>  	val1 = (parent_l1ss_cap & PCI_L1SS_CAP_CM_RESTORE_TIME) >> 8;
>  	val2 = (child_l1ss_cap & PCI_L1SS_CAP_CM_RESTORE_TIME) >> 8;
> @@ -525,6 +547,14 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
>  	 */
>  	l1_2_threshold = 2 + 4 + t_common_mode + t_power_on;
>  	encode_l12_threshold(l1_2_threshold, &scale, &value);
> +
> +	/*
> +	 * Based on PCIe r4.1, sec 5.5.1, L1.2 substate must be entered when reported
> +	 * snoop/no-snoop values are greater than or equal to LTR_L1.2_THRESHOLD value.
> +	 */
> +	scale = min(scale, max_scale);
> +	value = min(value, max_val);

This is wrong for the same reason as above, as Bjorn also pointed out in an earlier
comment.

Even with the values calculated correctly I'm not sure it this is the right
thing to do, but I know little about LTR. In any case this patch reduces
power consumption of the Kioxia NVMe significantly, essentially by
programming an LTR_L1.2_THRESHOLD of 0ns instead of 86ns.

I just came across Bjorn's recent 'PCI/ASPM: Fix L1SS issues' series [1] that
corrects the LTR_L1.2_THRESHOLD calculation and landed upstream, unfortunately
it doesn't magically fix the issues with the Kioxia NVMe :(

[1] https://lore.kernel.org/lkml/ca836507-50ca-13bc-ef88-7f69b1333c99@linux.intel.com/T/#m3f223b7e582052159de7538bcaf2bea6d2071472
Manivannan Sadhasivam Dec. 5, 2022, 11:25 a.m. UTC | #4
On Fri, Sep 16, 2022 at 01:38:37PM +0530, Krishna chaitanya chundru wrote:
> In ASPM driver, LTR threshold scale and value are updated based on
> tcommon_mode and t_poweron values. In Kioxia NVMe L1.2 is failing due to
> LTR threshold scale and value are greater values than max snoop/non-snoop
> value.
> 
> Based on PCIe r4.1, sec 5.5.1, L1.2 substate must be entered when
> reported snoop/no-snoop values is greater than or equal to
> LTR_L1.2_THRESHOLD value.
> 
> Signed-off-by: Prasad Malisetty  <quic_pmaliset@quicinc.com>
> Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
> Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

I take my Ack back... Sorry that I did not look into this patch closer.

> ---
> 
> I am taking this patch forward as prasad is no more working with our org.
> changes since v6:
> 	- Rebasing with pci/next.
> changes since v5:
> 	- no changes, just reposting as standalone patch instead of reply to
> 	  previous patch.
> Changes since v4:
> 	- Replaced conditional statements with min and max.
> changes since v3:
> 	- Changed the logic to include this condition "snoop/nosnoop
> 	  latencies are not equal to zero and lower than LTR_L1.2_THRESHOLD"
> Changes since v2:
> 	- Replaced LTRME logic with max snoop/no-snoop latencies check.
> Changes since v1:
> 	- Added missing variable declaration in v1 patch
> ---
>  drivers/pci/pcie/aspm.c | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 928bf64..2bb8470 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -486,13 +486,35 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
>  {
>  	struct pci_dev *child = link->downstream, *parent = link->pdev;
>  	u32 val1, val2, scale1, scale2;
> +	u32 max_val, max_scale, max_snp_scale, max_snp_val, max_nsnp_scale, max_nsnp_val;
>  	u32 t_common_mode, t_power_on, l1_2_threshold, scale, value;
>  	u32 ctl1 = 0, ctl2 = 0;
>  	u32 pctl1, pctl2, cctl1, cctl2;
> +	u16 ltr;
> +	u16 max_snoop_lat, max_nosnoop_lat;
>  
>  	if (!(link->aspm_support & ASPM_STATE_L1_2_MASK))
>  		return;
>  
> +	ltr = pci_find_ext_capability(child, PCI_EXT_CAP_ID_LTR);
> +	if (!ltr)
> +		return;
> +
> +	pci_read_config_word(child, ltr + PCI_LTR_MAX_SNOOP_LAT, &max_snoop_lat);
> +	pci_read_config_word(child, ltr + PCI_LTR_MAX_NOSNOOP_LAT, &max_nosnoop_lat);
> +
> +	max_snp_scale = (max_snoop_lat & PCI_LTR_SCALE_MASK) >> PCI_LTR_SCALE_SHIFT;
> +	max_snp_val = max_snoop_lat & PCI_LTR_VALUE_MASK;
> +
> +	max_nsnp_scale = (max_nosnoop_lat & PCI_LTR_SCALE_MASK) >> PCI_LTR_SCALE_SHIFT;
> +	max_nsnp_val = max_nosnoop_lat & PCI_LTR_VALUE_MASK;
> +
> +	/* choose the greater max scale value between snoop and no snoop value*/
> +	max_scale = max(max_snp_scale, max_nsnp_scale);
> +
> +	/* choose the greater max value between snoop and no snoop scales */
> +	max_val = max(max_snp_val, max_nsnp_val);
> +
>  	/* Choose the greater of the two Port Common_Mode_Restore_Times */
>  	val1 = (parent_l1ss_cap & PCI_L1SS_CAP_CM_RESTORE_TIME) >> 8;
>  	val2 = (child_l1ss_cap & PCI_L1SS_CAP_CM_RESTORE_TIME) >> 8;
> @@ -525,6 +547,14 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
>  	 */
>  	l1_2_threshold = 2 + 4 + t_common_mode + t_power_on;
>  	encode_l12_threshold(l1_2_threshold, &scale, &value);
> +
> +	/*
> +	 * Based on PCIe r4.1, sec 5.5.1, L1.2 substate must be entered when reported
> +	 * snoop/no-snoop values are greater than or equal to LTR_L1.2_THRESHOLD value.

Apart from the bug in calculating the LTR_Threshold as reported by Matthias
and Bjorn, I'm wondering if we are covering up for the device firmware issue.

As per section 6.18, if the device reports snoop/no-snoop scale/value as 0, then
it implies that the device won't tolerate any additional delays from the host.

In that case, how can we allow the link to go into L1.2 since that would incur
high delay compared to L1.1?

Thanks,
Mani

> +	 */
> +	scale = min(scale, max_scale);
> +	value = min(value, max_val);
> +
>  	ctl1 |= t_common_mode << 8 | scale << 29 | value << 16;
>  
>  	/* Some broken devices only support dword access to L1 SS */
> -- 
> 2.7.4
>
Matthias Kaehlcke Dec. 5, 2022, 6:18 p.m. UTC | #5
On Mon, Dec 05, 2022 at 04:55:00PM +0530, Manivannan Sadhasivam wrote:
> On Fri, Sep 16, 2022 at 01:38:37PM +0530, Krishna chaitanya chundru wrote:
> > In ASPM driver, LTR threshold scale and value are updated based on
> > tcommon_mode and t_poweron values. In Kioxia NVMe L1.2 is failing due to
> > LTR threshold scale and value are greater values than max snoop/non-snoop
> > value.
> > 
> > Based on PCIe r4.1, sec 5.5.1, L1.2 substate must be entered when
> > reported snoop/no-snoop values is greater than or equal to
> > LTR_L1.2_THRESHOLD value.
> > 
> > Signed-off-by: Prasad Malisetty  <quic_pmaliset@quicinc.com>
> > Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
> > Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> 
> I take my Ack back... Sorry that I did not look into this patch closer.
> 
> > ---
> > 
> > I am taking this patch forward as prasad is no more working with our org.
> > changes since v6:
> > 	- Rebasing with pci/next.
> > changes since v5:
> > 	- no changes, just reposting as standalone patch instead of reply to
> > 	  previous patch.
> > Changes since v4:
> > 	- Replaced conditional statements with min and max.
> > changes since v3:
> > 	- Changed the logic to include this condition "snoop/nosnoop
> > 	  latencies are not equal to zero and lower than LTR_L1.2_THRESHOLD"
> > Changes since v2:
> > 	- Replaced LTRME logic with max snoop/no-snoop latencies check.
> > Changes since v1:
> > 	- Added missing variable declaration in v1 patch
> > ---
> >  drivers/pci/pcie/aspm.c | 30 ++++++++++++++++++++++++++++++
> >  1 file changed, 30 insertions(+)
> > 
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index 928bf64..2bb8470 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -486,13 +486,35 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
> >  {
> >  	struct pci_dev *child = link->downstream, *parent = link->pdev;
> >  	u32 val1, val2, scale1, scale2;
> > +	u32 max_val, max_scale, max_snp_scale, max_snp_val, max_nsnp_scale, max_nsnp_val;
> >  	u32 t_common_mode, t_power_on, l1_2_threshold, scale, value;
> >  	u32 ctl1 = 0, ctl2 = 0;
> >  	u32 pctl1, pctl2, cctl1, cctl2;
> > +	u16 ltr;
> > +	u16 max_snoop_lat, max_nosnoop_lat;
> >  
> >  	if (!(link->aspm_support & ASPM_STATE_L1_2_MASK))
> >  		return;
> >  
> > +	ltr = pci_find_ext_capability(child, PCI_EXT_CAP_ID_LTR);
> > +	if (!ltr)
> > +		return;
> > +
> > +	pci_read_config_word(child, ltr + PCI_LTR_MAX_SNOOP_LAT, &max_snoop_lat);
> > +	pci_read_config_word(child, ltr + PCI_LTR_MAX_NOSNOOP_LAT, &max_nosnoop_lat);
> > +
> > +	max_snp_scale = (max_snoop_lat & PCI_LTR_SCALE_MASK) >> PCI_LTR_SCALE_SHIFT;
> > +	max_snp_val = max_snoop_lat & PCI_LTR_VALUE_MASK;
> > +
> > +	max_nsnp_scale = (max_nosnoop_lat & PCI_LTR_SCALE_MASK) >> PCI_LTR_SCALE_SHIFT;
> > +	max_nsnp_val = max_nosnoop_lat & PCI_LTR_VALUE_MASK;
> > +
> > +	/* choose the greater max scale value between snoop and no snoop value*/
> > +	max_scale = max(max_snp_scale, max_nsnp_scale);
> > +
> > +	/* choose the greater max value between snoop and no snoop scales */
> > +	max_val = max(max_snp_val, max_nsnp_val);
> > +
> >  	/* Choose the greater of the two Port Common_Mode_Restore_Times */
> >  	val1 = (parent_l1ss_cap & PCI_L1SS_CAP_CM_RESTORE_TIME) >> 8;
> >  	val2 = (child_l1ss_cap & PCI_L1SS_CAP_CM_RESTORE_TIME) >> 8;
> > @@ -525,6 +547,14 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
> >  	 */
> >  	l1_2_threshold = 2 + 4 + t_common_mode + t_power_on;
> >  	encode_l12_threshold(l1_2_threshold, &scale, &value);
> > +
> > +	/*
> > +	 * Based on PCIe r4.1, sec 5.5.1, L1.2 substate must be entered when reported
> > +	 * snoop/no-snoop values are greater than or equal to LTR_L1.2_THRESHOLD value.
> 
> Apart from the bug in calculating the LTR_Threshold as reported by Matthias
> and Bjorn, I'm wondering if we are covering up for the device firmware issue.

Yes, I think the patch is doing exactly that.

> As per section 6.18, if the device reports snoop/no-snoop scale/value as 0, then
> it implies that the device won't tolerate any additional delays from the host.
>
> In that case, how can we allow the link to go into L1.2 since that would incur
> high delay compared to L1.1?

I had the same doubt, a value of 0 doesn't make sense, if it literally means
'max delay of 0ns'. I did some debugging around this issue. One thing I found
is that there are NVMe models that don't have issues with entering L1.2 with
max (no-)snoop latencies of 0. From that I infer that a value of 0 does not
literally mean a max delay of 0ns.

The PCIe spec doesn't say specifically what a value of 0 in those registers
means, but chapter "6.18 Latency Tolerance Reporting (LTR) Mechanism" of the
PCIe 4.0 base spec says something about the latency requirements in LTR
messages:

  Setting the value and scale fields to all 0’s indicates that the device will
  be impacted by any delay and that the best possible service is requested.

With that and the fact that several NVMe's don't have issues with all 0 values
I deduce that all 0's means 'best possible service' and not 'max latency of
0ns'. It seems the Kioxia firmware has a bug which interprets all 0 values as
a max latency of 0ns.

Another finding is that the Kioxia NVMe can enter L1.2 if the max latencies
are set to values >= the LTR threshold. Unfortunately that isn't a viable
fix for existing devices in the field, devices under development could possibly
adjust the latencies in the BIOS (coreboot code [1] suggests that this is done
at least in some cases).

m.

[1] https://github.com/coreboot/coreboot/blob/master/src/device/pciexp_device.c#L313




> > +	 */
> > +	scale = min(scale, max_scale);
> > +	value = min(value, max_val);
> > +
> >  	ctl1 |= t_common_mode << 8 | scale << 29 | value << 16;
> >  
> >  	/* Some broken devices only support dword access to L1 SS */
> > -- 
> > 2.7.4
> > 
> 
> -- 
> மணிவண்ணன் சதாசிவம்
Manivannan Sadhasivam Dec. 21, 2022, 5:49 a.m. UTC | #6
On Mon, Dec 05, 2022 at 06:18:36PM +0000, Matthias Kaehlcke wrote:
> On Mon, Dec 05, 2022 at 04:55:00PM +0530, Manivannan Sadhasivam wrote:
> > On Fri, Sep 16, 2022 at 01:38:37PM +0530, Krishna chaitanya chundru wrote:
> > > In ASPM driver, LTR threshold scale and value are updated based on
> > > tcommon_mode and t_poweron values. In Kioxia NVMe L1.2 is failing due to
> > > LTR threshold scale and value are greater values than max snoop/non-snoop
> > > value.
> > > 
> > > Based on PCIe r4.1, sec 5.5.1, L1.2 substate must be entered when
> > > reported snoop/no-snoop values is greater than or equal to
> > > LTR_L1.2_THRESHOLD value.
> > > 
> > > Signed-off-by: Prasad Malisetty  <quic_pmaliset@quicinc.com>
> > > Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
> > > Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > 
> > I take my Ack back... Sorry that I did not look into this patch closer.
> > 
> > > ---
> > > 
> > > I am taking this patch forward as prasad is no more working with our org.
> > > changes since v6:
> > > 	- Rebasing with pci/next.
> > > changes since v5:
> > > 	- no changes, just reposting as standalone patch instead of reply to
> > > 	  previous patch.
> > > Changes since v4:
> > > 	- Replaced conditional statements with min and max.
> > > changes since v3:
> > > 	- Changed the logic to include this condition "snoop/nosnoop
> > > 	  latencies are not equal to zero and lower than LTR_L1.2_THRESHOLD"
> > > Changes since v2:
> > > 	- Replaced LTRME logic with max snoop/no-snoop latencies check.
> > > Changes since v1:
> > > 	- Added missing variable declaration in v1 patch
> > > ---
> > >  drivers/pci/pcie/aspm.c | 30 ++++++++++++++++++++++++++++++
> > >  1 file changed, 30 insertions(+)
> > > 
> > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > index 928bf64..2bb8470 100644
> > > --- a/drivers/pci/pcie/aspm.c
> > > +++ b/drivers/pci/pcie/aspm.c
> > > @@ -486,13 +486,35 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
> > >  {
> > >  	struct pci_dev *child = link->downstream, *parent = link->pdev;
> > >  	u32 val1, val2, scale1, scale2;
> > > +	u32 max_val, max_scale, max_snp_scale, max_snp_val, max_nsnp_scale, max_nsnp_val;
> > >  	u32 t_common_mode, t_power_on, l1_2_threshold, scale, value;
> > >  	u32 ctl1 = 0, ctl2 = 0;
> > >  	u32 pctl1, pctl2, cctl1, cctl2;
> > > +	u16 ltr;
> > > +	u16 max_snoop_lat, max_nosnoop_lat;
> > >  
> > >  	if (!(link->aspm_support & ASPM_STATE_L1_2_MASK))
> > >  		return;
> > >  
> > > +	ltr = pci_find_ext_capability(child, PCI_EXT_CAP_ID_LTR);
> > > +	if (!ltr)
> > > +		return;
> > > +
> > > +	pci_read_config_word(child, ltr + PCI_LTR_MAX_SNOOP_LAT, &max_snoop_lat);
> > > +	pci_read_config_word(child, ltr + PCI_LTR_MAX_NOSNOOP_LAT, &max_nosnoop_lat);
> > > +
> > > +	max_snp_scale = (max_snoop_lat & PCI_LTR_SCALE_MASK) >> PCI_LTR_SCALE_SHIFT;
> > > +	max_snp_val = max_snoop_lat & PCI_LTR_VALUE_MASK;
> > > +
> > > +	max_nsnp_scale = (max_nosnoop_lat & PCI_LTR_SCALE_MASK) >> PCI_LTR_SCALE_SHIFT;
> > > +	max_nsnp_val = max_nosnoop_lat & PCI_LTR_VALUE_MASK;
> > > +
> > > +	/* choose the greater max scale value between snoop and no snoop value*/
> > > +	max_scale = max(max_snp_scale, max_nsnp_scale);
> > > +
> > > +	/* choose the greater max value between snoop and no snoop scales */
> > > +	max_val = max(max_snp_val, max_nsnp_val);
> > > +
> > >  	/* Choose the greater of the two Port Common_Mode_Restore_Times */
> > >  	val1 = (parent_l1ss_cap & PCI_L1SS_CAP_CM_RESTORE_TIME) >> 8;
> > >  	val2 = (child_l1ss_cap & PCI_L1SS_CAP_CM_RESTORE_TIME) >> 8;
> > > @@ -525,6 +547,14 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
> > >  	 */
> > >  	l1_2_threshold = 2 + 4 + t_common_mode + t_power_on;
> > >  	encode_l12_threshold(l1_2_threshold, &scale, &value);
> > > +
> > > +	/*
> > > +	 * Based on PCIe r4.1, sec 5.5.1, L1.2 substate must be entered when reported
> > > +	 * snoop/no-snoop values are greater than or equal to LTR_L1.2_THRESHOLD value.
> > 
> > Apart from the bug in calculating the LTR_Threshold as reported by Matthias
> > and Bjorn, I'm wondering if we are covering up for the device firmware issue.
> 
> Yes, I think the patch is doing exactly that.
> 
> > As per section 6.18, if the device reports snoop/no-snoop scale/value as 0, then
> > it implies that the device won't tolerate any additional delays from the host.
> >
> > In that case, how can we allow the link to go into L1.2 since that would incur
> > high delay compared to L1.1?
> 
> I had the same doubt, a value of 0 doesn't make sense, if it literally means
> 'max delay of 0ns'. I did some debugging around this issue. One thing I found
> is that there are NVMe models that don't have issues with entering L1.2 with
> max (no-)snoop latencies of 0. From that I infer that a value of 0 does not
> literally mean a max delay of 0ns.
> 

This is interesting.

> The PCIe spec doesn't say specifically what a value of 0 in those registers
> means, but chapter "6.18 Latency Tolerance Reporting (LTR) Mechanism" of the
> PCIe 4.0 base spec says something about the latency requirements in LTR
> messages:
> 
>   Setting the value and scale fields to all 0’s indicates that the device will
>   be impacted by any delay and that the best possible service is requested.
> 
> With that and the fact that several NVMe's don't have issues with all 0 values
> I deduce that all 0's means 'best possible service' and not 'max latency of
> 0ns'. It seems the Kioxia firmware has a bug which interprets all 0 values as
> a max latency of 0ns.
> 
> Another finding is that the Kioxia NVMe can enter L1.2 if the max latencies
> are set to values >= the LTR threshold. Unfortunately that isn't a viable
> fix for existing devices in the field, devices under development could possibly
> adjust the latencies in the BIOS (coreboot code [1] suggests that this is done
> at least in some cases).
> 

I fully agree that it is a firmware issue. And yes, we should refrain to fixes
in the bootloader if possible. 

Another option would be to add a quirk for specific devices in the ASPM code.
But in that case, I'm not sure what would be the optimal snoop/no-snoop value
that could be used. There is another issue where if we have some other device
on the same bus that explicitly requires 0ns latency.

Thanks,
Mani

> m.
> 
> [1] https://github.com/coreboot/coreboot/blob/master/src/device/pciexp_device.c#L313
> 
> 
> 
> 
> > > +	 */
> > > +	scale = min(scale, max_scale);
> > > +	value = min(value, max_val);
> > > +
> > >  	ctl1 |= t_common_mode << 8 | scale << 29 | value << 16;
> > >  
> > >  	/* Some broken devices only support dword access to L1 SS */
> > > -- 
> > > 2.7.4
> > > 
> > 
> > -- 
> > மணிவண்ணன் சதாசிவம்
Matthias Kaehlcke Dec. 21, 2022, 2:59 p.m. UTC | #7
On Wed, Dec 21, 2022 at 11:19:53AM +0530, Manivannan Sadhasivam wrote:
> On Mon, Dec 05, 2022 at 06:18:36PM +0000, Matthias Kaehlcke wrote:
> > On Mon, Dec 05, 2022 at 04:55:00PM +0530, Manivannan Sadhasivam wrote:
> > > On Fri, Sep 16, 2022 at 01:38:37PM +0530, Krishna chaitanya chundru wrote:
> > > > In ASPM driver, LTR threshold scale and value are updated based on
> > > > tcommon_mode and t_poweron values. In Kioxia NVMe L1.2 is failing due to
> > > > LTR threshold scale and value are greater values than max snoop/non-snoop
> > > > value.
> > > > 
> > > > Based on PCIe r4.1, sec 5.5.1, L1.2 substate must be entered when
> > > > reported snoop/no-snoop values is greater than or equal to
> > > > LTR_L1.2_THRESHOLD value.
> > > > 
> > > > Signed-off-by: Prasad Malisetty  <quic_pmaliset@quicinc.com>
> > > > Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
> > > > Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > 
> > > I take my Ack back... Sorry that I did not look into this patch closer.
> > > 
> > > > ---
> > > > 
> > > > I am taking this patch forward as prasad is no more working with our org.
> > > > changes since v6:
> > > > 	- Rebasing with pci/next.
> > > > changes since v5:
> > > > 	- no changes, just reposting as standalone patch instead of reply to
> > > > 	  previous patch.
> > > > Changes since v4:
> > > > 	- Replaced conditional statements with min and max.
> > > > changes since v3:
> > > > 	- Changed the logic to include this condition "snoop/nosnoop
> > > > 	  latencies are not equal to zero and lower than LTR_L1.2_THRESHOLD"
> > > > Changes since v2:
> > > > 	- Replaced LTRME logic with max snoop/no-snoop latencies check.
> > > > Changes since v1:
> > > > 	- Added missing variable declaration in v1 patch
> > > > ---
> > > >  drivers/pci/pcie/aspm.c | 30 ++++++++++++++++++++++++++++++
> > > >  1 file changed, 30 insertions(+)
> > > > 
> > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > > index 928bf64..2bb8470 100644
> > > > --- a/drivers/pci/pcie/aspm.c
> > > > +++ b/drivers/pci/pcie/aspm.c
> > > > @@ -486,13 +486,35 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
> > > >  {
> > > >  	struct pci_dev *child = link->downstream, *parent = link->pdev;
> > > >  	u32 val1, val2, scale1, scale2;
> > > > +	u32 max_val, max_scale, max_snp_scale, max_snp_val, max_nsnp_scale, max_nsnp_val;
> > > >  	u32 t_common_mode, t_power_on, l1_2_threshold, scale, value;
> > > >  	u32 ctl1 = 0, ctl2 = 0;
> > > >  	u32 pctl1, pctl2, cctl1, cctl2;
> > > > +	u16 ltr;
> > > > +	u16 max_snoop_lat, max_nosnoop_lat;
> > > >  
> > > >  	if (!(link->aspm_support & ASPM_STATE_L1_2_MASK))
> > > >  		return;
> > > >  
> > > > +	ltr = pci_find_ext_capability(child, PCI_EXT_CAP_ID_LTR);
> > > > +	if (!ltr)
> > > > +		return;
> > > > +
> > > > +	pci_read_config_word(child, ltr + PCI_LTR_MAX_SNOOP_LAT, &max_snoop_lat);
> > > > +	pci_read_config_word(child, ltr + PCI_LTR_MAX_NOSNOOP_LAT, &max_nosnoop_lat);
> > > > +
> > > > +	max_snp_scale = (max_snoop_lat & PCI_LTR_SCALE_MASK) >> PCI_LTR_SCALE_SHIFT;
> > > > +	max_snp_val = max_snoop_lat & PCI_LTR_VALUE_MASK;
> > > > +
> > > > +	max_nsnp_scale = (max_nosnoop_lat & PCI_LTR_SCALE_MASK) >> PCI_LTR_SCALE_SHIFT;
> > > > +	max_nsnp_val = max_nosnoop_lat & PCI_LTR_VALUE_MASK;
> > > > +
> > > > +	/* choose the greater max scale value between snoop and no snoop value*/
> > > > +	max_scale = max(max_snp_scale, max_nsnp_scale);
> > > > +
> > > > +	/* choose the greater max value between snoop and no snoop scales */
> > > > +	max_val = max(max_snp_val, max_nsnp_val);
> > > > +
> > > >  	/* Choose the greater of the two Port Common_Mode_Restore_Times */
> > > >  	val1 = (parent_l1ss_cap & PCI_L1SS_CAP_CM_RESTORE_TIME) >> 8;
> > > >  	val2 = (child_l1ss_cap & PCI_L1SS_CAP_CM_RESTORE_TIME) >> 8;
> > > > @@ -525,6 +547,14 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
> > > >  	 */
> > > >  	l1_2_threshold = 2 + 4 + t_common_mode + t_power_on;
> > > >  	encode_l12_threshold(l1_2_threshold, &scale, &value);
> > > > +
> > > > +	/*
> > > > +	 * Based on PCIe r4.1, sec 5.5.1, L1.2 substate must be entered when reported
> > > > +	 * snoop/no-snoop values are greater than or equal to LTR_L1.2_THRESHOLD value.
> > > 
> > > Apart from the bug in calculating the LTR_Threshold as reported by Matthias
> > > and Bjorn, I'm wondering if we are covering up for the device firmware issue.
> > 
> > Yes, I think the patch is doing exactly that.
> > 
> > > As per section 6.18, if the device reports snoop/no-snoop scale/value as 0, then
> > > it implies that the device won't tolerate any additional delays from the host.
> > >
> > > In that case, how can we allow the link to go into L1.2 since that would incur
> > > high delay compared to L1.1?
> > 
> > I had the same doubt, a value of 0 doesn't make sense, if it literally means
> > 'max delay of 0ns'. I did some debugging around this issue. One thing I found
> > is that there are NVMe models that don't have issues with entering L1.2 with
> > max (no-)snoop latencies of 0. From that I infer that a value of 0 does not
> > literally mean a max delay of 0ns.
> > 
> 
> This is interesting.
> 
> > The PCIe spec doesn't say specifically what a value of 0 in those registers
> > means, but chapter "6.18 Latency Tolerance Reporting (LTR) Mechanism" of the
> > PCIe 4.0 base spec says something about the latency requirements in LTR
> > messages:
> > 
> >   Setting the value and scale fields to all 0’s indicates that the device will
> >   be impacted by any delay and that the best possible service is requested.
> > 
> > With that and the fact that several NVMe's don't have issues with all 0 values
> > I deduce that all 0's means 'best possible service' and not 'max latency of
> > 0ns'. It seems the Kioxia firmware has a bug which interprets all 0 values as
> > a max latency of 0ns.
> > 
> > Another finding is that the Kioxia NVMe can enter L1.2 if the max latencies
> > are set to values >= the LTR threshold. Unfortunately that isn't a viable
> > fix for existing devices in the field, devices under development could possibly
> > adjust the latencies in the BIOS (coreboot code [1] suggests that this is done
> > at least in some cases).
> > 
> 
> I fully agree that it is a firmware issue. And yes, we should refrain to fixes
> in the bootloader if possible.
> 
> Another option would be to add a quirk for specific devices in the ASPM code.
> But in that case, I'm not sure what would be the optimal snoop/no-snoop value
> that could be used.

I had/have the same doubt.

> There is another issue where if we have some other device on the same bus
> that explicitly requires 0ns latency.

Would that be reasonable requirement, i.e. can 0ns latency ever be achieved?
Manivannan Sadhasivam Dec. 22, 2022, 1:39 p.m. UTC | #8
On Wed, Dec 21, 2022 at 02:59:42PM +0000, Matthias Kaehlcke wrote:
> On Wed, Dec 21, 2022 at 11:19:53AM +0530, Manivannan Sadhasivam wrote:
> > On Mon, Dec 05, 2022 at 06:18:36PM +0000, Matthias Kaehlcke wrote:
> > > On Mon, Dec 05, 2022 at 04:55:00PM +0530, Manivannan Sadhasivam wrote:
> > > > On Fri, Sep 16, 2022 at 01:38:37PM +0530, Krishna chaitanya chundru wrote:
> > > > > In ASPM driver, LTR threshold scale and value are updated based on
> > > > > tcommon_mode and t_poweron values. In Kioxia NVMe L1.2 is failing due to
> > > > > LTR threshold scale and value are greater values than max snoop/non-snoop
> > > > > value.
> > > > > 
> > > > > Based on PCIe r4.1, sec 5.5.1, L1.2 substate must be entered when
> > > > > reported snoop/no-snoop values is greater than or equal to
> > > > > LTR_L1.2_THRESHOLD value.
> > > > > 
> > > > > Signed-off-by: Prasad Malisetty  <quic_pmaliset@quicinc.com>
> > > > > Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
> > > > > Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > 
> > > > I take my Ack back... Sorry that I did not look into this patch closer.
> > > > 
> > > > > ---
> > > > > 
> > > > > I am taking this patch forward as prasad is no more working with our org.
> > > > > changes since v6:
> > > > > 	- Rebasing with pci/next.
> > > > > changes since v5:
> > > > > 	- no changes, just reposting as standalone patch instead of reply to
> > > > > 	  previous patch.
> > > > > Changes since v4:
> > > > > 	- Replaced conditional statements with min and max.
> > > > > changes since v3:
> > > > > 	- Changed the logic to include this condition "snoop/nosnoop
> > > > > 	  latencies are not equal to zero and lower than LTR_L1.2_THRESHOLD"
> > > > > Changes since v2:
> > > > > 	- Replaced LTRME logic with max snoop/no-snoop latencies check.
> > > > > Changes since v1:
> > > > > 	- Added missing variable declaration in v1 patch
> > > > > ---
> > > > >  drivers/pci/pcie/aspm.c | 30 ++++++++++++++++++++++++++++++
> > > > >  1 file changed, 30 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > > > index 928bf64..2bb8470 100644
> > > > > --- a/drivers/pci/pcie/aspm.c
> > > > > +++ b/drivers/pci/pcie/aspm.c
> > > > > @@ -486,13 +486,35 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
> > > > >  {
> > > > >  	struct pci_dev *child = link->downstream, *parent = link->pdev;
> > > > >  	u32 val1, val2, scale1, scale2;
> > > > > +	u32 max_val, max_scale, max_snp_scale, max_snp_val, max_nsnp_scale, max_nsnp_val;
> > > > >  	u32 t_common_mode, t_power_on, l1_2_threshold, scale, value;
> > > > >  	u32 ctl1 = 0, ctl2 = 0;
> > > > >  	u32 pctl1, pctl2, cctl1, cctl2;
> > > > > +	u16 ltr;
> > > > > +	u16 max_snoop_lat, max_nosnoop_lat;
> > > > >  
> > > > >  	if (!(link->aspm_support & ASPM_STATE_L1_2_MASK))
> > > > >  		return;
> > > > >  
> > > > > +	ltr = pci_find_ext_capability(child, PCI_EXT_CAP_ID_LTR);
> > > > > +	if (!ltr)
> > > > > +		return;
> > > > > +
> > > > > +	pci_read_config_word(child, ltr + PCI_LTR_MAX_SNOOP_LAT, &max_snoop_lat);
> > > > > +	pci_read_config_word(child, ltr + PCI_LTR_MAX_NOSNOOP_LAT, &max_nosnoop_lat);
> > > > > +
> > > > > +	max_snp_scale = (max_snoop_lat & PCI_LTR_SCALE_MASK) >> PCI_LTR_SCALE_SHIFT;
> > > > > +	max_snp_val = max_snoop_lat & PCI_LTR_VALUE_MASK;
> > > > > +
> > > > > +	max_nsnp_scale = (max_nosnoop_lat & PCI_LTR_SCALE_MASK) >> PCI_LTR_SCALE_SHIFT;
> > > > > +	max_nsnp_val = max_nosnoop_lat & PCI_LTR_VALUE_MASK;
> > > > > +
> > > > > +	/* choose the greater max scale value between snoop and no snoop value*/
> > > > > +	max_scale = max(max_snp_scale, max_nsnp_scale);
> > > > > +
> > > > > +	/* choose the greater max value between snoop and no snoop scales */
> > > > > +	max_val = max(max_snp_val, max_nsnp_val);
> > > > > +
> > > > >  	/* Choose the greater of the two Port Common_Mode_Restore_Times */
> > > > >  	val1 = (parent_l1ss_cap & PCI_L1SS_CAP_CM_RESTORE_TIME) >> 8;
> > > > >  	val2 = (child_l1ss_cap & PCI_L1SS_CAP_CM_RESTORE_TIME) >> 8;
> > > > > @@ -525,6 +547,14 @@ static void aspm_calc_l1ss_info(struct pcie_link_state *link,
> > > > >  	 */
> > > > >  	l1_2_threshold = 2 + 4 + t_common_mode + t_power_on;
> > > > >  	encode_l12_threshold(l1_2_threshold, &scale, &value);
> > > > > +
> > > > > +	/*
> > > > > +	 * Based on PCIe r4.1, sec 5.5.1, L1.2 substate must be entered when reported
> > > > > +	 * snoop/no-snoop values are greater than or equal to LTR_L1.2_THRESHOLD value.
> > > > 
> > > > Apart from the bug in calculating the LTR_Threshold as reported by Matthias
> > > > and Bjorn, I'm wondering if we are covering up for the device firmware issue.
> > > 
> > > Yes, I think the patch is doing exactly that.
> > > 
> > > > As per section 6.18, if the device reports snoop/no-snoop scale/value as 0, then
> > > > it implies that the device won't tolerate any additional delays from the host.
> > > >
> > > > In that case, how can we allow the link to go into L1.2 since that would incur
> > > > high delay compared to L1.1?
> > > 
> > > I had the same doubt, a value of 0 doesn't make sense, if it literally means
> > > 'max delay of 0ns'. I did some debugging around this issue. One thing I found
> > > is that there are NVMe models that don't have issues with entering L1.2 with
> > > max (no-)snoop latencies of 0. From that I infer that a value of 0 does not
> > > literally mean a max delay of 0ns.
> > > 
> > 
> > This is interesting.
> > 
> > > The PCIe spec doesn't say specifically what a value of 0 in those registers
> > > means, but chapter "6.18 Latency Tolerance Reporting (LTR) Mechanism" of the
> > > PCIe 4.0 base spec says something about the latency requirements in LTR
> > > messages:
> > > 
> > >   Setting the value and scale fields to all 0’s indicates that the device will
> > >   be impacted by any delay and that the best possible service is requested.
> > > 
> > > With that and the fact that several NVMe's don't have issues with all 0 values
> > > I deduce that all 0's means 'best possible service' and not 'max latency of
> > > 0ns'. It seems the Kioxia firmware has a bug which interprets all 0 values as
> > > a max latency of 0ns.
> > > 
> > > Another finding is that the Kioxia NVMe can enter L1.2 if the max latencies
> > > are set to values >= the LTR threshold. Unfortunately that isn't a viable
> > > fix for existing devices in the field, devices under development could possibly
> > > adjust the latencies in the BIOS (coreboot code [1] suggests that this is done
> > > at least in some cases).
> > > 
> > 
> > I fully agree that it is a firmware issue. And yes, we should refrain to fixes
> > in the bootloader if possible.
> > 
> > Another option would be to add a quirk for specific devices in the ASPM code.
> > But in that case, I'm not sure what would be the optimal snoop/no-snoop value
> > that could be used.
> 
> I had/have the same doubt.
> 
> > There is another issue where if we have some other device on the same bus
> > that explicitly requires 0ns latency.
> 
> Would that be reasonable requirement, i.e. can 0ns latency ever be achieved?

Well, not in an ideal world. But the device vendors do not fail to surprise us ;)

Thanks,
Mani
diff mbox series

Patch

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 928bf64..2bb8470 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -486,13 +486,35 @@  static void aspm_calc_l1ss_info(struct pcie_link_state *link,
 {
 	struct pci_dev *child = link->downstream, *parent = link->pdev;
 	u32 val1, val2, scale1, scale2;
+	u32 max_val, max_scale, max_snp_scale, max_snp_val, max_nsnp_scale, max_nsnp_val;
 	u32 t_common_mode, t_power_on, l1_2_threshold, scale, value;
 	u32 ctl1 = 0, ctl2 = 0;
 	u32 pctl1, pctl2, cctl1, cctl2;
+	u16 ltr;
+	u16 max_snoop_lat, max_nosnoop_lat;
 
 	if (!(link->aspm_support & ASPM_STATE_L1_2_MASK))
 		return;
 
+	ltr = pci_find_ext_capability(child, PCI_EXT_CAP_ID_LTR);
+	if (!ltr)
+		return;
+
+	pci_read_config_word(child, ltr + PCI_LTR_MAX_SNOOP_LAT, &max_snoop_lat);
+	pci_read_config_word(child, ltr + PCI_LTR_MAX_NOSNOOP_LAT, &max_nosnoop_lat);
+
+	max_snp_scale = (max_snoop_lat & PCI_LTR_SCALE_MASK) >> PCI_LTR_SCALE_SHIFT;
+	max_snp_val = max_snoop_lat & PCI_LTR_VALUE_MASK;
+
+	max_nsnp_scale = (max_nosnoop_lat & PCI_LTR_SCALE_MASK) >> PCI_LTR_SCALE_SHIFT;
+	max_nsnp_val = max_nosnoop_lat & PCI_LTR_VALUE_MASK;
+
+	/* choose the greater max scale value between snoop and no snoop value*/
+	max_scale = max(max_snp_scale, max_nsnp_scale);
+
+	/* choose the greater max value between snoop and no snoop scales */
+	max_val = max(max_snp_val, max_nsnp_val);
+
 	/* Choose the greater of the two Port Common_Mode_Restore_Times */
 	val1 = (parent_l1ss_cap & PCI_L1SS_CAP_CM_RESTORE_TIME) >> 8;
 	val2 = (child_l1ss_cap & PCI_L1SS_CAP_CM_RESTORE_TIME) >> 8;
@@ -525,6 +547,14 @@  static void aspm_calc_l1ss_info(struct pcie_link_state *link,
 	 */
 	l1_2_threshold = 2 + 4 + t_common_mode + t_power_on;
 	encode_l12_threshold(l1_2_threshold, &scale, &value);
+
+	/*
+	 * Based on PCIe r4.1, sec 5.5.1, L1.2 substate must be entered when reported
+	 * snoop/no-snoop values are greater than or equal to LTR_L1.2_THRESHOLD value.
+	 */
+	scale = min(scale, max_scale);
+	value = min(value, max_val);
+
 	ctl1 |= t_common_mode << 8 | scale << 29 | value << 16;
 
 	/* Some broken devices only support dword access to L1 SS */