[iwl-next,v3] e1000e: Fix real-time violations on link up

Message ID	20241214191623.7256-1-gerhard@engleder-embedded.com (mailing list archive)
State	Superseded
Delegated to:	Netdev Maintainers
Headers	show Received: from mx23lb.world4you.com (mx23lb.world4you.com [81.19.149.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5657AEC0; Sat, 14 Dec 2024 20:02:00 +0000 (UTC) From: Gerhard Engleder <gerhard@engleder-embedded.com> To: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org, linux-pci@vger.kernel.org Cc: anthony.l.nguyen@intel.com, przemyslaw.kitszel@intel.com, andrew+netdev@lunn.ch, davem@davemloft.net, kuba@kernel.org, edumazet@google.com, pabeni@redhat.com, bhelgaas@google.com, pmenzel@molgen.mpg.de, Gerhard Engleder <eg@keba.com>, Vitaly Lifshits <vitaly.lifshits@intel.com> Subject: [PATCH iwl-next v3] e1000e: Fix real-time violations on link up Date: Sat, 14 Dec 2024 20:16:23 +0100 Message-Id: <20241214191623.7256-1-gerhard@engleder-embedded.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[iwl-next,v3] e1000e: Fix real-time violations on link up \| expand [iwl-next,v3] e1000e: Fix real-time violations on link up

Context	Check	Description
netdev/series_format	warning	Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection	success	Guessed tree name to be net-next
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 0 this patch: 0
netdev/build_tools	success	No tools touched, skip
netdev/cc_maintainers	warning	4 maintainers not CCed: bigeasy@linutronix.de clrkwllms@kernel.org rostedt@goodmis.org linux-rt-devel@lists.linux.dev
netdev/build_clang	success	Errors and warnings before: 5 this patch: 5
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 0 this patch: 0
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 16 lines checked
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 29 this patch: 29
netdev/source_inline	success	Was 0 now: 0
netdev/contest	success	net-next-2024-12-15--09-00 (tests: 795)

Gerhard Engleder Dec. 14, 2024, 7:16 p.m. UTC

From: Gerhard Engleder <eg@keba.com>

Link down and up triggers update of MTA table. This update executes many
PCIe writes and a final flush. Thus, PCIe will be blocked until all
writes are flushed. As a result, DMA transfers of other targets suffer
from delay in the range of 50us. This results in timing violations on
real-time systems during link down and up of e1000e in combination with
an Intel i3-2310E Sandy Bridge CPU.

The i3-2310E is quite old. Launched 2011 by Intel but still in use as
robot controller. The exact root cause of the problem is unclear and
this situation won't change as Intel support for this CPU has ended
years ago. Our experience is that the number of posted PCIe writes needs
to be limited at least for real-time systems. With posted PCIe writes a
much higher throughput can be generated than with PCIe reads which
cannot be posted. Thus, the load on the interconnect is much higher.
Additionally, a PCIe read waits until all posted PCIe writes are done.
Therefore, the PCIe read can block the CPU for much more than 10us if a
lot of PCIe writes were posted before. Both issues are the reason why we
are limiting the number of posted PCIe writes in row in general for our
real-time systems, not only for this driver.

A flush after a low enough number of posted PCIe writes eliminates the
delay but also increases the time needed for MTA table update. The
following measurements were done on i3-2310E with e1000e for 128 MTA
table entries:

Single flush after all writes: 106us
Flush after every write:       429us
Flush after every 2nd write:   266us
Flush after every 4th write:   180us
Flush after every 8th write:   141us
Flush after every 16th write:  121us

A flush after every 8th write delays the link up by 35us and the
negative impact to DMA transfers of other targets is still tolerable.

Execute a flush after every 8th write. This prevents overloading the
interconnect with posted writes.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
CC: Vitaly Lifshits <vitaly.lifshits@intel.com>
Link: https://lore.kernel.org/netdev/f8fe665a-5e6c-4f95-b47a-2f3281aa0e6c@lunn.ch/T/
Signed-off-by: Gerhard Engleder <eg@keba.com>
---
v3:
- mention problematic platform explicitly (Bjorn Helgaas)
- improve comment (Paul Menzel)

v2:
- remove PREEMPT_RT dependency (Andrew Lunn, Przemek Kitszel)
---
 drivers/net/ethernet/intel/e1000e/mac.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Lifshits, Vitaly Dec. 16, 2024, 11:16 a.m. UTC | #1

On 12/14/2024 9:16 PM, Gerhard Engleder wrote:
> From: Gerhard Engleder <eg@keba.com>
> 
> Link down and up triggers update of MTA table. This update executes many
> PCIe writes and a final flush. Thus, PCIe will be blocked until all
> writes are flushed. As a result, DMA transfers of other targets suffer
> from delay in the range of 50us. This results in timing violations on
> real-time systems during link down and up of e1000e in combination with
> an Intel i3-2310E Sandy Bridge CPU.
> 
> The i3-2310E is quite old. Launched 2011 by Intel but still in use as
> robot controller. The exact root cause of the problem is unclear and
> this situation won't change as Intel support for this CPU has ended
> years ago. Our experience is that the number of posted PCIe writes needs
> to be limited at least for real-time systems. With posted PCIe writes a
> much higher throughput can be generated than with PCIe reads which
> cannot be posted. Thus, the load on the interconnect is much higher.
> Additionally, a PCIe read waits until all posted PCIe writes are done.
> Therefore, the PCIe read can block the CPU for much more than 10us if a
> lot of PCIe writes were posted before. Both issues are the reason why we
> are limiting the number of posted PCIe writes in row in general for our
> real-time systems, not only for this driver.
> 
> A flush after a low enough number of posted PCIe writes eliminates the
> delay but also increases the time needed for MTA table update. The
> following measurements were done on i3-2310E with e1000e for 128 MTA
> table entries:
> 
> Single flush after all writes: 106us
> Flush after every write:       429us
> Flush after every 2nd write:   266us
> Flush after every 4th write:   180us
> Flush after every 8th write:   141us
> Flush after every 16th write:  121us
> 
> A flush after every 8th write delays the link up by 35us and the
> negative impact to DMA transfers of other targets is still tolerable.
> 
> Execute a flush after every 8th write. This prevents overloading the
> interconnect with posted writes.
> 
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> CC: Vitaly Lifshits <vitaly.lifshits@intel.com>
> Link: https://lore.kernel.org/netdev/f8fe665a-5e6c-4f95-b47a-2f3281aa0e6c@lunn.ch/T/
> Signed-off-by: Gerhard Engleder <eg@keba.com>
> ---
> v3:
> - mention problematic platform explicitly (Bjorn Helgaas)
> - improve comment (Paul Menzel)
> 
> v2:
> - remove PREEMPT_RT dependency (Andrew Lunn, Przemek Kitszel)
> ---
>   drivers/net/ethernet/intel/e1000e/mac.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/mac.c b/drivers/net/ethernet/intel/e1000e/mac.c
> index d7df2a0ed629..0174c16bbb43 100644
> --- a/drivers/net/ethernet/intel/e1000e/mac.c
> +++ b/drivers/net/ethernet/intel/e1000e/mac.c
> @@ -331,8 +331,15 @@ void e1000e_update_mc_addr_list_generic(struct e1000_hw *hw,
>   	}
>   
>   	/* replace the entire MTA table */
> -	for (i = hw->mac.mta_reg_count - 1; i >= 0; i--)
> +	for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) {
>   		E1000_WRITE_REG_ARRAY(hw, E1000_MTA, i, hw->mac.mta_shadow[i]);
> +
> +		/* do not queue up too many posted writes to prevent increased
> +		 * latency for other devices on the interconnect
> +		 */
> +		if ((i % 8) == 0 && i != 0)
> +			e1e_flush();


I would prefer to avoid adding this code to all devices, particularly 
those that don't operate on real-time systems. Implementing this code 
will introduce three additional MMIO transactions which will increase 
the driver start time in various flows (up, probe, etc.).

Is there a specific reason not to use if (IS_ENABLED(CONFIG_PREEMPT_RT)) 
as Andrew initially suggested?


> +	}
>   	e1e_flush();
>   }
>

Gerhard Engleder Dec. 16, 2024, 7:23 p.m. UTC | #2

>> @@ -331,8 +331,15 @@ void e1000e_update_mc_addr_list_generic(struct 
>> e1000_hw *hw,
>>       }
>>       /* replace the entire MTA table */
>> -    for (i = hw->mac.mta_reg_count - 1; i >= 0; i--)
>> +    for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) {
>>           E1000_WRITE_REG_ARRAY(hw, E1000_MTA, i, hw->mac.mta_shadow[i]);
>> +
>> +        /* do not queue up too many posted writes to prevent increased
>> +         * latency for other devices on the interconnect
>> +         */
>> +        if ((i % 8) == 0 && i != 0)
>> +            e1e_flush();
> 
> 
> I would prefer to avoid adding this code to all devices, particularly 
> those that don't operate on real-time systems. Implementing this code 
> will introduce three additional MMIO transactions which will increase 
> the driver start time in various flows (up, probe, etc.).
> 
> Is there a specific reason not to use if (IS_ENABLED(CONFIG_PREEMPT_RT)) 
> as Andrew initially suggested?

Andrew made two suggestions: IS_ENABLED(CONFIG_PREEMPT_RT) which I used
in the first version after the RFC. And he suggested to check for a
compromise between RT and none RT performance, as some distros might
enable PREEMPT_RT in the future.
Przemek suggested to remove the PREEMPT_RT check as "this change sounds
reasonable also for the standard kernel" after the first version with
IS_ENABLED(CONFIG_PREEMPT_RT).

I used the PREEMPT_RT dependency to limit effects to real-time systems,
to not make none real-time systems slower. But I could also follow the
reasoning of Andrew and Przemek. With that said, I have no problem to
add IS_ENABLED(CONFIG_PREEMPT_RT) again.

Gerhard

Przemek Kitszel Dec. 18, 2024, 8:36 a.m. UTC | #3

On 12/16/24 20:23, Gerhard Engleder wrote:
>>> @@ -331,8 +331,15 @@ void e1000e_update_mc_addr_list_generic(struct 
>>> e1000_hw *hw,
>>>       }
>>>       /* replace the entire MTA table */
>>> -    for (i = hw->mac.mta_reg_count - 1; i >= 0; i--)
>>> +    for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) {
>>>           E1000_WRITE_REG_ARRAY(hw, E1000_MTA, i, hw- 
>>> >mac.mta_shadow[i]);
>>> +
>>> +        /* do not queue up too many posted writes to prevent increased
>>> +         * latency for other devices on the interconnect
>>> +         */
>>> +        if ((i % 8) == 0 && i != 0)
>>> +            e1e_flush();
>>
>>
>> I would prefer to avoid adding this code to all devices, particularly 
>> those that don't operate on real-time systems. Implementing this code 
>> will introduce three additional MMIO transactions which will increase 
>> the driver start time in various flows (up, probe, etc.).
>>
>> Is there a specific reason not to use if 
>> (IS_ENABLED(CONFIG_PREEMPT_RT)) as Andrew initially suggested?
> 
> Andrew made two suggestions: IS_ENABLED(CONFIG_PREEMPT_RT) which I used
> in the first version after the RFC. And he suggested to check for a
> compromise between RT and none RT performance, as some distros might
> enable PREEMPT_RT in the future.
> Przemek suggested to remove the PREEMPT_RT check as "this change sounds
> reasonable also for the standard kernel" after the first version with
> IS_ENABLED(CONFIG_PREEMPT_RT).
> 
> I used the PREEMPT_RT dependency to limit effects to real-time systems,
> to not make none real-time systems slower. But I could also follow the
> reasoning of Andrew and Przemek. With that said, I have no problem to
> add IS_ENABLED(CONFIG_PREEMPT_RT) again.
> 
> Gerhard

I'm also fine with limiting the change to RT kernels.

Avigail Dahan Dec. 18, 2024, 3:08 p.m. UTC | #4

On 14/12/2024 21:16, Gerhard Engleder wrote:
> From: Gerhard Engleder <eg@keba.com>
> 
> Link down and up triggers update of MTA table. This update executes many
> PCIe writes and a final flush. Thus, PCIe will be blocked until all
> writes are flushed. As a result, DMA transfers of other targets suffer
> from delay in the range of 50us. This results in timing violations on
> real-time systems during link down and up of e1000e in combination with
> an Intel i3-2310E Sandy Bridge CPU.
> 
> The i3-2310E is quite old. Launched 2011 by Intel but still in use as
> robot controller. The exact root cause of the problem is unclear and
> this situation won't change as Intel support for this CPU has ended
> years ago. Our experience is that the number of posted PCIe writes needs
> to be limited at least for real-time systems. With posted PCIe writes a
> much higher throughput can be generated than with PCIe reads which
> cannot be posted. Thus, the load on the interconnect is much higher.
> Additionally, a PCIe read waits until all posted PCIe writes are done.
> Therefore, the PCIe read can block the CPU for much more than 10us if a
> lot of PCIe writes were posted before. Both issues are the reason why we
> are limiting the number of posted PCIe writes in row in general for our
> real-time systems, not only for this driver.
> 
> A flush after a low enough number of posted PCIe writes eliminates the
> delay but also increases the time needed for MTA table update. The
> following measurements were done on i3-2310E with e1000e for 128 MTA
> table entries:
> 
> Single flush after all writes: 106us
> Flush after every write:       429us
> Flush after every 2nd write:   266us
> Flush after every 4th write:   180us
> Flush after every 8th write:   141us
> Flush after every 16th write:  121us
> 
> A flush after every 8th write delays the link up by 35us and the
> negative impact to DMA transfers of other targets is still tolerable.
> 
> Execute a flush after every 8th write. This prevents overloading the
> interconnect with posted writes.
> 
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> CC: Vitaly Lifshits <vitaly.lifshits@intel.com>
> Link: https://lore.kernel.org/netdev/f8fe665a-5e6c-4f95-b47a-2f3281aa0e6c@lunn.ch/T/
> Signed-off-by: Gerhard Engleder <eg@keba.com>
> ---
> v3:
> - mention problematic platform explicitly (Bjorn Helgaas)
> - improve comment (Paul Menzel)
> 
> v2:
> - remove PREEMPT_RT dependency (Andrew Lunn, Przemek Kitszel)
> ---
>   drivers/net/ethernet/intel/e1000e/mac.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>

Alexander Lobakin Dec. 18, 2024, 3:23 p.m. UTC | #5

From: Gerhard Engleder <gerhard@engleder-embedded.com>
Date: Sat, 14 Dec 2024 20:16:23 +0100

> From: Gerhard Engleder <eg@keba.com>
> 
> Link down and up triggers update of MTA table. This update executes many
> PCIe writes and a final flush. Thus, PCIe will be blocked until all
> writes are flushed. As a result, DMA transfers of other targets suffer
> from delay in the range of 50us. This results in timing violations on
> real-time systems during link down and up of e1000e in combination with
> an Intel i3-2310E Sandy Bridge CPU.
> 
> The i3-2310E is quite old. Launched 2011 by Intel but still in use as
> robot controller. The exact root cause of the problem is unclear and
> this situation won't change as Intel support for this CPU has ended
> years ago. Our experience is that the number of posted PCIe writes needs
> to be limited at least for real-time systems. With posted PCIe writes a
> much higher throughput can be generated than with PCIe reads which
> cannot be posted. Thus, the load on the interconnect is much higher.
> Additionally, a PCIe read waits until all posted PCIe writes are done.
> Therefore, the PCIe read can block the CPU for much more than 10us if a
> lot of PCIe writes were posted before. Both issues are the reason why we
> are limiting the number of posted PCIe writes in row in general for our
> real-time systems, not only for this driver.
> 
> A flush after a low enough number of posted PCIe writes eliminates the
> delay but also increases the time needed for MTA table update. The
> following measurements were done on i3-2310E with e1000e for 128 MTA
> table entries:
> 
> Single flush after all writes: 106us
> Flush after every write:       429us
> Flush after every 2nd write:   266us
> Flush after every 4th write:   180us
> Flush after every 8th write:   141us
> Flush after every 16th write:  121us
> 
> A flush after every 8th write delays the link up by 35us and the
> negative impact to DMA transfers of other targets is still tolerable.
> 
> Execute a flush after every 8th write. This prevents overloading the
> interconnect with posted writes.
> 
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> CC: Vitaly Lifshits <vitaly.lifshits@intel.com>
> Link: https://lore.kernel.org/netdev/f8fe665a-5e6c-4f95-b47a-2f3281aa0e6c@lunn.ch/T/
> Signed-off-by: Gerhard Engleder <eg@keba.com>
> ---
> v3:
> - mention problematic platform explicitly (Bjorn Helgaas)
> - improve comment (Paul Menzel)
> 
> v2:
> - remove PREEMPT_RT dependency (Andrew Lunn, Przemek Kitszel)
> ---
>  drivers/net/ethernet/intel/e1000e/mac.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/mac.c b/drivers/net/ethernet/intel/e1000e/mac.c
> index d7df2a0ed629..0174c16bbb43 100644
> --- a/drivers/net/ethernet/intel/e1000e/mac.c
> +++ b/drivers/net/ethernet/intel/e1000e/mac.c
> @@ -331,8 +331,15 @@ void e1000e_update_mc_addr_list_generic(struct e1000_hw *hw,
>  	}
>  
>  	/* replace the entire MTA table */
> -	for (i = hw->mac.mta_reg_count - 1; i >= 0; i--)
> +	for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) {
>  		E1000_WRITE_REG_ARRAY(hw, E1000_MTA, i, hw->mac.mta_shadow[i]);
> +
> +		/* do not queue up too many posted writes to prevent increased
> +		 * latency for other devices on the interconnect
> +		 */

I think a multi-line comment should start with a capital letter and have
a '.' at the end of the sentence.

+ netdev code doesn't have the special rule for multi-line comments,
they should look the same way as in the rest of the kernel:

		/*
		 * Do not queue up ...
		 * latency ...
		 */

> +		if ((i % 8) == 0 && i != 0)
> +			e1e_flush();

IIRC explicit `== 0` / `!= 0` are considered redundant.

		if (!(i % 8) && i)

I'd also mention in the comment above that this means "flush each 8th
write" and why exactly 8.

> +	}
>  	e1e_flush();
>  }

Thanks,
Olek

Gerhard Engleder Dec. 18, 2024, 7:21 p.m. UTC | #6

On 18.12.24 09:36, Przemek Kitszel wrote:
> On 12/16/24 20:23, Gerhard Engleder wrote:
>>>> @@ -331,8 +331,15 @@ void e1000e_update_mc_addr_list_generic(struct 
>>>> e1000_hw *hw,
>>>>       }
>>>>       /* replace the entire MTA table */
>>>> -    for (i = hw->mac.mta_reg_count - 1; i >= 0; i--)
>>>> +    for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) {
>>>>           E1000_WRITE_REG_ARRAY(hw, E1000_MTA, i, hw- 
>>>> >mac.mta_shadow[i]);
>>>> +
>>>> +        /* do not queue up too many posted writes to prevent increased
>>>> +         * latency for other devices on the interconnect
>>>> +         */
>>>> +        if ((i % 8) == 0 && i != 0)
>>>> +            e1e_flush();
>>>
>>>
>>> I would prefer to avoid adding this code to all devices, particularly 
>>> those that don't operate on real-time systems. Implementing this code 
>>> will introduce three additional MMIO transactions which will increase 
>>> the driver start time in various flows (up, probe, etc.).
>>>
>>> Is there a specific reason not to use if 
>>> (IS_ENABLED(CONFIG_PREEMPT_RT)) as Andrew initially suggested?
>>
>> Andrew made two suggestions: IS_ENABLED(CONFIG_PREEMPT_RT) which I used
>> in the first version after the RFC. And he suggested to check for a
>> compromise between RT and none RT performance, as some distros might
>> enable PREEMPT_RT in the future.
>> Przemek suggested to remove the PREEMPT_RT check as "this change sounds
>> reasonable also for the standard kernel" after the first version with
>> IS_ENABLED(CONFIG_PREEMPT_RT).
>>
>> I used the PREEMPT_RT dependency to limit effects to real-time systems,
>> to not make none real-time systems slower. But I could also follow the
>> reasoning of Andrew and Przemek. With that said, I have no problem to
>> add IS_ENABLED(CONFIG_PREEMPT_RT) again.
>>
>> Gerhard
> 
> I'm also fine with limiting the change to RT kernels.

I will add IS_ENABLED(CONFIG_PREEMPT_RT).

Thanks!

Gerhard

Gerhard Engleder Dec. 18, 2024, 7:21 p.m. UTC | #7

On 18.12.24 16:08, Avigail Dahan wrote:
> 
> 
> On 14/12/2024 21:16, Gerhard Engleder wrote:
>> From: Gerhard Engleder <eg@keba.com>
>>
>> Link down and up triggers update of MTA table. This update executes many
>> PCIe writes and a final flush. Thus, PCIe will be blocked until all
>> writes are flushed. As a result, DMA transfers of other targets suffer
>> from delay in the range of 50us. This results in timing violations on
>> real-time systems during link down and up of e1000e in combination with
>> an Intel i3-2310E Sandy Bridge CPU.
>>
>> The i3-2310E is quite old. Launched 2011 by Intel but still in use as
>> robot controller. The exact root cause of the problem is unclear and
>> this situation won't change as Intel support for this CPU has ended
>> years ago. Our experience is that the number of posted PCIe writes needs
>> to be limited at least for real-time systems. With posted PCIe writes a
>> much higher throughput can be generated than with PCIe reads which
>> cannot be posted. Thus, the load on the interconnect is much higher.
>> Additionally, a PCIe read waits until all posted PCIe writes are done.
>> Therefore, the PCIe read can block the CPU for much more than 10us if a
>> lot of PCIe writes were posted before. Both issues are the reason why we
>> are limiting the number of posted PCIe writes in row in general for our
>> real-time systems, not only for this driver.
>>
>> A flush after a low enough number of posted PCIe writes eliminates the
>> delay but also increases the time needed for MTA table update. The
>> following measurements were done on i3-2310E with e1000e for 128 MTA
>> table entries:
>>
>> Single flush after all writes: 106us
>> Flush after every write:       429us
>> Flush after every 2nd write:   266us
>> Flush after every 4th write:   180us
>> Flush after every 8th write:   141us
>> Flush after every 16th write:  121us
>>
>> A flush after every 8th write delays the link up by 35us and the
>> negative impact to DMA transfers of other targets is still tolerable.
>>
>> Execute a flush after every 8th write. This prevents overloading the
>> interconnect with posted writes.
>>
>> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
>> CC: Vitaly Lifshits <vitaly.lifshits@intel.com>
>> Link: 
>> https://lore.kernel.org/netdev/f8fe665a-5e6c-4f95-b47a-2f3281aa0e6c@lunn.ch/T/
>> Signed-off-by: Gerhard Engleder <eg@keba.com>
>> ---
>> v3:
>> - mention problematic platform explicitly (Bjorn Helgaas)
>> - improve comment (Paul Menzel)
>>
>> v2:
>> - remove PREEMPT_RT dependency (Andrew Lunn, Przemek Kitszel)
>> ---
>>   drivers/net/ethernet/intel/e1000e/mac.c | 9 ++++++++-
>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>
> Tested-by: Avigail Dahan <avigailx.dahan@intel.com>

Thank you for the test!

Gerhard

Gerhard Engleder Dec. 18, 2024, 7:43 p.m. UTC | #8

On 18.12.24 16:23, Alexander Lobakin wrote:
> From: Gerhard Engleder <gerhard@engleder-embedded.com>
> Date: Sat, 14 Dec 2024 20:16:23 +0100
> 
>> From: Gerhard Engleder <eg@keba.com>
>>
>> Link down and up triggers update of MTA table. This update executes many
>> PCIe writes and a final flush. Thus, PCIe will be blocked until all
>> writes are flushed. As a result, DMA transfers of other targets suffer
>> from delay in the range of 50us. This results in timing violations on
>> real-time systems during link down and up of e1000e in combination with
>> an Intel i3-2310E Sandy Bridge CPU.
>>
>> The i3-2310E is quite old. Launched 2011 by Intel but still in use as
>> robot controller. The exact root cause of the problem is unclear and
>> this situation won't change as Intel support for this CPU has ended
>> years ago. Our experience is that the number of posted PCIe writes needs
>> to be limited at least for real-time systems. With posted PCIe writes a
>> much higher throughput can be generated than with PCIe reads which
>> cannot be posted. Thus, the load on the interconnect is much higher.
>> Additionally, a PCIe read waits until all posted PCIe writes are done.
>> Therefore, the PCIe read can block the CPU for much more than 10us if a
>> lot of PCIe writes were posted before. Both issues are the reason why we
>> are limiting the number of posted PCIe writes in row in general for our
>> real-time systems, not only for this driver.
>>
>> A flush after a low enough number of posted PCIe writes eliminates the
>> delay but also increases the time needed for MTA table update. The
>> following measurements were done on i3-2310E with e1000e for 128 MTA
>> table entries:
>>
>> Single flush after all writes: 106us
>> Flush after every write:       429us
>> Flush after every 2nd write:   266us
>> Flush after every 4th write:   180us
>> Flush after every 8th write:   141us
>> Flush after every 16th write:  121us
>>
>> A flush after every 8th write delays the link up by 35us and the
>> negative impact to DMA transfers of other targets is still tolerable.
>>
>> Execute a flush after every 8th write. This prevents overloading the
>> interconnect with posted writes.
>>
>> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
>> CC: Vitaly Lifshits <vitaly.lifshits@intel.com>
>> Link: https://lore.kernel.org/netdev/f8fe665a-5e6c-4f95-b47a-2f3281aa0e6c@lunn.ch/T/
>> Signed-off-by: Gerhard Engleder <eg@keba.com>
>> ---
>> v3:
>> - mention problematic platform explicitly (Bjorn Helgaas)
>> - improve comment (Paul Menzel)
>>
>> v2:
>> - remove PREEMPT_RT dependency (Andrew Lunn, Przemek Kitszel)
>> ---
>>   drivers/net/ethernet/intel/e1000e/mac.c | 9 ++++++++-
>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/intel/e1000e/mac.c b/drivers/net/ethernet/intel/e1000e/mac.c
>> index d7df2a0ed629..0174c16bbb43 100644
>> --- a/drivers/net/ethernet/intel/e1000e/mac.c
>> +++ b/drivers/net/ethernet/intel/e1000e/mac.c
>> @@ -331,8 +331,15 @@ void e1000e_update_mc_addr_list_generic(struct e1000_hw *hw,
>>   	}
>>   
>>   	/* replace the entire MTA table */
>> -	for (i = hw->mac.mta_reg_count - 1; i >= 0; i--)
>> +	for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) {
>>   		E1000_WRITE_REG_ARRAY(hw, E1000_MTA, i, hw->mac.mta_shadow[i]);
>> +
>> +		/* do not queue up too many posted writes to prevent increased
>> +		 * latency for other devices on the interconnect
>> +		 */
> 
> I think a multi-line comment should start with a capital letter and have
> a '.' at the end of the sentence.
> 
> + netdev code doesn't have the special rule for multi-line comments,
> they should look the same way as in the rest of the kernel:
> 
> 		/*
> 		 * Do not queue up ...
> 		 * latency ...
> 		 */

Oh the preferred style changed, I missed that. Will be done.

>> +		if ((i % 8) == 0 && i != 0)
>> +			e1e_flush();
> 
> IIRC explicit `== 0` / `!= 0` are considered redundant.
> 
> 		if (!(i % 8) && i)

You are right, will be changed.

> 
> I'd also mention in the comment above that this means "flush each 8th
> write" and why exactly 8.

I will add that information to the comment.

Thank you for the review!

Gerhard

[iwl-next,v3] e1000e: Fix real-time violations on link up

Checks

Commit Message

Comments

Patch