diff mbox series

[RFC,net-next] e1000e: Fix real-time violations on link up

Message ID 20241011195412.51804-1-gerhard@engleder-embedded.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series [RFC,net-next] e1000e: Fix real-time violations on link up | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5 this patch: 5
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 16 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 29 this patch: 29
netdev/source_inline success Was 0 now: 0

Commit Message

Gerhard Engleder Oct. 11, 2024, 7:54 p.m. UTC
From: Gerhard Engleder <eg@keba.com>

Link down and up triggers update of MTA table. This update executes many
PCIe writes and a final flush. Thus, PCIe will be blocked until all writes
are flushed. As a result, DMA transfers of other targets suffer from delay
in the range of 50us. The result are timing violations on real-time
systems during link down and up of e1000e.

Execute a flush after every single write. This prevents overloading the
interconnect with posted writes. As this also increases the time spent for
MTA table update considerable this change is limited to PREEMPT_RT.

Signed-off-by: Gerhard Engleder <eg@keba.com>
---
 drivers/net/ethernet/intel/e1000e/mac.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Andrew Lunn Oct. 12, 2024, 6:42 p.m. UTC | #1
On Fri, Oct 11, 2024 at 09:54:12PM +0200, Gerhard Engleder wrote:
> From: Gerhard Engleder <eg@keba.com>
> 
> Link down and up triggers update of MTA table. This update executes many
> PCIe writes and a final flush. Thus, PCIe will be blocked until all writes
> are flushed. As a result, DMA transfers of other targets suffer from delay
> in the range of 50us. The result are timing violations on real-time
> systems during link down and up of e1000e.
> 
> Execute a flush after every single write. This prevents overloading the
> interconnect with posted writes. As this also increases the time spent for
> MTA table update considerable this change is limited to PREEMPT_RT.
> 
> Signed-off-by: Gerhard Engleder <eg@keba.com>
> ---
>  drivers/net/ethernet/intel/e1000e/mac.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/mac.c b/drivers/net/ethernet/intel/e1000e/mac.c
> index d7df2a0ed629..f4693d355886 100644
> --- a/drivers/net/ethernet/intel/e1000e/mac.c
> +++ b/drivers/net/ethernet/intel/e1000e/mac.c
> @@ -331,9 +331,15 @@ void e1000e_update_mc_addr_list_generic(struct e1000_hw *hw,
>  	}
>  
>  	/* replace the entire MTA table */
> -	for (i = hw->mac.mta_reg_count - 1; i >= 0; i--)
> +	for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) {
>  		E1000_WRITE_REG_ARRAY(hw, E1000_MTA, i, hw->mac.mta_shadow[i]);
> +#ifdef CONFIG_PREEMPT_RT
> +		e1e_flush();
> +#endif
> +	}
> +#ifndef CONFIG_PREEMPT_RT
>  	e1e_flush();
> +#endif

#ifdef FOO is generally not liked because it reduces the effectiveness
of build testing.

Two suggestions:

	if (IS_ENABLED(CONFIG_PREEMPT_RT))
		e1e_flush();

This will then end up as and if (0) or if (1), with the statement
following it always being compiled, and then optimised out if not
needed.

Alternatively, consider something like:

	if (i % 8)
		e1e_flush()

if there is a reasonable compromise between RT and none RT
performance. Given that RT is now fully merged, we might see some
distros enable it, so a compromise would probably be better.

	Andrew
Gerhard Engleder Oct. 14, 2024, 5:59 p.m. UTC | #2
On 12.10.24 20:42, Andrew Lunn wrote:
> On Fri, Oct 11, 2024 at 09:54:12PM +0200, Gerhard Engleder wrote:
>> From: Gerhard Engleder <eg@keba.com>
>>
>> Link down and up triggers update of MTA table. This update executes many
>> PCIe writes and a final flush. Thus, PCIe will be blocked until all writes
>> are flushed. As a result, DMA transfers of other targets suffer from delay
>> in the range of 50us. The result are timing violations on real-time
>> systems during link down and up of e1000e.
>>
>> Execute a flush after every single write. This prevents overloading the
>> interconnect with posted writes. As this also increases the time spent for
>> MTA table update considerable this change is limited to PREEMPT_RT.
>>
>> Signed-off-by: Gerhard Engleder <eg@keba.com>
>> ---
>>   drivers/net/ethernet/intel/e1000e/mac.c | 8 +++++++-
>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/intel/e1000e/mac.c b/drivers/net/ethernet/intel/e1000e/mac.c
>> index d7df2a0ed629..f4693d355886 100644
>> --- a/drivers/net/ethernet/intel/e1000e/mac.c
>> +++ b/drivers/net/ethernet/intel/e1000e/mac.c
>> @@ -331,9 +331,15 @@ void e1000e_update_mc_addr_list_generic(struct e1000_hw *hw,
>>   	}
>>   
>>   	/* replace the entire MTA table */
>> -	for (i = hw->mac.mta_reg_count - 1; i >= 0; i--)
>> +	for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) {
>>   		E1000_WRITE_REG_ARRAY(hw, E1000_MTA, i, hw->mac.mta_shadow[i]);
>> +#ifdef CONFIG_PREEMPT_RT
>> +		e1e_flush();
>> +#endif
>> +	}
>> +#ifndef CONFIG_PREEMPT_RT
>>   	e1e_flush();
>> +#endif
> 
> #ifdef FOO is generally not liked because it reduces the effectiveness
> of build testing.
> 
> Two suggestions:
> 
> 	if (IS_ENABLED(CONFIG_PREEMPT_RT))
> 		e1e_flush();

I will do that.

> This will then end up as and if (0) or if (1), with the statement
> following it always being compiled, and then optimised out if not
> needed.
> 
> Alternatively, consider something like:
> 
> 	if (i % 8)
> 		e1e_flush()
> 
> if there is a reasonable compromise between RT and none RT
> performance. Given that RT is now fully merged, we might see some
> distros enable it, so a compromise would probably be better.

Yes, read/flush after every posted write is likely too much. I will
do some testing how often flush is required.

Thank you for your feedback Andrew!

Any comments from Intel driver maintainers?

Gerhard
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/e1000e/mac.c b/drivers/net/ethernet/intel/e1000e/mac.c
index d7df2a0ed629..f4693d355886 100644
--- a/drivers/net/ethernet/intel/e1000e/mac.c
+++ b/drivers/net/ethernet/intel/e1000e/mac.c
@@ -331,9 +331,15 @@  void e1000e_update_mc_addr_list_generic(struct e1000_hw *hw,
 	}
 
 	/* replace the entire MTA table */
-	for (i = hw->mac.mta_reg_count - 1; i >= 0; i--)
+	for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) {
 		E1000_WRITE_REG_ARRAY(hw, E1000_MTA, i, hw->mac.mta_shadow[i]);
+#ifdef CONFIG_PREEMPT_RT
+		e1e_flush();
+#endif
+	}
+#ifndef CONFIG_PREEMPT_RT
 	e1e_flush();
+#endif
 }
 
 /**