Message ID | 20241214191623.7256-1-gerhard@engleder-embedded.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Krzysztof Wilczyński |
Headers | show |
Series | [iwl-next,v3] e1000e: Fix real-time violations on link up | expand |
On 12/14/2024 9:16 PM, Gerhard Engleder wrote: > From: Gerhard Engleder <eg@keba.com> > > Link down and up triggers update of MTA table. This update executes many > PCIe writes and a final flush. Thus, PCIe will be blocked until all > writes are flushed. As a result, DMA transfers of other targets suffer > from delay in the range of 50us. This results in timing violations on > real-time systems during link down and up of e1000e in combination with > an Intel i3-2310E Sandy Bridge CPU. > > The i3-2310E is quite old. Launched 2011 by Intel but still in use as > robot controller. The exact root cause of the problem is unclear and > this situation won't change as Intel support for this CPU has ended > years ago. Our experience is that the number of posted PCIe writes needs > to be limited at least for real-time systems. With posted PCIe writes a > much higher throughput can be generated than with PCIe reads which > cannot be posted. Thus, the load on the interconnect is much higher. > Additionally, a PCIe read waits until all posted PCIe writes are done. > Therefore, the PCIe read can block the CPU for much more than 10us if a > lot of PCIe writes were posted before. Both issues are the reason why we > are limiting the number of posted PCIe writes in row in general for our > real-time systems, not only for this driver. > > A flush after a low enough number of posted PCIe writes eliminates the > delay but also increases the time needed for MTA table update. The > following measurements were done on i3-2310E with e1000e for 128 MTA > table entries: > > Single flush after all writes: 106us > Flush after every write: 429us > Flush after every 2nd write: 266us > Flush after every 4th write: 180us > Flush after every 8th write: 141us > Flush after every 16th write: 121us > > A flush after every 8th write delays the link up by 35us and the > negative impact to DMA transfers of other targets is still tolerable. > > Execute a flush after every 8th write. This prevents overloading the > interconnect with posted writes. > > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> > CC: Vitaly Lifshits <vitaly.lifshits@intel.com> > Link: https://lore.kernel.org/netdev/f8fe665a-5e6c-4f95-b47a-2f3281aa0e6c@lunn.ch/T/ > Signed-off-by: Gerhard Engleder <eg@keba.com> > --- > v3: > - mention problematic platform explicitly (Bjorn Helgaas) > - improve comment (Paul Menzel) > > v2: > - remove PREEMPT_RT dependency (Andrew Lunn, Przemek Kitszel) > --- > drivers/net/ethernet/intel/e1000e/mac.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/e1000e/mac.c b/drivers/net/ethernet/intel/e1000e/mac.c > index d7df2a0ed629..0174c16bbb43 100644 > --- a/drivers/net/ethernet/intel/e1000e/mac.c > +++ b/drivers/net/ethernet/intel/e1000e/mac.c > @@ -331,8 +331,15 @@ void e1000e_update_mc_addr_list_generic(struct e1000_hw *hw, > } > > /* replace the entire MTA table */ > - for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) > + for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) { > E1000_WRITE_REG_ARRAY(hw, E1000_MTA, i, hw->mac.mta_shadow[i]); > + > + /* do not queue up too many posted writes to prevent increased > + * latency for other devices on the interconnect > + */ > + if ((i % 8) == 0 && i != 0) > + e1e_flush(); I would prefer to avoid adding this code to all devices, particularly those that don't operate on real-time systems. Implementing this code will introduce three additional MMIO transactions which will increase the driver start time in various flows (up, probe, etc.). Is there a specific reason not to use if (IS_ENABLED(CONFIG_PREEMPT_RT)) as Andrew initially suggested? > + } > e1e_flush(); > } >
>> @@ -331,8 +331,15 @@ void e1000e_update_mc_addr_list_generic(struct >> e1000_hw *hw, >> } >> /* replace the entire MTA table */ >> - for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) >> + for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) { >> E1000_WRITE_REG_ARRAY(hw, E1000_MTA, i, hw->mac.mta_shadow[i]); >> + >> + /* do not queue up too many posted writes to prevent increased >> + * latency for other devices on the interconnect >> + */ >> + if ((i % 8) == 0 && i != 0) >> + e1e_flush(); > > > I would prefer to avoid adding this code to all devices, particularly > those that don't operate on real-time systems. Implementing this code > will introduce three additional MMIO transactions which will increase > the driver start time in various flows (up, probe, etc.). > > Is there a specific reason not to use if (IS_ENABLED(CONFIG_PREEMPT_RT)) > as Andrew initially suggested? Andrew made two suggestions: IS_ENABLED(CONFIG_PREEMPT_RT) which I used in the first version after the RFC. And he suggested to check for a compromise between RT and none RT performance, as some distros might enable PREEMPT_RT in the future. Przemek suggested to remove the PREEMPT_RT check as "this change sounds reasonable also for the standard kernel" after the first version with IS_ENABLED(CONFIG_PREEMPT_RT). I used the PREEMPT_RT dependency to limit effects to real-time systems, to not make none real-time systems slower. But I could also follow the reasoning of Andrew and Przemek. With that said, I have no problem to add IS_ENABLED(CONFIG_PREEMPT_RT) again. Gerhard
On 12/16/24 20:23, Gerhard Engleder wrote: >>> @@ -331,8 +331,15 @@ void e1000e_update_mc_addr_list_generic(struct >>> e1000_hw *hw, >>> } >>> /* replace the entire MTA table */ >>> - for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) >>> + for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) { >>> E1000_WRITE_REG_ARRAY(hw, E1000_MTA, i, hw- >>> >mac.mta_shadow[i]); >>> + >>> + /* do not queue up too many posted writes to prevent increased >>> + * latency for other devices on the interconnect >>> + */ >>> + if ((i % 8) == 0 && i != 0) >>> + e1e_flush(); >> >> >> I would prefer to avoid adding this code to all devices, particularly >> those that don't operate on real-time systems. Implementing this code >> will introduce three additional MMIO transactions which will increase >> the driver start time in various flows (up, probe, etc.). >> >> Is there a specific reason not to use if >> (IS_ENABLED(CONFIG_PREEMPT_RT)) as Andrew initially suggested? > > Andrew made two suggestions: IS_ENABLED(CONFIG_PREEMPT_RT) which I used > in the first version after the RFC. And he suggested to check for a > compromise between RT and none RT performance, as some distros might > enable PREEMPT_RT in the future. > Przemek suggested to remove the PREEMPT_RT check as "this change sounds > reasonable also for the standard kernel" after the first version with > IS_ENABLED(CONFIG_PREEMPT_RT). > > I used the PREEMPT_RT dependency to limit effects to real-time systems, > to not make none real-time systems slower. But I could also follow the > reasoning of Andrew and Przemek. With that said, I have no problem to > add IS_ENABLED(CONFIG_PREEMPT_RT) again. > > Gerhard I'm also fine with limiting the change to RT kernels.
On 14/12/2024 21:16, Gerhard Engleder wrote: > From: Gerhard Engleder <eg@keba.com> > > Link down and up triggers update of MTA table. This update executes many > PCIe writes and a final flush. Thus, PCIe will be blocked until all > writes are flushed. As a result, DMA transfers of other targets suffer > from delay in the range of 50us. This results in timing violations on > real-time systems during link down and up of e1000e in combination with > an Intel i3-2310E Sandy Bridge CPU. > > The i3-2310E is quite old. Launched 2011 by Intel but still in use as > robot controller. The exact root cause of the problem is unclear and > this situation won't change as Intel support for this CPU has ended > years ago. Our experience is that the number of posted PCIe writes needs > to be limited at least for real-time systems. With posted PCIe writes a > much higher throughput can be generated than with PCIe reads which > cannot be posted. Thus, the load on the interconnect is much higher. > Additionally, a PCIe read waits until all posted PCIe writes are done. > Therefore, the PCIe read can block the CPU for much more than 10us if a > lot of PCIe writes were posted before. Both issues are the reason why we > are limiting the number of posted PCIe writes in row in general for our > real-time systems, not only for this driver. > > A flush after a low enough number of posted PCIe writes eliminates the > delay but also increases the time needed for MTA table update. The > following measurements were done on i3-2310E with e1000e for 128 MTA > table entries: > > Single flush after all writes: 106us > Flush after every write: 429us > Flush after every 2nd write: 266us > Flush after every 4th write: 180us > Flush after every 8th write: 141us > Flush after every 16th write: 121us > > A flush after every 8th write delays the link up by 35us and the > negative impact to DMA transfers of other targets is still tolerable. > > Execute a flush after every 8th write. This prevents overloading the > interconnect with posted writes. > > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> > CC: Vitaly Lifshits <vitaly.lifshits@intel.com> > Link: https://lore.kernel.org/netdev/f8fe665a-5e6c-4f95-b47a-2f3281aa0e6c@lunn.ch/T/ > Signed-off-by: Gerhard Engleder <eg@keba.com> > --- > v3: > - mention problematic platform explicitly (Bjorn Helgaas) > - improve comment (Paul Menzel) > > v2: > - remove PREEMPT_RT dependency (Andrew Lunn, Przemek Kitszel) > --- > drivers/net/ethernet/intel/e1000e/mac.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
From: Gerhard Engleder <gerhard@engleder-embedded.com> Date: Sat, 14 Dec 2024 20:16:23 +0100 > From: Gerhard Engleder <eg@keba.com> > > Link down and up triggers update of MTA table. This update executes many > PCIe writes and a final flush. Thus, PCIe will be blocked until all > writes are flushed. As a result, DMA transfers of other targets suffer > from delay in the range of 50us. This results in timing violations on > real-time systems during link down and up of e1000e in combination with > an Intel i3-2310E Sandy Bridge CPU. > > The i3-2310E is quite old. Launched 2011 by Intel but still in use as > robot controller. The exact root cause of the problem is unclear and > this situation won't change as Intel support for this CPU has ended > years ago. Our experience is that the number of posted PCIe writes needs > to be limited at least for real-time systems. With posted PCIe writes a > much higher throughput can be generated than with PCIe reads which > cannot be posted. Thus, the load on the interconnect is much higher. > Additionally, a PCIe read waits until all posted PCIe writes are done. > Therefore, the PCIe read can block the CPU for much more than 10us if a > lot of PCIe writes were posted before. Both issues are the reason why we > are limiting the number of posted PCIe writes in row in general for our > real-time systems, not only for this driver. > > A flush after a low enough number of posted PCIe writes eliminates the > delay but also increases the time needed for MTA table update. The > following measurements were done on i3-2310E with e1000e for 128 MTA > table entries: > > Single flush after all writes: 106us > Flush after every write: 429us > Flush after every 2nd write: 266us > Flush after every 4th write: 180us > Flush after every 8th write: 141us > Flush after every 16th write: 121us > > A flush after every 8th write delays the link up by 35us and the > negative impact to DMA transfers of other targets is still tolerable. > > Execute a flush after every 8th write. This prevents overloading the > interconnect with posted writes. > > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> > CC: Vitaly Lifshits <vitaly.lifshits@intel.com> > Link: https://lore.kernel.org/netdev/f8fe665a-5e6c-4f95-b47a-2f3281aa0e6c@lunn.ch/T/ > Signed-off-by: Gerhard Engleder <eg@keba.com> > --- > v3: > - mention problematic platform explicitly (Bjorn Helgaas) > - improve comment (Paul Menzel) > > v2: > - remove PREEMPT_RT dependency (Andrew Lunn, Przemek Kitszel) > --- > drivers/net/ethernet/intel/e1000e/mac.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/e1000e/mac.c b/drivers/net/ethernet/intel/e1000e/mac.c > index d7df2a0ed629..0174c16bbb43 100644 > --- a/drivers/net/ethernet/intel/e1000e/mac.c > +++ b/drivers/net/ethernet/intel/e1000e/mac.c > @@ -331,8 +331,15 @@ void e1000e_update_mc_addr_list_generic(struct e1000_hw *hw, > } > > /* replace the entire MTA table */ > - for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) > + for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) { > E1000_WRITE_REG_ARRAY(hw, E1000_MTA, i, hw->mac.mta_shadow[i]); > + > + /* do not queue up too many posted writes to prevent increased > + * latency for other devices on the interconnect > + */ I think a multi-line comment should start with a capital letter and have a '.' at the end of the sentence. + netdev code doesn't have the special rule for multi-line comments, they should look the same way as in the rest of the kernel: /* * Do not queue up ... * latency ... */ > + if ((i % 8) == 0 && i != 0) > + e1e_flush(); IIRC explicit `== 0` / `!= 0` are considered redundant. if (!(i % 8) && i) I'd also mention in the comment above that this means "flush each 8th write" and why exactly 8. > + } > e1e_flush(); > } Thanks, Olek
On 18.12.24 09:36, Przemek Kitszel wrote: > On 12/16/24 20:23, Gerhard Engleder wrote: >>>> @@ -331,8 +331,15 @@ void e1000e_update_mc_addr_list_generic(struct >>>> e1000_hw *hw, >>>> } >>>> /* replace the entire MTA table */ >>>> - for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) >>>> + for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) { >>>> E1000_WRITE_REG_ARRAY(hw, E1000_MTA, i, hw- >>>> >mac.mta_shadow[i]); >>>> + >>>> + /* do not queue up too many posted writes to prevent increased >>>> + * latency for other devices on the interconnect >>>> + */ >>>> + if ((i % 8) == 0 && i != 0) >>>> + e1e_flush(); >>> >>> >>> I would prefer to avoid adding this code to all devices, particularly >>> those that don't operate on real-time systems. Implementing this code >>> will introduce three additional MMIO transactions which will increase >>> the driver start time in various flows (up, probe, etc.). >>> >>> Is there a specific reason not to use if >>> (IS_ENABLED(CONFIG_PREEMPT_RT)) as Andrew initially suggested? >> >> Andrew made two suggestions: IS_ENABLED(CONFIG_PREEMPT_RT) which I used >> in the first version after the RFC. And he suggested to check for a >> compromise between RT and none RT performance, as some distros might >> enable PREEMPT_RT in the future. >> Przemek suggested to remove the PREEMPT_RT check as "this change sounds >> reasonable also for the standard kernel" after the first version with >> IS_ENABLED(CONFIG_PREEMPT_RT). >> >> I used the PREEMPT_RT dependency to limit effects to real-time systems, >> to not make none real-time systems slower. But I could also follow the >> reasoning of Andrew and Przemek. With that said, I have no problem to >> add IS_ENABLED(CONFIG_PREEMPT_RT) again. >> >> Gerhard > > I'm also fine with limiting the change to RT kernels. I will add IS_ENABLED(CONFIG_PREEMPT_RT). Thanks! Gerhard
On 18.12.24 16:08, Avigail Dahan wrote: > > > On 14/12/2024 21:16, Gerhard Engleder wrote: >> From: Gerhard Engleder <eg@keba.com> >> >> Link down and up triggers update of MTA table. This update executes many >> PCIe writes and a final flush. Thus, PCIe will be blocked until all >> writes are flushed. As a result, DMA transfers of other targets suffer >> from delay in the range of 50us. This results in timing violations on >> real-time systems during link down and up of e1000e in combination with >> an Intel i3-2310E Sandy Bridge CPU. >> >> The i3-2310E is quite old. Launched 2011 by Intel but still in use as >> robot controller. The exact root cause of the problem is unclear and >> this situation won't change as Intel support for this CPU has ended >> years ago. Our experience is that the number of posted PCIe writes needs >> to be limited at least for real-time systems. With posted PCIe writes a >> much higher throughput can be generated than with PCIe reads which >> cannot be posted. Thus, the load on the interconnect is much higher. >> Additionally, a PCIe read waits until all posted PCIe writes are done. >> Therefore, the PCIe read can block the CPU for much more than 10us if a >> lot of PCIe writes were posted before. Both issues are the reason why we >> are limiting the number of posted PCIe writes in row in general for our >> real-time systems, not only for this driver. >> >> A flush after a low enough number of posted PCIe writes eliminates the >> delay but also increases the time needed for MTA table update. The >> following measurements were done on i3-2310E with e1000e for 128 MTA >> table entries: >> >> Single flush after all writes: 106us >> Flush after every write: 429us >> Flush after every 2nd write: 266us >> Flush after every 4th write: 180us >> Flush after every 8th write: 141us >> Flush after every 16th write: 121us >> >> A flush after every 8th write delays the link up by 35us and the >> negative impact to DMA transfers of other targets is still tolerable. >> >> Execute a flush after every 8th write. This prevents overloading the >> interconnect with posted writes. >> >> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> >> CC: Vitaly Lifshits <vitaly.lifshits@intel.com> >> Link: >> https://lore.kernel.org/netdev/f8fe665a-5e6c-4f95-b47a-2f3281aa0e6c@lunn.ch/T/ >> Signed-off-by: Gerhard Engleder <eg@keba.com> >> --- >> v3: >> - mention problematic platform explicitly (Bjorn Helgaas) >> - improve comment (Paul Menzel) >> >> v2: >> - remove PREEMPT_RT dependency (Andrew Lunn, Przemek Kitszel) >> --- >> drivers/net/ethernet/intel/e1000e/mac.c | 9 ++++++++- >> 1 file changed, 8 insertions(+), 1 deletion(-) >> > Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Thank you for the test! Gerhard
On 18.12.24 16:23, Alexander Lobakin wrote: > From: Gerhard Engleder <gerhard@engleder-embedded.com> > Date: Sat, 14 Dec 2024 20:16:23 +0100 > >> From: Gerhard Engleder <eg@keba.com> >> >> Link down and up triggers update of MTA table. This update executes many >> PCIe writes and a final flush. Thus, PCIe will be blocked until all >> writes are flushed. As a result, DMA transfers of other targets suffer >> from delay in the range of 50us. This results in timing violations on >> real-time systems during link down and up of e1000e in combination with >> an Intel i3-2310E Sandy Bridge CPU. >> >> The i3-2310E is quite old. Launched 2011 by Intel but still in use as >> robot controller. The exact root cause of the problem is unclear and >> this situation won't change as Intel support for this CPU has ended >> years ago. Our experience is that the number of posted PCIe writes needs >> to be limited at least for real-time systems. With posted PCIe writes a >> much higher throughput can be generated than with PCIe reads which >> cannot be posted. Thus, the load on the interconnect is much higher. >> Additionally, a PCIe read waits until all posted PCIe writes are done. >> Therefore, the PCIe read can block the CPU for much more than 10us if a >> lot of PCIe writes were posted before. Both issues are the reason why we >> are limiting the number of posted PCIe writes in row in general for our >> real-time systems, not only for this driver. >> >> A flush after a low enough number of posted PCIe writes eliminates the >> delay but also increases the time needed for MTA table update. The >> following measurements were done on i3-2310E with e1000e for 128 MTA >> table entries: >> >> Single flush after all writes: 106us >> Flush after every write: 429us >> Flush after every 2nd write: 266us >> Flush after every 4th write: 180us >> Flush after every 8th write: 141us >> Flush after every 16th write: 121us >> >> A flush after every 8th write delays the link up by 35us and the >> negative impact to DMA transfers of other targets is still tolerable. >> >> Execute a flush after every 8th write. This prevents overloading the >> interconnect with posted writes. >> >> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> >> CC: Vitaly Lifshits <vitaly.lifshits@intel.com> >> Link: https://lore.kernel.org/netdev/f8fe665a-5e6c-4f95-b47a-2f3281aa0e6c@lunn.ch/T/ >> Signed-off-by: Gerhard Engleder <eg@keba.com> >> --- >> v3: >> - mention problematic platform explicitly (Bjorn Helgaas) >> - improve comment (Paul Menzel) >> >> v2: >> - remove PREEMPT_RT dependency (Andrew Lunn, Przemek Kitszel) >> --- >> drivers/net/ethernet/intel/e1000e/mac.c | 9 ++++++++- >> 1 file changed, 8 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/ethernet/intel/e1000e/mac.c b/drivers/net/ethernet/intel/e1000e/mac.c >> index d7df2a0ed629..0174c16bbb43 100644 >> --- a/drivers/net/ethernet/intel/e1000e/mac.c >> +++ b/drivers/net/ethernet/intel/e1000e/mac.c >> @@ -331,8 +331,15 @@ void e1000e_update_mc_addr_list_generic(struct e1000_hw *hw, >> } >> >> /* replace the entire MTA table */ >> - for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) >> + for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) { >> E1000_WRITE_REG_ARRAY(hw, E1000_MTA, i, hw->mac.mta_shadow[i]); >> + >> + /* do not queue up too many posted writes to prevent increased >> + * latency for other devices on the interconnect >> + */ > > I think a multi-line comment should start with a capital letter and have > a '.' at the end of the sentence. > > + netdev code doesn't have the special rule for multi-line comments, > they should look the same way as in the rest of the kernel: > > /* > * Do not queue up ... > * latency ... > */ Oh the preferred style changed, I missed that. Will be done. >> + if ((i % 8) == 0 && i != 0) >> + e1e_flush(); > > IIRC explicit `== 0` / `!= 0` are considered redundant. > > if (!(i % 8) && i) You are right, will be changed. > > I'd also mention in the comment above that this means "flush each 8th > write" and why exactly 8. I will add that information to the comment. Thank you for the review! Gerhard
diff --git a/drivers/net/ethernet/intel/e1000e/mac.c b/drivers/net/ethernet/intel/e1000e/mac.c index d7df2a0ed629..0174c16bbb43 100644 --- a/drivers/net/ethernet/intel/e1000e/mac.c +++ b/drivers/net/ethernet/intel/e1000e/mac.c @@ -331,8 +331,15 @@ void e1000e_update_mc_addr_list_generic(struct e1000_hw *hw, } /* replace the entire MTA table */ - for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) + for (i = hw->mac.mta_reg_count - 1; i >= 0; i--) { E1000_WRITE_REG_ARRAY(hw, E1000_MTA, i, hw->mac.mta_shadow[i]); + + /* do not queue up too many posted writes to prevent increased + * latency for other devices on the interconnect + */ + if ((i % 8) == 0 && i != 0) + e1e_flush(); + } e1e_flush(); }