mbox series

[0/2] Fix TX/RX interrupt handling

Message ID 20230310172050.1394-1-mario.limonciello@amd.com (mailing list archive)
Headers show
Series Fix TX/RX interrupt handling | expand

Message

Mario Limonciello March 10, 2023, 5:20 p.m. UTC
Previously a patch series was sent up to change the way that DROM was read
to prefer directly from NVM instead of bit banging.

This series was produced due to issues found where TBT3 DROM CRC wouldn't
match.  In looking at it from USB4 analyzer the DROM wasn't corrupted
before it arrived at the router.  In analyzing the failure mode, every
single failure occurred during a retried TX because RX interrupt
"never came".

This was actually a smoking gun; when the hardware responded too quickly
both TX and RX interrupt status bits were set before the ISR would run.
By the ISR using auto clear on read to process the TX this would make the
RX interrupt bit get lost and the RX interrupt was never handled.

To fix this issue, disable auto clear in the ISR and instead only clear
the interrupt that is actually triggering the ISR.

This fixes the communication for a long series of transactions such as
bit banging and probably also fixes other situations that control transfers
were retried a number of times due to a missing RX.

Mario Limonciello (2):
  thunderbolt: Use const qualifier for `ring_interrupt_index`
  thunderbolt: Disable interrupt auto clear for rings

 drivers/thunderbolt/nhi.c      | 42 +++++++++++++++++++++-------------
 drivers/thunderbolt/nhi_regs.h |  6 +++--
 2 files changed, 30 insertions(+), 18 deletions(-)

Comments

Mika Westerberg March 14, 2023, 2:28 p.m. UTC | #1
Hi Mario,

On Fri, Mar 10, 2023 at 11:20:48AM -0600, Mario Limonciello wrote:
> Previously a patch series was sent up to change the way that DROM was read
> to prefer directly from NVM instead of bit banging.
> 
> This series was produced due to issues found where TBT3 DROM CRC wouldn't
> match.  In looking at it from USB4 analyzer the DROM wasn't corrupted
> before it arrived at the router.  In analyzing the failure mode, every
> single failure occurred during a retried TX because RX interrupt
> "never came".
> 
> This was actually a smoking gun; when the hardware responded too quickly
> both TX and RX interrupt status bits were set before the ISR would run.
> By the ISR using auto clear on read to process the TX this would make the
> RX interrupt bit get lost and the RX interrupt was never handled.
> 
> To fix this issue, disable auto clear in the ISR and instead only clear
> the interrupt that is actually triggering the ISR.
> 
> This fixes the communication for a long series of transactions such as
> bit banging and probably also fixes other situations that control transfers
> were retried a number of times due to a missing RX.
> 
> Mario Limonciello (2):
>   thunderbolt: Use const qualifier for `ring_interrupt_index`
>   thunderbolt: Disable interrupt auto clear for rings

Applied both to thunderbolt.git/fixes for v6.3-rc and marked them for
stable as well. Thanks! I dropped the other patch that adjusted the NVM
reading as now it is not necessary anymore (please correct me if I'm
mistaken).