diff mbox series

[v2] net/tg3: fix race condition in tg3_reset_task()

Message ID 20231102161219.220-1-thinhtr@linux.vnet.ibm.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [v2] net/tg3: fix race condition in tg3_reset_task() | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1357 this patch: 1357
netdev/cc_maintainers warning 4 maintainers not CCed: edumazet@google.com kuba@kernel.org pabeni@redhat.com davem@davemloft.net
netdev/build_clang success Errors and warnings before: 1378 this patch: 1378
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1389 this patch: 1389
netdev/checkpatch warning WARNING: line length of 84 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Thinh Tran Nov. 2, 2023, 4:12 p.m. UTC
When an EEH error is encountered by a PCI adapter, the EEH driver
modifies the PCI channel's state as shown below:

   enum {
      /* I/O channel is in normal state */
      pci_channel_io_normal = (__force pci_channel_state_t) 1,

      /* I/O to channel is blocked */
      pci_channel_io_frozen = (__force pci_channel_state_t) 2,

      /* PCI card is dead */
      pci_channel_io_perm_failure = (__force pci_channel_state_t) 3,
   };

If the same EEH error then causes the tg3 driver's transmit timeout
logic to execute, the tg3_tx_timeout() function schedules a reset
task via tg3_reset_task_schedule(), which may cause a race condition
between the tg3 and EEH driver as both attempt to recover the HW via
a reset action.

EEH driver gets error event
--> eeh_set_channel_state()
    and set device to one of
    error state above		scheduler: tg3_reset_task() get 
   				returned error from tg3_init_hw()
			     --> dev_close() shuts down the interface

tg3_io_slot_reset() and 
tg3_io_resume() fail to
reset/resume the device


To resolve this issue, we avoid the race condition by checking the PCI
channel state in the tg3_tx_timeout() function and skip the tg3 driver
initiated reset when the PCI channel is not in the normal state.  (The
driver has no access to tg3 device registers at this point and cannot
even complete the reset task successfully without external assistance.)
We'll leave the reset procedure to be managed by the EEH driver which
calls the tg3_io_error_detected(), tg3_io_slot_reset() and 
tg3_io_resume() functions as appropriate. 



Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
Tested-by: Venkata Sai Duggi <venkata.sai.duggi@ibm.com>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>

---
 drivers/net/ethernet/broadcom/tg3.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

Comments

Michael Chan Nov. 2, 2023, 5:27 p.m. UTC | #1
On Thu, Nov 2, 2023 at 9:16 AM Thinh Tran <thinhtr@linux.vnet.ibm.com> wrote:
>
> When an EEH error is encountered by a PCI adapter, the EEH driver
> modifies the PCI channel's state as shown below:
>
>    enum {
>       /* I/O channel is in normal state */
>       pci_channel_io_normal = (__force pci_channel_state_t) 1,
>
>       /* I/O to channel is blocked */
>       pci_channel_io_frozen = (__force pci_channel_state_t) 2,
>
>       /* PCI card is dead */
>       pci_channel_io_perm_failure = (__force pci_channel_state_t) 3,
>    };
>
> If the same EEH error then causes the tg3 driver's transmit timeout
> logic to execute, the tg3_tx_timeout() function schedules a reset
> task via tg3_reset_task_schedule(), which may cause a race condition
> between the tg3 and EEH driver as both attempt to recover the HW via
> a reset action.
>
> EEH driver gets error event
> --> eeh_set_channel_state()
>     and set device to one of
>     error state above           scheduler: tg3_reset_task() get
>                                 returned error from tg3_init_hw()
>                              --> dev_close() shuts down the interface
>
> tg3_io_slot_reset() and
> tg3_io_resume() fail to
> reset/resume the device
>
>
> To resolve this issue, we avoid the race condition by checking the PCI
> channel state in the tg3_tx_timeout() function and skip the tg3 driver
> initiated reset when the PCI channel is not in the normal state.  (The
> driver has no access to tg3 device registers at this point and cannot
> even complete the reset task successfully without external assistance.)
> We'll leave the reset procedure to be managed by the EEH driver which
> calls the tg3_io_error_detected(), tg3_io_slot_reset() and
> tg3_io_resume() functions as appropriate.

This scenario can affect other drivers too, right?  Shouldn't this be
handled in a higher layer before calling ->ndo_tx_timeout() so we
don't have to add this logic to all the other drivers?  Thanks.
Thinh Tran Nov. 2, 2023, 8:37 p.m. UTC | #2
Thanks for the review.

On 11/2/2023 12:27 PM, Michael Chan wrote:
> 
> This scenario can affect other drivers too, right?  Shouldn't this be
> handled in a higher layer before calling ->ndo_tx_timeout() so we
> don't have to add this logic to all the other drivers?  Thanks.

Yes, it does. We can add this into the dev_watchdog() function, but 
further investigations are required. This is because each driver may 
have a different approach to handling its own ->ndo_tx_timeout() function.

Thinh Tran
Thinh Tran Nov. 14, 2023, 5:39 p.m. UTC | #3
Hi Michael,

On 11/2/2023 3:37 PM, Thinh Tran wrote:
> Thanks for the review.
> 
> On 11/2/2023 12:27 PM, Michael Chan wrote:
>>
>> This scenario can affect other drivers too, right?  Shouldn't this be
>> handled in a higher layer before calling ->ndo_tx_timeout() so we
>> don't have to add this logic to all the other drivers?  Thanks.
> 
> Yes, it does. We can add this into the dev_watchdog() function, but 
> further investigations are required. This is because each driver may 
> have a different approach to handling its own ->ndo_tx_timeout() function.
> 
> Thinh Tran

I attempted to relocate the fix into the dev_watchdog(), but we 
experienced crashes in various drivers, leading to the destabilization 
of other drivers.
I propose to accept the current patch, rather than introducing a new fix 
that may potentially create issues for other drivers

Thanks,
Thinh Tran
Michael Chan Nov. 14, 2023, 9:03 p.m. UTC | #4
On Tue, Nov 14, 2023 at 9:39 AM Thinh Tran <thinhtr@linux.vnet.ibm.com> wrote:
> I attempted to relocate the fix into the dev_watchdog(), but we
> experienced crashes in various drivers, leading to the destabilization
> of other drivers.

Could you provide more information about the crashes?  The
dev_watchdog() code already checks for netif_device_present() and
netif_running() and netif_carrier_ok() before proceeding to check for
TX timeout.  Why would adding some additional checks for PCI errors
cause problems?  Of course the additional checks should only be done
on PCI devices only.  Thanks.
Thinh Tran Nov. 15, 2023, 6:23 p.m. UTC | #5
On 11/14/2023 3:03 PM, Michael Chan wrote:
> 
> Could you provide more information about the crashes?  The
> dev_watchdog() code already checks for netif_device_present() and
> netif_running() and netif_carrier_ok() before proceeding to check for
> TX timeout.  Why would adding some additional checks for PCI errors
> cause problems?  Of course the additional checks should only be done
> on PCI devices only.  Thanks.

The checking for PCI errors is not the problem, avoiding calling drivers 
->ndo_tx_timeout() function, causing some issue.
Here is the fix in the dev_watchdog():

--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -24,6 +24,7 @@
  #include <linux/if_vlan.h>
  #include <linux/skb_array.h>
  #include <linux/if_macvlan.h>
+#include <linux/pci.h>
  #include <net/sch_generic.h>
  #include <net/pkt_sched.h>
  #include <net/dst.h>
@@ -521,12 +522,32 @@ static void dev_watchdog(struct timer_list *t)
               }

               if (unlikely(some_queue_timedout)) {
+                      struct pci_dev *pci_dev;
+
                        trace_net_dev_xmit_timeout(dev, i);
                        WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s 
(%s): transmit queue %u timed out\n",
                        dev->name, netdev_drivername(dev), i);
-                      netif_freeze_queues(dev);
-                      dev->netdev_ops->ndo_tx_timeout(dev, i);
-                      netif_unfreeze_queues(dev);
+                      pci_dev = to_pci_dev(dev->dev.parent);
+                      if (pci_dev && (pci_dev->error_state != 
pci_channel_io_normal)) {
+                    /* checking the PCI channel state for hard errors
+                     * for pci_channel_io_frozen case
+                     * - I/O to channel is blocked.
+                     *   The EEH layer and I/O error detections will
+                     *   handle the reset procedure
+                     * for pci_channel_io_perm_failure  case
+                     * - the PCI card is dead. The reset will not help
+                     * Report the error for either case, but not calling
+                     * the driver's ndo_tx_timeout() function.
+                     */
+                          if (pci_dev->error_state == 
pci_channel_io_frozen)
+                               netdev_err(dev, " %s, I/O to channel is 
blocked\n",dev->name);
+                          else
+                              netdev_err(dev, " %s, adapter has failed 
permanently!\n", dev->name );
+                      } else {
+                              netif_freeze_queues(dev);
+                              dev->netdev_ops->ndo_tx_timeout(dev, i);
+                              netif_unfreeze_queues(dev);
+                      }
               }
               if (!mod_timer(&dev->watchdog_timer,
                                 round_jiffies(jiffies +
--

from the crash dump on a system with 4-port BCM57800 adapter
crash> net
    NET_DEVICE     NAME       IP ADDRESS(ES)
c000000003581000  lo         127.0.0.1, ::1
c000000008f8b000  net0       9.3.233.69
c00000000315c000  enP23p1s0f0 101.1.233.69
c000000003164000  enP23p1s0f1 102.1.233.69
c00000000316c000  enP23p1s0f2 103.1.233.69
c000000003174000  enP23p1s0f3 104.1.233.69
crash> dmesg
[  752.115994] ------------[ cut here ]------------
[  752.115996] NETDEV WATCHDOG: enP23p1s0f0 (bnx2x): transmit queue 2 
timed out
[  752.116018] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:528 
dev_watchdog+0x3c8/0x420
[  752.116037] Modules linked in: rpadlpar_io rpaphp xsk_diag 
nft_counter nft_compat nf_tables nfnetlink rfkill bonding tls sunrpc 
binfmt_misc pseries_rng drm drm_panel_orientation_quirks xfs sd_mod 
t10_pi sg ibmvscsi scsi_transport_srp ibmveth bnx2x vmx_crypto mdio 
pseries_wdt libcrc32c dm_mirror dm_region_hash dm_log dm_mod fuse

------ snip the watchdog's dump ---------
------ in dev_watchdog() checking the PCI error and skipping 
->ndo_tx_timeout()  --------

[  752.116430] ---[ end trace 868b17f3f105be7b ]---
[  752.116437] bnx2x 0017:01:00.0 enP23p1s0f0:  enP23p1s0f0, I/O to 
channel is blocked
[  752.195975] bnx2x: [bnx2x_timer:5811(enP23p1s0f3)]MFW seems hanged: 
drv_pulse (0x2ca) != mcp_pulse (0x7fff)
[  752.195986] bnx2x: 
[bnx2x_acquire_hw_lock:2023(enP23p1s0f3)]lock_status 0xffffffff 
resource_bit 0x1
[  752.196507] bnx2x 0017:01:00.3 enP23p1s0f3: MDC/MDIO access timeout
[  752.196792] EEH: Recovering PHB#17-PE#10000
[  752.196795] EEH: PE location: N/A, PHB location: N/A
[  752.196797] EEH: Frozen PHB#17-PE#10000 detected
[  752.196798] EEH: Call Trace:
[  752.196799] EEH: [c00000000005102c] __eeh_send_failure_event+0x7c/0x160
[  752.196808] EEH: [c000000000049cf4] 
eeh_dev_check_failure.part.0+0x254/0x650
[  752.196812] EEH: [c00800000107c7a0] bnx2x_timer+0x1e8/0x250 [bnx2x]
[  752.196863] EEH: [c00000000023eb00] call_timer_fn+0x50/0x1c0

--------- snip here -------------

[  752.196913] EEH: This PCI device has failed 1 times in the last hour 
and will be permanently disabled after 5 failures.
[  752.196915] EEH: Notify device drivers to shutdown
[  752.196918] EEH: Beginning: 'error_detected(IO frozen)'
[  752.196920] PCI 0017:01:00.0#10000: EEH: Invoking 
bnx2x->error_detected(IO frozen)
[  752.196924] bnx2x: [bnx2x_io_error_detected:14190(enP23p1s0f0)]IO 
error detected
[  752.197024] bnx2x 0017:01:00.3 enP23p1s0f3: MDC/MDIO access timeout
[  752.197039] PCI 0017:01:00.0#10000: EEH: bnx2x driver reports: 'need 
reset'
[  752.197041] PCI 0017:01:00.1#10000: EEH: Invoking 
bnx2x->error_detected(IO frozen)
[  752.197042] bnx2x: [bnx2x_io_error_detected:14190(enP23p1s0f1)]IO 
error detected
[  752.197093] PCI 0017:01:00.1#10000: EEH: bnx2x driver reports: 'need 
reset'
[  752.197095] PCI 0017:01:00.2#10000: EEH: Invoking 
bnx2x->error_detected(IO frozen)
[  752.197096] bnx2x: [bnx2x_io_error_detected:14190(enP23p1s0f2)]IO 
error detected
[  752.197151] PCI 0017:01:00.2#10000: EEH: bnx2x driver reports: 'need 
reset'
[  752.197153] PCI 0017:01:00.3#10000: EEH: Invoking 
bnx2x->error_detected(IO frozen)
[  752.197154] bnx2x: [bnx2x_io_error_detected:14190(enP23p1s0f3)]IO 
error detected
[  752.197208] PCI 0017:01:00.3#10000: EEH: bnx2x driver reports: 'need 
reset'
[  752.197210] EEH: Finished:'error_detected(IO frozen)' with aggregate 
recovery state:'need reset'

-------------- snip here --------

[  754.407972] EEH: Beginning: 'slot_reset'
[  754.407978] PCI 0017:01:00.0#10000: EEH: Invoking bnx2x->slot_reset()
[  754.407981] bnx2x: [bnx2x_io_slot_reset:14225(enP23p1s0f0)]IO slot 
reset initializing...
[  754.408047] bnx2x 0017:01:00.0: enabling device (0140 -> 0142)
[  754.412432] bnx2x: [bnx2x_io_slot_reset:14241(enP23p1s0f0)]IO slot 
reset --> driver unload

--------- snip here ------------

[  764.526802] PCI 0017:01:00.0#10000: EEH: bnx2x driver reports: 
'recovered'
[  764.526806] PCI 0017:01:00.1#10000: EEH: Invoking bnx2x->slot_reset()
[  764.526808] bnx2x: [bnx2x_io_slot_reset:14225(enP23p1s0f1)]IO slot 
reset initializing...
[  764.526898] bnx2x 0017:01:00.1: enabling device (0140 -> 0142)
[  764.531117] bnx2x: [bnx2x_io_slot_reset:14241(enP23p1s0f1)]IO slot 
reset --> driver unload

---------- snip here --------

[  772.770957] bnx2x: [bnx2x_io_slot_reset:14241(enP23p1s0f3)]IO slot 
reset --> driver unload
[  772.886717] PCI 0017:01:00.3#10000: EEH: bnx2x driver reports: 
'recovered'
[  772.886720] EEH: Finished:'slot_reset' with aggregate recovery 
state:'recovered'
[  772.886721] EEH: Notify device driver to resume
[  772.886722] EEH: Beginning: 'resume'
[  772.886723] PCI 0017:01:00.0#10000: EEH: Invoking bnx2x->resume()
[  773.476919] bnx2x 0017:01:00.0 enP23p1s0f0: using MSI-X  IRQs: sp 55 
fp[0] 57 ... fp[7] 64
[  773.706115] bnx2x 0017:01:00.0 enP23p1s0f0: NIC Link is Up, 10000 
Mbps full duplex, Flow control: ON - receive & transmit
[  773.708230] PCI 0017:01:00.0#10000: EEH: bnx2x driver reports: 'none'
[  773.708234] PCI 0017:01:00.1#10000: EEH: Invoking bnx2x->resume()
[  774.307404] bnx2x 0017:01:00.1 enP23p1s0f1: using MSI-X  IRQs: sp 65 
fp[0] 67 ... fp[7] 74
[  774.546123] bnx2x 0017:01:00.1 enP23p1s0f1: NIC Link is Up, 10000 
Mbps full duplex, Flow control: ON - receive & transmit
[  774.548304] PCI 0017:01:00.1#10000: EEH: bnx2x driver reports: 'none'
[  774.548311] PCI 0017:01:00.2#10000: EEH: Invoking bnx2x->resume()
[  774.747483] bnx2x 0017:01:00.2 enP23p1s0f2: using MSI-X  IRQs: sp 75 
fp[0] 77 ... fp[7] 84
[  774.756111] bnx2x: [bnx2x_hw_stats_update:871(enP23p1s0f0)]NIG timer 
max (0)
[  775.038454] PCI 0017:01:00.2#10000: EEH: bnx2x driver reports: 'none'
[  775.038466] PCI 0017:01:00.3#10000: EEH: Invoking bnx2x->resume()
[  775.228049] bnx2x 0017:01:00.3 enP23p1s0f3: using MSI-X  IRQs: sp 85 
fp[0] 87 ... fp[7] 94
[  775.548237] PCI 0017:01:00.3#10000: EEH: bnx2x driver reports: 'none'
[  775.548245] EEH: Finished:'resume'
[  775.548247] EEH: Recovery successful.
[  775.556120] bnx2x: [bnx2x_hw_stats_update:871(enP23p1s0f1)]NIG timer 
max (0)
[ 1203.919654] bnx2x 0017:01:00.0 enP23p1s0f0: using MSI-X  IRQs: sp 55 
fp[0] 57 ... fp[7] 64
[ 1204.156946] bnx2x 0017:01:00.0 enP23p1s0f0: NIC Link is Up, 10000 
Mbps full duplex, Flow control: ON - receive & transmit
[ 1204.159011] IPv6: ADDRCONF(NETDEV_CHANGE): enP23p1s0f0: link becomes 
ready
[ 1204.386939] bnx2x 0017:01:00.0 enP23p1s0f0: NIC Link is Down
[ 1209.789617] bnx2x 0017:01:00.1 enP23p1s0f1: using MSI-X  IRQs: sp 65 
fp[0] 67 ... fp[7] 74
[ 1210.026894] bnx2x 0017:01:00.1 enP23p1s0f1: NIC Link is Up, 10000 
Mbps full duplex, Flow control: ON - receive & transmit
[ 1210.028955] IPv6: ADDRCONF(NETDEV_CHANGE): enP23p1s0f1: link becomes 
ready
[ 1210.357268] bnx2x 0017:01:00.0 enP23p1s0f0: NIC Link is Up, 10000 
Mbps full duplex, Flow control: ON - receive & transmit
[ 1214.526868] bnx2x 0017:01:00.1 enP23p1s0f1: NIC Link is Down
[ 1215.647561] bnx2x 0017:01:00.2 enP23p1s0f2: using MSI-X  IRQs: sp 75 
fp[0] 77 ... fp[7] 84
[ 1220.357087] bnx2x 0017:01:00.1 enP23p1s0f1: NIC Link is Up, 10000 
Mbps full duplex, Flow control: ON - receive & transmit
[ 1221.517564] bnx2x 0017:01:00.3 enP23p1s0f3: using MSI-X  IRQs: sp 85 
fp[0] 87 ... fp[7] 94
[ 1222.012323] systemd-rc-local-generator[16948]: /etc/rc.d/rc.local is 
not marked executable, skipping.
[ 1232.476909] bnx2x 0017:01:00.2 enP23p1s0f2: NIC Link is Up, 1000 Mbps 
full duplex, Flow control: ON - receive & transmit
[ 1232.476941] IPv6: ADDRCONF(NETDEV_CHANGE): enP23p1s0f2: link becomes 
ready
[ 1237.996937] bnx2x 0017:01:00.3 enP23p1s0f3: NIC Link is Up, 1000 Mbps 
full duplex, Flow control: ON - receive & transmit
[ 1237.996961] IPv6: ADDRCONF(NETDEV_CHANGE): enP23p1s0f3: link becomes 
ready


---------- snip here ---------
[ 1592.978832] Kernel attempted to write user page (e) - exploit 
attempt? (uid: 0)
[ 1592.978836] BUG: Kernel NULL pointer dereference on write at 0x0000000e
[ 1592.978838] Faulting instruction address: 0xc0080000010bb1e8
[ 1592.978841] Oops: Kernel access of bad area, sig: 11 [#1]
----------
crash> bt
PID: 41       TASK: c000000003d29b00  CPU: 5    COMMAND: "ksoftirqd/5"
  R0:  c0080000010bb1a0    R1:  c000000003b7b910    R2:  c008000001178000
  R3:  08000001173928be    R4:  c00c00000045ce40    R5:  00000000000028be
  R6:  0000000000000001    R7:  ffffffffffffffff    R8:  0000000000000000
  R9:  0000000000000010    R10: 0000000000000000    R11: c0080000010fee78
  R12: c000000000231cf0    R13: c000000fffff9080    R14: 0000000000000000
  R15: 0000000000000000    R16: 0000000000000001    R17: 0000000000000000
  R18: 08000001173928be    R19: 0000000000000000    R20: 0000000000000000
  R21: c0000001173928be    R22: c000000003164a00    R23: c000000012570200
  R24: 0000000000000000    R25: 0000000000000001    R26: c00000001c51e050
  R27: 0000000000000005    R28: c000000003164000    R29: 0000000000000000
  R30: c0000000d38a8ae0    R31: 0000000000000000
  NIP: c0080000010bb1e8    MSR: 800000000280b033    OR3: c000000000230128
  CTR: c000000000231cf0    LR:  c0080000010bb1a0    XER: 0000000020040000
  CCR: 0000000048008482    MQ:  0000000000000000    DAR: 000000000000000e
  DSISR: 0000000042000000     Syscall Result: 0000000000000000
  [NIP  : bnx2x_start_xmit+496]
  [LR   : bnx2x_start_xmit+424]
  #0 [c000000003b7b4e0] crash_kexec at c000000000279f8c
  #1 [c000000003b7b510] oops_end at c0000000000291a8
  #2 [c000000003b7b590] __bad_page_fault at c00000000008d1cc
  #3 [c000000003b7b600] data_access_common_virt at c0000000000088dc
  Data Access [300] exception frame:
  R0:  c0080000010bb1a0    R1:  c000000003b7b910    R2:  c008000001178000
  R3:  08000001173928be    R4:  c00c00000045ce40    R5:  00000000000028be
  R6:  0000000000000001    R7:  ffffffffffffffff    R8:  0000000000000000
  R9:  0000000000000010    R10: 0000000000000000    R11: c0080000010fee78
  R12: c000000000231cf0    R13: c000000fffff9080    R14: 0000000000000000
  R15: 0000000000000000    R16: 0000000000000001    R17: 0000000000000000
  R18: 08000001173928be    R19: 0000000000000000    R20: 0000000000000000
  R21: c0000001173928be    R22: c000000003164a00    R23: c000000012570200
  R24: 0000000000000000    R25: 0000000000000001    R26: c00000001c51e050
  R27: 0000000000000005    R28: c000000003164000    R29: 0000000000000000
  R30: c0000000d38a8ae0    R31: 0000000000000000
  NIP: c0080000010bb1e8    MSR: 800000000280b033    OR3: c000000000230128
  CTR: c000000000231cf0    LR:  c0080000010bb1a0    XER: 0000000020040000
  CCR: 0000000048008482    MQ:  0000000000000000    DAR: 000000000000000e
crash> bt
PID: 41       TASK: c000000003d29b00  CPU: 5    COMMAND: "ksoftirqd/5"
  R0:  c0080000010bb1a0    R1:  c000000003b7b910    R2:  c008000001178000
  R3:  08000001173928be    R4:  c00c00000045ce40    R5:  00000000000028be
  R6:  0000000000000001    R7:  ffffffffffffffff    R8:  0000000000000000
  R9:  0000000000000010    R10: 0000000000000000    R11: c0080000010fee78
  R12: c000000000231cf0    R13: c000000fffff9080    R14: 0000000000000000
  R15: 0000000000000000    R16: 0000000000000001    R17: 0000000000000000
  R18: 08000001173928be    R19: 0000000000000000    R20: 0000000000000000
  R21: c0000001173928be    R22: c000000003164a00    R23: c000000012570200
  R24: 0000000000000000    R25: 0000000000000001    R26: c00000001c51e050
  R27: 0000000000000005    R28: c000000003164000    R29: 0000000000000000
  R30: c0000000d38a8ae0    R31: 0000000000000000
  NIP: c0080000010bb1e8    MSR: 800000000280b033    OR3: c000000000230128
  CTR: c000000000231cf0    LR:  c0080000010bb1a0    XER: 0000000020040000
  CCR: 0000000048008482    MQ:  0000000000000000    DAR: 000000000000000e
  DSISR: 0000000042000000     Syscall Result: 0000000000000000
  [NIP  : bnx2x_start_xmit+496]
  [LR   : bnx2x_start_xmit+424]
  #0 [c000000003b7b4e0] crash_kexec at c000000000279f8c
  #1 [c000000003b7b510] oops_end at c0000000000291a8
  #2 [c000000003b7b590] __bad_page_fault at c00000000008d1cc
  #3 [c000000003b7b600] data_access_common_virt at c0000000000088dc
  Data Access [300] exception frame:
  R0:  c0080000010bb1a0    R1:  c000000003b7b910    R2:  c008000001178000
  R3:  08000001173928be    R4:  c00c00000045ce40    R5:  00000000000028be
  R6:  0000000000000001    R7:  ffffffffffffffff    R8:  0000000000000000
  R9:  0000000000000010    R10: 0000000000000000    R11: c0080000010fee78
  R12: c000000000231cf0    R13: c000000fffff9080    R14: 0000000000000000
  R15: 0000000000000000    R16: 0000000000000001    R17: 0000000000000000
  R18: 08000001173928be    R19: 0000000000000000    R20: 0000000000000000
  R21: c0000001173928be    R22: c000000003164a00    R23: c000000012570200
  R24: 0000000000000000    R25: 0000000000000001    R26: c00000001c51e050
  R27: 0000000000000005    R28: c000000003164000    R29: 0000000000000000
  R30: c0000000d38a8ae0    R31: 0000000000000000
  NIP: c0080000010bb1e8    MSR: 800000000280b033    OR3: c000000000230128
  CTR: c000000000231cf0    LR:  c0080000010bb1a0    XER: 0000000020040000
  CCR: 0000000048008482    MQ:  0000000000000000    DAR: 000000000000000e

crash> dis -s bnx2x_start_xmit+496
FILE: drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
LINE: 3858

   3853          /* get a tx_buf and first BD
   3854           * tx_start_bd may be changed during SPLIT,
   3855           * but first_bd will always stay first
   3856           */
   3857          tx_buf = &txdata->tx_buf_ring[TX_BD(pkt_prod)];
* 3858          tx_start_bd = &txdata->tx_desc_ring[bd_prod].start_bd;
   3859          first_bd = tx_start_bd;
   3860

I have not identified the root of this crash yet.

Regards,
Thinh
Michael Chan Nov. 15, 2023, 6:56 p.m. UTC | #6
On Wed, Nov 15, 2023 at 10:23 AM Thinh Tran <thinhtr@linux.vnet.ibm.com> wrote:
>
>
> On 11/14/2023 3:03 PM, Michael Chan wrote:
> >
> > Could you provide more information about the crashes?  The
> > dev_watchdog() code already checks for netif_device_present() and
> > netif_running() and netif_carrier_ok() before proceeding to check for
> > TX timeout.  Why would adding some additional checks for PCI errors
> > cause problems?  Of course the additional checks should only be done
> > on PCI devices only.  Thanks.
>
> The checking for PCI errors is not the problem, avoiding calling drivers
> ->ndo_tx_timeout() function, causing some issue.

I see.  By skipping TX timeout during PCI errors, bnx2x crashes in
.ndo_start_xmit() after EEH error recovery.

I think it should be fine to fix the original EEH issue in tg3 then.
Please re-post the tg3 patch.  Thanks.
Thinh Tran Nov. 16, 2023, 2:41 p.m. UTC | #7
I'll re-post the V2 patch shortly.
Thanks for the review.

Thinh Tran
On 11/15/2023 12:56 PM, Michael Chan wrote:
> On Wed, Nov 15, 2023 at 10:23 AM Thinh Tran <thinhtr@linux.vnet.ibm.com> wrote:
>>
>>
>> On 11/14/2023 3:03 PM, Michael Chan wrote:
>>>
>>> Could you provide more information about the crashes?  The
>>> dev_watchdog() code already checks for netif_device_present() and
>>> netif_running() and netif_carrier_ok() before proceeding to check for
>>> TX timeout.  Why would adding some additional checks for PCI errors
>>> cause problems?  Of course the additional checks should only be done
>>> on PCI devices only.  Thanks.
>>
>> The checking for PCI errors is not the problem, avoiding calling drivers
>> ->ndo_tx_timeout() function, causing some issue.
> 
> I see.  By skipping TX timeout during PCI errors, bnx2x crashes in
> .ndo_start_xmit() after EEH error recovery.
> 
> I think it should be fine to fix the original EEH issue in tg3 then.
> Please re-post the tg3 patch.  Thanks.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 14b311196b8f..1c72ef05ab1b 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -7630,6 +7630,26 @@  static void tg3_tx_timeout(struct net_device *dev, unsigned int txqueue)
 {
 	struct tg3 *tp = netdev_priv(dev);
 
+	/* checking the PCI channel state for hard errors
+	 * for pci_channel_io_frozen case
+	 *   - I/O to channel is blocked.
+	 *     The EEH layer and I/O error detections will
+	 *     handle the reset procedure
+	 * for pci_channel_io_perm_failure  case
+	 *   - the PCI card is dead.
+	 *     The reset will not help
+	 * report the error for both cases and return.
+	 */
+	if (tp->pdev->error_state == pci_channel_io_frozen) {
+		netdev_err(dev, " %s, I/O to channel is blocked\n", __func__);
+		return;
+	}
+
+	if (tp->pdev->error_state == pci_channel_io_perm_failure) {
+		netdev_err(dev, " %s, adapter has failed permanently!\n", __func__);
+		return;
+	}
+
 	if (netif_msg_tx_err(tp)) {
 		netdev_err(dev, "transmit timed out, resetting\n");
 		tg3_dump_state(tp);