Message ID | 20231116151822.281-1-thinhtr@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v3] net/tg3: fix race condition in tg3_reset_task() | expand |
On Thu, Nov 16, 2023 at 7:19 AM Thinh Tran <thinhtr@linux.vnet.ibm.com> wrote: > diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c > index 14b311196b8f..1c72ef05ab1b 100644 > --- a/drivers/net/ethernet/broadcom/tg3.c > +++ b/drivers/net/ethernet/broadcom/tg3.c > @@ -7630,6 +7630,26 @@ static void tg3_tx_timeout(struct net_device *dev, unsigned int txqueue) > { > struct tg3 *tp = netdev_priv(dev); > > + /* checking the PCI channel state for hard errors > + * for pci_channel_io_frozen case > + * - I/O to channel is blocked. > + * The EEH layer and I/O error detections will > + * handle the reset procedure > + * for pci_channel_io_perm_failure case > + * - the PCI card is dead. > + * The reset will not help > + * report the error for both cases and return. > + */ > + if (tp->pdev->error_state == pci_channel_io_frozen) { > + netdev_err(dev, " %s, I/O to channel is blocked\n", __func__); > + return; > + } > + > + if (tp->pdev->error_state == pci_channel_io_perm_failure) { > + netdev_err(dev, " %s, adapter has failed permanently!\n", __func__); > + return; > + } > + I think it will be better to add these 2 checks in tg3_reset_task(). We are already doing a similar check there for tp->pcierr_recovery so it is better to centralize all the related checks in the same place. I don't think tg3_dump_state() below will cause any problems. We'll probably read 0xffffffff for all registers and it will actually confirm that the registers are not readable. > if (netif_msg_tx_err(tp)) { > netdev_err(dev, "transmit timed out, resetting\n"); > tg3_dump_state(tp); > -- > 2.25.1 >
On 11/16/2023 3:34 PM, Michael Chan wrote: > > I think it will be better to add these 2 checks in tg3_reset_task(). > We are already doing a similar check there for tp->pcierr_recovery so > it is better to centralize all the related checks in the same place. > I don't think tg3_dump_state() below will cause any problems. We'll > probably read 0xffffffff for all registers and it will actually > confirm that the registers are not readable. > I'm concerned that race conditions could still occur during the handling of Partitionable Endpoint (PE) reset by the EEH driver. The issue lies in the dependency on the lower-level FW reset procedure. When the driver executes tg3_dump_state(), and then follows it with tg3_reset_task(), the EEH driver possibility changes in the PCI devices' state, but the MMIO and DMA reset procedures may not have completed yet. Leading to a crash in tg3_reset_task(). This patch tries to prevent that scenario. While tg3_dump_state() is helpful, it also poses issues as it takes a considerable amount of time, approximately 1 or 2 seconds per device. Given our 4-port adapter, this could extend to more than 10 seconds to write to the dmesg buffer. With the default size, the dmesg buffer may be over-written and not retain all useful information. Thanks, Thinh Tran
On Fri, Nov 17, 2023 at 8:19 AM Thinh Tran <thinhtr@linux.vnet.ibm.com> wrote: > > > On 11/16/2023 3:34 PM, Michael Chan wrote: > > > > I think it will be better to add these 2 checks in tg3_reset_task(). > > We are already doing a similar check there for tp->pcierr_recovery so > > it is better to centralize all the related checks in the same place. > > I don't think tg3_dump_state() below will cause any problems. We'll > > probably read 0xffffffff for all registers and it will actually > > confirm that the registers are not readable. > > > > I'm concerned that race conditions could still occur during the handling > of Partitionable Endpoint (PE) reset by the EEH driver. The issue lies > in the dependency on the lower-level FW reset procedure. When the driver > executes tg3_dump_state(), and then follows it with tg3_reset_task(), > the EEH driver possibility changes in the PCI devices' state, but the > MMIO and DMA reset procedures may not have completed yet. Leading to a > crash in tg3_reset_task(). This patch tries to prevent that scenario. It seems fragile if you are relying on such timing. TG3_TX_TIMEOUT is 5 seconds but the actual time tg3_tx_timeout() is called varies depending on when the TX queue is stopped. So tg3_tx_timeout() will be called 5 seconds or more after EEH if there are uncompleted TX packets but we don't know precisely when. > > While tg3_dump_state() is helpful, it also poses issues as it takes a > considerable amount of time, approximately 1 or 2 seconds per device. > Given our 4-port adapter, this could extend to more than 10 seconds to > write to the dmesg buffer. With the default size, the dmesg buffer may > be over-written and not retain all useful information. > If tg3_dump_state() is not useful and fills the dmesg log with useless data, we can add the same check in tg3_dump_state() and skip it. tg3_dump_state() is also called by tg3_process_error() so we can avoid dumping useless data if we hit tg3_process_error() during EEH or AER. Thanks.
On 11/17/2023 12:31 PM, Michael Chan wrote: > On Fri, Nov 17, 2023 at 8:19 AM Thinh Tran <thinhtr@linux.vnet.ibm.com> wrote: >> >> >> On 11/16/2023 3:34 PM, Michael Chan wrote: >>> >>> I think it will be better to add these 2 checks in tg3_reset_task(). >>> We are already doing a similar check there for tp->pcierr_recovery so >>> it is better to centralize all the related checks in the same place. >>> I don't think tg3_dump_state() below will cause any problems. We'll >>> probably read 0xffffffff for all registers and it will actually >>> confirm that the registers are not readable. >>> >> >> I'm concerned that race conditions could still occur during the handling >> of Partitionable Endpoint (PE) reset by the EEH driver. The issue lies >> in the dependency on the lower-level FW reset procedure. When the driver >> executes tg3_dump_state(), and then follows it with tg3_reset_task(), >> the EEH driver possibility changes in the PCI devices' state, but the >> MMIO and DMA reset procedures may not have completed yet. Leading to a >> crash in tg3_reset_task(). This patch tries to prevent that scenario. > > It seems fragile if you are relying on such timing. TG3_TX_TIMEOUT is > 5 seconds but the actual time tg3_tx_timeout() is called varies > depending on when the TX queue is stopped. So tg3_tx_timeout() will > be called 5 seconds or more after EEH if there are uncompleted TX > packets but we don't know precisely when. > >> >> While tg3_dump_state() is helpful, it also poses issues as it takes a >> considerable amount of time, approximately 1 or 2 seconds per device. >> Given our 4-port adapter, this could extend to more than 10 seconds to >> write to the dmesg buffer. With the default size, the dmesg buffer may >> be over-written and not retain all useful information. >> > > If tg3_dump_state() is not useful and fills the dmesg log with useless > data, we can add the same check in tg3_dump_state() and skip it. > tg3_dump_state() is also called by tg3_process_error() so we can avoid > dumping useless data if we hit tg3_process_error() during EEH or AER. > > Thanks. I implemented the fix as you suggested and passed the tests. I will send the next version of patch shortly. Thanks.
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index 14b311196b8f..1c72ef05ab1b 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -7630,6 +7630,26 @@ static void tg3_tx_timeout(struct net_device *dev, unsigned int txqueue) { struct tg3 *tp = netdev_priv(dev); + /* checking the PCI channel state for hard errors + * for pci_channel_io_frozen case + * - I/O to channel is blocked. + * The EEH layer and I/O error detections will + * handle the reset procedure + * for pci_channel_io_perm_failure case + * - the PCI card is dead. + * The reset will not help + * report the error for both cases and return. + */ + if (tp->pdev->error_state == pci_channel_io_frozen) { + netdev_err(dev, " %s, I/O to channel is blocked\n", __func__); + return; + } + + if (tp->pdev->error_state == pci_channel_io_perm_failure) { + netdev_err(dev, " %s, adapter has failed permanently!\n", __func__); + return; + } + if (netif_msg_tx_err(tp)) { netdev_err(dev, "transmit timed out, resetting\n"); tg3_dump_state(tp);