Message ID | 20220908202853.21725-1-davthompson@nvidia.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v1] mlxbf_gige: fix receive packet race condition | expand |
On Thu, 8 Sep 2022 16:28:53 -0400 David Thompson wrote: > Under heavy traffic, the BF2 Gigabit interface can > become unresponsive for periods of time (several minutes) > before eventually recovering. This is due to a possible > race condition in the mlxbf_gige_rx_packet function, where > the function exits with producer and consumer indices equal > but there are remaining packet(s) to be processed. In order > to prevent this situation, disable receive DMA during the > processing of received packets. Pausing Rx DMA seems a little drastic, is the capacity of the NIC buffer large enough to sink the traffic while the stack drains the ring? Could you provide a little more detail on what the HW issue is? There is no less intrusive way we can fix it?
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Monday, September 19, 2022 5:18 PM > To: David Thompson <davthompson@nvidia.com> > Cc: davem@davemloft.net; edumazet@google.com; pabeni@redhat.com; > netdev@vger.kernel.org; cai.huoqing@linux.dev; brgl@bgdev.pl; Liming Sun > <limings@nvidia.com>; Asmaa Mnebhi <asmaa@nvidia.com> > Subject: Re: [PATCH net v1] mlxbf_gige: fix receive packet race condition > > On Thu, 8 Sep 2022 16:28:53 -0400 David Thompson wrote: > > Under heavy traffic, the BF2 Gigabit interface can become unresponsive > > for periods of time (several minutes) before eventually recovering. > > This is due to a possible race condition in the mlxbf_gige_rx_packet > > function, where the function exits with producer and consumer indices > > equal but there are remaining packet(s) to be processed. In order to > > prevent this situation, disable receive DMA during the processing of > > received packets. > > Pausing Rx DMA seems a little drastic, is the capacity of the NIC buffer large enough to sink the > traffic while the stack drains the ring? > > Could you provide a little more detail on what the HW issue is? > There is no less intrusive way we can fix it? Thank you for your insight Jakub. I will review this patch and see if it can be solved without pausing of the DMA process. FYI, a little background on the DMA operation in hardware: The pausing of RX DMA prevents writing new packets to memory. New packets will be written to a 20KB buffer (but won't get forwarded to memory and no consumer index update). Once this buffer is full, packets will get dropped. Thanks, Dave
diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c index afa3b92a6905..1490fbc74169 100644 --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c @@ -299,6 +299,10 @@ int mlxbf_gige_poll(struct napi_struct *napi, int budget) mlxbf_gige_handle_tx_complete(priv); + data = readq(priv->base + MLXBF_GIGE_RX_DMA); + data &= ~MLXBF_GIGE_RX_DMA_EN; + writeq(data, priv->base + MLXBF_GIGE_RX_DMA); + do { remaining_pkts = mlxbf_gige_rx_packet(priv, &work_done); } while (remaining_pkts && work_done < budget); @@ -314,6 +318,10 @@ int mlxbf_gige_poll(struct napi_struct *napi, int budget) data = readq(priv->base + MLXBF_GIGE_INT_MASK); data &= ~MLXBF_GIGE_INT_MASK_RX_RECEIVE_PACKET; writeq(data, priv->base + MLXBF_GIGE_INT_MASK); + + data = readq(priv->base + MLXBF_GIGE_RX_DMA); + data |= MLXBF_GIGE_RX_DMA_EN; + writeq(data, priv->base + MLXBF_GIGE_RX_DMA); } return work_done;