diff mbox

[v2] ethernet:arc: Fix racing of TX ring buffer

Message ID 20160523113609.GA21019@debian-dorm (mailing list archive)
State New, archived
Headers show

Commit Message

Shuyu Wei May 23, 2016, 11:36 a.m. UTC
On Sun, May 22, 2016 at 01:30:27PM +0200, Lino Sanfilippo wrote:
> 
> Thanks for testing. However that extra check for skb not being NULL should not be
> necessary if the code were correct. The changes I suggested were all about having
> skb and info consistent with txbd_curr.
> But I just realized that there is still a big flaw in the last changes. While
> tx() looks correct now (we first set up the descriptor and assign the skb and _then_
> advance txbd_curr) tx_clean still is not:
> 
> We _first_ have to read tx_curr and _then_ read the corresponding descriptor and its skb.
> (The last patch implemented just the reverse - and thus wrong - order, first get skb and 
> descriptor and then read tx_curr).
> 
> So the patch below hopefully handles also tx_clean correctly. Could you please do once more a test
> with this one?

Hi Lino, 
This patch worked after a whole night of stress testing.

> > 
> > After further test, my patch to barrier timestamp() didn't work.
> > Just like the original code in the tree, the emac still got stuck under
> > high load, even if I changed the smp_wmb() to dma_wmb(). So the original
> > code do have race somewhere.
> 
> So to make this clear: with the current code in net-next you still see a problem (lockup), right?
Yes, I mean the mainline kernel, which should be the same as net-next.


> > ... and why Francois' fix worked. Please be patient with me :-).
> 
> So which fix(es) exactly work for you and solve your lockup issue?
I mean the patch below, starting this thread.

Comments

Lino Sanfilippo May 24, 2016, 1:14 a.m. UTC | #1
On 23.05.2016 13:36, Shuyu Wei wrote:
> On Sun, May 22, 2016 at 01:30:27PM +0200, Lino Sanfilippo wrote:
>> 

> 
> Hi Lino, 
> This patch worked after a whole night of stress testing.
> 

Thats great! I will nevertheless make the changes discussed with Francois and hopefully we have a final
solution, soon.

Regards,
Lino
diff mbox

Patch

diff --git a/drivers/net/ethernet/arc/emac_main.c b/drivers/net/ethernet/arc/emac_main.c
index a3a9392..df3dfef 100644
--- a/drivers/net/ethernet/arc/emac_main.c
+++ b/drivers/net/ethernet/arc/emac_main.c
@@ -153,9 +153,8 @@  static void arc_emac_tx_clean(struct net_device *ndev)
 {
 	struct arc_emac_priv *priv = netdev_priv(ndev);
 	struct net_device_stats *stats = &ndev->stats;
-	unsigned int i;
 
-	for (i = 0; i < TX_BD_NUM; i++) {
+	while (priv->txbd_dirty != priv->txbd_curr) {
 		unsigned int *txbd_dirty = &priv->txbd_dirty;
 		struct arc_emac_bd *txbd = &priv->txbd[*txbd_dirty];
 		struct buffer_state *tx_buff = &priv->tx_buff[*txbd_dirty];
@@ -685,13 +684,15 @@  static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev)
 	wmb();
 
 	skb_tx_timestamp(skb);
+	priv->tx_buff[*txbd_curr].skb = skb;
+
+	dma_wmb();
 
 	*info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);
 
 	/* Make sure info word is set */
 	wmb();
 
-	priv->tx_buff[*txbd_curr].skb = skb;
 
 	/* Increment index to point to the next BD */
 	*txbd_curr = (*txbd_curr + 1) % TX_BD_NUM;