Message ID | 57419853.9050701@gmx.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Lino Sanfilippo <LinoSanfilippo@gmx.de> : [...] > --- a/drivers/net/ethernet/arc/emac_main.c > +++ b/drivers/net/ethernet/arc/emac_main.c > @@ -159,12 +159,22 @@ static void arc_emac_tx_clean(struct net_device *ndev) > unsigned int *txbd_dirty = &priv->txbd_dirty; > struct arc_emac_bd *txbd = &priv->txbd[*txbd_dirty]; > struct buffer_state *tx_buff = &priv->tx_buff[*txbd_dirty]; > - struct sk_buff *skb = tx_buff->skb; > unsigned int info = le32_to_cpu(txbd->info); > + struct sk_buff *skb; > Insert a smp_rmb() here to close one window for an outdated txbd_dirty value (the "arc_emac_tx_clean wrote txbd_curr and issued smp_wmb" one). Actually, insert smp_rmb() at the start of arc_emac_tx_clean() as it does not need to be performed withing the loop and would penalize it. Given the implicit smp barriers in the non-driver code, I consider "arc_emac_tx_clean on one CPU does not read latest txbd_dirty value written by previous arc_emac_tx_clean on different CPU" as utter onanism but issueing smp_rmb() at the start of arc_emac_tx_clean() nails it as well. > - if ((info & FOR_EMAC) || !txbd->data || !skb) > + if (*txbd_dirty == priv->txbd_curr) > break; Ok, it's just the "while (priv->txbd_dirty != priv->txbd_curr) {" loop in disguise. > > + /* Make sure curr pointer is consistent with info */ > + rmb(); > + > + info = le32_to_cpu(txbd->info); > + > + if (info & FOR_EMAC) > + break; With proper ordering + barrier in arc_emac_tx() you can relax it to smp_rmb(). > + > + skb = tx_buff->skb; > + > if (unlikely(info & (DROP | DEFR | LTCL | UFLO))) { > stats->tx_errors++; > stats->tx_dropped++; > @@ -195,8 +205,8 @@ static void arc_emac_tx_clean(struct net_device *ndev) > *txbd_dirty = (*txbd_dirty + 1) % TX_BD_NUM; > } > > - /* Ensure that txbd_dirty is visible to tx() before checking > - * for queue stopped. > + /* Ensure that txbd_dirty is visible to tx() and we see the most recent > + * value for txbd_curr. > */ > smp_mb(); > > @@ -680,35 +690,29 @@ static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev) > dma_unmap_len_set(&priv->tx_buff[*txbd_curr], len, len); > > priv->txbd[*txbd_curr].data = cpu_to_le32(addr); > - > - /* Make sure pointer to data buffer is set */ > - wmb(); > + priv->tx_buff[*txbd_curr].skb = skb; > > skb_tx_timestamp(skb); > > *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len); No. You need dma_wmb() after skb_tx_timestamp() to commit skb_tx_timestamp() [*] and data = cpu_to_le32(addr). [*] I doubt anyone want a dma_sync_single_...() here. > > - /* Make sure info word is set */ > + /* 1. Make sure that with respect to tx_clean everything is set up > + * properly before we advance txbd_curr. > + * 2. Make sure writes to DMA descriptors are completed before we inform > + * the hardware. > + */ > wmb(); Yes, either wmb() or smp_wmb() + dma_wmb(). > > - priv->tx_buff[*txbd_curr].skb = skb; > - > /* Increment index to point to the next BD */ > *txbd_curr = (*txbd_curr + 1) % TX_BD_NUM; > > - /* Ensure that tx_clean() sees the new txbd_curr before > - * checking the queue status. This prevents an unneeded wake > - * of the queue in tx_clean(). > + /* Ensure we see the most recent value of txbd_dirty and tx_clean() sees > + * the updated value of txbd_curr. > */ > smp_mb(); Nit: s/the most/a/ "a" as in "arc_emac_tx_clean() _is_ racing with arc_emac_tx" > > - if (!arc_emac_tx_avail(priv)) { > + if (!arc_emac_tx_avail(priv)) > netif_stop_queue(ndev); > - /* Refresh tx_dirty */ > - smp_mb(); > - if (arc_emac_tx_avail(priv)) > - netif_start_queue(ndev); > - } No. I may sound like an old record but the revalidation part must be kept. txbd_dirty may change in the arc_emac_tx_avail.. netif_stop_queue window (the race requires a different CPU as arc_emac_tx() runs in locally disabled BH context).
On 23.05.2016 00:36, Francois Romieu wrote: > Lino Sanfilippo <LinoSanfilippo@gmx.de> : > [...] >> --- a/drivers/net/ethernet/arc/emac_main.c >> +++ b/drivers/net/ethernet/arc/emac_main.c >> @@ -159,12 +159,22 @@ static void arc_emac_tx_clean(struct net_device *ndev) >> unsigned int *txbd_dirty = &priv->txbd_dirty; >> struct arc_emac_bd *txbd = &priv->txbd[*txbd_dirty]; >> struct buffer_state *tx_buff = &priv->tx_buff[*txbd_dirty]; >> - struct sk_buff *skb = tx_buff->skb; >> unsigned int info = le32_to_cpu(txbd->info); >> + struct sk_buff *skb; >> > > Insert a smp_rmb() here to close one window for an outdated txbd_dirty value > (the "arc_emac_tx_clean wrote txbd_curr and issued smp_wmb" one). > > Actually, insert smp_rmb() at the start of arc_emac_tx_clean() as it > does not need to be performed withing the loop and would penalize it. I agree, we should place the barrier before the loop. I also think it is indeed more appropriate to use the SMP versions for both barriers. > > Given the implicit smp barriers in the non-driver code, I consider > "arc_emac_tx_clean on one CPU does not read latest txbd_dirty value written > by previous arc_emac_tx_clean on different CPU" as utter onanism but > issueing smp_rmb() at the start of arc_emac_tx_clean() nails it as well. > >> - if ((info & FOR_EMAC) || !txbd->data || !skb) >> + if (*txbd_dirty == priv->txbd_curr) >> break; > > Ok, it's just the "while (priv->txbd_dirty != priv->txbd_curr) {" loop > in disguise. I cant deny that :) > >> >> + /* Make sure curr pointer is consistent with info */ >> + rmb(); >> + >> + info = le32_to_cpu(txbd->info); >> + >> + if (info & FOR_EMAC) >> + break; > > With proper ordering + barrier in arc_emac_tx() you can relax it to smp_rmb(). Ok. >> + >> + skb = tx_buff->skb; >> + >> if (unlikely(info & (DROP | DEFR | LTCL | UFLO))) { >> stats->tx_errors++; >> stats->tx_dropped++; >> @@ -195,8 +205,8 @@ static void arc_emac_tx_clean(struct net_device *ndev) >> *txbd_dirty = (*txbd_dirty + 1) % TX_BD_NUM; >> } >> >> - /* Ensure that txbd_dirty is visible to tx() before checking >> - * for queue stopped. >> + /* Ensure that txbd_dirty is visible to tx() and we see the most recent >> + * value for txbd_curr. >> */ >> smp_mb(); >> >> @@ -680,35 +690,29 @@ static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev) >> dma_unmap_len_set(&priv->tx_buff[*txbd_curr], len, len); >> >> priv->txbd[*txbd_curr].data = cpu_to_le32(addr); >> - >> - /* Make sure pointer to data buffer is set */ >> - wmb(); >> + priv->tx_buff[*txbd_curr].skb = skb; >> >> skb_tx_timestamp(skb); >> >> *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len); > > No. > > You need dma_wmb() after skb_tx_timestamp() to commit skb_tx_timestamp() [*] > and data = cpu_to_le32(addr). I dont agree here. A dma_wmb would have an effect to "data" and "info", yes, but it would have absolutely no effect to skb_tx_timestamp(), since there is no dma access involved here. In fact skb_tx_timestamp() could probably be even reordered to appear after the dma_wmb. Anyway, there is the wmb() directly after the assignment to "info". _This_ barrier should ensure that skb_tx_timestamp() (along with a flush of data and info to DMA) is executed before "txbd_curr" is advanced. This means that the corresponding skb cant be freed prematurely by tx_clean(). > > [*] I doubt anyone want a dma_sync_single_...() here. > >> >> - /* Make sure info word is set */ >> + /* 1. Make sure that with respect to tx_clean everything is set up >> + * properly before we advance txbd_curr. >> + * 2. Make sure writes to DMA descriptors are completed before we inform >> + * the hardware. >> + */ >> wmb(); > > Yes, either wmb() or smp_wmb() + dma_wmb(). > I really prefer one generic barrier over combos of 2 or more special barriers. >> >> - priv->tx_buff[*txbd_curr].skb = skb; >> - >> /* Increment index to point to the next BD */ >> *txbd_curr = (*txbd_curr + 1) % TX_BD_NUM; >> >> - /* Ensure that tx_clean() sees the new txbd_curr before >> - * checking the queue status. This prevents an unneeded wake >> - * of the queue in tx_clean(). >> + /* Ensure we see the most recent value of txbd_dirty and tx_clean() sees >> + * the updated value of txbd_curr. >> */ >> smp_mb(); > > Nit: s/the most/a/ > > "a" as in "arc_emac_tx_clean() _is_ racing with arc_emac_tx" > >> >> - if (!arc_emac_tx_avail(priv)) { >> + if (!arc_emac_tx_avail(priv)) >> netif_stop_queue(ndev); >> - /* Refresh tx_dirty */ >> - smp_mb(); >> - if (arc_emac_tx_avail(priv)) >> - netif_start_queue(ndev); >> - } > > No. > > I may sound like an old record but the revalidation part must be kept. > > txbd_dirty may change in the arc_emac_tx_avail.. netif_stop_queue window > (the race requires a different CPU as arc_emac_tx() runs in locally > disabled BH context). > Ok, I can see the race now. So yes, this should be kept. I will prepare a patch with the discussed changes tomorrow. Regards, Lino
Lino Sanfilippo <LinoSanfilippo@gmx.de> : [...] > I dont agree here. A dma_wmb would have an effect to "data" and "info", yes, > but it would have absolutely no effect to skb_tx_timestamp(), since there > is no dma access involved here. In fact skb_tx_timestamp() could probably > be even reordered to appear after the dma_wmb. > > Anyway, there is the wmb() directly after the assignment to "info". _This_ > barrier should ensure that skb_tx_timestamp() (along with a flush of data > and info to DMA) is executed before "txbd_curr" is advanced. > This means that the corresponding skb cant be freed prematurely by tx_clean(). The concern here is about sending adequate PTP payload on the network. skb_tx_timestamp() is used for network clock synchronization. Some extra information must be transmitted. Be it through direct payload change or through indirect control, it _is_ related to dma. Several (most ?) skb_tx_timestamp() misuses blur the picture: CPU vs device sync is of course way below the radar when the driver crashes because of plain use-after-free skb_tx_timestamp() :o/
--- a/drivers/net/ethernet/arc/emac_main.c +++ b/drivers/net/ethernet/arc/emac_main.c @@ -159,12 +159,22 @@ static void arc_emac_tx_clean(struct net_device *ndev) unsigned int *txbd_dirty = &priv->txbd_dirty; struct arc_emac_bd *txbd = &priv->txbd[*txbd_dirty]; struct buffer_state *tx_buff = &priv->tx_buff[*txbd_dirty]; - struct sk_buff *skb = tx_buff->skb; unsigned int info = le32_to_cpu(txbd->info); + struct sk_buff *skb; - if ((info & FOR_EMAC) || !txbd->data || !skb) + if (*txbd_dirty == priv->txbd_curr) break; + /* Make sure curr pointer is consistent with info */ + rmb(); + + info = le32_to_cpu(txbd->info); + + if (info & FOR_EMAC) + break; + + skb = tx_buff->skb; + if (unlikely(info & (DROP | DEFR | LTCL | UFLO))) { stats->tx_errors++; stats->tx_dropped++; @@ -195,8 +205,8 @@ static void arc_emac_tx_clean(struct net_device *ndev) *txbd_dirty = (*txbd_dirty + 1) % TX_BD_NUM; } - /* Ensure that txbd_dirty is visible to tx() before checking - * for queue stopped. + /* Ensure that txbd_dirty is visible to tx() and we see the most recent + * value for txbd_curr. */ smp_mb(); @@ -680,35 +690,29 @@ static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev) dma_unmap_len_set(&priv->tx_buff[*txbd_curr], len, len); priv->txbd[*txbd_curr].data = cpu_to_le32(addr); - - /* Make sure pointer to data buffer is set */ - wmb(); + priv->tx_buff[*txbd_curr].skb = skb; skb_tx_timestamp(skb); *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len); - /* Make sure info word is set */ + /* 1. Make sure that with respect to tx_clean everything is set up + * properly before we advance txbd_curr. + * 2. Make sure writes to DMA descriptors are completed before we inform + * the hardware. + */ wmb(); - priv->tx_buff[*txbd_curr].skb = skb; - /* Increment index to point to the next BD */ *txbd_curr = (*txbd_curr + 1) % TX_BD_NUM; - /* Ensure that tx_clean() sees the new txbd_curr before - * checking the queue status. This prevents an unneeded wake - * of the queue in tx_clean(). + /* Ensure we see the most recent value of txbd_dirty and tx_clean() sees + * the updated value of txbd_curr. */ smp_mb(); - if (!arc_emac_tx_avail(priv)) { + if (!arc_emac_tx_avail(priv)) netif_stop_queue(ndev); - /* Refresh tx_dirty */ - smp_mb(); - if (arc_emac_tx_avail(priv)) - netif_start_queue(ndev); - } arc_reg_set(priv, R_STATUS, TXPL_MASK);