Message ID | 1361814409-6704-1-git-send-email-roland@kernel.org (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
On 25/02/2013 19:46, Roland Dreier wrote: > If IPoIB fails to look up a path record (eg if it tries during an SM > failover when one SM is dead but the new one hasn't taken over yet), the > driver ends up with a neighbour structure but no address handle (AH). > There's no mechanism to recover from this: any further packets sent to > this destination will be silently dumped in ipoib_start_xmit(). Looking on the flow of sending ARP probes, I see that in unicast_arp_send, if there's no AH for the path, a path query is initiated, if (path->ah) { ipoib_dbg(priv, "Send unicast ARP to %04x\n", be16_to_cpu(path->pathrec.dlid)); spin_unlock_irqrestore(&priv->lock, flags); ipoib_send(dev, skb, path->ah, IPOIB_QPN(cb->hwaddr)); return; } else if ((path->query || !path_rec_start(dev, path)) && skb_queue_len(&path->queue) < IPOIB_MAX_PATH_REC_QUEUE) { __skb_queue_tail(&path->queue, skb); } else { ++dev->stats.tx_dropped; dev_kfree_skb_any(skb); } so eventually the traffic should resume once the ND state machine sends a probe, agree? did you only wanted to make that faster? Or. > Fix this by freeing the neighbour structures when a path rec query fails, so that the next packet queued to be sent will trigger a new path record query. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 26, 2013 at 2:32 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote: > Looking on the flow of sending ARP probes, I see that in unicast_arp_send, > if there's no AH for the path, a path query is initiated, > > if (path->ah) { > ipoib_dbg(priv, "Send unicast ARP to %04x\n", > be16_to_cpu(path->pathrec.dlid)); > > spin_unlock_irqrestore(&priv->lock, flags); > ipoib_send(dev, skb, path->ah, IPOIB_QPN(cb->hwaddr)); > return; > } else if ((path->query || !path_rec_start(dev, path)) && > skb_queue_len(&path->queue) < IPOIB_MAX_PATH_REC_QUEUE) { > __skb_queue_tail(&path->queue, skb); > } else { > ++dev->stats.tx_dropped; > dev_kfree_skb_any(skb); > } > > > so eventually the traffic should resume once the ND state machine sends a > probe, agree? did you only wanted to make that faster? Well, look at ipoib_start_xmit(): /* unicast, arrange "switch" according to probability */ switch (header->proto) { case htons(ETH_P_IP): case htons(ETH_P_IPV6): neigh = ipoib_neigh_get(dev, cb->hwaddr); if (unlikely(!neigh)) { neigh_add_path(skb, cb->hwaddr, dev); return NETDEV_TX_OK; } break; case htons(ETH_P_ARP): case htons(ETH_P_RARP): /* for unicast ARP and RARP should always perform path find */ unicast_arp_send(skb, dev, cb); return NETDEV_TX_OK; etc. So although you mention ND, we only call unicast_arp_send() for IPv4 ARP. And in our case we were actually using IPv6, which follows the main flow with neighbours even for neighbour discovery packets. So there's no way to ever recover from having the failed path record. In fact I bet this is why the bug has been there as long as it has been: almost no one is using IPv6 on IPoIB seriously, and IPv4 should work OK as you point out. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 26.02.2013 17:55, Roland Dreier wrote: [...] > In fact I bet this is why the bug has been there as long as it has > been: almost no one is using IPv6 on IPoIB seriously, and IPv4 should > work OK as you point out. Thanks a lot,.... Unfortunately, we are using IPoIB with IPv6 in production for the inter-VM and the internet gateway traffic. We had an issue last Friday were one of our gateway machines wasn't reachable anymore and pings didn't come through. We had to reload the ib_ipoib modules of nearly every server connected to that gateway. I'm so glad that we use SRP - all SRP connections survived. I don't have the time to care for IPoIB as well at ProfitBricks so I will encourage our networking teams to get their hands on IPoIB and linux-rdma communication. How to verify if we've hit this bug? How to reproduce this bug? Cheers, Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 66d6da9..8534afd 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -510,6 +510,9 @@ static void path_rec_completion(int status, spin_unlock_irqrestore(&priv->lock, flags); + if (IS_ERR_OR_NULL(ah)) + ipoib_del_neighs_by_gid(dev, path->pathrec.dgid.raw); + if (old_ah) ipoib_put_ah(old_ah);