Message ID | 1465042524-25852-3-git-send-email-leon@kernel.org (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On Sat, Jun 4, 2016 at 3:15 PM, Leon Romanovsky <leon@kernel.org> wrote: > From: Erez Shitrit <erezsh@mellanox.com> > > ipoib_neigh_get whenever tries to access specific neigh updates its > "alive" variable member, the neigh now considered as updated and won't > be cleaned by the GC process even if it needs to be resolved again. Erez, After reading the above few times, I think you wanted to say something like: Whenever ipoib_neigh_get tries to access a specific neigh, it updates the "alive" variable member. The neigh now considered as updated and won't be cleaned by the GC process even if it needs to be resolved again. correct? > Fixes: b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path") > Signed-off-by: Erez Shitrit <erezsh@mellanox.com> > Signed-off-by: Leon Romanovsky <leon@kernel.org> > --- > drivers/infiniband/ulp/ipoib/ipoib_main.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c > index 2d7c163..c558c32 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c > @@ -1206,7 +1206,9 @@ struct ipoib_neigh *ipoib_neigh_get(struct net_device *dev, u8 *daddr) > neigh = NULL; > goto out_unlock; > } > - neigh->alive = jiffies; > + > + if (likely(skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE)) > + neigh->alive = jiffies; why testing the queue len makes things right? > goto out_unlock; > } > } -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 6/5/2016 5:10 PM, Or Gerlitz wrote: >> - neigh->alive = jiffies; >> + >> + if (likely(skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE)) >> + neigh->alive = jiffies; > > why testing the queue len makes things right? I'm with Or on this one. This test doesn't make sense unless you assume that the neighbor is being hit with a steady stream of packets. If someone just runs ping -c 1 to a dead neighbor, this test will fail. The idea of the patch is OK, but the test needs to be reworked for situations other than a blasting stream of data.
On Tue, Jun 7, 2016 at 2:59 AM, Doug Ledford <dledford@redhat.com> wrote: > On 6/5/2016 5:10 PM, Or Gerlitz wrote: > >>> - neigh->alive = jiffies; >>> + >>> + if (likely(skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE)) >>> + neigh->alive = jiffies; >> >> why testing the queue len makes things right? > > I'm with Or on this one. This test doesn't make sense unless you assume > that the neighbor is being hit with a steady stream of packets. If > someone just runs ping -c 1 to a dead neighbor, this test will fail. > The idea of the patch is OK, but the test needs to be reworked for > situations other than a blasting stream of data. > I will try to explain the idea behind that test, and why it works (I hope I'll make it clear this time :)) there are 2 objects that are taking place here, the garbage-collector that removes neigh if was not used for defined time, and the path-query for each neigh. if the path-query failed the packets for specific neigh are kept in the neigh queue, but only if there are no more than IPOIB_MAX_PATH_REC_QUEUE, and the neigh is still kept for ever (let's assume CM connected that was stopped without any notification with DREQ etc.) The only way to know that there is an unresolved neigh is by checking its queue, if it is full we can assume that this is an unresolved neigh otherwise it is resolved. So, the test doesn't take any assumption on the shape of the traffic, even in a ping requests once a second it will failed after 3 times the request was sent, and the neigh time will not be updated which will leave it to the garbage-collector to delete it, that gives the driver the possibility to recreate the neigh and to update its ah/pr, otherwise the driver will try over and over to resend to that unresolved neigh. > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 6/7/2016 2:01 AM, Erez Shitrit wrote: > On Tue, Jun 7, 2016 at 2:59 AM, Doug Ledford <dledford@redhat.com> wrote: >> On 6/5/2016 5:10 PM, Or Gerlitz wrote: >> >>>> - neigh->alive = jiffies; >>>> + >>>> + if (likely(skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE)) >>>> + neigh->alive = jiffies; >>> >>> why testing the queue len makes things right? >> >> I'm with Or on this one. This test doesn't make sense unless you assume >> that the neighbor is being hit with a steady stream of packets. If >> someone just runs ping -c 1 to a dead neighbor, this test will fail. >> The idea of the patch is OK, but the test needs to be reworked for >> situations other than a blasting stream of data. >> > > I will try to explain the idea behind that test, and why it works (I > hope I'll make it clear this time :)) > > there are 2 objects that are taking place here, the garbage-collector > that removes neigh if was not used for defined time, and the > path-query for each neigh. > if the path-query failed the packets for specific neigh are kept in > the neigh queue, but only if there are no more than > IPOIB_MAX_PATH_REC_QUEUE, and the neigh is still kept for ever (let's > assume CM connected that was stopped without any notification with > DREQ etc.) > The only way to know that there is an unresolved neigh is by checking > its queue, if it is full we can assume that this is an unresolved > neigh otherwise it is resolved. That's not true. Reread what I wrote above (I was specific when I picked that command). Ping -c 1 will only send one ping. It will not trip this test. As I said, your test relies on a stream of packets, a single packet to a gone neighbor will never cause it to get deleted. You need a timeout on the queue and then if the packet in the queue times out, regardless of whether the queue is full or not, then you mark the neighbor for garbage collection and drop the packet(s). > So, the test doesn't take any assumption on the shape of the traffic, > even in a ping requests once a second it will failed after 3 times the > request was sent, and the neigh time will not be updated which will > leave it to the garbage-collector to delete it, that gives the driver > the possibility to recreate the neigh and to update its ah/pr, > otherwise the driver will try over and over to resend to that > unresolved neigh. > >>
On Tue, Jun 7, 2016 at 4:48 PM, Doug Ledford <dledford@redhat.com> wrote: > On 6/7/2016 2:01 AM, Erez Shitrit wrote: >> On Tue, Jun 7, 2016 at 2:59 AM, Doug Ledford <dledford@redhat.com> wrote: >>> On 6/5/2016 5:10 PM, Or Gerlitz wrote: >>> >>>>> - neigh->alive = jiffies; >>>>> + >>>>> + if (likely(skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE)) >>>>> + neigh->alive = jiffies; >>>> >>>> why testing the queue len makes things right? >>> >>> I'm with Or on this one. This test doesn't make sense unless you assume >>> that the neighbor is being hit with a steady stream of packets. If >>> someone just runs ping -c 1 to a dead neighbor, this test will fail. >>> The idea of the patch is OK, but the test needs to be reworked for >>> situations other than a blasting stream of data. >>> >> >> I will try to explain the idea behind that test, and why it works (I >> hope I'll make it clear this time :)) >> >> there are 2 objects that are taking place here, the garbage-collector >> that removes neigh if was not used for defined time, and the >> path-query for each neigh. >> if the path-query failed the packets for specific neigh are kept in >> the neigh queue, but only if there are no more than >> IPOIB_MAX_PATH_REC_QUEUE, and the neigh is still kept for ever (let's >> assume CM connected that was stopped without any notification with >> DREQ etc.) >> The only way to know that there is an unresolved neigh is by checking >> its queue, if it is full we can assume that this is an unresolved >> neigh otherwise it is resolved. > > That's not true. Reread what I wrote above (I was specific when I > picked that command). Ping -c 1 will only send one ping. It will not > trip this test. As I said, your test relies on a stream of packets, a > single packet to a gone neighbor will never cause it to get deleted. If you ping one time (ping -c 1) there is no problem at all, the neigh will be deleted by the GC anyway, because no (unresolved) packet updates the neigh validity and the GC will check the last time it was refreshed and since only one ping refreshed it long ago, it will be deleted. (the GC always running) the problem is when there are non stop traffic to that neigh. IMHO that test works for that. > You need a timeout on the queue and then if the packet in the queue > times out, regardless of whether the queue is full or not, then you mark > the neighbor for garbage collection and drop the packet(s). > >> So, the test doesn't take any assumption on the shape of the traffic, >> even in a ping requests once a second it will failed after 3 times the >> request was sent, and the neigh time will not be updated which will >> leave it to the garbage-collector to delete it, that gives the driver >> the possibility to recreate the neigh and to update its ah/pr, >> otherwise the driver will try over and over to resend to that >> unresolved neigh. >> >>> > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 6/7/2016 10:32 AM, Erez Shitrit wrote: > On Tue, Jun 7, 2016 at 4:48 PM, Doug Ledford <dledford@redhat.com> wrote: >> On 6/7/2016 2:01 AM, Erez Shitrit wrote: >>> On Tue, Jun 7, 2016 at 2:59 AM, Doug Ledford <dledford@redhat.com> wrote: >>>> On 6/5/2016 5:10 PM, Or Gerlitz wrote: >>>> >>>>>> - neigh->alive = jiffies; >>>>>> + >>>>>> + if (likely(skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE)) >>>>>> + neigh->alive = jiffies; >>>>> >>>>> why testing the queue len makes things right? >>>> >>>> I'm with Or on this one. This test doesn't make sense unless you assume >>>> that the neighbor is being hit with a steady stream of packets. If >>>> someone just runs ping -c 1 to a dead neighbor, this test will fail. >>>> The idea of the patch is OK, but the test needs to be reworked for >>>> situations other than a blasting stream of data. >>>> >>> >>> I will try to explain the idea behind that test, and why it works (I >>> hope I'll make it clear this time :)) >>> >>> there are 2 objects that are taking place here, the garbage-collector >>> that removes neigh if was not used for defined time, and the >>> path-query for each neigh. >>> if the path-query failed the packets for specific neigh are kept in >>> the neigh queue, but only if there are no more than >>> IPOIB_MAX_PATH_REC_QUEUE, and the neigh is still kept for ever (let's >>> assume CM connected that was stopped without any notification with >>> DREQ etc.) >>> The only way to know that there is an unresolved neigh is by checking >>> its queue, if it is full we can assume that this is an unresolved >>> neigh otherwise it is resolved. >> >> That's not true. Reread what I wrote above (I was specific when I >> picked that command). Ping -c 1 will only send one ping. It will not >> trip this test. As I said, your test relies on a stream of packets, a >> single packet to a gone neighbor will never cause it to get deleted. > > If you ping one time (ping -c 1) there is no problem at all, the neigh > will be deleted by the GC anyway, because no (unresolved) packet > updates the neigh validity and the GC will check the last time it was > refreshed and since only one ping refreshed it long ago, it will be > deleted. (the GC always running) > > the problem is when there are non stop traffic to that neigh. IMHO > that test works for that. Ok, that makes sense. >> You need a timeout on the queue and then if the packet in the queue >> times out, regardless of whether the queue is full or not, then you mark >> the neighbor for garbage collection and drop the packet(s). >> >>> So, the test doesn't take any assumption on the shape of the traffic, >>> even in a ping requests once a second it will failed after 3 times the >>> request was sent, and the neigh time will not be updated which will >>> leave it to the garbage-collector to delete it, that gives the driver >>> the possibility to recreate the neigh and to update its ah/pr, >>> otherwise the driver will try over and over to resend to that >>> unresolved neigh. >>> >>>> >> >>
On 6/7/2016 10:43 AM, Doug Ledford wrote: > On 6/7/2016 10:32 AM, Erez Shitrit wrote: >> On Tue, Jun 7, 2016 at 4:48 PM, Doug Ledford <dledford@redhat.com> wrote: >>> On 6/7/2016 2:01 AM, Erez Shitrit wrote: >>>> On Tue, Jun 7, 2016 at 2:59 AM, Doug Ledford <dledford@redhat.com> wrote: >>>>> On 6/5/2016 5:10 PM, Or Gerlitz wrote: >>>>> >>>>>>> - neigh->alive = jiffies; >>>>>>> + >>>>>>> + if (likely(skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE)) >>>>>>> + neigh->alive = jiffies; >>>>>> >>>>>> why testing the queue len makes things right? >>>>> >>>>> I'm with Or on this one. This test doesn't make sense unless you assume >>>>> that the neighbor is being hit with a steady stream of packets. If >>>>> someone just runs ping -c 1 to a dead neighbor, this test will fail. >>>>> The idea of the patch is OK, but the test needs to be reworked for >>>>> situations other than a blasting stream of data. >>>>> >>>> >>>> I will try to explain the idea behind that test, and why it works (I >>>> hope I'll make it clear this time :)) >>>> >>>> there are 2 objects that are taking place here, the garbage-collector >>>> that removes neigh if was not used for defined time, and the >>>> path-query for each neigh. >>>> if the path-query failed the packets for specific neigh are kept in >>>> the neigh queue, but only if there are no more than >>>> IPOIB_MAX_PATH_REC_QUEUE, and the neigh is still kept for ever (let's >>>> assume CM connected that was stopped without any notification with >>>> DREQ etc.) >>>> The only way to know that there is an unresolved neigh is by checking >>>> its queue, if it is full we can assume that this is an unresolved >>>> neigh otherwise it is resolved. >>> >>> That's not true. Reread what I wrote above (I was specific when I >>> picked that command). Ping -c 1 will only send one ping. It will not >>> trip this test. As I said, your test relies on a stream of packets, a >>> single packet to a gone neighbor will never cause it to get deleted. >> >> If you ping one time (ping -c 1) there is no problem at all, the neigh >> will be deleted by the GC anyway, because no (unresolved) packet >> updates the neigh validity and the GC will check the last time it was >> refreshed and since only one ping refreshed it long ago, it will be >> deleted. (the GC always running) >> >> the problem is when there are non stop traffic to that neigh. IMHO >> that test works for that. > > Ok, that makes sense. Sorry, in case it wasn't clear, I grabbed this patch now (although I reworded the commit message significantly).
On Tue, Jun 07, 2016 at 10:50:54AM -0400, Doug Ledford wrote: > On 6/7/2016 10:43 AM, Doug Ledford wrote: > > On 6/7/2016 10:32 AM, Erez Shitrit wrote: > >> On Tue, Jun 7, 2016 at 4:48 PM, Doug Ledford <dledford@redhat.com> wrote: > >>> On 6/7/2016 2:01 AM, Erez Shitrit wrote: > >>>> On Tue, Jun 7, 2016 at 2:59 AM, Doug Ledford <dledford@redhat.com> wrote: > >>>>> On 6/5/2016 5:10 PM, Or Gerlitz wrote: > >>>>> > >>>>>>> - neigh->alive = jiffies; > >>>>>>> + > >>>>>>> + if (likely(skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE)) > >>>>>>> + neigh->alive = jiffies; > >>>>>> > >>>>>> why testing the queue len makes things right? > >>>>> > >>>>> I'm with Or on this one. This test doesn't make sense unless you assume > >>>>> that the neighbor is being hit with a steady stream of packets. If > >>>>> someone just runs ping -c 1 to a dead neighbor, this test will fail. > >>>>> The idea of the patch is OK, but the test needs to be reworked for > >>>>> situations other than a blasting stream of data. > >>>>> > >>>> > >>>> I will try to explain the idea behind that test, and why it works (I > >>>> hope I'll make it clear this time :)) > >>>> > >>>> there are 2 objects that are taking place here, the garbage-collector > >>>> that removes neigh if was not used for defined time, and the > >>>> path-query for each neigh. > >>>> if the path-query failed the packets for specific neigh are kept in > >>>> the neigh queue, but only if there are no more than > >>>> IPOIB_MAX_PATH_REC_QUEUE, and the neigh is still kept for ever (let's > >>>> assume CM connected that was stopped without any notification with > >>>> DREQ etc.) > >>>> The only way to know that there is an unresolved neigh is by checking > >>>> its queue, if it is full we can assume that this is an unresolved > >>>> neigh otherwise it is resolved. > >>> > >>> That's not true. Reread what I wrote above (I was specific when I > >>> picked that command). Ping -c 1 will only send one ping. It will not > >>> trip this test. As I said, your test relies on a stream of packets, a > >>> single packet to a gone neighbor will never cause it to get deleted. > >> > >> If you ping one time (ping -c 1) there is no problem at all, the neigh > >> will be deleted by the GC anyway, because no (unresolved) packet > >> updates the neigh validity and the GC will check the last time it was > >> refreshed and since only one ping refreshed it long ago, it will be > >> deleted. (the GC always running) > >> > >> the problem is when there are non stop traffic to that neigh. IMHO > >> that test works for that. > > > > Ok, that makes sense. > > Sorry, in case it wasn't clear, I grabbed this patch now (although I > reworded the commit message significantly). Thank you, I truly appreciate your help. > >
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 2d7c163..c558c32 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -1206,7 +1206,9 @@ struct ipoib_neigh *ipoib_neigh_get(struct net_device *dev, u8 *daddr) neigh = NULL; goto out_unlock; } - neigh->alive = jiffies; + + if (likely(skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE)) + neigh->alive = jiffies; goto out_unlock; } }