Message ID | 60b4df0f7349570fe91b94eb8861043f0aba7cf2.camel@oracle.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | [v2,1/1] RDMA/addr: Refresh neighbour entries upon "rdma_resolve_addr" | expand |
On Thu, Jun 16, 2022 at 08:57:14AM -0700, Gerd Rausch wrote: > Unlike with IPv[46], where "ip_finish_output2" triggers > a refresh of STALE neighbour entries via "neigh_output", > "rdma_resolve_addr" never triggers an update. Why? We alread call neigh_event_send() right after this in addr_resolve_neigh(), and this seems to ignore the input resolve_neigh to addr_resolve() ? I think there is more going on here than this commit message is explaining. Jason
Hi Jason, On Fri, 2022-06-24 at 20:11 -0300, Jason Gunthorpe wrote: > On Thu, Jun 16, 2022 at 08:57:14AM -0700, Gerd Rausch wrote: > > Unlike with IPv[46], where "ip_finish_output2" triggers > > a refresh of STALE neighbour entries via "neigh_output", > > "rdma_resolve_addr" never triggers an update. > > Why? We alread call neigh_event_send() right after this in > addr_resolve_neigh(), and this seems to ignore the input resolve_neigh > to addr_resolve() ? > This in "dst_fetch_ha"?: --------%<--------%<--------%<--------%<--------%<-------- if (!(n->nud_state & NUD_VALID)) { neigh_event_send(n, NULL); ret = -ENODATA; --------%<--------%<--------%<--------%<--------%<-------- With: #define NUD_VALID (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE|NUD_PROBE |NUD_STALE|NUD_DELAY) and the knowledge that the ARP entry is NUD_STALE, how would that function be called and perform the necessary refresh? > I think there is more going on here than this commit message is > explaining. > If the intention of the above mentioned "neigh_event_send" was to refresh stale entries, then the "if" condition ought to change, no? Thanks, Gerd
On Fri, Jun 24, 2022 at 05:03:23PM -0700, Gerd Rausch wrote: > Hi Jason, > > On Fri, 2022-06-24 at 20:11 -0300, Jason Gunthorpe wrote: > > On Thu, Jun 16, 2022 at 08:57:14AM -0700, Gerd Rausch wrote: > > > Unlike with IPv[46], where "ip_finish_output2" triggers > > > a refresh of STALE neighbour entries via "neigh_output", > > > "rdma_resolve_addr" never triggers an update. > > > > Why? We alread call neigh_event_send() right after this in > > addr_resolve_neigh(), and this seems to ignore the input resolve_neigh > > to addr_resolve() ? > > > > This in "dst_fetch_ha"?: > --------%<--------%<--------%<--------%<--------%<-------- > if (!(n->nud_state & NUD_VALID)) { > neigh_event_send(n, NULL); > ret = -ENODATA; > --------%<--------%<--------%<--------%<--------%<-------- > > With: > #define NUD_VALID (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE|NUD_PROBE > |NUD_STALE|NUD_DELAY) > > and the knowledge that the ARP entry is NUD_STALE, > how would that function be called and perform the necessary refresh? > > > > I think there is more going on here than this commit message is > > explaining. > > > > If the intention of the above mentioned "neigh_event_send" was to > refresh stale entries, then the "if" condition ought to change, no? Maybe yes. If you are saying with this patch that neigh_event_send() should be called unconditionally for every 'packet' why not remove the test above instead of calling it twice? On the other hand I see many places checking for NUD_VALID before calling it. So, really the commit message needs to explain how neigh_event_send() is supposed to be used and explain that NUD_VALID is OK to drop. I'm having a hard time guessing from a quick look around. Jason
Hi Jason, On Fri, 2022-06-24 at 21:10 -0300, Jason Gunthorpe wrote: > On Fri, Jun 24, 2022 at 05:03:23PM -0700, Gerd Rausch wrote: [...] > > This in "dst_fetch_ha"?: > > --------%<--------%<--------%<--------%<--------%<-------- > > if (!(n->nud_state & NUD_VALID)) { > > neigh_event_send(n, NULL); > > ret = -ENODATA; > > --------%<--------%<--------%<--------%<--------%<-------- > > > > With: > > #define NUD_VALID (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE|NUD_PROBE > > > NUD_STALE|NUD_DELAY) > > > > and the knowledge that the ARP entry is NUD_STALE, > > how would that function be called and perform the necessary refresh? > > > > > > > I think there is more going on here than this commit message is > > > explaining. > > > > > > > If the intention of the above mentioned "neigh_event_send" was to > > refresh stale entries, then the "if" condition ought to change, no? > > Maybe yes. > > If you are saying with this patch that neigh_event_send() should be > called unconditionally for every 'packet' why not remove the test > above instead of calling it twice? > It looks like starting from the very first time when "neigh_event_send" was introduced here: 935ef2d7a291 RDMA/cma: Use neigh_event_send() to start neighbour discovery it was always strictly coupled to the "-ENODATA" case. In other words, I don't see how the STALE case was covered, and I'd have to guess wether the -ENODATA coupling was intentional or accidental. Now it may be perfectly possible to just make the "neigh_event_send" call above unconditional and call it a day. I mean, simpler fixes always win over more complex ones. But, I personally don't know the twists & turns of this code well enough to be able to assert that this won't leave any non-covered conditional corner cases. I certainly hadn't tested that. > On the other hand I see many places checking for NUD_VALID before > calling it. > > So, really the commit message needs to explain how neigh_event_send() > is supposed to be used and explain that NUD_VALID is OK to drop. > Some places do check for NUD_VALID (e.g. "__ip_do_redirect"), many others don't (e.g. "__teql_resolve", or "neigh_resolve_output" itself). Even in the case of "dst_fetch_ha": I can not state whether the check for !NUD_VALUD was done intentionally or not. I can observe though that the side-effect of that check is that STALE entries won't get refreshed. So something is clearly off here. Let's look "__ip_do_redirect": I would have to guess why that check for !NUD_VALID is there. It's been there as far back as the GIT history goes. But I prefer not having to put guess-work into my Commit-Log. That said, I don't mind putting guess-work in this e-mail though: Perhaps the author(s) knew that the code would have to go through "ip_finish_output2" (or whatever the predecessor's name was) eventually, and therefore it was okay to kick off a refresh only if !NUD_VALID. At the end of the day, if the desired behavior is to always refresh STALE entries, we should do so. Calling "neigh_event_send" for !NUD_VALID only won't achieve that. > I'm having a hard time guessing from a quick look around. > Thanks for looking though, Gerd
Hi Jason, On Fri, 2022-06-24 at 17:55 -0700, Gerd Rausch wrote: > In other words, I don't see how the STALE case was covered, and I'd have > to guess wether the -ENODATA coupling was intentional or accidental. > > Now it may be perfectly possible to just make the "neigh_event_send" > call above unconditional and call it a day. > > I mean, simpler fixes always win over more complex ones. > > But, I personally don't know the twists & turns of this code well enough > to be able to assert that this won't leave any non-covered conditional > corner cases. I certainly hadn't tested that. > I finally got around to trying the much simpler fix, and it appears to work just as well: --------%<--------%<--------%<--------%<--------%<--------%<-------- --- drivers/infiniband/core/addr.c- +++ drivers/infiniband/core/addr.c @@ -336,11 +336,11 @@ static int dst_fetch_ha(const struct dst return -ENODATA; if (!(n->nud_state & NUD_VALID)) { - neigh_event_send(n, NULL); ret = -ENODATA; } else { neigh_ha_snapshot(dev_addr->dst_dev_addr, n, dst->dev); } + neigh_event_send(n, NULL); neigh_release(n); --------%<--------%<--------%<--------%<--------%<--------%<-------- Tested with IPv4 only, but this should work just as well with IPv6: STALE neighbor entries transition to DELAY -> REACHABLE with this change, i.e. the entries get updated. Thanks, Gerd
On Thu, Jul 28, 2022 at 10:55:39AM -0700, Gerd Rausch wrote: > Hi Jason, > > On Fri, 2022-06-24 at 17:55 -0700, Gerd Rausch wrote: > > In other words, I don't see how the STALE case was covered, and I'd have > > to guess wether the -ENODATA coupling was intentional or accidental. > > > > Now it may be perfectly possible to just make the "neigh_event_send" > > call above unconditional and call it a day. > > > > I mean, simpler fixes always win over more complex ones. > > > > But, I personally don't know the twists & turns of this code well enough > > to be able to assert that this won't leave any non-covered conditional > > corner cases. I certainly hadn't tested that. > > > > I finally got around to trying the much simpler fix, and it appears to > work just as well: > > --------%<--------%<--------%<--------%<--------%<--------%<-------- > --- drivers/infiniband/core/addr.c- > +++ drivers/infiniband/core/addr.c > @@ -336,11 +336,11 @@ static int dst_fetch_ha(const struct dst > return -ENODATA; > > if (!(n->nud_state & NUD_VALID)) { > - neigh_event_send(n, NULL); > ret = -ENODATA; > } else { > neigh_ha_snapshot(dev_addr->dst_dev_addr, n, dst->dev); > } > + neigh_event_send(n, NULL); > > neigh_release(n); > > --------%<--------%<--------%<--------%<--------%<--------%<-------- > > Tested with IPv4 only, but this should work just as well with IPv6: > > STALE neighbor entries transition to DELAY -> REACHABLE with this > change, i.e. the entries get updated. It seems OK, but I would still like to see an attempt to explain how nud_state and neigh_event_send is supposed to be used.. Maybe futile, but lets try at least. Jason
diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index f253295795f0..704dc9cc130e 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -394,6 +394,8 @@ static int addr4_resolve(struct sockaddr *src_sock, __be32 dst_ip = dst_in->sin_addr.s_addr; struct rtable *rt; struct flowi4 fl4; + struct net_device *dev; + struct neighbour *neigh; int ret; memset(&fl4, 0, sizeof(fl4)); @@ -409,6 +411,24 @@ static int addr4_resolve(struct sockaddr *src_sock, addr->hoplimit = ip4_dst_hoplimit(&rt->dst); + /* trigger ARP-entry refresh if necessary, + * the same way "ip_finish_output2" does + */ + if (addr->bound_dev_if) { + dev = dev_get_by_index(addr->net, addr->bound_dev_if); + } else { + dev = rt->dst.dev; + dev_hold(dev); + } + if (dev) { + rcu_read_lock_bh(); + neigh = __ipv4_neigh_lookup_noref(dev, (__force u32)dst_ip); + if (neigh) + neigh_event_send(neigh, NULL); + rcu_read_unlock_bh(); + dev_put(dev); + } + *prt = rt; return 0; } @@ -424,6 +444,8 @@ static int addr6_resolve(struct sockaddr *src_sock, (const struct sockaddr_in6 *)dst_sock; struct flowi6 fl6; struct dst_entry *dst; + struct net_device *dev; + struct neighbour *neigh; memset(&fl6, 0, sizeof fl6); fl6.daddr = dst_in->sin6_addr; @@ -439,6 +461,24 @@ static int addr6_resolve(struct sockaddr *src_sock, addr->hoplimit = ip6_dst_hoplimit(dst); + /* trigger neighbour-entry refresh if necessary, + * the same way "ip6_finish_output2" does + */ + if (addr->bound_dev_if) { + dev = dev_get_by_index(addr->net, addr->bound_dev_if); + } else { + dev = dst->dev; + dev_hold(dev); + } + if (dev) { + rcu_read_lock_bh(); + neigh = __ipv6_neigh_lookup_noref(dst->dev, &dst_in->sin6_addr); + if (neigh) + neigh_event_send(neigh, NULL); + rcu_read_unlock_bh(); + dev_put(dev); + } + *pdst = dst; return 0; }
Unlike with IPv[46], where "ip_finish_output2" triggers a refresh of STALE neighbour entries via "neigh_output", "rdma_resolve_addr" never triggers an update. If a wrong STALE entry ever enters the cache, it'll remain wrong forever (unless refreshed via TCP/IP, or otherwise). Let the cache inconsistency resolve itself by triggering an update from "rdma_resolve_addr". Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> --- v2: Add a "(__force u32)" cast for "__ipv4_neigh_lookup_noref" drivers/infiniband/core/addr.c | 40 ++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+)