diff mbox series

[v2,1/1] RDMA/addr: Refresh neighbour entries upon "rdma_resolve_addr"

Message ID 60b4df0f7349570fe91b94eb8861043f0aba7cf2.camel@oracle.com (mailing list archive)
State Changes Requested
Delegated to: Jason Gunthorpe
Headers show
Series [v2,1/1] RDMA/addr: Refresh neighbour entries upon "rdma_resolve_addr" | expand

Commit Message

Gerd Rausch June 16, 2022, 3:57 p.m. UTC
Unlike with IPv[46], where "ip_finish_output2" triggers
a refresh of STALE neighbour entries via "neigh_output",
"rdma_resolve_addr" never triggers an update.

If a wrong STALE entry ever enters the cache, it'll remain
wrong forever (unless refreshed via TCP/IP, or otherwise).

Let the cache inconsistency resolve itself by triggering
an update from "rdma_resolve_addr".

Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
---

v2: Add a "(__force u32)" cast for "__ipv4_neigh_lookup_noref"

 drivers/infiniband/core/addr.c | 40 ++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

Comments

Jason Gunthorpe June 24, 2022, 11:11 p.m. UTC | #1
On Thu, Jun 16, 2022 at 08:57:14AM -0700, Gerd Rausch wrote:
> Unlike with IPv[46], where "ip_finish_output2" triggers
> a refresh of STALE neighbour entries via "neigh_output",
> "rdma_resolve_addr" never triggers an update.

Why? We alread call neigh_event_send() right after this in
addr_resolve_neigh(), and this seems to ignore the input resolve_neigh
to addr_resolve() ?

I think there is more going on here than this commit message is
explaining.

Jason
Gerd Rausch June 25, 2022, 12:03 a.m. UTC | #2
Hi Jason,

On Fri, 2022-06-24 at 20:11 -0300, Jason Gunthorpe wrote:
> On Thu, Jun 16, 2022 at 08:57:14AM -0700, Gerd Rausch wrote:
> > Unlike with IPv[46], where "ip_finish_output2" triggers
> > a refresh of STALE neighbour entries via "neigh_output",
> > "rdma_resolve_addr" never triggers an update.
> 
> Why? We alread call neigh_event_send() right after this in
> addr_resolve_neigh(), and this seems to ignore the input resolve_neigh
> to addr_resolve() ?
> 

This in "dst_fetch_ha"?:
--------%<--------%<--------%<--------%<--------%<--------
        if (!(n->nud_state & NUD_VALID)) {
                neigh_event_send(n, NULL);
                ret = -ENODATA;
--------%<--------%<--------%<--------%<--------%<--------

With:
#define NUD_VALID	(NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE|NUD_PROBE
|NUD_STALE|NUD_DELAY)

and the knowledge that the ARP entry is NUD_STALE,
how would that function be called and perform the necessary refresh?


> I think there is more going on here than this commit message is
> explaining.
> 

If the intention of the above mentioned "neigh_event_send" was to
refresh stale entries, then the "if" condition ought to change, no?

Thanks,

  Gerd
Jason Gunthorpe June 25, 2022, 12:10 a.m. UTC | #3
On Fri, Jun 24, 2022 at 05:03:23PM -0700, Gerd Rausch wrote:
> Hi Jason,
> 
> On Fri, 2022-06-24 at 20:11 -0300, Jason Gunthorpe wrote:
> > On Thu, Jun 16, 2022 at 08:57:14AM -0700, Gerd Rausch wrote:
> > > Unlike with IPv[46], where "ip_finish_output2" triggers
> > > a refresh of STALE neighbour entries via "neigh_output",
> > > "rdma_resolve_addr" never triggers an update.
> > 
> > Why? We alread call neigh_event_send() right after this in
> > addr_resolve_neigh(), and this seems to ignore the input resolve_neigh
> > to addr_resolve() ?
> > 
> 
> This in "dst_fetch_ha"?:
> --------%<--------%<--------%<--------%<--------%<--------
>         if (!(n->nud_state & NUD_VALID)) {
>                 neigh_event_send(n, NULL);
>                 ret = -ENODATA;
> --------%<--------%<--------%<--------%<--------%<--------
> 
> With:
> #define NUD_VALID	(NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE|NUD_PROBE
> |NUD_STALE|NUD_DELAY)
> 
> and the knowledge that the ARP entry is NUD_STALE,
> how would that function be called and perform the necessary refresh?
> 
> 
> > I think there is more going on here than this commit message is
> > explaining.
> > 
> 
> If the intention of the above mentioned "neigh_event_send" was to
> refresh stale entries, then the "if" condition ought to change, no?

Maybe yes.

If you are saying with this patch that neigh_event_send() should be
called unconditionally for every 'packet' why not remove the test
above instead of calling it twice?

On the other hand I see many places checking for NUD_VALID before
calling it.

So, really the commit message needs to explain how neigh_event_send()
is supposed to be used and explain that NUD_VALID is OK to drop.

I'm having a hard time guessing from a quick look around.

Jason
Gerd Rausch June 25, 2022, 12:55 a.m. UTC | #4
Hi Jason,

On Fri, 2022-06-24 at 21:10 -0300, Jason Gunthorpe wrote:
> On Fri, Jun 24, 2022 at 05:03:23PM -0700, Gerd Rausch wrote:
[...]
> > This in "dst_fetch_ha"?:
> > --------%<--------%<--------%<--------%<--------%<--------
> >         if (!(n->nud_state & NUD_VALID)) {
> >                 neigh_event_send(n, NULL);
> >                 ret = -ENODATA;
> > --------%<--------%<--------%<--------%<--------%<--------
> > 
> > With:
> > #define NUD_VALID	(NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE|NUD_PROBE
> > > NUD_STALE|NUD_DELAY)
> > 
> > and the knowledge that the ARP entry is NUD_STALE,
> > how would that function be called and perform the necessary refresh?
> > 
> > 
> > > I think there is more going on here than this commit message is
> > > explaining.
> > > 
> > 
> > If the intention of the above mentioned "neigh_event_send" was to
> > refresh stale entries, then the "if" condition ought to change, no?
> 
> Maybe yes.
> 
> If you are saying with this patch that neigh_event_send() should be
> called unconditionally for every 'packet' why not remove the test
> above instead of calling it twice?
> 

It looks like starting from the very first time when "neigh_event_send"
was introduced here:

935ef2d7a291 RDMA/cma: Use neigh_event_send() to start neighbour discovery

it was always strictly coupled to the "-ENODATA" case.

In other words, I don't see how the STALE case was covered, and I'd have
to guess wether the -ENODATA coupling was intentional or accidental.

Now it may be perfectly possible to just make the "neigh_event_send"
call above unconditional and call it a day.

I mean, simpler fixes always win over more complex ones.

But, I personally don't know the twists & turns of this code well enough
to be able to assert that this won't leave any non-covered conditional
corner cases. I certainly hadn't tested that.


> On the other hand I see many places checking for NUD_VALID before
> calling it.
> 
> So, really the commit message needs to explain how neigh_event_send()
> is supposed to be used and explain that NUD_VALID is OK to drop.
> 

Some places do check for NUD_VALID (e.g. "__ip_do_redirect"),
many others don't (e.g. "__teql_resolve", or "neigh_resolve_output"
itself).

Even in the case of "dst_fetch_ha":
I can not state whether the check for !NUD_VALUD was done intentionally
or not. I can observe though that the side-effect of that check is that
STALE entries won't get refreshed. So something is clearly off here.

Let's look "__ip_do_redirect":
I would have to guess why that check for !NUD_VALID is there.
It's been there as far back as the GIT history goes.
But I prefer not having to put guess-work into my Commit-Log.

That said, I don't mind putting guess-work in this e-mail though:
Perhaps the author(s) knew that the code would have to go through
"ip_finish_output2" (or whatever the predecessor's name was) eventually,
and therefore it was okay to kick off a refresh only if !NUD_VALID.

At the end of the day, if the desired behavior is to always refresh
STALE entries, we should do so.
Calling "neigh_event_send" for !NUD_VALID only won't achieve that.

> I'm having a hard time guessing from a quick look around.
> 

Thanks for looking though,

  Gerd
Gerd Rausch July 28, 2022, 5:55 p.m. UTC | #5
Hi Jason,

On Fri, 2022-06-24 at 17:55 -0700, Gerd Rausch wrote:
> In other words, I don't see how the STALE case was covered, and I'd have
> to guess wether the -ENODATA coupling was intentional or accidental.
> 
> Now it may be perfectly possible to just make the "neigh_event_send"
> call above unconditional and call it a day.
> 
> I mean, simpler fixes always win over more complex ones.
> 
> But, I personally don't know the twists & turns of this code well enough
> to be able to assert that this won't leave any non-covered conditional
> corner cases. I certainly hadn't tested that.
> 

I finally got around to trying the much simpler fix, and it appears to
work just as well:

--------%<--------%<--------%<--------%<--------%<--------%<--------
--- drivers/infiniband/core/addr.c-
+++ drivers/infiniband/core/addr.c
@@ -336,11 +336,11 @@ static int dst_fetch_ha(const struct dst
 		return -ENODATA;
 
 	if (!(n->nud_state & NUD_VALID)) {
-		neigh_event_send(n, NULL);
 		ret = -ENODATA;
 	} else {
 		neigh_ha_snapshot(dev_addr->dst_dev_addr, n, dst->dev);
 	}
+	neigh_event_send(n, NULL);
 
 	neigh_release(n);
 
--------%<--------%<--------%<--------%<--------%<--------%<--------

Tested with IPv4 only, but this should work just as well with IPv6:

STALE neighbor entries transition to DELAY -> REACHABLE with this
change, i.e. the entries get updated.

Thanks,

  Gerd
Jason Gunthorpe July 28, 2022, 7:24 p.m. UTC | #6
On Thu, Jul 28, 2022 at 10:55:39AM -0700, Gerd Rausch wrote:
> Hi Jason,
> 
> On Fri, 2022-06-24 at 17:55 -0700, Gerd Rausch wrote:
> > In other words, I don't see how the STALE case was covered, and I'd have
> > to guess wether the -ENODATA coupling was intentional or accidental.
> > 
> > Now it may be perfectly possible to just make the "neigh_event_send"
> > call above unconditional and call it a day.
> > 
> > I mean, simpler fixes always win over more complex ones.
> > 
> > But, I personally don't know the twists & turns of this code well enough
> > to be able to assert that this won't leave any non-covered conditional
> > corner cases. I certainly hadn't tested that.
> > 
> 
> I finally got around to trying the much simpler fix, and it appears to
> work just as well:
> 
> --------%<--------%<--------%<--------%<--------%<--------%<--------
> --- drivers/infiniband/core/addr.c-
> +++ drivers/infiniband/core/addr.c
> @@ -336,11 +336,11 @@ static int dst_fetch_ha(const struct dst
>  		return -ENODATA;
>  
>  	if (!(n->nud_state & NUD_VALID)) {
> -		neigh_event_send(n, NULL);
>  		ret = -ENODATA;
>  	} else {
>  		neigh_ha_snapshot(dev_addr->dst_dev_addr, n, dst->dev);
>  	}
> +	neigh_event_send(n, NULL);
>  
>  	neigh_release(n);
>  
> --------%<--------%<--------%<--------%<--------%<--------%<--------
> 
> Tested with IPv4 only, but this should work just as well with IPv6:
> 
> STALE neighbor entries transition to DELAY -> REACHABLE with this
> change, i.e. the entries get updated.

It seems OK, but I would still like to see an attempt to explain how
nud_state and neigh_event_send is supposed to be used.. Maybe futile,
but lets try at least.

Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index f253295795f0..704dc9cc130e 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -394,6 +394,8 @@  static int addr4_resolve(struct sockaddr *src_sock,
 	__be32 dst_ip = dst_in->sin_addr.s_addr;
 	struct rtable *rt;
 	struct flowi4 fl4;
+	struct net_device *dev;
+	struct neighbour *neigh;
 	int ret;
 
 	memset(&fl4, 0, sizeof(fl4));
@@ -409,6 +411,24 @@  static int addr4_resolve(struct sockaddr *src_sock,
 
 	addr->hoplimit = ip4_dst_hoplimit(&rt->dst);
 
+	/* trigger ARP-entry refresh if necessary,
+	 * the same way "ip_finish_output2" does
+	 */
+	if (addr->bound_dev_if) {
+		dev = dev_get_by_index(addr->net, addr->bound_dev_if);
+	} else {
+		dev = rt->dst.dev;
+		dev_hold(dev);
+	}
+	if (dev) {
+		rcu_read_lock_bh();
+		neigh = __ipv4_neigh_lookup_noref(dev, (__force u32)dst_ip);
+		if (neigh)
+			neigh_event_send(neigh, NULL);
+		rcu_read_unlock_bh();
+		dev_put(dev);
+	}
+
 	*prt = rt;
 	return 0;
 }
@@ -424,6 +444,8 @@  static int addr6_resolve(struct sockaddr *src_sock,
 				(const struct sockaddr_in6 *)dst_sock;
 	struct flowi6 fl6;
 	struct dst_entry *dst;
+	struct net_device *dev;
+	struct neighbour *neigh;
 
 	memset(&fl6, 0, sizeof fl6);
 	fl6.daddr = dst_in->sin6_addr;
@@ -439,6 +461,24 @@  static int addr6_resolve(struct sockaddr *src_sock,
 
 	addr->hoplimit = ip6_dst_hoplimit(dst);
 
+	/* trigger neighbour-entry refresh if necessary,
+	 * the same way "ip6_finish_output2" does
+	 */
+	if (addr->bound_dev_if) {
+		dev = dev_get_by_index(addr->net, addr->bound_dev_if);
+	} else {
+		dev = dst->dev;
+		dev_hold(dev);
+	}
+	if (dev) {
+		rcu_read_lock_bh();
+		neigh = __ipv6_neigh_lookup_noref(dst->dev, &dst_in->sin6_addr);
+		if (neigh)
+			neigh_event_send(neigh, NULL);
+		rcu_read_unlock_bh();
+		dev_put(dev);
+	}
+
 	*pdst = dst;
 	return 0;
 }