Message ID | 20230817014808.3494465-2-jrife@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | SUNRPC: Avoid address overwrite with eBPF NAT | expand |
> On Aug 16, 2023, at 9:48 PM, Jordan Rife <jrife@google.com> wrote: > > kernel_connect() will modify the rpc_xprt socket address in contexts > where eBPF programs perform NAT instead of iptables. In these contexts, > it is common for an NFS mount to be mounted to be a static virtual IP > while the server has an ephemeral IP leading to a problem where the > virtual IP gets overwritten and forgotten. When the endpoint IP changes, > reconnect attempts fail and the mount never recovers. > > This patch protects addr from being modified in these scenarios, allowing > NFS reconnects to work as intended. > > Link: https://github.com/cilium/cilium/issues/21541#issuecomment-1386857338 > Signed-off-by: Jordan Rife <jrife@google.com> Hello Jordan, since kernel_connect() is used exclusively by the RPC client, I suggest directing your patch to Trond and Anna. <trondmy@hammerspace.com <mailto:trondmy@hammerspace.com>> <anna@kernel.org <mailto:anna@kernel.org>> Does the RPC/RDMA client also have this issue? It does not use kernel_connect(). > --- > include/linux/sunrpc/xprt.h | 1 + > net/sunrpc/xprtsock.c | 17 +++++++++++++++-- > 2 files changed, 16 insertions(+), 2 deletions(-) > > diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h > index b52411bcfe4e7..ddde79b025c53 100644 > --- a/include/linux/sunrpc/xprt.h > +++ b/include/linux/sunrpc/xprt.h > @@ -211,6 +211,7 @@ struct rpc_xprt { > > const struct rpc_timeout *timeout; /* timeout parms */ > struct sockaddr_storage addr; /* server address */ > + struct sockaddr_storage m_addr; /* mutable server address */ > size_t addrlen; /* size of server address */ > int prot; /* IP protocol */ > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index 9f010369100a2..4100e0bf5ebba 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -236,6 +236,18 @@ static inline struct sockaddr *xs_addr(struct rpc_xprt *xprt) > return (struct sockaddr *) &xprt->addr; > } > > +static inline struct sockaddr *xs_m_addr(struct rpc_xprt *xprt) > +{ > + /* kernel_connect() may modify the address in contexts where NAT is > + * performed by eBPF programs instead of iptables. Make a copy to ensure > + * that our original address, xprt->addr, is not modified. Without this, > + * NFS reconnects may fail if the endpoint address changes. > + */ > + memcpy(&xprt->m_addr, &xprt->addr, xprt->addrlen); > + > + return (struct sockaddr *) &xprt->m_addr; > +} > + > static inline struct sockaddr_un *xs_addr_un(struct rpc_xprt *xprt) > { > return (struct sockaddr_un *) &xprt->addr; > @@ -1954,7 +1966,7 @@ static int xs_local_finish_connecting(struct rpc_xprt *xprt, > > xs_stream_start_connect(transport); > > - return kernel_connect(sock, xs_addr(xprt), xprt->addrlen, 0); > + return kernel_connect(sock, xs_m_addr(xprt), xprt->addrlen, 0); > } > > /** > @@ -2334,7 +2346,8 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock) > > /* Tell the socket layer to start connecting... */ > set_bit(XPRT_SOCK_CONNECTING, &transport->sock_state); > - return kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK); > + > + return kernel_connect(sock, xs_m_addr(xprt), xprt->addrlen, O_NONBLOCK); > } > > /** > -- > 2.42.0.rc1.204.g551eb34607-goog > -- Chuck Lever
On Wed, 2023-08-16 at 20:48 -0500, Jordan Rife wrote: > [You don't often get email from jrife@google.com. Learn why this is > important at https://aka.ms/LearnAboutSenderIdentification ] > > kernel_connect() will modify the rpc_xprt socket address in contexts > where eBPF programs perform NAT instead of iptables. In these > contexts, > it is common for an NFS mount to be mounted to be a static virtual IP > while the server has an ephemeral IP leading to a problem where the > virtual IP gets overwritten and forgotten. When the endpoint IP > changes, > reconnect attempts fail and the mount never recovers. > > This patch protects addr from being modified in these scenarios, > allowing > NFS reconnects to work as intended. What? No! A connect() call should not be allowed to modify its own call parameters.
On Thu, 2023-08-17 at 02:09 +0000, Trond Myklebust wrote: > On Wed, 2023-08-16 at 20:48 -0500, Jordan Rife wrote: > > [You don't often get email from jrife@google.com. Learn why this is > > important at https://aka.ms/LearnAboutSenderIdentification ] > > > > kernel_connect() will modify the rpc_xprt socket address in > > contexts > > where eBPF programs perform NAT instead of iptables. In these > > contexts, > > it is common for an NFS mount to be mounted to be a static virtual > > IP > > while the server has an ephemeral IP leading to a problem where the > > virtual IP gets overwritten and forgotten. When the endpoint IP > > changes, > > reconnect attempts fail and the mount never recovers. > > > > This patch protects addr from being modified in these scenarios, > > allowing > > NFS reconnects to work as intended. > > What? No! A connect() call should not be allowed to modify its own > call > parameters. > To put it more succinctly, the struct rpc_xprt is one of many private kernel structures. Parts of it can be exposed through public APIs, such as the sysfs API that we're building, but when you use eBPF to hack your way around those public APIs, then you're on your own. We're not going to commit to support your hacks.
diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index b52411bcfe4e7..ddde79b025c53 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -211,6 +211,7 @@ struct rpc_xprt { const struct rpc_timeout *timeout; /* timeout parms */ struct sockaddr_storage addr; /* server address */ + struct sockaddr_storage m_addr; /* mutable server address */ size_t addrlen; /* size of server address */ int prot; /* IP protocol */ diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 9f010369100a2..4100e0bf5ebba 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -236,6 +236,18 @@ static inline struct sockaddr *xs_addr(struct rpc_xprt *xprt) return (struct sockaddr *) &xprt->addr; } +static inline struct sockaddr *xs_m_addr(struct rpc_xprt *xprt) +{ + /* kernel_connect() may modify the address in contexts where NAT is + * performed by eBPF programs instead of iptables. Make a copy to ensure + * that our original address, xprt->addr, is not modified. Without this, + * NFS reconnects may fail if the endpoint address changes. + */ + memcpy(&xprt->m_addr, &xprt->addr, xprt->addrlen); + + return (struct sockaddr *) &xprt->m_addr; +} + static inline struct sockaddr_un *xs_addr_un(struct rpc_xprt *xprt) { return (struct sockaddr_un *) &xprt->addr; @@ -1954,7 +1966,7 @@ static int xs_local_finish_connecting(struct rpc_xprt *xprt, xs_stream_start_connect(transport); - return kernel_connect(sock, xs_addr(xprt), xprt->addrlen, 0); + return kernel_connect(sock, xs_m_addr(xprt), xprt->addrlen, 0); } /** @@ -2334,7 +2346,8 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock) /* Tell the socket layer to start connecting... */ set_bit(XPRT_SOCK_CONNECTING, &transport->sock_state); - return kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK); + + return kernel_connect(sock, xs_m_addr(xprt), xprt->addrlen, O_NONBLOCK); } /**
kernel_connect() will modify the rpc_xprt socket address in contexts where eBPF programs perform NAT instead of iptables. In these contexts, it is common for an NFS mount to be mounted to be a static virtual IP while the server has an ephemeral IP leading to a problem where the virtual IP gets overwritten and forgotten. When the endpoint IP changes, reconnect attempts fail and the mount never recovers. This patch protects addr from being modified in these scenarios, allowing NFS reconnects to work as intended. Link: https://github.com/cilium/cilium/issues/21541#issuecomment-1386857338 Signed-off-by: Jordan Rife <jrife@google.com> --- include/linux/sunrpc/xprt.h | 1 + net/sunrpc/xprtsock.c | 17 +++++++++++++++-- 2 files changed, 16 insertions(+), 2 deletions(-)