diff mbox series

SUNRPC: Avoid address overwrite with eBPF NAT

Message ID 20230817014808.3494465-2-jrife@google.com (mailing list archive)
State New, archived
Headers show
Series SUNRPC: Avoid address overwrite with eBPF NAT | expand

Commit Message

Jordan Rife Aug. 17, 2023, 1:48 a.m. UTC
kernel_connect() will modify the rpc_xprt socket address in contexts
where eBPF programs perform NAT instead of iptables. In these contexts,
it is common for an NFS mount to be mounted to be a static virtual IP
while the server has an ephemeral IP leading to a problem where the
virtual IP gets overwritten and forgotten. When the endpoint IP changes,
reconnect attempts fail and the mount never recovers.

This patch protects addr from being modified in these scenarios, allowing
NFS reconnects to work as intended.

Link: https://github.com/cilium/cilium/issues/21541#issuecomment-1386857338
Signed-off-by: Jordan Rife <jrife@google.com>
---
 include/linux/sunrpc/xprt.h |  1 +
 net/sunrpc/xprtsock.c       | 17 +++++++++++++++--
 2 files changed, 16 insertions(+), 2 deletions(-)

Comments

Chuck Lever Aug. 17, 2023, 2:07 a.m. UTC | #1
> On Aug 16, 2023, at 9:48 PM, Jordan Rife <jrife@google.com> wrote:
> 
> kernel_connect() will modify the rpc_xprt socket address in contexts
> where eBPF programs perform NAT instead of iptables. In these contexts,
> it is common for an NFS mount to be mounted to be a static virtual IP
> while the server has an ephemeral IP leading to a problem where the
> virtual IP gets overwritten and forgotten. When the endpoint IP changes,
> reconnect attempts fail and the mount never recovers.
> 
> This patch protects addr from being modified in these scenarios, allowing
> NFS reconnects to work as intended.
> 
> Link: https://github.com/cilium/cilium/issues/21541#issuecomment-1386857338
> Signed-off-by: Jordan Rife <jrife@google.com>

Hello Jordan, since kernel_connect() is used exclusively
by the RPC client, I suggest directing your patch to
Trond and Anna.

<trondmy@hammerspace.com <mailto:trondmy@hammerspace.com>>
<anna@kernel.org <mailto:anna@kernel.org>>

Does the RPC/RDMA client also have this issue? It does
not use kernel_connect().


> ---
> include/linux/sunrpc/xprt.h |  1 +
> net/sunrpc/xprtsock.c       | 17 +++++++++++++++--
> 2 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
> index b52411bcfe4e7..ddde79b025c53 100644
> --- a/include/linux/sunrpc/xprt.h
> +++ b/include/linux/sunrpc/xprt.h
> @@ -211,6 +211,7 @@ struct rpc_xprt {
> 
> const struct rpc_timeout *timeout; /* timeout parms */
> struct sockaddr_storage addr; /* server address */
> + struct sockaddr_storage m_addr;      /* mutable server address */
> size_t addrlen; /* size of server address */
> int prot; /* IP protocol */
> 
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 9f010369100a2..4100e0bf5ebba 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -236,6 +236,18 @@ static inline struct sockaddr *xs_addr(struct rpc_xprt *xprt)
> return (struct sockaddr *) &xprt->addr;
> }
> 
> +static inline struct sockaddr *xs_m_addr(struct rpc_xprt *xprt)
> +{
> +    /* kernel_connect() may modify the address in contexts where NAT is
> +     * performed by eBPF programs instead of iptables. Make a copy to ensure
> +     * that our original address, xprt->addr, is not modified. Without this,
> +     * NFS reconnects may fail if the endpoint address changes.
> +     */
> + memcpy(&xprt->m_addr, &xprt->addr, xprt->addrlen);
> +
> + return (struct sockaddr *) &xprt->m_addr;
> +}
> +
> static inline struct sockaddr_un *xs_addr_un(struct rpc_xprt *xprt)
> {
> return (struct sockaddr_un *) &xprt->addr;
> @@ -1954,7 +1966,7 @@ static int xs_local_finish_connecting(struct rpc_xprt *xprt,
> 
> xs_stream_start_connect(transport);
> 
> - return kernel_connect(sock, xs_addr(xprt), xprt->addrlen, 0);
> + return kernel_connect(sock, xs_m_addr(xprt), xprt->addrlen, 0);
> }
> 
> /**
> @@ -2334,7 +2346,8 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
> 
> /* Tell the socket layer to start connecting... */
> set_bit(XPRT_SOCK_CONNECTING, &transport->sock_state);
> - return kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK);
> +
> + return kernel_connect(sock, xs_m_addr(xprt), xprt->addrlen, O_NONBLOCK);
> }
> 
> /**
> -- 
> 2.42.0.rc1.204.g551eb34607-goog
> 

--
Chuck Lever
Trond Myklebust Aug. 17, 2023, 2:09 a.m. UTC | #2
On Wed, 2023-08-16 at 20:48 -0500, Jordan Rife wrote:
> [You don't often get email from jrife@google.com. Learn why this is
> important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> kernel_connect() will modify the rpc_xprt socket address in contexts
> where eBPF programs perform NAT instead of iptables. In these
> contexts,
> it is common for an NFS mount to be mounted to be a static virtual IP
> while the server has an ephemeral IP leading to a problem where the
> virtual IP gets overwritten and forgotten. When the endpoint IP
> changes,
> reconnect attempts fail and the mount never recovers.
> 
> This patch protects addr from being modified in these scenarios,
> allowing
> NFS reconnects to work as intended.

What? No! A connect() call should not be allowed to modify its own call
parameters.
Trond Myklebust Aug. 17, 2023, 2:29 a.m. UTC | #3
On Thu, 2023-08-17 at 02:09 +0000, Trond Myklebust wrote:
> On Wed, 2023-08-16 at 20:48 -0500, Jordan Rife wrote:
> > [You don't often get email from jrife@google.com. Learn why this is
> > important at https://aka.ms/LearnAboutSenderIdentification ]
> > 
> > kernel_connect() will modify the rpc_xprt socket address in
> > contexts
> > where eBPF programs perform NAT instead of iptables. In these
> > contexts,
> > it is common for an NFS mount to be mounted to be a static virtual
> > IP
> > while the server has an ephemeral IP leading to a problem where the
> > virtual IP gets overwritten and forgotten. When the endpoint IP
> > changes,
> > reconnect attempts fail and the mount never recovers.
> > 
> > This patch protects addr from being modified in these scenarios,
> > allowing
> > NFS reconnects to work as intended.
> 
> What? No! A connect() call should not be allowed to modify its own
> call
> parameters.
> 

To put it more succinctly, the struct rpc_xprt is one of many private
kernel structures. Parts of it can be exposed through public APIs, such
as the sysfs API that we're building, but when you use eBPF to hack
your way around those public APIs, then you're on your own. We're not
going to commit to support your hacks.
diff mbox series

Patch

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index b52411bcfe4e7..ddde79b025c53 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -211,6 +211,7 @@  struct rpc_xprt {
 
 	const struct rpc_timeout *timeout;	/* timeout parms */
 	struct sockaddr_storage	addr;		/* server address */
+	struct sockaddr_storage m_addr;      /* mutable server address */
 	size_t			addrlen;	/* size of server address */
 	int			prot;		/* IP protocol */
 
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 9f010369100a2..4100e0bf5ebba 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -236,6 +236,18 @@  static inline struct sockaddr *xs_addr(struct rpc_xprt *xprt)
 	return (struct sockaddr *) &xprt->addr;
 }
 
+static inline struct sockaddr *xs_m_addr(struct rpc_xprt *xprt)
+{
+    /* kernel_connect() may modify the address in contexts where NAT is
+     * performed by eBPF programs instead of iptables. Make a copy to ensure
+     * that our original address, xprt->addr, is not modified. Without this,
+     * NFS reconnects may fail if the endpoint address changes.
+     */
+	memcpy(&xprt->m_addr, &xprt->addr, xprt->addrlen);
+
+	return (struct sockaddr *) &xprt->m_addr;
+}
+
 static inline struct sockaddr_un *xs_addr_un(struct rpc_xprt *xprt)
 {
 	return (struct sockaddr_un *) &xprt->addr;
@@ -1954,7 +1966,7 @@  static int xs_local_finish_connecting(struct rpc_xprt *xprt,
 
 	xs_stream_start_connect(transport);
 
-	return kernel_connect(sock, xs_addr(xprt), xprt->addrlen, 0);
+	return kernel_connect(sock, xs_m_addr(xprt), xprt->addrlen, 0);
 }
 
 /**
@@ -2334,7 +2346,8 @@  static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
 
 	/* Tell the socket layer to start connecting... */
 	set_bit(XPRT_SOCK_CONNECTING, &transport->sock_state);
-	return kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK);
+
+	return kernel_connect(sock, xs_m_addr(xprt), xprt->addrlen, O_NONBLOCK);
 }
 
 /**