Message ID | 20210722150154.10608-1-fw@strlen.de (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Matthieu Baerts |
Headers | show |
Series | [RFC] mptcp: add MPTCP_INFO getsockopt | expand |
Hi Florian, Thank you for looking at that! On 22/07/2021 17:01, Florian Westphal wrote: > Its not compatible with mptcp.org kernel one: > 1. mptcp.org defines 'struct mptcp_info', but we already have this for > inet_diag. > > 2. 'struct mptcp_info', as defined by mptcp.org has different layout for > 32/64 bit arches. > This is a problem for 32bit binaries on 64bit kernels. > > For those reasons a new 'struct mptcp_info_opt' is added which contains > aligned_u64 fields to store the userspace buffer addresses. Thank you for the explanation, it makes sense! > 'struct mptcp_sub_info' is added as per mptcp.org kernel, but I don't > like this structure. I think 'src' and 'dst' are confusing terms. Do you prefer 'local' and 'remote'? Good time to change! > AFAICS 'src' really is 'local address' (getsockname) and 'dst' is peer > (getpeername). > > Given mptcp-next has IPPROTO_MPTCP, this adds SOL_MPTCP. > This also gives ample space to add mptcp specific sockopts. Very good idea! > In light of this, I wonder if it would make more sense to split the > functionality. Would it not start to be a bit costly to get all details if the userspace has do a few syscalls? - general info - get subflow IDs - for each suflow, call subflow info + peer/sock names With the full structure, we could eventually allow userspace programs to set some "*_len" members to 0 not to get all info, e.g. only get the general info without peer/sock names for all subflows. Or without TCP_INFO for each subflow. No? > > Examples: > > getsockopt(fd, SOL_MPTCP, SUBFLOW_ID, subflow_ids, &count); > > sa.sa_family = subflow_ids[0]; > getsockopt(fd, SOL_MPTCP, SUBFLOW_GETPEERNAME, &sa, &len); > .. > sa.sa_family = subflow_ids[0]; > getsockopt(fd, SOL_MPTCP, SUBFLOW_GETSOCKNAME, &sa, &len); > > tcp_info.tcpi_state = subflow_ids[0]; > getsockopt(fd, SOL_MPTCP, SUBFLOW_TCP_INFO, &tcp_info, &len); > > ... > > and so on. This could be interesting if there is a demand to get info only for some specific subflows :) > > Alternatively one could overload e.g. the upper 8 bit: > > getsockopt(fd, SOL_MPTCP, subflow_ids[0] << 24 | SUBFLOW_FOO, ...); Maybe a "cleaner" workaround to set the ID? :) > In any case, here is a tentative patch to add a mptcp.org-alike > MPTCP_INFO getsockopt. Thank you for sharing that! (I didn't check the code yet) Cheers, Matt
Matthieu Baerts <matthieu.baerts@tessares.net> wrote: > Hi Florian, > > Thank you for looking at that! > > On 22/07/2021 17:01, Florian Westphal wrote: > > Its not compatible with mptcp.org kernel one: > > 1. mptcp.org defines 'struct mptcp_info', but we already have this for > > inet_diag. > > > > 2. 'struct mptcp_info', as defined by mptcp.org has different layout for > > 32/64 bit arches. > > This is a problem for 32bit binaries on 64bit kernels. > > > > For those reasons a new 'struct mptcp_info_opt' is added which contains > > aligned_u64 fields to store the userspace buffer addresses. > > Thank you for the explanation, it makes sense! > > > 'struct mptcp_sub_info' is added as per mptcp.org kernel, but I don't > > like this structure. I think 'src' and 'dst' are confusing terms. > > Do you prefer 'local' and 'remote'? I thought about alternative names, I like your suggestion. > > In light of this, I wonder if it would make more sense to split the > > functionality. > > Would it not start to be a bit costly to get all details if the > userspace has do a few syscalls? > - general info > - get subflow IDs > - for each suflow, call subflow info + peer/sock names Yes, good point. Its unlikely one would ask for a specific subflow endpoint addresses. So perhaps just keep that in mind for later and go with a mptcp.org alike approach. > With the full structure, we could eventually allow userspace programs to > set some "*_len" members to 0 not to get all info, e.g. only get the > general info without peer/sock names for all subflows. Or without > TCP_INFO for each subflow. No? Yes, userspace can set the address field to 0 and the kernel will skip that part. One thing that I'd like to change is the way the length fields are handled. I propose that userspace can set the legnth field to 0 to request the kernel to fill in the required size. This won't work reliably (there could be a new subflow arriving after the query and before the second getsockopt with the "properly sized" fields. At the moment, there is no way to detect this. Perhaps the info struct should contain two size versions, one with the "this is how much I copied to the buffer" and another with "this is how much i would have copied". > > sa.sa_family = subflow_ids[0]; > > getsockopt(fd, SOL_MPTCP, SUBFLOW_GETPEERNAME, &sa, &len); > > .. > > sa.sa_family = subflow_ids[0]; > > getsockopt(fd, SOL_MPTCP, SUBFLOW_GETSOCKNAME, &sa, &len); > > > > tcp_info.tcpi_state = subflow_ids[0]; > > getsockopt(fd, SOL_MPTCP, SUBFLOW_TCP_INFO, &tcp_info, &len); > > > > ... > > > > and so on. > > This could be interesting if there is a demand to get info only for some > specific subflows :) Yes, as I wrote above your are probably right that the "give me the peer socket address for subflow 42" is a bit esoteric. > > Alternatively one could overload e.g. the upper 8 bit: > > > > getsockopt(fd, SOL_MPTCP, subflow_ids[0] << 24 | SUBFLOW_FOO, ...); > > Maybe a "cleaner" workaround to set the ID? :) It would be more consistent, at least. The "piggyback in the buffer" is very ugly... > > In any case, here is a tentative patch to add a mptcp.org-alike > > MPTCP_INFO getsockopt. > > Thank you for sharing that! > > (I didn't check the code yet) No need to have a super close look, its going to change anyway. Its also heavily lifted from the mptcp.org kernel. I will add attributions as needed once a formal patch submission is made.
On Thu, 2021-07-22 at 17:01 +0200, Florian Westphal wrote: > Its not compatible with mptcp.org kernel one: > 1. mptcp.org defines 'struct mptcp_info', but we already have this for > inet_diag. > > 2. 'struct mptcp_info', as defined by mptcp.org has different layout for > 32/64 bit arches. > This is a problem for 32bit binaries on 64bit kernels. I think the lack of binary compatibility with the OoO tree kernel is nto a big deal - and anyhow is unsolvable, so we have to consider it not a big deal ;) I think it could *possibly* be nice to tie togethar mptcp_info and mptcp_info_opt - e.g. pulling the second into the first and unifying the handling, as tcp does for diag and GET_INFO. But perhpas is just too much code/overhead?!? > For those reasons a new 'struct mptcp_info_opt' is added which contains > aligned_u64 fields to store the userspace buffer addresses. > > 'struct mptcp_sub_info' is added as per mptcp.org kernel, but I don't > like this structure. I think 'src' and 'dst' are confusing terms. > > AFAICS 'src' really is 'local address' (getsockname) and 'dst' is peer > (getpeername). +1 for 'local' and 'remote' > Given mptcp-next has IPPROTO_MPTCP, this adds SOL_MPTCP. > This also gives ample space to add mptcp specific sockopts. > > In light of this, I wonder if it would make more sense to split the > functionality. > > Examples: > > getsockopt(fd, SOL_MPTCP, SUBFLOW_ID, subflow_ids, &count); > > sa.sa_family = subflow_ids[0]; > getsockopt(fd, SOL_MPTCP, SUBFLOW_GETPEERNAME, &sa, &len); > .. > sa.sa_family = subflow_ids[0]; > getsockopt(fd, SOL_MPTCP, SUBFLOW_GETSOCKNAME, &sa, &len); > > tcp_info.tcpi_state = subflow_ids[0]; > getsockopt(fd, SOL_MPTCP, SUBFLOW_TCP_INFO, &tcp_info, &len); > > ... > > and so on. +1 to fetch all the subflow info with a single syscall +1 to pass back the user 2 sizes: one with the "this is how much I copied to the buffer" and another with "this is how much i would have copied". > Alternatively one could overload e.g. the upper 8 bit: > > getsockopt(fd, SOL_MPTCP, subflow_ids[0] << 24 | SUBFLOW_FOO, ...); What about using: struct mptcp_subflow_info { u32 id; struct tcp_info info; }; mptcp_subflow_info.id = id; getsockopt(fd, SOL_MPTCP, &mptcp_subflow_info, ...); ? Cheers, Paolo
On 23/07/2021 10:49, Paolo Abeni wrote: > On Thu, 2021-07-22 at 17:01 +0200, Florian Westphal wrote: (...) > +1 to pass back the user 2 sizes: one with > the "this is how much I copied to the buffer" and another with > "this is how much i would have copied". +1 >> Alternatively one could overload e.g. the upper 8 bit: >> >> getsockopt(fd, SOL_MPTCP, subflow_ids[0] << 24 | SUBFLOW_FOO, ...); > > What about using: > > struct mptcp_subflow_info { > u32 id; > struct tcp_info info; > }; > > mptcp_subflow_info.id = id; > getsockopt(fd, SOL_MPTCP, &mptcp_subflow_info, ...); Indeed, clearer if we can do that :) Cheers, Matt
diff --git a/include/linux/socket.h b/include/linux/socket.h index 0d8e3dcb7f88..af853eadadcb 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -360,6 +360,7 @@ struct ucred { #define SOL_KCM 281 #define SOL_TLS 282 #define SOL_XDP 283 +#define SOL_MPTCP 284 /* IPX options */ #define IPX_TYPE 1 diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h index 7b05f7102321..35d90f08b376 100644 --- a/include/uapi/linux/mptcp.h +++ b/include/uapi/linux/mptcp.h @@ -4,6 +4,7 @@ #include <linux/const.h> #include <linux/types.h> +#include <linux/in.h> #define MPTCP_SUBFLOW_FLAG_MCAP_REM _BITUL(0) #define MPTCP_SUBFLOW_FLAG_MCAP_LOC _BITUL(1) @@ -192,4 +193,33 @@ enum mptcp_event_attr { #define MPTCP_RST_EBADPERF 5 #define MPTCP_RST_EMIDDLEBOX 6 +struct mptcp_info_opt { + __u32 tcp_info_len; /* Length of each struct tcp_info in subflows pointer */ + __u32 sub_len; /* Total length of memory pointed to by subflows pointer */ + __u32 meta_info_len; /* Length of memory pointed to by meta_info */ + __u32 sub_info_len; /* Length of each struct mptcp_sub_info in subflow_info pointer */ + __u32 total_sub_info_len; /* Total length of memory pointed to by subflow_info */ + + __aligned_u64 meta_info; + __aligned_u64 initial; + __aligned_u64 subflows; + __aligned_u64 subflow_info; /* mptcp_sub_info[] */ +}; + +struct mptcp_sub_info { + union { + struct sockaddr src; + struct sockaddr_in src_v4; + struct sockaddr_in6 src_v6; + }; + union { + struct sockaddr dst; + struct sockaddr_in dst_v4; + struct sockaddr_in6 dst_v6; + }; +}; + +/* MPTCP socket options */ +#define MPTCP_INFO 1 + #endif /* _UAPI_MPTCP_H */ diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c index 4c68ce14843a..a78302750578 100644 --- a/net/mptcp/sockopt.c +++ b/net/mptcp/sockopt.c @@ -668,6 +668,190 @@ void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info) } EXPORT_SYMBOL_GPL(mptcp_diag_fill_info); +static int mptcp_get_diaginfo(struct mptcp_sock *msk, struct mptcp_info_opt *m_info) +{ + unsigned int user_len, len; + struct mptcp_info __info; + + memset(&__info, 0, sizeof(__info)); + + mptcp_diag_fill_info(msk, &__info); + + user_len = m_info->meta_info_len; + len = min_t(unsigned int, user_len, sizeof(__info)); + m_info->meta_info_len = len; + + if (copy_to_user((void __user *)m_info->meta_info, &__info, len)) + return -EFAULT; + + return 0; +} + +static void mptcp_get_sub_info(const struct sock *sk, struct mptcp_sub_info *info) +{ + struct inet_sock *inet = inet_sk(sk); + + memset(info, 0, sizeof(*info)); + + if (sk->sk_family == AF_INET) { + info->src_v4.sin_family = AF_INET; + info->src_v4.sin_port = inet->inet_sport; + + info->src_v4.sin_addr.s_addr = inet->inet_rcv_saddr; + if (!info->src_v4.sin_addr.s_addr) + info->src_v4.sin_addr.s_addr = inet->inet_saddr; + + info->dst_v4.sin_family = AF_INET; + info->dst_v4.sin_port = inet->inet_dport; + info->dst_v4.sin_addr.s_addr = inet->inet_daddr; +#if IS_ENABLED(CONFIG_IPV6) + } else { + struct ipv6_pinfo *np = inet6_sk(sk); + + info->src_v6.sin6_family = AF_INET6; + info->src_v6.sin6_port = inet->inet_sport; + + if (ipv6_addr_any(&sk->sk_v6_rcv_saddr)) + info->src_v6.sin6_addr = np->saddr; + else + info->src_v6.sin6_addr = sk->sk_v6_rcv_saddr; + + info->dst_v6.sin6_family = AF_INET6; + info->dst_v6.sin6_port = inet->inet_dport; + info->dst_v6.sin6_addr = sk->sk_v6_daddr; +#endif + } +} + +static int mptcp_get_info(struct mptcp_sock *msk, struct mptcp_info_opt *m_info) +{ + unsigned int tcp_info_len; + struct tcp_info info; + int err; + + if (m_info->meta_info) { + err = mptcp_get_diaginfo(msk, m_info); + + if (err) + return err; + } + + tcp_info_len = min_t(unsigned int, m_info->tcp_info_len, sizeof(struct tcp_info)); + m_info->tcp_info_len = tcp_info_len; + + if (m_info->initial) { + struct sock *ssk = msk->first; + + if (ssk) { + tcp_get_info(ssk, &info); + } else { + memset(&info, 0, sizeof(info)); + } + + if (copy_to_user((void __user *)m_info->initial, &info, tcp_info_len)) + return -EFAULT; + } + + if (m_info->subflows) { + const struct mptcp_subflow_context *subflow; + struct sock *sk = &msk->sk.icsk_inet.sk; + unsigned int sub_len = 0, len; + char __user *ptr; + + lock_sock(sk); + + ptr = (char __user *)m_info->subflows; + len = m_info->sub_len; + + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + unsigned int tmp_len; + + tcp_get_info(ssk, &info); + + tmp_len = min_t(unsigned int, len, tcp_info_len); + sub_len -= tcp_info_len; + + if (copy_to_user(ptr, &info, tmp_len)) + return -EFAULT; + + ptr += tmp_len; + sub_len += tmp_len; + + if (len == 0) + break; + } + + release_sock(sk); + + m_info->sub_len = sub_len; + } + + if (m_info->subflow_info) { + unsigned int len, sub_info_len, total_sub_info_len = 0; + const struct mptcp_subflow_context *subflow; + char __user *ptr; + + ptr = (char __user *)m_info->subflow_info; + len = m_info->total_sub_info_len; + + sub_info_len = min_t(unsigned int, m_info->sub_info_len, + sizeof(struct mptcp_sub_info)); + m_info->sub_info_len = sub_info_len; + + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + struct mptcp_sub_info m_sub_info; + unsigned int tmp_len; + + mptcp_get_sub_info(ssk, &m_sub_info); + + tmp_len = min_t(unsigned int, len, sub_info_len); + len -= tmp_len; + + if (copy_to_user(ptr, &m_sub_info, tmp_len)) + return -EFAULT; + + ptr += tmp_len; + total_sub_info_len += tmp_len; + + if (len == 0) + break; + } + + m_info->total_sub_info_len = total_sub_info_len; + } + + return 0; +} + +static int mptcp_getsockopt_info(struct mptcp_sock *msk, char __user *optval, int __user *_u_optlen) +{ + struct mptcp_info_opt m_info; + int ret, len; + + if (get_user(len, _u_optlen)) + return -EFAULT; + + len = min_t(unsigned int, len, sizeof(struct mptcp_info_opt)); + + memset(&m_info, 0, sizeof(m_info)); + + if (copy_from_user(&m_info, optval, len)) + return -EFAULT; + + ret = mptcp_get_info(msk, &m_info); + if (ret) + return ret; + + if (put_user(len, _u_optlen)) + return -EFAULT; + + if (copy_to_user(optval, &m_info, len)) + return -EFAULT; + return 0; +} + static int mptcp_getsockopt_sol_tcp(struct mptcp_sock *msk, int optname, char __user *optval, int __user *optlen) { @@ -682,6 +866,17 @@ static int mptcp_getsockopt_sol_tcp(struct mptcp_sock *msk, int optname, return -EOPNOTSUPP; } +static int mptcp_getsockopt_sol_mptcp(struct mptcp_sock *msk, int optname, + char __user *optval, int __user *optlen) +{ + switch (optname) { + case MPTCP_INFO: + return mptcp_getsockopt_info(msk, optval, optlen); + } + + return -EOPNOTSUPP; +} + int mptcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *option) { @@ -704,6 +899,8 @@ int mptcp_getsockopt(struct sock *sk, int level, int optname, if (level == SOL_TCP) return mptcp_getsockopt_sol_tcp(msk, optname, optval, option); + if (level == SOL_MPTCP) + return mptcp_getsockopt_sol_mptcp(msk, optname, optval, option); return -EOPNOTSUPP; }
Its not compatible with mptcp.org kernel one: 1. mptcp.org defines 'struct mptcp_info', but we already have this for inet_diag. 2. 'struct mptcp_info', as defined by mptcp.org has different layout for 32/64 bit arches. This is a problem for 32bit binaries on 64bit kernels. For those reasons a new 'struct mptcp_info_opt' is added which contains aligned_u64 fields to store the userspace buffer addresses. 'struct mptcp_sub_info' is added as per mptcp.org kernel, but I don't like this structure. I think 'src' and 'dst' are confusing terms. AFAICS 'src' really is 'local address' (getsockname) and 'dst' is peer (getpeername). Given mptcp-next has IPPROTO_MPTCP, this adds SOL_MPTCP. This also gives ample space to add mptcp specific sockopts. In light of this, I wonder if it would make more sense to split the functionality. Examples: getsockopt(fd, SOL_MPTCP, SUBFLOW_ID, subflow_ids, &count); sa.sa_family = subflow_ids[0]; getsockopt(fd, SOL_MPTCP, SUBFLOW_GETPEERNAME, &sa, &len); .. sa.sa_family = subflow_ids[0]; getsockopt(fd, SOL_MPTCP, SUBFLOW_GETSOCKNAME, &sa, &len); tcp_info.tcpi_state = subflow_ids[0]; getsockopt(fd, SOL_MPTCP, SUBFLOW_TCP_INFO, &tcp_info, &len); ... and so on. Alternatively one could overload e.g. the upper 8 bit: getsockopt(fd, SOL_MPTCP, subflow_ids[0] << 24 | SUBFLOW_FOO, ...); In any case, here is a tentative patch to add a mptcp.org-alike MPTCP_INFO getsockopt. It depends on a tiny extra patch to move part of inet_diag to the core. Signed-off-by: Florian Westphal <fw@strlen.de> --- include/linux/socket.h | 1 + include/uapi/linux/mptcp.h | 30 ++++++ net/mptcp/sockopt.c | 197 +++++++++++++++++++++++++++++++++++++ 3 files changed, 228 insertions(+)