Message ID | 20180118030634.GY13338@ZenIV.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jan 17, 2018 at 7:06 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Similar to the way put_cmsg() handles 32bit case on biarch > targets, introduce a flag telling put_cmsg() to treat > ->msg_control as kernel pointer, using memcpy instead of > copy_to_user(). That allows to avoid the use of kernel_recvmsg() > with its set_fs(). If this is the only case that kernel_recvmsg() exists for, then by all means, that patch certainly looks like a good thing. Linus
On Wed, Jan 17, 2018 at 07:16:02PM -0800, Linus Torvalds wrote: > On Wed, Jan 17, 2018 at 7:06 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Similar to the way put_cmsg() handles 32bit case on biarch > > targets, introduce a flag telling put_cmsg() to treat > > ->msg_control as kernel pointer, using memcpy instead of > > copy_to_user(). That allows to avoid the use of kernel_recvmsg() > > with its set_fs(). > > If this is the only case that kernel_recvmsg() exists for, then by all > means, that patch certainly looks like a good thing. In -next that's the only remaining caller. kernel_recvmsg() is { mm_segment_t oldfs = get_fs(); int result; iov_iter_kvec(&msg->msg_iter, READ | ITER_KVEC, vec, num, size); set_fs(KERNEL_DS); result = sock_recvmsg(sock, msg, flags); set_fs(oldfs); return result; } and iov_iter_kvec(&msg->msg_iter, READ | ITER_KVEC, vec, num, size); result = sock_recvmsg(sock, msg, flags); works just fine for copying the data - that gets handled by copy_to_iter() and copy_page_to_iter(). Those don't care about set_fs(); the trouble with sunrpc call site is that we want to fill msg_control-pointed kernel object. That gets copied by put_cmsg(). We could turn ->msg_control/->msg_controllen into another iov_iter, but seeing that we never do scatter-gather for those IMO that would be a massive overkill. A flag controlling whether ->msg_control is kernel or userland pointer would do, especially since we already have a flag for "do we want a native or compat layout for cmsg" in there. That's the only caller we need it for, but that thing looks cheap enough. Obviously needs to pass testing, including "is it too ugly to live as far as Davem is concerned" test, though...
> We could turn ->msg_control/->msg_controllen into another > iov_iter, but seeing that we never do scatter-gather for those > IMO that would be a massive overkill. A flag controlling whether > ->msg_control is kernel or userland pointer would do, especially > since we already have a flag for "do we want a native or compat > layout for cmsg" in there. While your current hack seems like a nice short term improvement I think we need an iov_iter or iov_iter-light there in the long run. Same for ioctl so that we can pass properly typed kernel or user buffers through without all these set_fs hacks.
On Thu, Jan 18, 2018 at 08:29:57AM -0800, Christoph Hellwig wrote: > > We could turn ->msg_control/->msg_controllen into another > > iov_iter, but seeing that we never do scatter-gather for those > > IMO that would be a massive overkill. A flag controlling whether > > ->msg_control is kernel or userland pointer would do, especially > > since we already have a flag for "do we want a native or compat > > layout for cmsg" in there. > > While your current hack seems like a nice short term improvement > I think we need an iov_iter or iov_iter-light there in the long run. For one caller in the entire history of the kernel? > Same for ioctl so that we can pass properly typed kernel or user > buffers through without all these set_fs hacks. Umm... Most of the PITA with ioctls is due to compat ones being reformatted for native and fed under set_fs(). I actually have a series dealing with most of such places for net ioctls. Sure, there's also ioctl_by_bdev(), but for those we might be better off exposing the things like ->get_last_session() and its ilk to filesystems that want to deal with cdroms... It's kernel_setsockopt() that is the real PITA...
On Thu, Jan 18, 2018 at 04:43:02AM +0000, Al Viro wrote: > We could turn ->msg_control/->msg_controllen into another > iov_iter, but seeing that we never do scatter-gather for those > IMO that would be a massive overkill. A flag controlling whether > ->msg_control is kernel or userland pointer would do, especially > since we already have a flag for "do we want a native or compat > layout for cmsg" in there. > > That's the only caller we need it for, but that thing looks cheap > enough. Obviously needs to pass testing, including "is it too ugly to > live as far as Davem is concerned" test, though... BTW, there's another series of set_fs-removal patches in net ioctls; still needs review, though. With that one we would be down to 11 instances in the entire net/*: * SO_RCVTIMEO/SO_SNDTIMEO handling in compat [sg]etsockopt() * passing SIOC{ADD,DEL}TUNNEL down (ipmr_del_tunnel(),ipmr_new_tunnel(), addrconf_set_dstaddr()) * SIOCGSTAMP/SIOCGSTAMPNS in compat ioctls * SIOCADDRT/SIOCDELRT in compat ioctls * kernel_[gs]etsockopt() * ipv6_renew_options_kern() I don't know if all of that stuff can be realistically done without set_fs(). kernel_setsockopt(), in particular, is unpleasant... The patches need review and testing, obviously; I'll post them in followups, the entire series (on top of net/master) is in git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.net-ioctl
On Thu, Jan 18, 2018 at 07:31:56PM +0000, Al Viro wrote: > * SO_RCVTIMEO/SO_SNDTIMEO handling in compat [sg]etsockopt() > * passing SIOC{ADD,DEL}TUNNEL down (ipmr_del_tunnel(),ipmr_new_tunnel(), > addrconf_set_dstaddr()) > * SIOCGSTAMP/SIOCGSTAMPNS in compat ioctls > * SIOCADDRT/SIOCDELRT in compat ioctls > * kernel_[gs]etsockopt() > * ipv6_renew_options_kern() > > I don't know if all of that stuff can be realistically done without set_fs(). > kernel_setsockopt(), in particular, is unpleasant... Speaking of weird indirect calls: in net/packet/af_packet.c:packet_ioctl() we have this: #ifdef CONFIG_INET case SIOCADDRT: case SIOCDELRT: case SIOCDARP: case SIOCGARP: case SIOCSARP: case SIOCGIFADDR: case SIOCSIFADDR: case SIOCGIFBRDADDR: case SIOCSIFBRDADDR: case SIOCGIFNETMASK: case SIOCSIFNETMASK: case SIOCGIFDSTADDR: case SIOCSIFDSTADDR: case SIOCSIFFLAGS: return inet_dgram_ops.ioctl(sock, cmd, arg); #endif That's inet_ioctl(sock, cmd, arg) disguised by indirect. AFAICS, that line dates back to 2.1.89; back then inet_dgram_ops had been exported and inet_ioctl() had been static. When SCTP went in they'd exported inet_ioctl() rather than playing that kind of games. Is there anything subtle I'm missing here that would make it wrong to replace that with explicit call of inet_ioctl()?
On Thu, Jan 18, 2018 at 07:31:56PM +0000, Al Viro wrote:
> * SIOCADDRT/SIOCDELRT in compat ioctls
To bring back a question I'd asked back in October - what do
we do about SIOC...RT compat?
To recap:
* AF_INET sockets expect struct rtentry; it differs
between 32bit and 64bit, so routing_ioctl() in net/socket.c
is called from compat_sock_ioctl_trans() and does the right
thing. All proto_ops instances with .family = PF_INET (and
only they) have inet_ioctl() as ->ioctl(), and end up with
ip_rt_ioctl() called for native ones. Three of those have
->compat_ioctl() set to inet_compat_ioctl(), the rest have
it NULL. In any case, inet_compat_ioctl() ignores those,
leaving them to compat_sock_ioctl_trans() to pick up.
* for AF_INET6 the situation is similar, except that
they use struct in6_rtmsg. Compat is also dealt with in
routing_ioctl(). inet6_ioctl() for all such proto_ops
(and only those), ipv6_route_ioctl() is what ends up
handling the native ones. No ->compat_ioctl() in any
of those.
* AF_PACKET sockets expect struct rt_entry and
actually bounce the native calls to inet_ioctl(). No
->compat_ioctl() there, but routing_ioctl() in net/socket.c
does the right thing.
* AF_APPLETALK sockets expect struct rt_entry.
Native handled in atrtr_ioctl(); there is ->compat_ioctl(),
but it ignores those ioctls, so we go through the conversion
in net/socket.c. Also happens to work correctly.
* ax25, ipx, netrom, rose and x25 use structures
of their own, and those structures have identical layouts on
32bit and 64bit. x25 has ->compat_ioctl() that does the
right thing (bounces to native), the rest either have
->compat_ioctl() ignoring those ioctls (ipx) or do not
have ->compat_ioctl() at all. That ends up with generic
code picking those and buggering them up - routing_ioctl()
assumes that we want either in6_rtmsg (ipv6) or rtentry
(everything else). Unfortunately, in case of these
protocols we should just leave the suckers alone.
Back then Ralf has verified that the bug exists
and said he'd put together a fix. Looks like that fix
has fallen through the cracks, though.
* all other protocols fail those; usually with
ENOTTY, except for AF_QIPCRTR that fails with EINVAL.
Either way, compat is not an issue.
Note that handling of SIOCADDRT on e.g. raw ipv4
sockets from 32bit process is convoluted as hell. The
call chain is
compat_sys_ioctl()
compat_sock_ioctl()
inet_compat_ioctl()
compat_raw_ioctl()
=> -ENOIOCTLCMD, possibly
by way of ipmr_compat_ioctl()
compat_sock_ioctl_trans()
routing_ioctl() [conversion done here]
sock_do_ioctl()
inet_ioctl()
ip_rt_ioctl()
A lot of those are method calls, BTW, and the overhead on those has
just grown...
Does anybody have objections against the following?
1) Somewhere in net/core (or net/compat.c, for that matter) add
int compat_get_rtentry(struct rtentry *r, struct rtentry32 __user *p);
2) In inet_compat_ioctl() recognize SIOC{ADD,DEL}RT and do
err = compat_get_rtentry(&r, (void __user *)arg);
if (!err)
err = ip_rt_ioctl(...)
return err;
3) Add inet_compat_ioctl() as ->compat_ioctl in all PF_INET proto_ops.
4) Lift copyin from atrtr_ioctl() to atalk_ioctl(), teach
atalk_compat_ioctl() about these ioctls (using compat_get_rtentry()
and atrtr_ioctl(), that is).
5) Add ->compat_ioctl() to AF_PACKET, let it just call inet_compat_ioctl()
for those two.
6) Lift copyin from ipv6_route_ioctl() to inet6_ioctl().
Add inet6_compat_ioctl() that would recognize those two, do compat copyin
and call ipv6_route_ioctl(). Make it ->compat_ioctl for all PF_INET6
proto_ops.
7) Tell compat_sock_ioctl_trans() to move these two into the "just call
sock_do_ioctl()" group of cases. Or, with Ralf's fix, just remove these
two cases from compat_sock_ioctl_trans() completely. Either way,
routing_ioctl() becomes dead code and can be removed.
diff --git a/include/linux/socket.h b/include/linux/socket.h index 9286a5a8c60c..60947da9c806 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -298,6 +298,7 @@ struct ucred { #else #define MSG_CMSG_COMPAT 0 /* We never have 32 bit fixups */ #endif +#define MSG_CMSG_KERNEL 0x10000000 /* Setsockoptions(2) level. Thanks to BSD these must match IPPROTO_xxx */ diff --git a/net/core/scm.c b/net/core/scm.c index b1ff8a441748..1b73b682e441 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -237,10 +237,17 @@ int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data) cmhdr.cmsg_len = cmlen; err = -EFAULT; - if (copy_to_user(cm, &cmhdr, sizeof cmhdr)) - goto out; - if (copy_to_user(CMSG_DATA(cm), data, cmlen - sizeof(struct cmsghdr))) - goto out; + if (unlikely(MSG_CMSG_KERNEL & msg->msg_flags)) { + struct cmsghdr *p = msg->msg_control; + memcpy(p, &cmhdr, sizeof cmhdr); + memcpy(CMSG_DATA(p), data, cmlen - sizeof(struct cmsghdr)); + } else { + if (copy_to_user(cm, &cmhdr, sizeof cmhdr)) + goto out; + if (copy_to_user(CMSG_DATA(cm), data, + cmlen - sizeof(struct cmsghdr))) + goto out; + } cmlen = CMSG_SPACE(len); if (msg->msg_controllen < cmlen) cmlen = msg->msg_controllen; diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 5570719e4787..774904f67c93 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -545,7 +545,7 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp) .msg_name = svc_addr(rqstp), .msg_control = cmh, .msg_controllen = sizeof(buffer), - .msg_flags = MSG_DONTWAIT, + .msg_flags = MSG_DONTWAIT | MSG_CMSG_KERNEL, }; size_t len; int err; @@ -565,8 +565,7 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp) clear_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); skb = NULL; - err = kernel_recvmsg(svsk->sk_sock, &msg, NULL, - 0, 0, MSG_PEEK | MSG_DONTWAIT); + err = sock_recvmsg(svsk->sk_sock, &msg, MSG_PEEK | MSG_DONTWAIT); if (err >= 0) skb = skb_recv_udp(svsk->sk_sk, 0, 1, &err);