diff mbox series

[CFT,experimental] net/socket.c: use straight fdget/fdput (resend)

Message ID 20240526034506.GZ2118490@ZenIV (mailing list archive)
State New
Headers show
Series [CFT,experimental] net/socket.c: use straight fdget/fdput (resend) | expand

Commit Message

Al Viro May 26, 2024, 3:45 a.m. UTC
[sigh... turns out that my .muttrc had no netdev alias ;-/]

Checking the theory that the important part in sockfd_lookup_light() is
avoiding needless file refcount operations, not the marginal reduction
of the register pressure from not keeping a struct file pointer in the
caller.

If that's true, we should get the same benefits from straight fdget()/fdput().
And AFAICS with sane use of CLASS(fd) we can get a better code generation...

Would be nice if somebody tested it on networking test suites (including
benchmarks)...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---


----- End forwarded message -----

Comments

Linus Torvalds May 26, 2024, 5:07 a.m. UTC | #1
On Sat, 25 May 2024 at 20:45, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Checking the theory that the important part in sockfd_lookup_light() is
> avoiding needless file refcount operations, not the marginal reduction
> of the register pressure from not keeping a struct file pointer in the
> caller.

Yeah, the register pressure thing is not likely an issue. That said, I
think we can fix it.

> If that's true, we should get the same benefits from straight fdget()/fdput().

Agreed.

That said, your patch is just horrendous.

> +static inline bool fd_empty(struct fd f)
> +{
> +       // better code with gcc that way
> +       return unlikely(!(f.flags | (unsigned long)f.file));
> +}

Ok, this is disgusting. I went all "WTF?"

And then looked at why you would say that testing two different fields
in a 'struct fd' would possibly be better than just checking one.

And I see why that's the case - it basically short-circuits the
(inlined) __to_fd() logic, and the compiler can undo it and look at
the original single-word value.

But it's still disgusting. If there is anything else in between, the
compiler wouldn't then notice any more.

What we *really* should do is have 'struct fd' just *be* that
single-word value, never expand it to two words at all, and instead of
doing "fd.file" we'd do "fd_file(fd)" and "fd_flags(fd)".

Maybe it's not too late to do that?

This is *particularly* true for the socket code that doesn't even want
the 'struct file *' at all outside of that "check that it's a socket,
then turn it into a socket pointer". So _particularly_ in that
context, having a "fd_file()" helper to do the (trivial) unpacking
would work very well.

But even without changing 'struct fd', maybe we could just have
"__fdget()" and friends not return a "unsigned long", but a "struct
rawfd".

Which would is a struct with an unsigned long.

There's actually a possible future standard C++ extension for what we
want to do ("C++ tagged pointers") and while it might make it into C
eventually, we'd have to do it manully with ugly inline helpers (LLVM
does it manually in C++ with a PointerIntPair class).

IOW, we could do something like the attached. I think it's actually
almost a cleanup, and now your socket things can use "struct rawfd"
and that fd_empty() becomes

   static inline bool rawfd_empty(struct rawfd raw)
   { return !raw.word; }

instead. Still somewhat disgusting, but now it's a "C doesn't have
tagged pointers, so we do this by hand" _understandable_ disgusting.

Hmm? The attached patch compiles. It looks "ObviouslyCorrect(tm)". But
it might be "TotallyBroken(tm)". You get the idea.

               Linus
Linus Torvalds May 26, 2024, 5:31 a.m. UTC | #2
On Sat, 25 May 2024 at 22:07, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> IOW, we could do something like the attached. I think it's actually
> almost a cleanup [..]

The reason I say that is this part of the patch:

  -static inline struct fd __to_fd(unsigned long v)
  +static inline struct fd __to_fd(struct rawfd raw)
   {
  -       return (struct fd){(struct file *)(v & ~3),v & 3};
  +       return (struct fd){fdfile(raw),fdflags(raw)};
   }

which I think actually improves on our current situation.

No, my fdfile/fdflags functions aren't pretty, but they have more type
safety than the bare "unsigned long", and in that "__to_fd()" case I
think it actually makes that unpacking operation much more obvious.

It might be good to have the reverse "packing" helper inline function
too, so that __fdget() wouldn't do this:

                return (struct rawfd) { FDPUT_FPUT | (unsigned long)file };

but instead have some kind of "mkrawfd()" helper that does the above,
and we'd have

                return mkrawfd(file, FDPUT_FPUT);

instead. That would obviate my EMPTY_RAWFD macro, and we'd just use
"mkrawfd(NULL, 0)"?

Maybe this is bikeshedding. But I'd rather have this kind of explicit
one-word interface with a wrapped tagged pointer than have something
as subtle as your fd_empty() that magically generates better code in
certain very specific circumstances.

                     Linus
Al Viro May 26, 2024, 7:27 p.m. UTC | #3
On Sat, May 25, 2024 at 10:07:39PM -0700, Linus Torvalds wrote:
> On Sat, 25 May 2024 at 20:45, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Checking the theory that the important part in sockfd_lookup_light() is
> > avoiding needless file refcount operations, not the marginal reduction
> > of the register pressure from not keeping a struct file pointer in the
> > caller.
> 
> Yeah, the register pressure thing is not likely an issue. That said, I
> think we can fix it.
> 
> > If that's true, we should get the same benefits from straight fdget()/fdput().
> 
> Agreed.
> 
> That said, your patch is just horrendous.
> 
> > +static inline bool fd_empty(struct fd f)
> > +{
> > +       // better code with gcc that way
> > +       return unlikely(!(f.flags | (unsigned long)f.file));
> > +}
> 
> Ok, this is disgusting. I went all "WTF?"
> 
> And then looked at why you would say that testing two different fields
> in a 'struct fd' would possibly be better than just checking one.
> 
> And I see why that's the case - it basically short-circuits the
> (inlined) __to_fd() logic, and the compiler can undo it and look at
> the original single-word value.

Not really.  The real reason is different - there is a constraint on
possible values of struct fd.  No valid instance can ever have NULL
file and non-zero flags.

The usual pattern is this:

	fd = something;
	if (!fd.file)
		return -EBADF;
	// we know that fd.file != NULL
	....
	fdput(fd);
	return something_else;

Switch to __cleanup would expand into something like
	fd = something;
	if (!fd.file) {
		fdput(fd);
		return -EBADF;
	}
	// we know that fd.file != NULL
	...
	fdput(fd);
	return something_else;

Look at the early exit there in more details:
	if (!fd.file) {
		if (fd.flags & FDPUT_FPUT)
			fput(fd.file);	// would oops
		return -EBADF;
	}
It *is* correct, but only because NULL fd.file can only happen
with zero fd.flags, so this fput() call is a dead code.

Boths your and mine variants are equivalent to this:
	if (!fd.file && !fd.flags) {
		fpdut(fd); 	// compiler will eliminate it
		return -EBADF;
	}
	// we assume that fd.file is non-NULL now
	...
	fdput(fd);
	return something_else;
and IMO both attack the problem at the wrong place.  If we
keep the check as it is (NULL fd.file) and turn fdput() into
	if ((fd.flags & FDPUT_FPUT) && fd.file)
		fput(fd.file);
we will get obviously correct behaviour with good code generation -
pretty much all current users of fdput() are downstream of
the check that fd.file is non-NULL and gcc will be able to optimize
the second part of condition away.

IOW, we don't disturb the existing explicit callers and the
only implicit ones (from __cleanup()) that might get an extra
check are the ones on failure exits between getting fd and
checking it to be empty.  Not a lot of those...

How about the following?

commit b3692a6e34e9dcfb1d9705782bc928bd04d5640d
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sat May 25 23:32:20 2024 -0400

    [experimental] net/socket.c: use straight fdget/fdput
    
    Checking the theory that the important part in sockfd_lookup_light() is
    avoiding needless file refcount operations, not the marginal reduction
    of the register pressure from not keeping a struct file pointer in the
    caller.
    
    If that's true, we should get the same benefits from straight fdget()/fdput().
    And AFAICS with sane use of CLASS(fd) we can get a better code generation...
    
    Would be nice if somebody tested it on networking test suites (including
    benchmarks)...
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/include/linux/file.h b/include/linux/file.h
index 45d0f4800abd..c482c1e39ebc 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -29,12 +29,6 @@ extern struct file *alloc_file_pseudo_noaccount(struct inode *, struct vfsmount
 extern struct file *alloc_file_clone(struct file *, int flags,
 	const struct file_operations *);
 
-static inline void fput_light(struct file *file, int fput_needed)
-{
-	if (fput_needed)
-		fput(file);
-}
-
 struct fd {
 	struct file *file;
 	unsigned int flags;
@@ -44,7 +38,7 @@ struct fd {
 
 static inline void fdput(struct fd fd)
 {
-	if (fd.flags & FDPUT_FPUT)
+	if ((fd.flags & FDPUT_FPUT) && fd.file)
 		fput(fd.file);
 }
 
@@ -76,6 +70,11 @@ static inline struct fd fdget_pos(int fd)
 	return __to_fd(__fdget_pos(fd));
 }
 
+static inline bool fd_empty(struct fd f)
+{
+	return unlikely(!f.file);
+}
+
 static inline void fdput_pos(struct fd f)
 {
 	if (f.flags & FDPUT_POS_UNLOCK)
diff --git a/net/socket.c b/net/socket.c
index e416920e9399..297293797df2 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -510,7 +510,7 @@ static int sock_map_fd(struct socket *sock, int flags)
 
 struct socket *sock_from_file(struct file *file)
 {
-	if (file->f_op == &socket_file_ops)
+	if (likely(file->f_op == &socket_file_ops))
 		return file->private_data;	/* set in sock_alloc_file */
 
 	return NULL;
@@ -550,24 +550,6 @@ struct socket *sockfd_lookup(int fd, int *err)
 }
 EXPORT_SYMBOL(sockfd_lookup);
 
-static struct socket *sockfd_lookup_light(int fd, int *err, int *fput_needed)
-{
-	struct fd f = fdget(fd);
-	struct socket *sock;
-
-	*err = -EBADF;
-	if (f.file) {
-		sock = sock_from_file(f.file);
-		if (likely(sock)) {
-			*fput_needed = f.flags & FDPUT_FPUT;
-			return sock;
-		}
-		*err = -ENOTSOCK;
-		fdput(f);
-	}
-	return NULL;
-}
-
 static ssize_t sockfs_listxattr(struct dentry *dentry, char *buffer,
 				size_t size)
 {
@@ -1834,23 +1816,25 @@ int __sys_bind(int fd, struct sockaddr __user *umyaddr, int addrlen)
 {
 	struct socket *sock;
 	struct sockaddr_storage address;
-	int err, fput_needed;
-
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (sock) {
-		err = move_addr_to_kernel(umyaddr, addrlen, &address);
-		if (!err) {
-			err = security_socket_bind(sock,
-						   (struct sockaddr *)&address,
-						   addrlen);
-			if (!err)
-				err = READ_ONCE(sock->ops)->bind(sock,
-						      (struct sockaddr *)
-						      &address, addrlen);
-		}
-		fput_light(sock->file, fput_needed);
-	}
-	return err;
+	CLASS(fd, f)(fd);
+	int err;
+
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
+
+	err = move_addr_to_kernel(umyaddr, addrlen, &address);
+	if (unlikely(err))
+		return err;
+
+	err = security_socket_bind(sock, (struct sockaddr *)&address, addrlen);
+	if (unlikely(err))
+		return err;
+
+	return READ_ONCE(sock->ops)->bind(sock,
+					  (struct sockaddr *)&address, addrlen);
 }
 
 SYSCALL_DEFINE3(bind, int, fd, struct sockaddr __user *, umyaddr, int, addrlen)
@@ -1867,21 +1851,24 @@ SYSCALL_DEFINE3(bind, int, fd, struct sockaddr __user *, umyaddr, int, addrlen)
 int __sys_listen(int fd, int backlog)
 {
 	struct socket *sock;
-	int err, fput_needed;
+	CLASS(fd, f)(fd);
 	int somaxconn;
+	int err;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (sock) {
-		somaxconn = READ_ONCE(sock_net(sock->sk)->core.sysctl_somaxconn);
-		if ((unsigned int)backlog > somaxconn)
-			backlog = somaxconn;
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
-		err = security_socket_listen(sock, backlog);
-		if (!err)
-			err = READ_ONCE(sock->ops)->listen(sock, backlog);
+	somaxconn = READ_ONCE(sock_net(sock->sk)->core.sysctl_somaxconn);
+	if ((unsigned int)backlog > somaxconn)
+		backlog = somaxconn;
+
+	err = security_socket_listen(sock, backlog);
+	if (!err)
+		err = READ_ONCE(sock->ops)->listen(sock, backlog);
 
-		fput_light(sock->file, fput_needed);
-	}
 	return err;
 }
 
@@ -1992,17 +1979,12 @@ static int __sys_accept4_file(struct file *file, struct sockaddr __user *upeer_s
 int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr,
 		  int __user *upeer_addrlen, int flags)
 {
-	int ret = -EBADF;
-	struct fd f;
+	CLASS(fd, f)(fd);
 
-	f = fdget(fd);
-	if (f.file) {
-		ret = __sys_accept4_file(f.file, upeer_sockaddr,
+	if (fd_empty(f))
+		return -EBADF;
+	return __sys_accept4_file(f.file, upeer_sockaddr,
 					 upeer_addrlen, flags);
-		fdput(f);
-	}
-
-	return ret;
 }
 
 SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user *, upeer_sockaddr,
@@ -2054,20 +2036,18 @@ int __sys_connect_file(struct file *file, struct sockaddr_storage *address,
 
 int __sys_connect(int fd, struct sockaddr __user *uservaddr, int addrlen)
 {
-	int ret = -EBADF;
-	struct fd f;
+	struct sockaddr_storage address;
+	CLASS(fd, f)(fd);
+	int ret;
 
-	f = fdget(fd);
-	if (f.file) {
-		struct sockaddr_storage address;
+	if (fd_empty(f))
+		return -EBADF;
 
-		ret = move_addr_to_kernel(uservaddr, addrlen, &address);
-		if (!ret)
-			ret = __sys_connect_file(f.file, &address, addrlen, 0);
-		fdput(f);
-	}
+	ret = move_addr_to_kernel(uservaddr, addrlen, &address);
+	if (ret)
+		return ret;
 
-	return ret;
+	return __sys_connect_file(f.file, &address, addrlen, 0);
 }
 
 SYSCALL_DEFINE3(connect, int, fd, struct sockaddr __user *, uservaddr,
@@ -2086,26 +2066,25 @@ int __sys_getsockname(int fd, struct sockaddr __user *usockaddr,
 {
 	struct socket *sock;
 	struct sockaddr_storage address;
-	int err, fput_needed;
+	CLASS(fd, f)(fd);
+	int err;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		goto out;
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
 	err = security_socket_getsockname(sock);
 	if (err)
-		goto out_put;
+		return err;
 
 	err = READ_ONCE(sock->ops)->getname(sock, (struct sockaddr *)&address, 0);
 	if (err < 0)
-		goto out_put;
-	/* "err" is actually length in this case */
-	err = move_addr_to_user(&address, err, usockaddr, usockaddr_len);
+		return err;
 
-out_put:
-	fput_light(sock->file, fput_needed);
-out:
-	return err;
+	/* "err" is actually length in this case */
+	return move_addr_to_user(&address, err, usockaddr, usockaddr_len);
 }
 
 SYSCALL_DEFINE3(getsockname, int, fd, struct sockaddr __user *, usockaddr,
@@ -2124,26 +2103,25 @@ int __sys_getpeername(int fd, struct sockaddr __user *usockaddr,
 {
 	struct socket *sock;
 	struct sockaddr_storage address;
-	int err, fput_needed;
+	CLASS(fd, f)(fd);
+	int err;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (sock != NULL) {
-		const struct proto_ops *ops = READ_ONCE(sock->ops);
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
-		err = security_socket_getpeername(sock);
-		if (err) {
-			fput_light(sock->file, fput_needed);
-			return err;
-		}
+	err = security_socket_getpeername(sock);
+	if (err)
+		return err;
 
-		err = ops->getname(sock, (struct sockaddr *)&address, 1);
-		if (err >= 0)
-			/* "err" is actually length in this case */
-			err = move_addr_to_user(&address, err, usockaddr,
-						usockaddr_len);
-		fput_light(sock->file, fput_needed);
-	}
-	return err;
+	err = READ_ONCE(sock->ops)->getname(sock, (struct sockaddr *)&address, 1);
+	if (err < 0)
+		return err;
+
+	/* "err" is actually length in this case */
+	return move_addr_to_user(&address, err, usockaddr, usockaddr_len);
 }
 
 SYSCALL_DEFINE3(getpeername, int, fd, struct sockaddr __user *, usockaddr,
@@ -2162,16 +2140,13 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags,
 {
 	struct socket *sock;
 	struct sockaddr_storage address;
-	int err;
 	struct msghdr msg;
-	int fput_needed;
+	CLASS(fd, f)(fd);
+	int err;
 
 	err = import_ubuf(ITER_SOURCE, buff, len, &msg.msg_iter);
 	if (unlikely(err))
 		return err;
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		goto out;
 
 	msg.msg_name = NULL;
 	msg.msg_control = NULL;
@@ -2181,20 +2156,22 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags,
 	if (addr) {
 		err = move_addr_to_kernel(addr, addr_len, &address);
 		if (err < 0)
-			goto out_put;
+			return err;
 		msg.msg_name = (struct sockaddr *)&address;
 		msg.msg_namelen = addr_len;
 	}
+
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
+
 	flags &= ~MSG_INTERNAL_SENDMSG_FLAGS;
 	if (sock->file->f_flags & O_NONBLOCK)
 		flags |= MSG_DONTWAIT;
 	msg.msg_flags = flags;
-	err = __sock_sendmsg(sock, &msg);
-
-out_put:
-	fput_light(sock->file, fput_needed);
-out:
-	return err;
+	return __sock_sendmsg(sock, &msg);
 }
 
 SYSCALL_DEFINE6(sendto, int, fd, void __user *, buff, size_t, len,
@@ -2229,14 +2206,17 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags,
 	};
 	struct socket *sock;
 	int err, err2;
-	int fput_needed;
+	CLASS(fd, f)(fd);
+
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
 	err = import_ubuf(ITER_DEST, ubuf, size, &msg.msg_iter);
 	if (unlikely(err))
 		return err;
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		goto out;
 
 	if (sock->file->f_flags & O_NONBLOCK)
 		flags |= MSG_DONTWAIT;
@@ -2248,9 +2228,6 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags,
 		if (err2 < 0)
 			err = err2;
 	}
-
-	fput_light(sock->file, fput_needed);
-out:
 	return err;
 }
 
@@ -2325,17 +2302,16 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
 {
 	sockptr_t optval = USER_SOCKPTR(user_optval);
 	bool compat = in_compat_syscall();
-	int err, fput_needed;
 	struct socket *sock;
+	CLASS(fd, f)(fd);
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		return err;
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
-	err = do_sock_setsockopt(sock, compat, level, optname, optval, optlen);
-
-	fput_light(sock->file, fput_needed);
-	return err;
+	return do_sock_setsockopt(sock, compat, level, optname, optval, optlen);
 }
 
 SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
@@ -2391,20 +2367,17 @@ EXPORT_SYMBOL(do_sock_getsockopt);
 int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
 		int __user *optlen)
 {
-	int err, fput_needed;
 	struct socket *sock;
-	bool compat;
+	CLASS(fd, f)(fd);
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		return err;
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
-	compat = in_compat_syscall();
-	err = do_sock_getsockopt(sock, compat, level, optname,
+	return do_sock_getsockopt(sock, in_compat_syscall(), level, optname,
 				 USER_SOCKPTR(optval), USER_SOCKPTR(optlen));
-
-	fput_light(sock->file, fput_needed);
-	return err;
 }
 
 SYSCALL_DEFINE5(getsockopt, int, fd, int, level, int, optname,
@@ -2430,15 +2403,16 @@ int __sys_shutdown_sock(struct socket *sock, int how)
 
 int __sys_shutdown(int fd, int how)
 {
-	int err, fput_needed;
 	struct socket *sock;
+	CLASS(fd, f)(fd);
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (sock != NULL) {
-		err = __sys_shutdown_sock(sock, how);
-		fput_light(sock->file, fput_needed);
-	}
-	return err;
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
+
+	return __sys_shutdown_sock(sock, how);
 }
 
 SYSCALL_DEFINE2(shutdown, int, fd, int, how)
@@ -2654,22 +2628,20 @@ long __sys_sendmsg_sock(struct socket *sock, struct msghdr *msg,
 long __sys_sendmsg(int fd, struct user_msghdr __user *msg, unsigned int flags,
 		   bool forbid_cmsg_compat)
 {
-	int fput_needed, err;
 	struct msghdr msg_sys;
 	struct socket *sock;
+	CLASS(fd, f)(fd);
+
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
 	if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT))
 		return -EINVAL;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		goto out;
-
-	err = ___sys_sendmsg(sock, msg, &msg_sys, flags, NULL, 0);
-
-	fput_light(sock->file, fput_needed);
-out:
-	return err;
+	return ___sys_sendmsg(sock, msg, &msg_sys, flags, NULL, 0);
 }
 
 SYSCALL_DEFINE3(sendmsg, int, fd, struct user_msghdr __user *, msg, unsigned int, flags)
@@ -2684,13 +2656,20 @@ SYSCALL_DEFINE3(sendmsg, int, fd, struct user_msghdr __user *, msg, unsigned int
 int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 		   unsigned int flags, bool forbid_cmsg_compat)
 {
-	int fput_needed, err, datagrams;
+	int err, datagrams;
 	struct socket *sock;
 	struct mmsghdr __user *entry;
 	struct compat_mmsghdr __user *compat_entry;
 	struct msghdr msg_sys;
 	struct used_address used_address;
 	unsigned int oflags = flags;
+	CLASS(fd, f)(fd);
+
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
 	if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT))
 		return -EINVAL;
@@ -2700,10 +2679,6 @@ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 
 	datagrams = 0;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		return err;
-
 	used_address.name_len = UINT_MAX;
 	entry = mmsg;
 	compat_entry = (struct compat_mmsghdr __user *)mmsg;
@@ -2739,8 +2714,6 @@ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 		cond_resched();
 	}
 
-	fput_light(sock->file, fput_needed);
-
 	/* We only return an error if no datagrams were able to be sent */
 	if (datagrams != 0)
 		return datagrams;
@@ -2862,22 +2835,20 @@ long __sys_recvmsg_sock(struct socket *sock, struct msghdr *msg,
 long __sys_recvmsg(int fd, struct user_msghdr __user *msg, unsigned int flags,
 		   bool forbid_cmsg_compat)
 {
-	int fput_needed, err;
 	struct msghdr msg_sys;
 	struct socket *sock;
+	CLASS(fd, f)(fd);
+
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
 	if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT))
 		return -EINVAL;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		goto out;
-
-	err = ___sys_recvmsg(sock, msg, &msg_sys, flags, 0);
-
-	fput_light(sock->file, fput_needed);
-out:
-	return err;
+	return ___sys_recvmsg(sock, msg, &msg_sys, flags, 0);
 }
 
 SYSCALL_DEFINE3(recvmsg, int, fd, struct user_msghdr __user *, msg,
@@ -2894,31 +2865,32 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
 			  unsigned int vlen, unsigned int flags,
 			  struct timespec64 *timeout)
 {
-	int fput_needed, err, datagrams;
+	int err, datagrams;
 	struct socket *sock;
 	struct mmsghdr __user *entry;
 	struct compat_mmsghdr __user *compat_entry;
 	struct msghdr msg_sys;
 	struct timespec64 end_time;
 	struct timespec64 timeout64;
+	CLASS(fd, f)(fd);
 
 	if (timeout &&
 	    poll_select_set_timeout(&end_time, timeout->tv_sec,
 				    timeout->tv_nsec))
 		return -EINVAL;
 
-	datagrams = 0;
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		return err;
+	datagrams = 0;
 
 	if (likely(!(flags & MSG_ERRQUEUE))) {
 		err = sock_error(sock->sk);
-		if (err) {
-			datagrams = err;
-			goto out_put;
-		}
+		if (err)
+			return err;
 	}
 
 	entry = mmsg;
@@ -2975,12 +2947,10 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
 	}
 
 	if (err == 0)
-		goto out_put;
+		return datagrams;
 
-	if (datagrams == 0) {
-		datagrams = err;
-		goto out_put;
-	}
+	if (datagrams == 0)
+		return err;
 
 	/*
 	 * We may return less entries than requested (vlen) if the
@@ -2995,9 +2965,6 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
 		 */
 		WRITE_ONCE(sock->sk->sk_err, -err);
 	}
-out_put:
-	fput_light(sock->file, fput_needed);
-
 	return datagrams;
 }
Linus Torvalds May 26, 2024, 10:16 p.m. UTC | #4
On Sun, 26 May 2024 at 12:27, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Not really.  The real reason is different - there is a constraint on
> possible values of struct fd.  No valid instance can ever have NULL
> file and non-zero flags.
>
> The usual pattern is this:

[ snip snip ]

Ugh. I still hate it, including your new version. I suspect it will
easily generate the extra test at fd_empty() time, and your new
version would instead just move that extra test at fdput() time
instead.

Hopefully in most cases the compiler sees the previous test for
fd.file, realizes the new test is unnecessary and optimizes it away.

Except we most definitely pass around 'struct fd *' in some places (at
least ovlfs), so I doubt that  will be the case everywhere.

What would make more sense is if you make the "fd_empty()" test be
about the _flags_, and then both the fp_empty() test and the test
inside fdput() would be testing the same things.

Sadly, we'd need another bit in the flags. One option is easy enough -
we'd just have to make 'struct file' always be 8-byte aligned, which
it effectively always is.

Or we'd need to make the rule be that FDPUT_POS_UNLOCK only gets set
if FDPUT_FPUT is set.

Because I think we could have even a two-bit tag value have that "no fd" case:

 00 - no fd
 01 - fd but no need for fput
 10 - fd needs fput
 11 - fd needs pos unlock and fput

but as it is, that's not what we have. Right now we have

  00 - no fd or fd with no need for fput ("look at fd.file to decide")
  01 - fd needs fput
  10 - fd pos unlock but no fput
  11 - fd pos unlock and fput

but that 10 case looks odd to me. Why would we ever need a pos unlock
but no fput? The reason we don't need an fput is that we're the only
thread that has access to the file pointer, but that also implies that
we shouldn't need to lock the position.

So now I've just confused myself. Why *do* we have that 10 pattern?

Adding a separate bit would certainly avoid any complexity, and then
you'd have "flags==0 means no file pointer" and the "fd_empty()" test
would then make the fdput) test obviously unnecessary in the usual
pattern.

             Linus
Al Viro May 26, 2024, 11:16 p.m. UTC | #5
On Sun, May 26, 2024 at 03:16:37PM -0700, Linus Torvalds wrote:

> Hopefully in most cases the compiler sees the previous test for
> fd.file, realizes the new test is unnecessary and optimizes it away.

It does.

> Except we most definitely pass around 'struct fd *' in some places (at
> least ovlfs), so I doubt that  will be the case everywhere.

... and those places need care, really.  I've done that review
quite recently.  I don't think you'd been on Cc in related thread...
<checks> nope.

|| Another is overlayfs ovl_real_fdget() and ovl_real_fdget_meta().
|| Those two are arguably too clever for their readers' good.
|| It's still "borrow or clone, and remember which one had it been",
|| but that's mixed with returning errors:
||         err = ovl_read_fdget(file, &fd);
|| may construct and store a junk value in fd if err is non-zero.
|| Not a bug, but only because all callers remember to ignore that
|| value in such case.

Junk value as in e.g. <ERR_PTR(-EIO), FDPUT_FPUT>; it's trivial
to avoid (
		file = ovl_open_realfile(file, &realpath);
		if (IS_ERR(file))
			return ERR_PTR(file);
                real->flags = FDPUT_FPUT;
                real->file = file;
		return 0;
instead of
                real->flags = FDPUT_FPUT;
                real->file = ovl_open_realfile(file, &realpath);

                return PTR_ERR_OR_ZERO(real->file);
we have there right now and similar for <ERR_PTR(-EPERM), 0> pairs),
but note that anything like your switch to struct-wrapped unsigned long
would need similar massage there anyway.

> What would make more sense is if you make the "fd_empty()" test be
> about the _flags_, and then both the fp_empty() test and the test
> inside fdput() would be testing the same things.

Umm...  What encoding would you use?

> Sadly, we'd need another bit in the flags. One option is easy enough -
> we'd just have to make 'struct file' always be 8-byte aligned, which
> it effectively always is.
> 
> Or we'd need to make the rule be that FDPUT_POS_UNLOCK only gets set
> if FDPUT_FPUT is set.

Huh?  That makes no sense - open a file, fork, and there you go;
same file reference in unshared descriptor tables in child and parent.
You definitely don't want to bump refcount on fdget_pos() in either
and you don't want to lose read/write/lseek serialization.

FDPUT_FPUT is *not* a property of file; it's about the original
reference to file not being guaranteed to stay pinned for the lifetime
of struct fd in question...
Linus Torvalds May 27, 2024, 3:23 p.m. UTC | #6
On Sun, 26 May 2024 at 16:16, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> FDPUT_FPUT is *not* a property of file; it's about the original
> reference to file not being guaranteed to stay pinned for the lifetime
> of struct fd in question...

Yup. I forgot the rules and thought we set FDPUT_PUT for cases where
we had exclusive access to 'struct file *', but it's for cases where
we have exclusive access to the 'int fd'.

               Linus
Al Viro May 27, 2024, 4:31 p.m. UTC | #7
On Mon, May 27, 2024 at 12:16:41AM +0100, Al Viro wrote:
> > What would make more sense is if you make the "fd_empty()" test be
> > about the _flags_, and then both the fp_empty() test and the test
> > inside fdput() would be testing the same things.
> 
> Umm...  What encoding would you use?

FWIW, _if_ we go for replacement of struct fd with struct-wrapped
unsigned long and actually steal another bit from pointer, we could
actually use that to encode errors as well...

If bit 0 is clear for file reference and set for an error, we could
use bit 1 to represent borrowed vs. cloned and bit 2 for need to
unlock...

overlayfs would be happier that way - we could have those functions
return struct fd directly, and pass the error value that way.
fdget() would report EBADF explicitly on empty slot - not a big
deal.  Hmm...

Let me play with that a bit and see if I can come up with something
tolerable.

We could add that in parallel to existing struct fd; we'd need a name
for replacement, though - anything like "rawfd" is asking for confusion
around fdget_raw(), and we'd need replacements for fdget() et.al.
that would yield those.

Alternatively, with a sane set of helpers we could actually _replace_
struct fd definition without a huge flagday commit - start with adding
#define fd_file(fd) ((fd).file)
#define fd_valid(fd) (fd_file(fd) != NULL)
convert the current users, one by one (fairly mechanical patches),
then change the definition of struct fd.  The last step would affect
	* definitions of helpers
	* adding fd-constructing macros (ERR_FD, etc.)
	* converting the few places that really construct those
suckers (fdget() et.al., overlayfs, a couple of explicit
initializers to {NULL, 0}).

It's not a huge amount of work and the flagday step would be fairly
small, but we'd probably need to split that over a couple of cycles -
helpers in one, along with renaming fd.file to e.g.
fd.__file_use_the_damn_helpers, then in the next cycle do the
switchover.

Hell knows; let me play with that a bit and let's see what falls out...
Linus Torvalds May 27, 2024, 7:20 p.m. UTC | #8
On Mon, 27 May 2024 at 09:31, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Alternatively, with a sane set of helpers we could actually _replace_
> struct fd definition without a huge flagday commit

It wouldn't even be all that huge. We've got less than 200 of those
'struct fd' users, and some of them hardly even look at the result,
but just pass it off to other helpers (ie the kernel/bpf/syscalls.c
pattern).

With just a couple of helpers it would be mostly a cleanup.

               Linus
Linus Torvalds May 27, 2024, 7:35 p.m. UTC | #9
On Mon, 27 May 2024 at 12:20, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> With just a couple of helpers it would be mostly a cleanup.

Looking around, I think we could even take four bits. Without any
debugging, 'struct file' is 160 bytes on 32-bit, it would not hurt at
all to just force a 16-byte alignment.

In fact, in practice it is aligned more than that - we already use
SLAB_HWCACHE_ALIGN, which in most cases means that it's 64-byte
aligned.

But to be safe, we should specify the 16 bytes in the
kmem_cache_create() call, and we should just make this all very
explicit:

  --- a/fs/file_table.c
  +++ b/fs/file_table.c
  @@ -512,7 +512,7 @@ EXPORT_SYMBOL(__fput_sync);

   void __init files_init(void)
   {
  -     filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0,
  +     filp_cachep = kmem_cache_create("filp", sizeof(struct file), 16,
                                SLAB_TYPESAFE_BY_RCU | SLAB_HWCACHE_ALIGN |
                                SLAB_PANIC | SLAB_ACCOUNT, NULL);
        percpu_counter_init(&nr_files, 0, GFP_KERNEL);
  --- a/include/linux/fs.h
  +++ b/include/linux/fs.h
  @@ -1025,7 +1025,7 @@ struct file {
        errseq_t                f_wb_err;
        errseq_t                f_sb_err; /* for syncfs */
   } __randomize_layout
  -  __attribute__((aligned(4)));       /* lest something weird
decides that 2 is OK */
  +  __attribute__((aligned(16))); /* Up to four tag bits */

   struct file_handle {
        __u32 handle_bytes;

and while four tag bits isn't something to waste, it looks pretty
reasonable here.

               Linus
Al Viro May 27, 2024, 9:25 p.m. UTC | #10
On Mon, May 27, 2024 at 05:31:16PM +0100, Al Viro wrote:

> Hell knows; let me play with that a bit and let's see what falls out...

gcc optimizer is... interesting:

if (v & 1)
        if (!(v & 3))
                foo();

is *NOT* collapsed into a no-op.  Not even

if (v & 1)
        if (!(v & 1) && !(v & 2))
                foo();

is good enough for it.

if (v & 1)
        if (!(v & 1))
                if (!(v & 2))
                        foo();

is recognized, thankfully, but... WTF?  As far as I can tell,
        if (!(v & 1) && !(v & 2))
gets turned into if (!(v & 3)), and then gcc gets stuck on that.
        if (!(v & 1)) if (!(v & 2))
is turned into the same, but apparently only after it actually
looks at both parts in context of what it knows about the earlier
checks.

clang handles all forms just fine, but gcc does not ;-/
diff mbox series

Patch

diff --git a/include/linux/file.h b/include/linux/file.h
index 45d0f4800abd..831fda778e23 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -29,12 +29,6 @@  extern struct file *alloc_file_pseudo_noaccount(struct inode *, struct vfsmount
 extern struct file *alloc_file_clone(struct file *, int flags,
 	const struct file_operations *);
 
-static inline void fput_light(struct file *file, int fput_needed)
-{
-	if (fput_needed)
-		fput(file);
-}
-
 struct fd {
 	struct file *file;
 	unsigned int flags;
@@ -76,6 +70,12 @@  static inline struct fd fdget_pos(int fd)
 	return __to_fd(__fdget_pos(fd));
 }
 
+static inline bool fd_empty(struct fd f)
+{
+	// better code with gcc that way
+	return unlikely(!(f.flags | (unsigned long)f.file));
+}
+
 static inline void fdput_pos(struct fd f)
 {
 	if (f.flags & FDPUT_POS_UNLOCK)
diff --git a/net/socket.c b/net/socket.c
index e416920e9399..297293797df2 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -510,7 +510,7 @@  static int sock_map_fd(struct socket *sock, int flags)
 
 struct socket *sock_from_file(struct file *file)
 {
-	if (file->f_op == &socket_file_ops)
+	if (likely(file->f_op == &socket_file_ops))
 		return file->private_data;	/* set in sock_alloc_file */
 
 	return NULL;
@@ -550,24 +550,6 @@  struct socket *sockfd_lookup(int fd, int *err)
 }
 EXPORT_SYMBOL(sockfd_lookup);
 
-static struct socket *sockfd_lookup_light(int fd, int *err, int *fput_needed)
-{
-	struct fd f = fdget(fd);
-	struct socket *sock;
-
-	*err = -EBADF;
-	if (f.file) {
-		sock = sock_from_file(f.file);
-		if (likely(sock)) {
-			*fput_needed = f.flags & FDPUT_FPUT;
-			return sock;
-		}
-		*err = -ENOTSOCK;
-		fdput(f);
-	}
-	return NULL;
-}
-
 static ssize_t sockfs_listxattr(struct dentry *dentry, char *buffer,
 				size_t size)
 {
@@ -1834,23 +1816,25 @@  int __sys_bind(int fd, struct sockaddr __user *umyaddr, int addrlen)
 {
 	struct socket *sock;
 	struct sockaddr_storage address;
-	int err, fput_needed;
-
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (sock) {
-		err = move_addr_to_kernel(umyaddr, addrlen, &address);
-		if (!err) {
-			err = security_socket_bind(sock,
-						   (struct sockaddr *)&address,
-						   addrlen);
-			if (!err)
-				err = READ_ONCE(sock->ops)->bind(sock,
-						      (struct sockaddr *)
-						      &address, addrlen);
-		}
-		fput_light(sock->file, fput_needed);
-	}
-	return err;
+	CLASS(fd, f)(fd);
+	int err;
+
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
+
+	err = move_addr_to_kernel(umyaddr, addrlen, &address);
+	if (unlikely(err))
+		return err;
+
+	err = security_socket_bind(sock, (struct sockaddr *)&address, addrlen);
+	if (unlikely(err))
+		return err;
+
+	return READ_ONCE(sock->ops)->bind(sock,
+					  (struct sockaddr *)&address, addrlen);
 }
 
 SYSCALL_DEFINE3(bind, int, fd, struct sockaddr __user *, umyaddr, int, addrlen)
@@ -1867,21 +1851,24 @@  SYSCALL_DEFINE3(bind, int, fd, struct sockaddr __user *, umyaddr, int, addrlen)
 int __sys_listen(int fd, int backlog)
 {
 	struct socket *sock;
-	int err, fput_needed;
+	CLASS(fd, f)(fd);
 	int somaxconn;
+	int err;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (sock) {
-		somaxconn = READ_ONCE(sock_net(sock->sk)->core.sysctl_somaxconn);
-		if ((unsigned int)backlog > somaxconn)
-			backlog = somaxconn;
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
-		err = security_socket_listen(sock, backlog);
-		if (!err)
-			err = READ_ONCE(sock->ops)->listen(sock, backlog);
+	somaxconn = READ_ONCE(sock_net(sock->sk)->core.sysctl_somaxconn);
+	if ((unsigned int)backlog > somaxconn)
+		backlog = somaxconn;
+
+	err = security_socket_listen(sock, backlog);
+	if (!err)
+		err = READ_ONCE(sock->ops)->listen(sock, backlog);
 
-		fput_light(sock->file, fput_needed);
-	}
 	return err;
 }
 
@@ -1992,17 +1979,12 @@  static int __sys_accept4_file(struct file *file, struct sockaddr __user *upeer_s
 int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr,
 		  int __user *upeer_addrlen, int flags)
 {
-	int ret = -EBADF;
-	struct fd f;
+	CLASS(fd, f)(fd);
 
-	f = fdget(fd);
-	if (f.file) {
-		ret = __sys_accept4_file(f.file, upeer_sockaddr,
+	if (fd_empty(f))
+		return -EBADF;
+	return __sys_accept4_file(f.file, upeer_sockaddr,
 					 upeer_addrlen, flags);
-		fdput(f);
-	}
-
-	return ret;
 }
 
 SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user *, upeer_sockaddr,
@@ -2054,20 +2036,18 @@  int __sys_connect_file(struct file *file, struct sockaddr_storage *address,
 
 int __sys_connect(int fd, struct sockaddr __user *uservaddr, int addrlen)
 {
-	int ret = -EBADF;
-	struct fd f;
+	struct sockaddr_storage address;
+	CLASS(fd, f)(fd);
+	int ret;
 
-	f = fdget(fd);
-	if (f.file) {
-		struct sockaddr_storage address;
+	if (fd_empty(f))
+		return -EBADF;
 
-		ret = move_addr_to_kernel(uservaddr, addrlen, &address);
-		if (!ret)
-			ret = __sys_connect_file(f.file, &address, addrlen, 0);
-		fdput(f);
-	}
+	ret = move_addr_to_kernel(uservaddr, addrlen, &address);
+	if (ret)
+		return ret;
 
-	return ret;
+	return __sys_connect_file(f.file, &address, addrlen, 0);
 }
 
 SYSCALL_DEFINE3(connect, int, fd, struct sockaddr __user *, uservaddr,
@@ -2086,26 +2066,25 @@  int __sys_getsockname(int fd, struct sockaddr __user *usockaddr,
 {
 	struct socket *sock;
 	struct sockaddr_storage address;
-	int err, fput_needed;
+	CLASS(fd, f)(fd);
+	int err;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		goto out;
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
 	err = security_socket_getsockname(sock);
 	if (err)
-		goto out_put;
+		return err;
 
 	err = READ_ONCE(sock->ops)->getname(sock, (struct sockaddr *)&address, 0);
 	if (err < 0)
-		goto out_put;
-	/* "err" is actually length in this case */
-	err = move_addr_to_user(&address, err, usockaddr, usockaddr_len);
+		return err;
 
-out_put:
-	fput_light(sock->file, fput_needed);
-out:
-	return err;
+	/* "err" is actually length in this case */
+	return move_addr_to_user(&address, err, usockaddr, usockaddr_len);
 }
 
 SYSCALL_DEFINE3(getsockname, int, fd, struct sockaddr __user *, usockaddr,
@@ -2124,26 +2103,25 @@  int __sys_getpeername(int fd, struct sockaddr __user *usockaddr,
 {
 	struct socket *sock;
 	struct sockaddr_storage address;
-	int err, fput_needed;
+	CLASS(fd, f)(fd);
+	int err;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (sock != NULL) {
-		const struct proto_ops *ops = READ_ONCE(sock->ops);
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
-		err = security_socket_getpeername(sock);
-		if (err) {
-			fput_light(sock->file, fput_needed);
-			return err;
-		}
+	err = security_socket_getpeername(sock);
+	if (err)
+		return err;
 
-		err = ops->getname(sock, (struct sockaddr *)&address, 1);
-		if (err >= 0)
-			/* "err" is actually length in this case */
-			err = move_addr_to_user(&address, err, usockaddr,
-						usockaddr_len);
-		fput_light(sock->file, fput_needed);
-	}
-	return err;
+	err = READ_ONCE(sock->ops)->getname(sock, (struct sockaddr *)&address, 1);
+	if (err < 0)
+		return err;
+
+	/* "err" is actually length in this case */
+	return move_addr_to_user(&address, err, usockaddr, usockaddr_len);
 }
 
 SYSCALL_DEFINE3(getpeername, int, fd, struct sockaddr __user *, usockaddr,
@@ -2162,16 +2140,13 @@  int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags,
 {
 	struct socket *sock;
 	struct sockaddr_storage address;
-	int err;
 	struct msghdr msg;
-	int fput_needed;
+	CLASS(fd, f)(fd);
+	int err;
 
 	err = import_ubuf(ITER_SOURCE, buff, len, &msg.msg_iter);
 	if (unlikely(err))
 		return err;
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		goto out;
 
 	msg.msg_name = NULL;
 	msg.msg_control = NULL;
@@ -2181,20 +2156,22 @@  int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags,
 	if (addr) {
 		err = move_addr_to_kernel(addr, addr_len, &address);
 		if (err < 0)
-			goto out_put;
+			return err;
 		msg.msg_name = (struct sockaddr *)&address;
 		msg.msg_namelen = addr_len;
 	}
+
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
+
 	flags &= ~MSG_INTERNAL_SENDMSG_FLAGS;
 	if (sock->file->f_flags & O_NONBLOCK)
 		flags |= MSG_DONTWAIT;
 	msg.msg_flags = flags;
-	err = __sock_sendmsg(sock, &msg);
-
-out_put:
-	fput_light(sock->file, fput_needed);
-out:
-	return err;
+	return __sock_sendmsg(sock, &msg);
 }
 
 SYSCALL_DEFINE6(sendto, int, fd, void __user *, buff, size_t, len,
@@ -2229,14 +2206,17 @@  int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags,
 	};
 	struct socket *sock;
 	int err, err2;
-	int fput_needed;
+	CLASS(fd, f)(fd);
+
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
 	err = import_ubuf(ITER_DEST, ubuf, size, &msg.msg_iter);
 	if (unlikely(err))
 		return err;
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		goto out;
 
 	if (sock->file->f_flags & O_NONBLOCK)
 		flags |= MSG_DONTWAIT;
@@ -2248,9 +2228,6 @@  int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags,
 		if (err2 < 0)
 			err = err2;
 	}
-
-	fput_light(sock->file, fput_needed);
-out:
 	return err;
 }
 
@@ -2325,17 +2302,16 @@  int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
 {
 	sockptr_t optval = USER_SOCKPTR(user_optval);
 	bool compat = in_compat_syscall();
-	int err, fput_needed;
 	struct socket *sock;
+	CLASS(fd, f)(fd);
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		return err;
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
-	err = do_sock_setsockopt(sock, compat, level, optname, optval, optlen);
-
-	fput_light(sock->file, fput_needed);
-	return err;
+	return do_sock_setsockopt(sock, compat, level, optname, optval, optlen);
 }
 
 SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
@@ -2391,20 +2367,17 @@  EXPORT_SYMBOL(do_sock_getsockopt);
 int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
 		int __user *optlen)
 {
-	int err, fput_needed;
 	struct socket *sock;
-	bool compat;
+	CLASS(fd, f)(fd);
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		return err;
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
-	compat = in_compat_syscall();
-	err = do_sock_getsockopt(sock, compat, level, optname,
+	return do_sock_getsockopt(sock, in_compat_syscall(), level, optname,
 				 USER_SOCKPTR(optval), USER_SOCKPTR(optlen));
-
-	fput_light(sock->file, fput_needed);
-	return err;
 }
 
 SYSCALL_DEFINE5(getsockopt, int, fd, int, level, int, optname,
@@ -2430,15 +2403,16 @@  int __sys_shutdown_sock(struct socket *sock, int how)
 
 int __sys_shutdown(int fd, int how)
 {
-	int err, fput_needed;
 	struct socket *sock;
+	CLASS(fd, f)(fd);
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (sock != NULL) {
-		err = __sys_shutdown_sock(sock, how);
-		fput_light(sock->file, fput_needed);
-	}
-	return err;
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
+
+	return __sys_shutdown_sock(sock, how);
 }
 
 SYSCALL_DEFINE2(shutdown, int, fd, int, how)
@@ -2654,22 +2628,20 @@  long __sys_sendmsg_sock(struct socket *sock, struct msghdr *msg,
 long __sys_sendmsg(int fd, struct user_msghdr __user *msg, unsigned int flags,
 		   bool forbid_cmsg_compat)
 {
-	int fput_needed, err;
 	struct msghdr msg_sys;
 	struct socket *sock;
+	CLASS(fd, f)(fd);
+
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
 	if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT))
 		return -EINVAL;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		goto out;
-
-	err = ___sys_sendmsg(sock, msg, &msg_sys, flags, NULL, 0);
-
-	fput_light(sock->file, fput_needed);
-out:
-	return err;
+	return ___sys_sendmsg(sock, msg, &msg_sys, flags, NULL, 0);
 }
 
 SYSCALL_DEFINE3(sendmsg, int, fd, struct user_msghdr __user *, msg, unsigned int, flags)
@@ -2684,13 +2656,20 @@  SYSCALL_DEFINE3(sendmsg, int, fd, struct user_msghdr __user *, msg, unsigned int
 int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 		   unsigned int flags, bool forbid_cmsg_compat)
 {
-	int fput_needed, err, datagrams;
+	int err, datagrams;
 	struct socket *sock;
 	struct mmsghdr __user *entry;
 	struct compat_mmsghdr __user *compat_entry;
 	struct msghdr msg_sys;
 	struct used_address used_address;
 	unsigned int oflags = flags;
+	CLASS(fd, f)(fd);
+
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
 	if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT))
 		return -EINVAL;
@@ -2700,10 +2679,6 @@  int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 
 	datagrams = 0;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		return err;
-
 	used_address.name_len = UINT_MAX;
 	entry = mmsg;
 	compat_entry = (struct compat_mmsghdr __user *)mmsg;
@@ -2739,8 +2714,6 @@  int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 		cond_resched();
 	}
 
-	fput_light(sock->file, fput_needed);
-
 	/* We only return an error if no datagrams were able to be sent */
 	if (datagrams != 0)
 		return datagrams;
@@ -2862,22 +2835,20 @@  long __sys_recvmsg_sock(struct socket *sock, struct msghdr *msg,
 long __sys_recvmsg(int fd, struct user_msghdr __user *msg, unsigned int flags,
 		   bool forbid_cmsg_compat)
 {
-	int fput_needed, err;
 	struct msghdr msg_sys;
 	struct socket *sock;
+	CLASS(fd, f)(fd);
+
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
 	if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT))
 		return -EINVAL;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		goto out;
-
-	err = ___sys_recvmsg(sock, msg, &msg_sys, flags, 0);
-
-	fput_light(sock->file, fput_needed);
-out:
-	return err;
+	return ___sys_recvmsg(sock, msg, &msg_sys, flags, 0);
 }
 
 SYSCALL_DEFINE3(recvmsg, int, fd, struct user_msghdr __user *, msg,
@@ -2894,31 +2865,32 @@  static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
 			  unsigned int vlen, unsigned int flags,
 			  struct timespec64 *timeout)
 {
-	int fput_needed, err, datagrams;
+	int err, datagrams;
 	struct socket *sock;
 	struct mmsghdr __user *entry;
 	struct compat_mmsghdr __user *compat_entry;
 	struct msghdr msg_sys;
 	struct timespec64 end_time;
 	struct timespec64 timeout64;
+	CLASS(fd, f)(fd);
 
 	if (timeout &&
 	    poll_select_set_timeout(&end_time, timeout->tv_sec,
 				    timeout->tv_nsec))
 		return -EINVAL;
 
-	datagrams = 0;
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(f.file);
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		return err;
+	datagrams = 0;
 
 	if (likely(!(flags & MSG_ERRQUEUE))) {
 		err = sock_error(sock->sk);
-		if (err) {
-			datagrams = err;
-			goto out_put;
-		}
+		if (err)
+			return err;
 	}
 
 	entry = mmsg;
@@ -2975,12 +2947,10 @@  static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
 	}
 
 	if (err == 0)
-		goto out_put;
+		return datagrams;
 
-	if (datagrams == 0) {
-		datagrams = err;
-		goto out_put;
-	}
+	if (datagrams == 0)
+		return err;
 
 	/*
 	 * We may return less entries than requested (vlen) if the
@@ -2995,9 +2965,6 @@  static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
 		 */
 		WRITE_ONCE(sock->sk->sk_err, -err);
 	}
-out_put:
-	fput_light(sock->file, fput_needed);
-
 	return datagrams;
 }