mbox series

[v2,0/8] io_uring: Initial support for {s,g}etsockopt commands

Message ID 20230808134049.1407498-1-leitao@debian.org (mailing list archive)
Headers show
Series io_uring: Initial support for {s,g}etsockopt commands | expand

Message

Breno Leitao Aug. 8, 2023, 1:40 p.m. UTC
This patchset adds support for getsockopt (SOCKET_URING_OP_GETSOCKOPT)
and setsockopt (SOCKET_URING_OP_SETSOCKOPT) in io_uring commands.
SOCKET_URING_OP_SETSOCKOPT implements generic case, covering all levels
nad optnames. On the other hand, SOCKET_URING_OP_GETSOCKOPT just
implements level SOL_SOCKET case, which seems to be the
most common level parameter for get/setsockopt(2).

struct proto_ops->setsockopt() uses sockptr instead of userspace
pointers, which makes it easy to bind to io_uring. Unfortunately
proto_ops->getsockopt() callback uses userspace pointers, except for
SOL_SOCKET, which is handled by sk_getsockopt(). Thus, this patchset
leverages sk_getsockopt() to imlpement the SOCKET_URING_OP_GETSOCKOPT
case.

In order to support BPF hooks, I modified the hooks to use  sockptr, so,
it is flexible enough to accept user or kernel pointers for
optval/optlen.

PS1: For getsockopt command, the optlen field is not a userspace
pointers, but an absolute value, so this is slightly different from
getsockopt(2) behaviour. The new optlen value is returned in cqe->res.

PS2: The userspace pointers need to be alive until the operation is
completed.

These changes were tested with a new test[1] in liburing. On the BPF
side, I tested that no regression was introduced by running "test_progs"
self test using "sockopt" test case.

[1] Link: https://github.com/leitao/liburing/blob/getsock/test/socket-getsetsock-cmd.c

RFC -> V1:
	* Copy user memory at io_uring subsystem, and call proto_ops
	  callbacks using kernel memory
	* Implement all the cases for SOCKET_URING_OP_SETSOCKOPT
V1 -> V2
	* Implemented the BPF part
	* Using user pointers from optval to avoid kmalloc in io_uring part.

Breno Leitao (8):
  net: expose sock_use_custom_sol_socket
  io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT
  io_uring/cmd: Extend support beyond SOL_SOCKET
  bpf: Leverage sockptr_t in BPF getsockopt hook
  bpf: Leverage sockptr_t in BPF setsockopt hook
  io_uring/cmd: BPF hook for getsockopt cmd
  io_uring/cmd: BPF hook for setsockopt cmd

 include/linux/bpf-cgroup.h    |  7 +--
 include/linux/net.h           |  5 +++
 include/uapi/linux/io_uring.h |  8 ++++
 io_uring/uring_cmd.c          | 82 +++++++++++++++++++++++++++++++++++
 kernel/bpf/cgroup.c           | 25 ++++++-----
 net/socket.c                  | 12 ++---
 6 files changed, 117 insertions(+), 22 deletions(-)

Comments

Stanislav Fomichev Aug. 8, 2023, 5:35 p.m. UTC | #1
On 08/08, Breno Leitao wrote:
> This patchset adds support for getsockopt (SOCKET_URING_OP_GETSOCKOPT)
> and setsockopt (SOCKET_URING_OP_SETSOCKOPT) in io_uring commands.
> SOCKET_URING_OP_SETSOCKOPT implements generic case, covering all levels
> nad optnames. On the other hand, SOCKET_URING_OP_GETSOCKOPT just
> implements level SOL_SOCKET case, which seems to be the
> most common level parameter for get/setsockopt(2).
> 
> struct proto_ops->setsockopt() uses sockptr instead of userspace
> pointers, which makes it easy to bind to io_uring. Unfortunately
> proto_ops->getsockopt() callback uses userspace pointers, except for
> SOL_SOCKET, which is handled by sk_getsockopt(). Thus, this patchset
> leverages sk_getsockopt() to imlpement the SOCKET_URING_OP_GETSOCKOPT
> case.
> 
> In order to support BPF hooks, I modified the hooks to use  sockptr, so,
> it is flexible enough to accept user or kernel pointers for
> optval/optlen.
> 
> PS1: For getsockopt command, the optlen field is not a userspace
> pointers, but an absolute value, so this is slightly different from
> getsockopt(2) behaviour. The new optlen value is returned in cqe->res.
> 
> PS2: The userspace pointers need to be alive until the operation is
> completed.
> 
> These changes were tested with a new test[1] in liburing. On the BPF
> side, I tested that no regression was introduced by running "test_progs"
> self test using "sockopt" test case.
> 
> [1] Link: https://github.com/leitao/liburing/blob/getsock/test/socket-getsetsock-cmd.c
> 
> RFC -> V1:
> 	* Copy user memory at io_uring subsystem, and call proto_ops
> 	  callbacks using kernel memory
> 	* Implement all the cases for SOCKET_URING_OP_SETSOCKOPT

I did a quick pass, will take a close look later today. So far everything makes
sense to me.

Should we properly test it as well?
We have tools/testing/selftests/bpf/prog_tests/sockopt.c which does
most of the sanity checks, but it uses regular socket/{g,s}etsockopt
syscalls. Seems like it should be pretty easy to extend this with
io_uring path? tools/testing/selftests/net/io_uring_zerocopy_tx.c
already implements minimal wrappers which we can most likely borrow.
Breno Leitao Aug. 9, 2023, 9:40 a.m. UTC | #2
On Tue, Aug 08, 2023 at 10:35:08AM -0700, Stanislav Fomichev wrote:
> On 08/08, Breno Leitao wrote:
> > This patchset adds support for getsockopt (SOCKET_URING_OP_GETSOCKOPT)
> > and setsockopt (SOCKET_URING_OP_SETSOCKOPT) in io_uring commands.
> > SOCKET_URING_OP_SETSOCKOPT implements generic case, covering all levels
> > nad optnames. On the other hand, SOCKET_URING_OP_GETSOCKOPT just
> > implements level SOL_SOCKET case, which seems to be the
> > most common level parameter for get/setsockopt(2).
> > 
> > struct proto_ops->setsockopt() uses sockptr instead of userspace
> > pointers, which makes it easy to bind to io_uring. Unfortunately
> > proto_ops->getsockopt() callback uses userspace pointers, except for
> > SOL_SOCKET, which is handled by sk_getsockopt(). Thus, this patchset
> > leverages sk_getsockopt() to imlpement the SOCKET_URING_OP_GETSOCKOPT
> > case.
> > 
> > In order to support BPF hooks, I modified the hooks to use  sockptr, so,
> > it is flexible enough to accept user or kernel pointers for
> > optval/optlen.
> > 
> > PS1: For getsockopt command, the optlen field is not a userspace
> > pointers, but an absolute value, so this is slightly different from
> > getsockopt(2) behaviour. The new optlen value is returned in cqe->res.
> > 
> > PS2: The userspace pointers need to be alive until the operation is
> > completed.
> > 
> > These changes were tested with a new test[1] in liburing. On the BPF
> > side, I tested that no regression was introduced by running "test_progs"
> > self test using "sockopt" test case.
> > 
> > [1] Link: https://github.com/leitao/liburing/blob/getsock/test/socket-getsetsock-cmd.c
> > 
> > RFC -> V1:
> > 	* Copy user memory at io_uring subsystem, and call proto_ops
> > 	  callbacks using kernel memory
> > 	* Implement all the cases for SOCKET_URING_OP_SETSOCKOPT
> 
> I did a quick pass, will take a close look later today. So far everything makes
> sense to me.
> 
> Should we properly test it as well?
> We have tools/testing/selftests/bpf/prog_tests/sockopt.c which does
> most of the sanity checks, but it uses regular socket/{g,s}etsockopt
> syscalls.

Right, that is what I've been using to test the changes.

> Seems like it should be pretty easy to extend this with
> io_uring path? tools/testing/selftests/net/io_uring_zerocopy_tx.c
> already implements minimal wrappers which we can most likely borrow.

Sure, I can definitely do it. Do you want to see the new tests in this
patchset, or, in a following patches?
Stanislav Fomichev Aug. 9, 2023, 4:26 p.m. UTC | #3
On Wed, Aug 9, 2023 at 2:41 AM Breno Leitao <leitao@debian.org> wrote:
>
> On Tue, Aug 08, 2023 at 10:35:08AM -0700, Stanislav Fomichev wrote:
> > On 08/08, Breno Leitao wrote:
> > > This patchset adds support for getsockopt (SOCKET_URING_OP_GETSOCKOPT)
> > > and setsockopt (SOCKET_URING_OP_SETSOCKOPT) in io_uring commands.
> > > SOCKET_URING_OP_SETSOCKOPT implements generic case, covering all levels
> > > nad optnames. On the other hand, SOCKET_URING_OP_GETSOCKOPT just
> > > implements level SOL_SOCKET case, which seems to be the
> > > most common level parameter for get/setsockopt(2).
> > >
> > > struct proto_ops->setsockopt() uses sockptr instead of userspace
> > > pointers, which makes it easy to bind to io_uring. Unfortunately
> > > proto_ops->getsockopt() callback uses userspace pointers, except for
> > > SOL_SOCKET, which is handled by sk_getsockopt(). Thus, this patchset
> > > leverages sk_getsockopt() to imlpement the SOCKET_URING_OP_GETSOCKOPT
> > > case.
> > >
> > > In order to support BPF hooks, I modified the hooks to use  sockptr, so,
> > > it is flexible enough to accept user or kernel pointers for
> > > optval/optlen.
> > >
> > > PS1: For getsockopt command, the optlen field is not a userspace
> > > pointers, but an absolute value, so this is slightly different from
> > > getsockopt(2) behaviour. The new optlen value is returned in cqe->res.
> > >
> > > PS2: The userspace pointers need to be alive until the operation is
> > > completed.
> > >
> > > These changes were tested with a new test[1] in liburing. On the BPF
> > > side, I tested that no regression was introduced by running "test_progs"
> > > self test using "sockopt" test case.
> > >
> > > [1] Link: https://github.com/leitao/liburing/blob/getsock/test/socket-getsetsock-cmd.c
> > >
> > > RFC -> V1:
> > >     * Copy user memory at io_uring subsystem, and call proto_ops
> > >       callbacks using kernel memory
> > >     * Implement all the cases for SOCKET_URING_OP_SETSOCKOPT
> >
> > I did a quick pass, will take a close look later today. So far everything makes
> > sense to me.
> >
> > Should we properly test it as well?
> > We have tools/testing/selftests/bpf/prog_tests/sockopt.c which does
> > most of the sanity checks, but it uses regular socket/{g,s}etsockopt
> > syscalls.
>
> Right, that is what I've been using to test the changes.
>
> > Seems like it should be pretty easy to extend this with
> > io_uring path? tools/testing/selftests/net/io_uring_zerocopy_tx.c
> > already implements minimal wrappers which we can most likely borrow.
>
> Sure, I can definitely do it. Do you want to see the new tests in this
> patchset, or, in a following patches?

Let's keep it in the same series if possible?