[RFC,0/3] io_uring: add restrictions to support untrusted applications and guests

Message ID	20200710141945.129329-1-sgarzare@redhat.com (mailing list archive)
Headers	show Return-Path: <SRS0=TtS0=AV=lists.openwall.com=kernel-hardening-return-19278-patchwork-kernel-hardening=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1B10B207D0 Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk From: Stefano Garzarella <sgarzare@redhat.com> To: Jens Axboe <axboe@kernel.dk> Cc: Sargun Dhillon <sargun@sargun.me>, Kees Cook <keescook@chromium.org>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Kernel Hardening <kernel-hardening@lists.openwall.com>, Jann Horn <jannh@google.com>, Aleksa Sarai <asarai@suse.de>, Christian Brauner <christian.brauner@ubuntu.com>, Stefan Hajnoczi <stefanha@redhat.com>, io-uring@vger.kernel.org, Alexander Viro <viro@zeniv.linux.org.uk>, Jeff Moyer <jmoyer@redhat.com> Subject: [PATCH RFC 0/3] io_uring: add restrictions to support untrusted applications and guests Date: Fri, 10 Jul 2020 16:19:42 +0200 Message-Id: <20200710141945.129329-1-sgarzare@redhat.com> Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	io_uring: add restrictions to support untrusted applications and guests \| expand [RFC,0/3] io_uring: add restrictions to support untrusted applications and guests [RFC,1/3] io_uring: use an enumeration for io_uring_register(2) opcodes [RFC,2/3] io_uring: add IOURING_REGISTER_RESTRICTIONS opcode [RFC,3/3] io_uring: allow disabling rings during the creation

Message ID

20200710141945.129329-1-sgarzare@redhat.com (mailing list archive)

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1B10B207D0
Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm
Precedence: bulk
From: Stefano Garzarella <sgarzare@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: Sargun Dhillon <sargun@sargun.me>,
	Kees Cook <keescook@chromium.org>,
	linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Kernel Hardening <kernel-hardening@lists.openwall.com>,
	Jann Horn <jannh@google.com>,
	Aleksa Sarai <asarai@suse.de>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	io-uring@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Jeff Moyer <jmoyer@redhat.com>
Subject: [PATCH RFC 0/3] io_uring: add restrictions to support untrusted
 applications and guests
Date: Fri, 10 Jul 2020 16:19:42 +0200
Message-Id: <20200710141945.129329-1-sgarzare@redhat.com>
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

io_uring: add restrictions to support untrusted applications and guests | expand

Message

Stefano Garzarella July 10, 2020, 2:19 p.m. UTC

Following the proposal that I send about restrictions [1], I wrote a PoC with
the main changes. It is still WiP so I left some TODO in the code.

I also wrote helpers in liburing and a test case (test/register-restrictions.c)
available in this repository:
https://github.com/stefano-garzarella/liburing (branch: io_uring_restrictions)

Just to recap the proposal, the idea is to add some restrictions to the
operations (sqe, register, fixed file) to safely allow untrusted applications
or guests to use io_uring queues.

The first patch changes io_uring_register(2) opcodes into an enumeration to
keep track of the last opcode available.

The second patch adds IOURING_REGISTER_RESTRICTIONS opcode and the code to
handle restrictions.

The third patch adds IORING_SETUP_R_DISABLED flag to start the rings disabled,
allowing the user to register restrictions, buffers, files, before to start
processing SQEs.
I'm not sure if this could help seccomp. An alternative pointed out by Jann
Horn could be to register restrictions during io_uring_setup(2), but this
requires some intrusive changes (there is no space in the struct
io_uring_params to pass a pointer to restriction arrays, maybe we can add a
flag and add the pointer at the end of the struct io_uring_params).

Another limitation now is that I need to enable every time
IORING_REGISTER_ENABLE_RINGS in the restrictions to be able to start the rings,
I'm not sure if we should treat it as an exception.

Maybe registering restrictions during io_uring_setup(2) could solve both issues
(seccomp integration and IORING_REGISTER_ENABLE_RINGS registration), but I need
some suggestions to properly extend the io_uring_setup(2).

Comments and suggestions are very welcome.

Thank you in advance,
Stefano

[1] https://lore.kernel.org/io-uring/20200609142406.upuwpfmgqjeji4lc@steredhat/

Stefano Garzarella (3):
  io_uring: use an enumeration for io_uring_register(2) opcodes
  io_uring: add IOURING_REGISTER_RESTRICTIONS opcode
  io_uring: allow disabling rings during the creation

 fs/io_uring.c                 | 155 ++++++++++++++++++++++++++++++++--
 include/uapi/linux/io_uring.h |  59 ++++++++++---
 2 files changed, 194 insertions(+), 20 deletions(-)

Comments

Konrad Rzeszutek Wilk July 10, 2020, 3:33 p.m. UTC | #1

.snip..
> Just to recap the proposal, the idea is to add some restrictions to the
> operations (sqe, register, fixed file) to safely allow untrusted applications
> or guests to use io_uring queues.

Hi!

This is neat and quite cool - but one thing that keeps nagging me is
what how much overhead does this cut from the existing setup when you use
virtio (with guests obviously)? That is from a high level view the
beaty of io_uring being passed in the guest is you don't have the
virtio ring -> io_uring processing, right?

Thanks!

Stefano Garzarella July 10, 2020, 4:20 p.m. UTC | #2

Hi Konrad,

On Fri, Jul 10, 2020 at 11:33:09AM -0400, Konrad Rzeszutek Wilk wrote:
> .snip..
> > Just to recap the proposal, the idea is to add some restrictions to the
> > operations (sqe, register, fixed file) to safely allow untrusted applications
> > or guests to use io_uring queues.
> 
> Hi!
> 
> This is neat and quite cool - but one thing that keeps nagging me is
> what how much overhead does this cut from the existing setup when you use
> virtio (with guests obviously)?

I need to do more tests, but the preliminary results that I reported on
the original proposal [1] show an overhead of ~ 4.17 uS (with iodepth=1)
when I'm using virtio ring processed in a dedicated iothread:

  - 73 kIOPS using virtio-blk + QEMU iothread + io_uring backend
  - 104 kIOPS using io_uring passthrough

>                                 That is from a high level view the
> beaty of io_uring being passed in the guest is you don't have the
> virtio ring -> io_uring processing, right?

Right, and potentially we can share the io_uring queues directly to the
guest userspace applications, cutting down the cost of Linux block
layer in the guest.

Thanks for your feedback,
Stefano

[1] https://lore.kernel.org/io-uring/20200609142406.upuwpfmgqjeji4lc@steredhat/

Stefan Hajnoczi July 13, 2020, 9:24 a.m. UTC | #3

On Fri, Jul 10, 2020 at 06:20:17PM +0200, Stefano Garzarella wrote:
> On Fri, Jul 10, 2020 at 11:33:09AM -0400, Konrad Rzeszutek Wilk wrote:
> > .snip..
> > > Just to recap the proposal, the idea is to add some restrictions to the
> > > operations (sqe, register, fixed file) to safely allow untrusted applications
> > > or guests to use io_uring queues.
> > 
> > Hi!
> > 
> > This is neat and quite cool - but one thing that keeps nagging me is
> > what how much overhead does this cut from the existing setup when you use
> > virtio (with guests obviously)?
> 
> I need to do more tests, but the preliminary results that I reported on
> the original proposal [1] show an overhead of ~ 4.17 uS (with iodepth=1)
> when I'm using virtio ring processed in a dedicated iothread:
> 
>   - 73 kIOPS using virtio-blk + QEMU iothread + io_uring backend
>   - 104 kIOPS using io_uring passthrough
> 
> >                                 That is from a high level view the
> > beaty of io_uring being passed in the guest is you don't have the
> > virtio ring -> io_uring processing, right?
> 
> Right, and potentially we can share the io_uring queues directly to the
> guest userspace applications, cutting down the cost of Linux block
> layer in the guest.

Another factor is that the guest submits requests directly to the host
kernel sqpoll thread. When a virtqueue is used the sqpoll thread cannot
poll it directly so another host thread (QEMU) needs to poll the
virtqueue. The same applies for the completion code path.

Stefan