From patchwork Fri Oct 25 09:31:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 11211871 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 29B521390 for ; Fri, 25 Oct 2019 09:31:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 07E9E21D7F for ; Fri, 25 Oct 2019 09:31:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hGAvYoL9" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2438630AbfJYJbr (ORCPT ); Fri, 25 Oct 2019 05:31:47 -0400 Received: from mail-wm1-f68.google.com ([209.85.128.68]:33378 "EHLO mail-wm1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405781AbfJYJbr (ORCPT ); Fri, 25 Oct 2019 05:31:47 -0400 Received: by mail-wm1-f68.google.com with SMTP id 6so3855258wmf.0; Fri, 25 Oct 2019 02:31:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=wwqcelQMPV3TFsQAotbZk75eULqBsmycuG/dLtg7Yog=; b=hGAvYoL9HlckdcxCq/B0lI4kZnv3IL9Q00dHXsLic5/VBaDL+t2P4VcKB4MJWa79jf XvVXXT2OMocS6fFDBneO2sBJwtVMiV3wB5QFB898Lo27wkalRISHJeRLvmdeoOSIMGWo nnlT6t1NjqdtNCe9B2b3cDhvBP6AVwHlhcPH2saIkSx6ACoIDTUHVE+MO6M2jiYas64p iDNBNDJWLVvK8M8MwLJ6eprJ8M4bRLn9H6979q9ugPE6ARKKlWvmUXG0UpCTcbJeAT7e rVjYsCdSk8xKQTt2OJ5/iYx2rFr33vMfMO8HpFWt4zkvtlm6mATOCO5xcZDNzAN+c+Rb n1Sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=wwqcelQMPV3TFsQAotbZk75eULqBsmycuG/dLtg7Yog=; b=J9v0SxkGb2r3m/hi28XhVbHxRaWdmD8UtT4D6w41TGwKEDVZDb88CkwefAbv60UFI2 KSavfaq6UL4LrreTbuc5tdZGSzHoME2pitEdvD8gfl6Rk67T1Fy7TvLlxIrqxkJlNJqc jh0/Bq+I5TCBDtWIqC1fDJ9HCgGfE36xKZIU2Mv9K+dH/xr5ZZVWtZZnzjtiReivDDgR zWrlB3Ztg3dHV3zo9psgnG0AYLPTKabT9u4a9GqgDUFQg78ddCGov/eJk1pJS1/Xjq6O T6nIsMvkzeLBMb+D8rOTs+zGCzw+3GgQ0+e56gGvcIM57S2/5CdqU/+iFpVvLoavAkey 9xNg== X-Gm-Message-State: APjAAAX16w6NoicP9TnNzvZiH274Cncj7NYyofMUHufacuXYGbLHbH33 bS90HggPQRQNX0mi2girGnU= X-Google-Smtp-Source: APXvYqwNZP/V/RZ92ydEBeugcb4gGumsxf8/uGiAm1wVppKOsokzxxHDMYgBVw/2WfZiVtqepzrDIg== X-Received: by 2002:a1c:10b:: with SMTP id 11mr2497983wmb.118.1571995905298; Fri, 25 Oct 2019 02:31:45 -0700 (PDT) Received: from localhost.localdomain ([109.126.132.16]) by smtp.gmail.com with ESMTPSA id l7sm2054551wro.17.2019.10.25.02.31.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Oct 2019 02:31:44 -0700 (PDT) From: "Pavel Begunkov (Silence)" To: Jens Axboe , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Pavel Begunkov Subject: [PATCH 1/3] io_uring: Fix corrupted user_data Date: Fri, 25 Oct 2019 12:31:29 +0300 Message-Id: <53e2b76c28b82c98973efca2126d71ecf62e3fdb.1571991701.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Pavel Begunkov There is a bug, where failed linked requests are returned not with specified @user_data, but with garbage from a kernel stack. The reason is that io_fail_links() uses req->user_data, which is uninitialised when called from io_queue_sqe() on fail path. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index 1b46c72f8975..0e141d905a5b 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2448,6 +2448,8 @@ static void io_submit_sqe(struct io_ring_ctx *ctx, struct sqe_submit *s, return; } + req->user_data = s->sqe->user_data; + /* * If we already have a head request, queue this one for async * submittal once the head completes. If we don't have a head but From patchwork Fri Oct 25 09:31:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 11211875 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D2DF41390 for ; Fri, 25 Oct 2019 09:31:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A380E21D7F for ; Fri, 25 Oct 2019 09:31:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="vhiyo7wP" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2406341AbfJYJbv (ORCPT ); Fri, 25 Oct 2019 05:31:51 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:42161 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2438652AbfJYJbu (ORCPT ); Fri, 25 Oct 2019 05:31:50 -0400 Received: by mail-wr1-f65.google.com with SMTP id r1so1475900wrs.9; Fri, 25 Oct 2019 02:31:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=dmRjCgPUzKkYP2MIaB/azsfLQhHwn5qPJdajE425Fu0=; b=vhiyo7wPudEr0Gu9924BQzkHVXmfKXKsh/khTCJDNwrJ5yXAg0DspSYohuQKnpUSiQ MGXeLBCGAXaAZHfEa4achL9hFzwmUC8OILeHVB7ocw72eTrmLGybuWd7ZaUmFZYJGmB7 B9aJY9ASAEDSejh/I6+p/q58w5i/vOaYmy16zRwlcjAkLgGafzd5PY3q/R6X8aTzPOdm fqK624csERn5RSU60LPENQBd2Q7URvauPFJVFYuTBqakM5/F5AKvdkedillCnl7q9ib5 IDaIFsMAnA8HCs65LhFkY+hdK1DbDAip0D5GwJCkhr7/OqW6SqH7hdkj5qvWADpjAjO2 T7eQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=dmRjCgPUzKkYP2MIaB/azsfLQhHwn5qPJdajE425Fu0=; b=adLvcB46N+TGslMC2L+DEzVsKLKXloDdrFYKiZRIHR/KDzUF7YJLFyxR+eiHih0Vsm FwMthpip7J20+nfq5riFSGJx8LfFRR/zU+skOgO/ro9+qzdCGTz/O/uKJU0LbrH7OVxz L5yGXVWEb+Bg14BZ9itg3Y1SLKhjYu+zdI3CUDJqZGkO4/dqVxPjPB6wf3wAN9pBDCcw cFtWv5VaOwUuAvPYN2y1JsuwESVlFp+sHME7G+PMEcfOgRLoNs2rc0ozxvWtZNpUlQ1U aVA1XuLzXzCO5zg/EE3xNn2Q1y2g1toFBhRjXE7m8X66YRRpvKolVJHl4bWpl8yvLXOi 1sEQ== X-Gm-Message-State: APjAAAU6YNeCPx52II4wTWldYN70Elwh716M2XDkQbFk7gel0ykG3D7p OqBqauDlaibYY8ryKdZ51tjosoZP X-Google-Smtp-Source: APXvYqy8SWNGaNVLczSgaGmiXvj46hbOdxX3pT2leyUhfQJAVmGMu6Lb3bSDi2YqHzXIam/JMo3Xtg== X-Received: by 2002:adf:eb0f:: with SMTP id s15mr1911075wrn.97.1571995907880; Fri, 25 Oct 2019 02:31:47 -0700 (PDT) Received: from localhost.localdomain ([109.126.132.16]) by smtp.gmail.com with ESMTPSA id l7sm2054551wro.17.2019.10.25.02.31.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Oct 2019 02:31:47 -0700 (PDT) From: "Pavel Begunkov (Silence)" To: Jens Axboe , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Pavel Begunkov Subject: [PATCH 2/3] io_uring: Fix broken links with offloading Date: Fri, 25 Oct 2019 12:31:30 +0300 Message-Id: X-Mailer: git-send-email 2.23.0 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Pavel Begunkov io_sq_thread() processes sqes by 8 without considering links. As a result, links will be randomely subdivided. The easiest way to fix it is to call io_get_sqring() inside io_submit_sqes() as do io_ring_submit(). Downsides: 1. This removes optimisation of not grabbing mm_struct for fixed files 2. It submitting all sqes in one go, without finer-grained sheduling with cq processing. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 62 +++++++++++++++++++++++++++------------------------ 1 file changed, 33 insertions(+), 29 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 0e141d905a5b..949c82a40d16 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -735,6 +735,14 @@ static unsigned io_cqring_events(struct io_rings *rings) return READ_ONCE(rings->cq.tail) - READ_ONCE(rings->cq.head); } +static inline unsigned int io_sqring_entries(struct io_ring_ctx *ctx) +{ + struct io_rings *rings = ctx->rings; + + /* make sure SQ entry isn't read before tail */ + return smp_load_acquire(&rings->sq.tail) - ctx->cached_sq_head; +} + /* * Find and free completed poll iocbs */ @@ -2560,8 +2568,8 @@ static bool io_get_sqring(struct io_ring_ctx *ctx, struct sqe_submit *s) return false; } -static int io_submit_sqes(struct io_ring_ctx *ctx, struct sqe_submit *sqes, - unsigned int nr, bool has_user, bool mm_fault) +static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr, + bool has_user, bool mm_fault) { struct io_submit_state state, *statep = NULL; struct io_kiocb *link = NULL; @@ -2575,6 +2583,11 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, struct sqe_submit *sqes, } for (i = 0; i < nr; i++) { + struct sqe_submit s; + + if (!io_get_sqring(ctx, &s)) + break; + /* * If previous wasn't linked and we have a linked command, * that's the end of the chain. Submit the previous link. @@ -2584,9 +2597,9 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, struct sqe_submit *sqes, link = NULL; shadow_req = NULL; } - prev_was_link = (sqes[i].sqe->flags & IOSQE_IO_LINK) != 0; + prev_was_link = (s.sqe->flags & IOSQE_IO_LINK) != 0; - if (link && (sqes[i].sqe->flags & IOSQE_IO_DRAIN)) { + if (link && (s.sqe->flags & IOSQE_IO_DRAIN)) { if (!shadow_req) { shadow_req = io_get_req(ctx, NULL); if (unlikely(!shadow_req)) @@ -2594,18 +2607,18 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, struct sqe_submit *sqes, shadow_req->flags |= (REQ_F_IO_DRAIN | REQ_F_SHADOW_DRAIN); refcount_dec(&shadow_req->refs); } - shadow_req->sequence = sqes[i].sequence; + shadow_req->sequence = s.sequence; } out: if (unlikely(mm_fault)) { - io_cqring_add_event(ctx, sqes[i].sqe->user_data, + io_cqring_add_event(ctx, s.sqe->user_data, -EFAULT); } else { - sqes[i].has_user = has_user; - sqes[i].needs_lock = true; - sqes[i].needs_fixed_file = true; - io_submit_sqe(ctx, &sqes[i], statep, &link); + s.has_user = has_user; + s.needs_lock = true; + s.needs_fixed_file = true; + io_submit_sqe(ctx, &s, statep, &link); submitted++; } } @@ -2620,7 +2633,6 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, struct sqe_submit *sqes, static int io_sq_thread(void *data) { - struct sqe_submit sqes[IO_IOPOLL_BATCH]; struct io_ring_ctx *ctx = data; struct mm_struct *cur_mm = NULL; mm_segment_t old_fs; @@ -2635,8 +2647,8 @@ static int io_sq_thread(void *data) timeout = inflight = 0; while (!kthread_should_park()) { - bool all_fixed, mm_fault = false; - int i; + bool mm_fault = false; + unsigned int to_submit; if (inflight) { unsigned nr_events = 0; @@ -2656,7 +2668,8 @@ static int io_sq_thread(void *data) timeout = jiffies + ctx->sq_thread_idle; } - if (!io_get_sqring(ctx, &sqes[0])) { + to_submit = io_sqring_entries(ctx); + if (!to_submit) { /* * We're polling. If we're within the defined idle * period, then let us spin without work before going @@ -2687,7 +2700,8 @@ static int io_sq_thread(void *data) /* make sure to read SQ tail after writing flags */ smp_mb(); - if (!io_get_sqring(ctx, &sqes[0])) { + to_submit = io_sqring_entries(ctx); + if (!to_submit) { if (kthread_should_park()) { finish_wait(&ctx->sqo_wait, &wait); break; @@ -2705,19 +2719,8 @@ static int io_sq_thread(void *data) ctx->rings->sq_flags &= ~IORING_SQ_NEED_WAKEUP; } - i = 0; - all_fixed = true; - do { - if (all_fixed && io_sqe_needs_user(sqes[i].sqe)) - all_fixed = false; - - i++; - if (i == ARRAY_SIZE(sqes)) - break; - } while (io_get_sqring(ctx, &sqes[i])); - /* Unless all new commands are FIXED regions, grab mm */ - if (!all_fixed && !cur_mm) { + if (!cur_mm) { mm_fault = !mmget_not_zero(ctx->sqo_mm); if (!mm_fault) { use_mm(ctx->sqo_mm); @@ -2725,8 +2728,9 @@ static int io_sq_thread(void *data) } } - inflight += io_submit_sqes(ctx, sqes, i, cur_mm != NULL, - mm_fault); + to_submit = min(to_submit, ctx->sq_entries); + inflight += io_submit_sqes(ctx, to_submit, cur_mm != NULL, + mm_fault); /* Commit SQ ring head once we've consumed all SQEs */ io_commit_sqring(ctx); From patchwork Fri Oct 25 09:31:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 11211873 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CFC2D1709 for ; Fri, 25 Oct 2019 09:31:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AD3E121E6F for ; Fri, 25 Oct 2019 09:31:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Q/bVBEeB" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2408835AbfJYJbv (ORCPT ); Fri, 25 Oct 2019 05:31:51 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:33669 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2438658AbfJYJbv (ORCPT ); Fri, 25 Oct 2019 05:31:51 -0400 Received: by mail-wr1-f65.google.com with SMTP id s1so1525406wro.0; Fri, 25 Oct 2019 02:31:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=77A9mrU+GvtRghygf59d/zr0EDg/QRoehlw2HaK8WGw=; b=Q/bVBEeBy5tn//kvTzIw8xbzOxwDulRXcCUWsPpI8xQ6ggu0HmedYpLkER8xsKbfjV AFpOxBm+U+vwz+IIZpRK2UwME7D/Rp56Zi6J+MHzhnLufJMwYsinSvFdTL8fNS1Fiu8h ppc3Fke/XsQ5VDC/mYoNXf2OI5oWftNfG27SNvXdgRJu2uEpEOERnbLjEXun37HwOn2z il/O+0xNfYM8n+vgcQLtJWP1L/C3FhBbFlqMEQ/pFlSCra2R/C6JNJWjH0PKrcNRR7aq U+jvbZRK+/psfuBjSU+PgdVM1tfi5aBLh4hzualXLH5rggJjwbhe2JRr5LaboKvjwOnD 28NA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=77A9mrU+GvtRghygf59d/zr0EDg/QRoehlw2HaK8WGw=; b=tHo0r/1/HcHrKrowyzemEfv1kuRedSh2Rdine4jJ1gUx48t3MinA/ro50on/yAZEas p5Q/uHAZ83h8G1tH4OD1y3+aqZLIsmrsIlV1lhWcL7qLr/GtYokoU8TgdPLbZ448IEYZ 7mhlJeAPi2koHEYFsjXgQffdLitP+B4xDX77C2OAp5Sn2FNFWPU9wftPqCmAR+ilnHH8 yc1A1Q9/7SqCoD1zl9KWInB5IEW+FIpLKp/U7myq3S7OXqusYj5Q63ThNj23R0xToC3f UVJJcdxASjvX8kOswuc/K2s1pJ0FnPQWGHm9Xt8nWp8Vm40O1x+Z9tyPO0YIciCdxeHt m+dQ== X-Gm-Message-State: APjAAAVfpn6Ijgb05dr2Hi/531FmeP/qD0o0eLbRJVzlx1rSqpQXIvnL 0pKeGzlKGWvgdIB3+JKjQOo= X-Google-Smtp-Source: APXvYqwrEopp9G/Q+1sxJJWFLmASvDAv315NCV9IEEkbnpTD+PMY12Z8Jo4hHk0tAxrx6iy/pjdIGg== X-Received: by 2002:adf:ee03:: with SMTP id y3mr2026446wrn.116.1571995909648; Fri, 25 Oct 2019 02:31:49 -0700 (PDT) Received: from localhost.localdomain ([109.126.132.16]) by smtp.gmail.com with ESMTPSA id l7sm2054551wro.17.2019.10.25.02.31.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Oct 2019 02:31:49 -0700 (PDT) From: "Pavel Begunkov (Silence)" To: Jens Axboe , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Pavel Begunkov Subject: [PATCH 3/3] io_uring: Fix race for sqes with userspace Date: Fri, 25 Oct 2019 12:31:31 +0300 Message-Id: X-Mailer: git-send-email 2.23.0 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Pavel Begunkov io_ring_submit() finalises with 1. io_commit_sqring(), which releases sqes to the userspace 2. Then calls to io_queue_link_head(), accessing released head's sqe Reorder them. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 949c82a40d16..32f6598ecae9 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2795,13 +2795,14 @@ static int io_ring_submit(struct io_ring_ctx *ctx, unsigned int to_submit) submit++; io_submit_sqe(ctx, &s, statep, &link); } - io_commit_sqring(ctx); if (link) io_queue_link_head(ctx, link, &link->submit, shadow_req); if (statep) io_submit_state_end(statep); + io_commit_sqring(ctx); + return submit; }