From patchwork Thu Mar 14 20:44:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10853555 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ABF731874 for ; Thu, 14 Mar 2019 20:44:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9EBB12A650 for ; Thu, 14 Mar 2019 20:44:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 90C5E2A652; Thu, 14 Mar 2019 20:44:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2B3A82A656 for ; Thu, 14 Mar 2019 20:44:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727853AbfCNUon (ORCPT ); Thu, 14 Mar 2019 16:44:43 -0400 Received: from mail-it1-f195.google.com ([209.85.166.195]:40057 "EHLO mail-it1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727837AbfCNUon (ORCPT ); Thu, 14 Mar 2019 16:44:43 -0400 Received: by mail-it1-f195.google.com with SMTP id l139so7323005ita.5 for ; Thu, 14 Mar 2019 13:44:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=VBY/WCiVRIxbJHK0kuOFwZ092MB3IurzrK4hVX4OtTY=; b=JBpRAYhxAe886ZlTGumenA/bmp0aXfOIQRv7lYKKicY+m4mpSvszGd2WXd8whaex/7 J1C4bnT4YmPQhYQd2YofGQ5OkpMx1mJRMKbGyO/pigiTf5FyIUDSkQSgMr+Qzn8ax8+k tYzk9mrnSp3DfgTSA7CH8QreWOzJ0dKNiQBZ0Vs57ywQLEQK9153lKkHFnxlm6amY6U1 i+n3BuQddvNyBMqLMHB5V+zoNQgYJyoCG2DcPhp1aDlkr4vCYT2C+t07YxRgWjwUeF9p KhbFISCHUOvYQvV7cOtqPQLTOgJzJxwLZk62cILjC+2PmbbOXY+8dtYfNoyoT4cEsDrm dcNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=VBY/WCiVRIxbJHK0kuOFwZ092MB3IurzrK4hVX4OtTY=; b=T1/lgEG9GnWh0K0ijMXEV8MHiOFNBwpuJpiv3Yo0pGO4kwW/lY8/q6m2iQ4fkCS16H /gOKyOjWEeVnYFxrOk5NyMe0HadafBH1pa01c1A2XqJXEat+LUukitMD9aU00WvpLaPp OPSVE6E29Y68uGCYEKYic5mis26yIB3wDhK7apkG4BdpwW9SrIdUmOnwasBB7hBNwJwW hjG4SHN5iKpvQPP1E2zDNGEf8gKz8Ps/A6cmvU+X5otXdSu98TPONqSBqXDfLTP0Cywj DmwTODBzM0QVm18PQE4m9KgSAYVcfDf4/ZzHscy3ba/Yg+C4xiK/ZNHrkkInXs9yBwKy W3IQ== X-Gm-Message-State: APjAAAWVcsuLOsNGhPtMpuPzBGMj6JDLOq8K1SpvQAXF/Ml22Xxwe7ZU EHl671waX/KmJqpsGGV5fRCrdwE5Khw3SA== X-Google-Smtp-Source: APXvYqz19xrvsclOb7WZ+9n2HEubMwEqy2CZApJhX7+Iikb3e6RLK6cgj0IJz3vB5eoEJitWzMdKSQ== X-Received: by 2002:a02:4084:: with SMTP id n126mr150388jaa.78.1552596281319; Thu, 14 Mar 2019 13:44:41 -0700 (PDT) Received: from x1.thefacebook.com ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id w19sm1820538ita.33.2019.03.14.13.44.39 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 14 Mar 2019 13:44:40 -0700 (PDT) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: viro@ZenIV.linux.org.uk, Jens Axboe Subject: [PATCH 1/8] io_uring: use regular request ref counts Date: Thu, 14 Mar 2019 14:44:28 -0600 Message-Id: <20190314204435.7692-2-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190314204435.7692-1-axboe@kernel.dk> References: <20190314204435.7692-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Get rid of the special casing of "normal" requests not having any references to the io_kiocb. We initialize the ref count to 2, one for the submission side, and one or the completion side. Signed-off-by: Jens Axboe --- fs/io_uring.c | 48 ++++++++++++++++++++++++++++++------------------ 1 file changed, 30 insertions(+), 18 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 5d99376d2369..c54af70c72fd 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -411,7 +411,8 @@ static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx, req->ctx = ctx; req->flags = 0; - refcount_set(&req->refs, 0); + /* one is dropped after submission, the other at completion */ + refcount_set(&req->refs, 2); return req; out: io_ring_drop_ctx_refs(ctx, 1); @@ -429,10 +430,14 @@ static void io_free_req_many(struct io_ring_ctx *ctx, void **reqs, int *nr) static void io_free_req(struct io_kiocb *req) { - if (!refcount_read(&req->refs) || refcount_dec_and_test(&req->refs)) { - io_ring_drop_ctx_refs(req->ctx, 1); - kmem_cache_free(req_cachep, req); - } + io_ring_drop_ctx_refs(req->ctx, 1); + kmem_cache_free(req_cachep, req); +} + +static void io_put_req(struct io_kiocb *req) +{ + if (refcount_dec_and_test(&req->refs)) + io_free_req(req); } /* @@ -453,7 +458,8 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events, io_cqring_fill_event(ctx, req->user_data, req->error, 0); - reqs[to_free++] = req; + if (refcount_dec_and_test(&req->refs)) + reqs[to_free++] = req; (*nr_events)++; /* @@ -616,7 +622,7 @@ static void io_complete_rw(struct kiocb *kiocb, long res, long res2) io_fput(req); io_cqring_add_event(req->ctx, req->user_data, res, 0); - io_free_req(req); + io_put_req(req); } static void io_complete_rw_iopoll(struct kiocb *kiocb, long res, long res2) @@ -1083,7 +1089,7 @@ static int io_nop(struct io_kiocb *req, u64 user_data) io_fput(req); } io_cqring_add_event(ctx, user_data, err, 0); - io_free_req(req); + io_put_req(req); return 0; } @@ -1146,7 +1152,7 @@ static int io_fsync(struct io_kiocb *req, const struct io_uring_sqe *sqe, io_fput(req); io_cqring_add_event(req->ctx, sqe->user_data, ret, 0); - io_free_req(req); + io_put_req(req); return 0; } @@ -1204,7 +1210,7 @@ static int io_poll_remove(struct io_kiocb *req, const struct io_uring_sqe *sqe) spin_unlock_irq(&ctx->completion_lock); io_cqring_add_event(req->ctx, sqe->user_data, ret, 0); - io_free_req(req); + io_put_req(req); return 0; } @@ -1212,7 +1218,7 @@ static void io_poll_complete(struct io_kiocb *req, __poll_t mask) { io_cqring_add_event(req->ctx, req->user_data, mangle_poll(mask), 0); io_fput(req); - io_free_req(req); + io_put_req(req); } static void io_poll_complete_work(struct work_struct *work) @@ -1346,9 +1352,6 @@ static int io_poll_add(struct io_kiocb *req, const struct io_uring_sqe *sqe) INIT_LIST_HEAD(&poll->wait.entry); init_waitqueue_func_entry(&poll->wait, io_poll_wake); - /* one for removal from waitqueue, one for this function */ - refcount_set(&req->refs, 2); - mask = vfs_poll(poll->file, &ipt.pt) & poll->events; if (unlikely(!poll->head)) { /* we did not manage to set up a waitqueue, done */ @@ -1380,13 +1383,12 @@ static int io_poll_add(struct io_kiocb *req, const struct io_uring_sqe *sqe) * Drop one of our refs to this req, __io_submit_sqe() will * drop the other one since we're returning an error. */ - io_free_req(req); + io_put_req(req); return ipt.error; } if (mask) io_poll_complete(req, mask); - io_free_req(req); return 0; } @@ -1524,10 +1526,13 @@ static void io_sq_wq_submit_work(struct work_struct *work) break; cond_resched(); } while (1); + + /* drop submission reference */ + io_put_req(req); } if (ret) { io_cqring_add_event(ctx, sqe->user_data, ret, 0); - io_free_req(req); + io_put_req(req); } /* async context always use a copy of the sqe */ @@ -1617,6 +1622,7 @@ static bool io_add_to_prev_work(struct async_list *list, struct io_kiocb *req) static int io_submit_sqe(struct io_ring_ctx *ctx, struct sqe_submit *s, struct io_submit_state *state) { + bool did_submit = true; struct io_kiocb *req; ssize_t ret; @@ -1650,10 +1656,16 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct sqe_submit *s, queue_work(ctx->sqo_wq, &req->work); } ret = 0; + did_submit = false; } } + + /* drop submission reference, if we did the submit */ + if (did_submit) + io_put_req(req); + if (ret) - io_free_req(req); + io_put_req(req); return ret; } From patchwork Thu Mar 14 20:44:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10853559 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AADBC1874 for ; Thu, 14 Mar 2019 20:44:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9C8132A650 for ; Thu, 14 Mar 2019 20:44:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 90BFC2A654; Thu, 14 Mar 2019 20:44:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3D0912A650 for ; Thu, 14 Mar 2019 20:44:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727911AbfCNUop (ORCPT ); Thu, 14 Mar 2019 16:44:45 -0400 Received: from mail-io1-f66.google.com ([209.85.166.66]:45657 "EHLO mail-io1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727854AbfCNUop (ORCPT ); Thu, 14 Mar 2019 16:44:45 -0400 Received: by mail-io1-f66.google.com with SMTP id x9so6333792iog.12 for ; Thu, 14 Mar 2019 13:44:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=GkQgnR8Dxlw6eMCFxktvvmbSrLQd+vccM/+thHvNV8Q=; b=NoBNxZ84SlJzEyoWKoldOCW2KzUpVWnPoqMUxJ6JseiYkAmYhVl8JW8sMEB1Wner0i ZIBCvli9UGcTFzeFq3apVGHaJjnB3yzfPsqexr6Fx5L5kXbCLA/6t+PiUUjgVYAXuut9 TJXnDYjDJdApU2pKonijFJRFKff4CRz2hVNZ5lf2oS8cTT1SorZJL/ERkPdNzCjiy+9a CabYAq26cHo/OjShnoa3YOI9fWzmb7l0AQXCt5nyRh3/n7nJSpzzaWTwWU5Zg0CeH2yT LaSndNFnpm/KIG7fegjMs0K3lMlaZQE8/yF+OTD21ps5m2WSNetKwOzBWm2wSX8NDJyA odMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=GkQgnR8Dxlw6eMCFxktvvmbSrLQd+vccM/+thHvNV8Q=; b=eiunlHxVXDTY5s12dwycCDjyMIDVImMSlVZ92/DL2U1iHU5XpvRI1TqBfNRi+cqFNa mqbFHXfSRFYpYDKq0kzT5Q7PUS+RYAjBkWsz9pE/j13VcZLrBpogNQHUBHN2vTdnLsxg Ava1poNcs/BkbSqWup/1sJEhpQZFDnl/cnzo+HWmAhzYlh/dZpvuUcowjAN0CmF8zx/3 bmJgkLkRAoIrXaKT8/xXgaBfUjn6EX9cmp/CTNj4B4WeutEkje0ytHaEdZQ0+f1KmRa5 IAxCbhmRu1AWbxpXegIRanl7UCPNO629awzjjhKchLcP4yfFdt1M6UWpv4y47LHSVUvO ShLQ== X-Gm-Message-State: APjAAAXYkQCOBVey7Zguj+KTP7R61Oyy8wrjl4xbcRb60ZkTDG3gmb6H AQ9TUrQqsUcPEwjjc20+LelnhJfo/9R1ng== X-Google-Smtp-Source: APXvYqyWtlUZXThcHFyDK1NqpEfJMam9MaweeDgGJ6IgJ6dFXNoIHA5XgNbnkNwb9b0T93I/uq0UBQ== X-Received: by 2002:a5e:8212:: with SMTP id l18mr14345iom.67.1552596283757; Thu, 14 Mar 2019 13:44:43 -0700 (PDT) Received: from x1.thefacebook.com ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id w19sm1820538ita.33.2019.03.14.13.44.41 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 14 Mar 2019 13:44:41 -0700 (PDT) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: viro@ZenIV.linux.org.uk, Jens Axboe Subject: [PATCH 2/8] io_uring: make io_read/write return an integer Date: Thu, 14 Mar 2019 14:44:29 -0600 Message-Id: <20190314204435.7692-3-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190314204435.7692-1-axboe@kernel.dk> References: <20190314204435.7692-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The callers all convert to an integer, and we only return 0/-ERROR anyway. Signed-off-by: Jens Axboe --- fs/io_uring.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index c54af70c72fd..ecb93bb9d84e 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -893,7 +893,7 @@ static int io_import_iovec(struct io_ring_ctx *ctx, int rw, opcode = READ_ONCE(sqe->opcode); if (opcode == IORING_OP_READ_FIXED || opcode == IORING_OP_WRITE_FIXED) { - ssize_t ret = io_import_fixed(ctx, rw, sqe, iter); + int ret = io_import_fixed(ctx, rw, sqe, iter); *iovec = NULL; return ret; } @@ -951,15 +951,15 @@ static void io_async_list_note(int rw, struct io_kiocb *req, size_t len) async_list->io_end = io_end; } -static ssize_t io_read(struct io_kiocb *req, const struct sqe_submit *s, - bool force_nonblock, struct io_submit_state *state) +static int io_read(struct io_kiocb *req, const struct sqe_submit *s, + bool force_nonblock, struct io_submit_state *state) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *kiocb = &req->rw; struct iov_iter iter; struct file *file; size_t iov_count; - ssize_t ret; + int ret; ret = io_prep_rw(req, s, force_nonblock, state); if (ret) @@ -1004,15 +1004,15 @@ static ssize_t io_read(struct io_kiocb *req, const struct sqe_submit *s, return ret; } -static ssize_t io_write(struct io_kiocb *req, const struct sqe_submit *s, - bool force_nonblock, struct io_submit_state *state) +static int io_write(struct io_kiocb *req, const struct sqe_submit *s, + bool force_nonblock, struct io_submit_state *state) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *kiocb = &req->rw; struct iov_iter iter; struct file *file; size_t iov_count; - ssize_t ret; + int ret; ret = io_prep_rw(req, s, force_nonblock, state); if (ret) @@ -1396,8 +1396,7 @@ static int __io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, const struct sqe_submit *s, bool force_nonblock, struct io_submit_state *state) { - ssize_t ret; - int opcode; + int ret, opcode; if (unlikely(s->index >= ctx->sq_entries)) return -EINVAL; @@ -1624,7 +1623,7 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct sqe_submit *s, { bool did_submit = true; struct io_kiocb *req; - ssize_t ret; + int ret; /* enforce forwards compatibility on users */ if (unlikely(s->sqe->flags & ~IOSQE_FIXED_FILE)) From patchwork Thu Mar 14 20:44:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10853563 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8CC4C6C2 for ; Thu, 14 Mar 2019 20:44:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7D3BC2A650 for ; Thu, 14 Mar 2019 20:44:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 717A72A654; Thu, 14 Mar 2019 20:44:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 18ED72A650 for ; Thu, 14 Mar 2019 20:44:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727915AbfCNUor (ORCPT ); Thu, 14 Mar 2019 16:44:47 -0400 Received: from mail-it1-f194.google.com ([209.85.166.194]:55053 "EHLO mail-it1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727912AbfCNUor (ORCPT ); Thu, 14 Mar 2019 16:44:47 -0400 Received: by mail-it1-f194.google.com with SMTP id w18so6920986itj.4 for ; Thu, 14 Mar 2019 13:44:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=6aGgfBQgJx68zZgA7Gpk3ttCJRvX4djiuuqiwUtjZX4=; b=zr1JOoLAv36LoNKK4bcvAZSxtrMxrxa5ySaeDX03leFwFX0nWvtUMVlZ92m+weAX4x 5NJXOVlq9SwCe+7XNYs6L0bZcwoejjnCm4EKdFTA37FrUcGr/bQzDOhhOpnmTlNlxr6K LLQQOP+qKKpU94SCjgwTpHAnvF1Su78eGno7K1xJsQzEfEwXLXq7TBBBAaKxf25dYws2 S8PyLWcX+tGZ9Dr8VNm1nSAGdmzaJkjbk4iEq3N2E7eA3veFatbXS8r2dWppOUJla7Sb B5Cih+36uyKPp00vZQVSZvBxs61xaxyrcedg5QEuHyEGNoblVxa8fmcHxQBdjlZka7TJ gMww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=6aGgfBQgJx68zZgA7Gpk3ttCJRvX4djiuuqiwUtjZX4=; b=LffuAsZg+zqHuosay7NfIyP7cHvQuLkyqqSnfyvgzFr17DDbsP/Ew20JcjkrMQJ55C PEcB5Aq7RVL1riwPjkOtIEGXHd+kB95GbVeE8jb1NEl1Q7ymkv+3oVX+MGMrKyOb0/qU jaTaStOaoJkrELPqifU1Bbs6oaEDv7hSjtXcwofeVAcF+71SDlLvrDIuTs5vfu5/CpO8 uEroJx99r7x0S07Qw5aOsKAcS2xUVPl4A9KvJ3gT2+6rJ+hnkxknGOKR5xXZlGtFZDdK xOlGTO/URb4GUiBCLYgtw8WKfh9QvkXRya9DPA1C0kBakqYSq/qqgUn8L2BKhtzljRaX Gsog== X-Gm-Message-State: APjAAAWm8zRXsvoVuXz1AJBwy37PL26k+S2YAoj3dHATdTCzvcF20NGj LTmSg3ddG8TWC4QejgveWmnyjoPA7h+IPA== X-Google-Smtp-Source: APXvYqz76O6bFRt6AM0woI74PnUReqyn62Wd3fhf2tom6b253YrUnABg1uFkMAq50ASWnf4EcCRbOg== X-Received: by 2002:a02:3c07:: with SMTP id m7mr187046jaa.26.1552596285456; Thu, 14 Mar 2019 13:44:45 -0700 (PDT) Received: from x1.thefacebook.com ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id w19sm1820538ita.33.2019.03.14.13.44.43 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 14 Mar 2019 13:44:44 -0700 (PDT) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: viro@ZenIV.linux.org.uk, Jens Axboe Subject: [PATCH 3/8] io_uring: add prepped flag Date: Thu, 14 Mar 2019 14:44:30 -0600 Message-Id: <20190314204435.7692-4-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190314204435.7692-1-axboe@kernel.dk> References: <20190314204435.7692-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We currently use the fact that if ->ki_filp is already set, then we've done the prep. In preparation for moving the file assignment earlier, use a separate flag to tell whether the request has been prepped for IO or not. Signed-off-by: Jens Axboe --- fs/io_uring.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index ecb93bb9d84e..d8468594a483 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -214,6 +214,7 @@ struct io_kiocb { #define REQ_F_IOPOLL_COMPLETED 2 /* polled IO has completed */ #define REQ_F_FIXED_FILE 4 /* ctx owns file */ #define REQ_F_SEQ_PREV 8 /* sequential with previous */ +#define REQ_F_PREPPED 16 /* prep already done */ u64 user_data; u64 error; @@ -741,7 +742,7 @@ static int io_prep_rw(struct io_kiocb *req, const struct sqe_submit *s, int fd, ret; /* For -EAGAIN retry, everything is already prepped */ - if (kiocb->ki_filp) + if (req->flags & REQ_F_PREPPED) return 0; flags = READ_ONCE(sqe->flags); @@ -799,6 +800,7 @@ static int io_prep_rw(struct io_kiocb *req, const struct sqe_submit *s, } kiocb->ki_complete = io_complete_rw; } + req->flags |= REQ_F_PREPPED; return 0; out_fput: if (!(flags & IOSQE_FIXED_FILE)) { @@ -1099,8 +1101,8 @@ static int io_prep_fsync(struct io_kiocb *req, const struct io_uring_sqe *sqe) unsigned flags; int fd; - /* Prep already done */ - if (req->rw.ki_filp) + /* Prep already done (EAGAIN retry) */ + if (req->flags & REQ_F_PREPPED) return 0; if (unlikely(ctx->flags & IORING_SETUP_IOPOLL)) @@ -1122,6 +1124,7 @@ static int io_prep_fsync(struct io_kiocb *req, const struct io_uring_sqe *sqe) return -EBADF; } + req->flags |= REQ_F_PREPPED; return 0; } @@ -1633,8 +1636,6 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct sqe_submit *s, if (unlikely(!req)) return -EAGAIN; - req->rw.ki_filp = NULL; - ret = __io_submit_sqe(ctx, req, s, true, state); if (ret == -EAGAIN) { struct io_uring_sqe *sqe_copy; From patchwork Thu Mar 14 20:44:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10853565 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 335CC6C2 for ; Thu, 14 Mar 2019 20:44:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2120F2A652 for ; Thu, 14 Mar 2019 20:44:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 156952A658; Thu, 14 Mar 2019 20:44:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0F1CD2A652 for ; Thu, 14 Mar 2019 20:44:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727924AbfCNUot (ORCPT ); Thu, 14 Mar 2019 16:44:49 -0400 Received: from mail-it1-f195.google.com ([209.85.166.195]:52161 "EHLO mail-it1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727916AbfCNUot (ORCPT ); Thu, 14 Mar 2019 16:44:49 -0400 Received: by mail-it1-f195.google.com with SMTP id e24so6946066itl.1 for ; Thu, 14 Mar 2019 13:44:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=lbv2OInshtL5ZMEKoOBoCLd+hlK/j7knHFf5h1+VkbI=; b=H1kw3pk6HuArwug5x8mkTneH7xhNIINJ6qTdKdVbIo69ztbBGIPTTfeV+21dkfUWO/ XDS7yCoe/4IcL3cXhn9O/zKnNKdOhI5kMFSDtzjZpzCsYIJkX9UpKOrgmbZFyCYgiOl6 RhgQdiOgIU6KWWkEKgKsSTH+SH7gFYPv3x+NSa4vZq/XgNzOpTE0m7OMK2ppdb2B42hK 2IrFYh/Ll6Asg3FdtIOpW44MPpYsJ+SU0PXxHDlc2MyMKFOeA9WI0zy7X7uOT5AJGK8v JWf2Wp0mSeaBSxhwnZsgUbG+kQMzkDh4ksg/1j8WuZP/54jIaDNu5ryflr3TgB4jILC4 I9VQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=lbv2OInshtL5ZMEKoOBoCLd+hlK/j7knHFf5h1+VkbI=; b=MNdNa68eeVVP6JFkOdPbanP22Z+c18DH/COuPXFyWcXB1qE3EgrHOw6H4OeknO2a4i x+QI1ji5xx4ni9Mnwo1xLyKRCGNkVhe94MoDS4E3FK8J45o8dQNGhA1XvCMpGYYZpBUL 72I3ZfkYF4u+VRcJuq5pYOaGxoykhkN92/L5UC37jJrRqL3hkk/49wMvRaFEGf9EaVMB FMxQFROG8s99pj+Yxp5C9vsKowO50x53cECC3up8JTxy6/GhCMgx/1g1sxuyr6zNvUMJ 8mvkmfoQw4PBdnD4Gv6z8msUUqnVqcGH3HR1ap7/PvWY0wG7tbHZ91/uIgi4VMAY4SDn CXvg== X-Gm-Message-State: APjAAAVza16LZb8pNudrQOgS15fYEA0qWmpJYtFZ9kE9Cw1D9EgXeYIZ 5ZGw1ZGhsiR+MVGGAhM+ZDqH5jyRS79TUw== X-Google-Smtp-Source: APXvYqw5ViYDx/mur4Gh2oV/egL4S9xIj6m0Tgo59XfasWwKo6vevm0orzY34YIHyLNFWqQ7yP+mBw== X-Received: by 2002:a02:ec4:: with SMTP id 187mr138501jae.11.1552596287076; Thu, 14 Mar 2019 13:44:47 -0700 (PDT) Received: from x1.thefacebook.com ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id w19sm1820538ita.33.2019.03.14.13.44.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 14 Mar 2019 13:44:46 -0700 (PDT) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: viro@ZenIV.linux.org.uk, Jens Axboe Subject: [PATCH 4/8] io_uring: fix fget/fput handling Date: Thu, 14 Mar 2019 14:44:31 -0600 Message-Id: <20190314204435.7692-5-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190314204435.7692-1-axboe@kernel.dk> References: <20190314204435.7692-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This isn't a straight port of commit 84c4e1f89fef for aio.c, since io_uring doesn't use files in exactly the same way. But it's pretty close. See the commit message for that commit. This essentially fixes a use-after-free with the poll command handling, but it takes cue from Linus's approach to just simplifying the file handling. We move the setup of the file into a higher level location, so the individual commands don't have to deal with it. And then we release the reference when we free the associated io_kiocb. Fixes: 221c5eb23382 ("io_uring: add support for IORING_OP_POLL") Signed-off-by: Jens Axboe --- fs/io_uring.c | 201 +++++++++++++++++--------------------------------- 1 file changed, 67 insertions(+), 134 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index d8468594a483..f4fe9dce38ee 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -189,6 +189,10 @@ struct sqe_submit { bool needs_fixed_file; }; +/* + * First field must be the file pointer in all the + * iocb unions! See also 'struct kiocb' in + */ struct io_poll_iocb { struct file *file; struct wait_queue_head *head; @@ -198,8 +202,15 @@ struct io_poll_iocb { struct wait_queue_entry wait; }; +/* + * NOTE! Each of the iocb union members has the file pointer + * as the first entry in their struct definition. So you can + * access the file pointer through any of the sub-structs, + * or directly as just 'ki_filp' in this struct. + */ struct io_kiocb { union { + struct file *file; struct kiocb rw; struct io_poll_iocb poll; }; @@ -429,8 +440,16 @@ static void io_free_req_many(struct io_ring_ctx *ctx, void **reqs, int *nr) } } +static void io_fput(struct io_kiocb *req) +{ + if (!(req->flags & REQ_F_FIXED_FILE)) + fput(req->file); +} + static void io_free_req(struct io_kiocb *req) { + if (req->file) + io_fput(req); io_ring_drop_ctx_refs(req->ctx, 1); kmem_cache_free(req_cachep, req); } @@ -448,45 +467,34 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events, struct list_head *done) { void *reqs[IO_IOPOLL_BATCH]; - int file_count, to_free; - struct file *file = NULL; struct io_kiocb *req; + int to_free; - file_count = to_free = 0; + to_free = 0; while (!list_empty(done)) { req = list_first_entry(done, struct io_kiocb, list); list_del(&req->list); io_cqring_fill_event(ctx, req->user_data, req->error, 0); - - if (refcount_dec_and_test(&req->refs)) - reqs[to_free++] = req; (*nr_events)++; - /* - * Batched puts of the same file, to avoid dirtying the - * file usage count multiple times, if avoidable. - */ - if (!(req->flags & REQ_F_FIXED_FILE)) { - if (!file) { - file = req->rw.ki_filp; - file_count = 1; - } else if (file == req->rw.ki_filp) { - file_count++; + if (refcount_dec_and_test(&req->refs)) { + /* If we're not using fixed files, we have to pair the + * completion part with the file put. Use regular + * completions for those, only batch free for fixed + * file. + */ + if (req->flags & REQ_F_FIXED_FILE) { + reqs[to_free++] = req; + if (to_free == ARRAY_SIZE(reqs)) + io_free_req_many(ctx, reqs, &to_free); } else { - fput_many(file, file_count); - file = req->rw.ki_filp; - file_count = 1; + io_free_req(req); } } - - if (to_free == ARRAY_SIZE(reqs)) - io_free_req_many(ctx, reqs, &to_free); } - io_commit_cqring(ctx); - if (file) - fput_many(file, file_count); + io_commit_cqring(ctx); io_free_req_many(ctx, reqs, &to_free); } @@ -609,19 +617,12 @@ static void kiocb_end_write(struct kiocb *kiocb) } } -static void io_fput(struct io_kiocb *req) -{ - if (!(req->flags & REQ_F_FIXED_FILE)) - fput(req->rw.ki_filp); -} - static void io_complete_rw(struct kiocb *kiocb, long res, long res2) { struct io_kiocb *req = container_of(kiocb, struct io_kiocb, rw); kiocb_end_write(kiocb); - io_fput(req); io_cqring_add_event(req->ctx, req->user_data, res, 0); io_put_req(req); } @@ -738,31 +739,16 @@ static int io_prep_rw(struct io_kiocb *req, const struct sqe_submit *s, const struct io_uring_sqe *sqe = s->sqe; struct io_ring_ctx *ctx = req->ctx; struct kiocb *kiocb = &req->rw; - unsigned ioprio, flags; - int fd, ret; + unsigned ioprio; + int ret; /* For -EAGAIN retry, everything is already prepped */ if (req->flags & REQ_F_PREPPED) return 0; - flags = READ_ONCE(sqe->flags); - fd = READ_ONCE(sqe->fd); + if (force_nonblock && !io_file_supports_async(req->file)) + force_nonblock = false; - if (flags & IOSQE_FIXED_FILE) { - if (unlikely(!ctx->user_files || - (unsigned) fd >= ctx->nr_user_files)) - return -EBADF; - kiocb->ki_filp = ctx->user_files[fd]; - req->flags |= REQ_F_FIXED_FILE; - } else { - if (s->needs_fixed_file) - return -EBADF; - kiocb->ki_filp = io_file_get(state, fd); - if (unlikely(!kiocb->ki_filp)) - return -EBADF; - if (force_nonblock && !io_file_supports_async(kiocb->ki_filp)) - force_nonblock = false; - } kiocb->ki_pos = READ_ONCE(sqe->off); kiocb->ki_flags = iocb_flags(kiocb->ki_filp); kiocb->ki_hint = ki_hint_validate(file_write_hint(kiocb->ki_filp)); @@ -771,7 +757,7 @@ static int io_prep_rw(struct io_kiocb *req, const struct sqe_submit *s, if (ioprio) { ret = ioprio_check_cap(ioprio); if (ret) - goto out_fput; + return ret; kiocb->ki_ioprio = ioprio; } else @@ -779,39 +765,26 @@ static int io_prep_rw(struct io_kiocb *req, const struct sqe_submit *s, ret = kiocb_set_rw_flags(kiocb, READ_ONCE(sqe->rw_flags)); if (unlikely(ret)) - goto out_fput; + return ret; if (force_nonblock) { kiocb->ki_flags |= IOCB_NOWAIT; req->flags |= REQ_F_FORCE_NONBLOCK; } if (ctx->flags & IORING_SETUP_IOPOLL) { - ret = -EOPNOTSUPP; if (!(kiocb->ki_flags & IOCB_DIRECT) || !kiocb->ki_filp->f_op->iopoll) - goto out_fput; + return -EOPNOTSUPP; req->error = 0; kiocb->ki_flags |= IOCB_HIPRI; kiocb->ki_complete = io_complete_rw_iopoll; } else { - if (kiocb->ki_flags & IOCB_HIPRI) { - ret = -EINVAL; - goto out_fput; - } + if (kiocb->ki_flags & IOCB_HIPRI) + return -EINVAL; kiocb->ki_complete = io_complete_rw; } req->flags |= REQ_F_PREPPED; return 0; -out_fput: - if (!(flags & IOSQE_FIXED_FILE)) { - /* - * in case of error, we didn't use this file reference. drop it. - */ - if (state) - state->used_refs--; - io_file_put(state, kiocb->ki_filp); - } - return ret; } static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret) @@ -968,16 +941,14 @@ static int io_read(struct io_kiocb *req, const struct sqe_submit *s, return ret; file = kiocb->ki_filp; - ret = -EBADF; if (unlikely(!(file->f_mode & FMODE_READ))) - goto out_fput; - ret = -EINVAL; + return -EBADF; if (unlikely(!file->f_op->read_iter)) - goto out_fput; + return -EINVAL; ret = io_import_iovec(req->ctx, READ, s, &iovec, &iter); if (ret) - goto out_fput; + return ret; iov_count = iov_iter_count(&iter); ret = rw_verify_area(READ, file, &kiocb->ki_pos, iov_count); @@ -999,10 +970,6 @@ static int io_read(struct io_kiocb *req, const struct sqe_submit *s, } } kfree(iovec); -out_fput: - /* Hold on to the file for -EAGAIN */ - if (unlikely(ret && ret != -EAGAIN)) - io_fput(req); return ret; } @@ -1020,17 +987,15 @@ static int io_write(struct io_kiocb *req, const struct sqe_submit *s, if (ret) return ret; - ret = -EBADF; file = kiocb->ki_filp; if (unlikely(!(file->f_mode & FMODE_WRITE))) - goto out_fput; - ret = -EINVAL; + return -EBADF; if (unlikely(!file->f_op->write_iter)) - goto out_fput; + return -EINVAL; ret = io_import_iovec(req->ctx, WRITE, s, &iovec, &iter); if (ret) - goto out_fput; + return ret; iov_count = iov_iter_count(&iter); @@ -1062,10 +1027,6 @@ static int io_write(struct io_kiocb *req, const struct sqe_submit *s, } out_free: kfree(iovec); -out_fput: - /* Hold on to the file for -EAGAIN */ - if (unlikely(ret && ret != -EAGAIN)) - io_fput(req); return ret; } @@ -1080,16 +1041,6 @@ static int io_nop(struct io_kiocb *req, u64 user_data) if (unlikely(ctx->flags & IORING_SETUP_IOPOLL)) return -EINVAL; - /* - * Twilight zone - it's possible that someone issued an opcode that - * has a file attached, then got -EAGAIN on submission, and changed - * the sqe before we retried it from async context. Avoid dropping - * a file reference for this malicious case, and flag the error. - */ - if (req->rw.ki_filp) { - err = -EBADF; - io_fput(req); - } io_cqring_add_event(ctx, user_data, err, 0); io_put_req(req); return 0; @@ -1098,8 +1049,6 @@ static int io_nop(struct io_kiocb *req, u64 user_data) static int io_prep_fsync(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_ring_ctx *ctx = req->ctx; - unsigned flags; - int fd; /* Prep already done (EAGAIN retry) */ if (req->flags & REQ_F_PREPPED) @@ -1110,20 +1059,6 @@ static int io_prep_fsync(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (unlikely(sqe->addr || sqe->ioprio || sqe->buf_index)) return -EINVAL; - fd = READ_ONCE(sqe->fd); - flags = READ_ONCE(sqe->flags); - - if (flags & IOSQE_FIXED_FILE) { - if (unlikely(!ctx->user_files || fd >= ctx->nr_user_files)) - return -EBADF; - req->rw.ki_filp = ctx->user_files[fd]; - req->flags |= REQ_F_FIXED_FILE; - } else { - req->rw.ki_filp = fget(fd); - if (unlikely(!req->rw.ki_filp)) - return -EBADF; - } - req->flags |= REQ_F_PREPPED; return 0; } @@ -1153,7 +1088,6 @@ static int io_fsync(struct io_kiocb *req, const struct io_uring_sqe *sqe, end > 0 ? end : LLONG_MAX, fsync_flags & IORING_FSYNC_DATASYNC); - io_fput(req); io_cqring_add_event(req->ctx, sqe->user_data, ret, 0); io_put_req(req); return 0; @@ -1220,7 +1154,6 @@ static int io_poll_remove(struct io_kiocb *req, const struct io_uring_sqe *sqe) static void io_poll_complete(struct io_kiocb *req, __poll_t mask) { io_cqring_add_event(req->ctx, req->user_data, mangle_poll(mask), 0); - io_fput(req); io_put_req(req); } @@ -1314,10 +1247,8 @@ static int io_poll_add(struct io_kiocb *req, const struct io_uring_sqe *sqe) struct io_poll_iocb *poll = &req->poll; struct io_ring_ctx *ctx = req->ctx; struct io_poll_table ipt; - unsigned flags; __poll_t mask; u16 events; - int fd; if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL)) return -EINVAL; @@ -1328,20 +1259,6 @@ static int io_poll_add(struct io_kiocb *req, const struct io_uring_sqe *sqe) events = READ_ONCE(sqe->poll_events); poll->events = demangle_poll(events) | EPOLLERR | EPOLLHUP; - flags = READ_ONCE(sqe->flags); - fd = READ_ONCE(sqe->fd); - - if (flags & IOSQE_FIXED_FILE) { - if (unlikely(!ctx->user_files || fd >= ctx->nr_user_files)) - return -EBADF; - poll->file = ctx->user_files[fd]; - req->flags |= REQ_F_FIXED_FILE; - } else { - poll->file = fget(fd); - } - if (unlikely(!poll->file)) - return -EBADF; - poll->head = NULL; poll->woken = false; poll->canceled = false; @@ -1380,8 +1297,6 @@ static int io_poll_add(struct io_kiocb *req, const struct io_uring_sqe *sqe) out: if (unlikely(ipt.error)) { - if (!(flags & IOSQE_FIXED_FILE)) - fput(poll->file); /* * Drop one of our refs to this req, __io_submit_sqe() will * drop the other one since we're returning an error. @@ -1626,7 +1541,8 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct sqe_submit *s, { bool did_submit = true; struct io_kiocb *req; - int ret; + unsigned flags; + int ret, fd; /* enforce forwards compatibility on users */ if (unlikely(s->sqe->flags & ~IOSQE_FIXED_FILE)) @@ -1636,6 +1552,23 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct sqe_submit *s, if (unlikely(!req)) return -EAGAIN; + flags = READ_ONCE(s->sqe->flags); + fd = READ_ONCE(s->sqe->fd); + + if (flags & IOSQE_FIXED_FILE) { + if (unlikely(!ctx->user_files || + (unsigned) fd >= ctx->nr_user_files)) + return -EBADF; + req->file = ctx->user_files[fd]; + req->flags |= REQ_F_FIXED_FILE; + } else { + if (s->needs_fixed_file) + return -EBADF; + req->file = io_file_get(state, fd); + if (unlikely(!req->file)) + return -EBADF; + } + ret = __io_submit_sqe(ctx, req, s, true, state); if (ret == -EAGAIN) { struct io_uring_sqe *sqe_copy; From patchwork Thu Mar 14 20:44:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10853571 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 21BBA6C2 for ; Thu, 14 Mar 2019 20:44:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1194A2A650 for ; Thu, 14 Mar 2019 20:44:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 05E7E2A654; Thu, 14 Mar 2019 20:44:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5986F2A652 for ; Thu, 14 Mar 2019 20:44:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727925AbfCNUov (ORCPT ); Thu, 14 Mar 2019 16:44:51 -0400 Received: from mail-it1-f194.google.com ([209.85.166.194]:52164 "EHLO mail-it1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727919AbfCNUou (ORCPT ); Thu, 14 Mar 2019 16:44:50 -0400 Received: by mail-it1-f194.google.com with SMTP id e24so6946154itl.1 for ; Thu, 14 Mar 2019 13:44:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=TRH2veoxBrMHbgwT8WykbY7Ezh04BGLnBv6N+nQQnco=; b=Oq7jVHzESlhUA2GzDYpQPEa36K1Y5HCmnJkmPKTXb5/P5LIRVn64oyPY3Jkwa22Bn2 oWLrWI1Zc8TgnOT043f7QkP74rsQeOgSNxHIkNHTUfGkHcla1BiOD6yFl/v4GVEXvQOx CqeRlls0r2M0Iu5+3hBSLB1NQhNCOqbTHQay8q7r6/a21N/7Gya1Js0Vk0EKINlGFb+/ UwPpYJgYvbLLwnG0insFnSKd7XOm1x430yNSHDefZHZKpsK/dUKeO4dYCnthPCqEwFt7 StEF2EENbCcMTcp+UR3luVWvu0rYUNVz/pdpaf9/qnyqYMNj9+MCZxG2iOmulHPgkiD8 hoSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=TRH2veoxBrMHbgwT8WykbY7Ezh04BGLnBv6N+nQQnco=; b=r1FaWqTQ0e1V/Mu6VaGRiwSxqvS2dooG5xiFsbdgR4HkJJEGC9m+wR5+InFq2X+Ss4 zwZTsFdD0WuP8bBHzKXYBLUFRxASOu3dOmiH1bL0tImX3mqj0Gk0pWo0vxXmESQWLO5B BUNHgAD8MzlPrMoxyr9st9SBjVQhNwT89GS/jnCfZpRMdZplUaojIMaeUjhvmdL5zt92 JSxTejO1d62Iuygokw6AiJF8QeNfcMUemJ5UiXCEn5JZIa9oUAUD+LtJ29oZ2S8bfGR9 wybLVYr45FoICcS1Rlcfbg8OxiNQJjWA3SB+nrIYlGJfvtQDQAb8it9DxRthk06+xwpS 1OBw== X-Gm-Message-State: APjAAAWyuW/cuWmrIvYdK08YorWNobZlH+yPqiVj5S952xBSNB7RDUlZ +DENeZwq7TFggIL7TPTTdRpG0BVNi6Q1MA== X-Google-Smtp-Source: APXvYqxt+2AgpprtT5KGTHGE4S6x2GCJ73/aj7N8qXdi8C4gBNqnMMWL2pzDiaKWssD5ZqSN4VS5pA== X-Received: by 2002:a24:bdcc:: with SMTP id x195mr287373ite.149.1552596288447; Thu, 14 Mar 2019 13:44:48 -0700 (PDT) Received: from x1.thefacebook.com ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id w19sm1820538ita.33.2019.03.14.13.44.47 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 14 Mar 2019 13:44:47 -0700 (PDT) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: viro@ZenIV.linux.org.uk, Jens Axboe Subject: [PATCH 5/8] io_uring: fix poll races Date: Thu, 14 Mar 2019 14:44:32 -0600 Message-Id: <20190314204435.7692-6-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190314204435.7692-1-axboe@kernel.dk> References: <20190314204435.7692-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is a straight port of Al's fix for the aio poll implementation, since the io_uring version is heavily based on that. The below description is almost straight from that patch, just modified to fit the io_uring situation. io_poll() has to cope with several unpleasant problems: * requests that might stay around indefinitely need to be made visible for io_cancel(2); that must not be done to a request already completed, though. * in cases when ->poll() has placed us on a waitqueue, wakeup might have happened (and request completed) before ->poll() returns. * worse, in some early wakeup cases request might end up re-added into the queue later - we can't treat "woken up and currently not in the queue" as "it's not going to stick around indefinitely" * ... moreover, ->poll() might have decided not to put it on any queues to start with, and that needs to be distinguished from the previous case * ->poll() might have tried to put us on more than one queue. Only the first will succeed for io poll, so we might end up missing wakeups. OTOH, we might very well notice that only after the wakeup hits and request gets completed (all before ->poll() gets around to the second poll_wait()). In that case it's too late to decide that we have an error. req->woken was an attempt to deal with that. Unfortunately, it was broken. What we need to keep track of is not that wakeup has happened - the thing might come back after that. It's that async reference is already gone and won't come back, so we can't (and needn't) put the request on the list of cancellables. The easiest case is "request hadn't been put on any waitqueues"; we can tell by seeing NULL apt.head, and in that case there won't be anything async. We should either complete the request ourselves (if vfs_poll() reports anything of interest) or return an error. In all other cases we get exclusion with wakeups by grabbing the queue lock. If request is currently on queue and we have something interesting from vfs_poll(), we can steal it and complete the request ourselves. If it's on queue and vfs_poll() has not reported anything interesting, we either put it on the cancellable list, or, if we know that it hadn't been put on all queues ->poll() wanted it on, we steal it and return an error. If it's _not_ on queue, it's either been already dealt with (in which case we do nothing), or there's io_poll_complete_work() about to be executed. In that case we either put it on the cancellable list, or, if we know it hadn't been put on all queues ->poll() wanted it on, simulate what cancel would've done. Fixes: 221c5eb23382 ("io_uring: add support for IORING_OP_POLL") Signed-off-by: Jens Axboe --- fs/io_uring.c | 110 +++++++++++++++++++++++++------------------------- 1 file changed, 54 insertions(+), 56 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index f4fe9dce38ee..46cf38b8d863 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -197,7 +197,7 @@ struct io_poll_iocb { struct file *file; struct wait_queue_head *head; __poll_t events; - bool woken; + bool done; bool canceled; struct wait_queue_entry wait; }; @@ -367,20 +367,25 @@ static void io_cqring_fill_event(struct io_ring_ctx *ctx, u64 ki_user_data, } } -static void io_cqring_add_event(struct io_ring_ctx *ctx, u64 ki_user_data, +static void io_cqring_ev_posted(struct io_ring_ctx *ctx) +{ + if (waitqueue_active(&ctx->wait)) + wake_up(&ctx->wait); + if (waitqueue_active(&ctx->sqo_wait)) + wake_up(&ctx->sqo_wait); +} + +static void io_cqring_add_event(struct io_ring_ctx *ctx, u64 user_data, long res, unsigned ev_flags) { unsigned long flags; spin_lock_irqsave(&ctx->completion_lock, flags); - io_cqring_fill_event(ctx, ki_user_data, res, ev_flags); + io_cqring_fill_event(ctx, user_data, res, ev_flags); io_commit_cqring(ctx); spin_unlock_irqrestore(&ctx->completion_lock, flags); - if (waitqueue_active(&ctx->wait)) - wake_up(&ctx->wait); - if (waitqueue_active(&ctx->sqo_wait)) - wake_up(&ctx->sqo_wait); + io_cqring_ev_posted(ctx); } static void io_ring_drop_ctx_refs(struct io_ring_ctx *ctx, unsigned refs) @@ -1151,10 +1156,13 @@ static int io_poll_remove(struct io_kiocb *req, const struct io_uring_sqe *sqe) return 0; } -static void io_poll_complete(struct io_kiocb *req, __poll_t mask) +static void io_poll_complete(struct io_ring_ctx *ctx, struct io_kiocb *req, + __poll_t mask) { - io_cqring_add_event(req->ctx, req->user_data, mangle_poll(mask), 0); - io_put_req(req); + req->poll.done = true; + io_cqring_fill_event(ctx, req->user_data, mangle_poll(mask), 0); + io_commit_cqring(ctx); + io_cqring_ev_posted(ctx); } static void io_poll_complete_work(struct work_struct *work) @@ -1182,9 +1190,10 @@ static void io_poll_complete_work(struct work_struct *work) return; } list_del_init(&req->list); + io_poll_complete(ctx, req, mask); spin_unlock_irq(&ctx->completion_lock); - io_poll_complete(req, mask); + io_put_req(req); } static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync, @@ -1195,29 +1204,23 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync, struct io_kiocb *req = container_of(poll, struct io_kiocb, poll); struct io_ring_ctx *ctx = req->ctx; __poll_t mask = key_to_poll(key); - - poll->woken = true; + unsigned long flags; /* for instances that support it check for an event match first: */ - if (mask) { - unsigned long flags; - - if (!(mask & poll->events)) - return 0; + if (mask && !(mask & poll->events)) + return 0; - /* try to complete the iocb inline if we can: */ - if (spin_trylock_irqsave(&ctx->completion_lock, flags)) { - list_del(&req->list); - spin_unlock_irqrestore(&ctx->completion_lock, flags); + list_del_init(&poll->wait.entry); - list_del_init(&poll->wait.entry); - io_poll_complete(req, mask); - return 1; - } + if (mask && spin_trylock_irqsave(&ctx->completion_lock, flags)) { + list_del(&req->list); + io_poll_complete(ctx, req, mask); + spin_unlock_irqrestore(&ctx->completion_lock, flags); + io_put_req(req); + } else { + queue_work(ctx->sqo_wq, &req->work); } - list_del_init(&poll->wait.entry); - queue_work(ctx->sqo_wq, &req->work); return 1; } @@ -1247,6 +1250,7 @@ static int io_poll_add(struct io_kiocb *req, const struct io_uring_sqe *sqe) struct io_poll_iocb *poll = &req->poll; struct io_ring_ctx *ctx = req->ctx; struct io_poll_table ipt; + bool cancel = false; __poll_t mask; u16 events; @@ -1260,7 +1264,7 @@ static int io_poll_add(struct io_kiocb *req, const struct io_uring_sqe *sqe) poll->events = demangle_poll(events) | EPOLLERR | EPOLLHUP; poll->head = NULL; - poll->woken = false; + poll->done = false; poll->canceled = false; ipt.pt._qproc = io_poll_queue_proc; @@ -1273,41 +1277,35 @@ static int io_poll_add(struct io_kiocb *req, const struct io_uring_sqe *sqe) init_waitqueue_func_entry(&poll->wait, io_poll_wake); mask = vfs_poll(poll->file, &ipt.pt) & poll->events; - if (unlikely(!poll->head)) { - /* we did not manage to set up a waitqueue, done */ - goto out; - } spin_lock_irq(&ctx->completion_lock); - spin_lock(&poll->head->lock); - if (poll->woken) { - /* wake_up context handles the rest */ - mask = 0; + if (likely(poll->head)) { + spin_lock(&poll->head->lock); + if (unlikely(list_empty(&poll->wait.entry))) { + if (ipt.error) + cancel = true; + ipt.error = 0; + mask = 0; + } + if (mask || ipt.error) + list_del_init(&poll->wait.entry); + else if (cancel) + WRITE_ONCE(poll->canceled, true); + else if (!poll->done) /* actually waiting for an event */ + list_add_tail(&req->list, &ctx->cancel_list); + spin_unlock(&poll->head->lock); + } + if (mask) { /* no async, we'd stolen it */ + req->error = mangle_poll(mask); ipt.error = 0; - } else if (mask || ipt.error) { - /* if we get an error or a mask we are done */ - WARN_ON_ONCE(list_empty(&poll->wait.entry)); - list_del_init(&poll->wait.entry); - } else { - /* actually waiting for an event */ - list_add_tail(&req->list, &ctx->cancel_list); } - spin_unlock(&poll->head->lock); spin_unlock_irq(&ctx->completion_lock); -out: - if (unlikely(ipt.error)) { - /* - * Drop one of our refs to this req, __io_submit_sqe() will - * drop the other one since we're returning an error. - */ + if (mask) { + io_poll_complete(ctx, req, mask); io_put_req(req); - return ipt.error; } - - if (mask) - io_poll_complete(req, mask); - return 0; + return ipt.error; } static int __io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, From patchwork Thu Mar 14 20:44:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10853575 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4ACC36C2 for ; Thu, 14 Mar 2019 20:44:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3BD0E2A650 for ; Thu, 14 Mar 2019 20:44:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 305362A654; Thu, 14 Mar 2019 20:44:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D24B72A650 for ; Thu, 14 Mar 2019 20:44:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727933AbfCNUow (ORCPT ); Thu, 14 Mar 2019 16:44:52 -0400 Received: from mail-it1-f193.google.com ([209.85.166.193]:38067 "EHLO mail-it1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727912AbfCNUow (ORCPT ); Thu, 14 Mar 2019 16:44:52 -0400 Received: by mail-it1-f193.google.com with SMTP id k193so7345629ita.3 for ; Thu, 14 Mar 2019 13:44:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=G1Jr4tdYvPOZT9iN+F+fcErmHVHIuOwHcm6qKgjAwBo=; b=mK0amnxSw+2w7/yYNc0Fn3k7i/v89nbwzu9S/duk9WFSE2unT65ayI4ziOSA0daRZj 5EyuCl5km3boOh0/Hzeq9kycjMPvvVsidcoJPQXFjp+2CdqY2Ucul0kPcihjIWCIQBdq g/WRJqnikUxfxEf1/xQey94oIDFkVBHZN8vYc62c5N0Th+7dnynkkG+IvoXGKChBwEWg zRUvSWXNOgr/tCi5PiXR9FCFvRRWXrBc3KmEUnXTGyDv93SWd6ByOShVVO02JR9v4TMD rvdT+XUoWdW9lW3U64HXakV5At0rnlVyc9P1PBaxm0K9uhZLYtQFwyN7K+UKfQTWlg9f lEcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=G1Jr4tdYvPOZT9iN+F+fcErmHVHIuOwHcm6qKgjAwBo=; b=ngarTRYquVPs8fQClNO2NzC3UDwyt/t1Ejtb3PB1NMIsODD08VoCJTspFRuF94O+RE c8hK3/0oyOoU6BrNcmPqYXqvSF1vGhcdI310Mr6F5Jq2/pNNLf/hbH0wf4IXeTejWJfC xNz9qckwiHqkFqLccQiErA9VVG2pUgAheFoIYWuOL9LK0La4Rqx50rCe8gIMARqvLQPt dgsEH+oSm1vvZlaE9x5rthTQqazhQjvL1je6qeNXyxeHLhBfI4kP3DnU28IrFMPleqvG 1QiUoq4jcrqqo6FZ3DoBpX+QbbUAXwIoc9SWlr2RJOZj5rhxxsakoIJnKkxcdVt6Qj5l qJNg== X-Gm-Message-State: APjAAAUu0GxPJx1OmOY11iGVWW/fq71veNBOmpYKQdff7yMqvGVW0lVE MQezOottSONZcvOGm8efxBXpWJHd228eKw== X-Google-Smtp-Source: APXvYqzt+rpMkTwzHoW03akfTRgMxucZclVWsAIApuPMV1CmRFtFmFbx+ypa4YpGHxFc8uJxgXNlKg== X-Received: by 2002:a05:660c:6c3:: with SMTP id z3mr291942itk.83.1552596290372; Thu, 14 Mar 2019 13:44:50 -0700 (PDT) Received: from x1.thefacebook.com ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id w19sm1820538ita.33.2019.03.14.13.44.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 14 Mar 2019 13:44:49 -0700 (PDT) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: viro@ZenIV.linux.org.uk, Jens Axboe Subject: [PATCH 6/8] iov_iter: add ITER_BVEC_FLAG_NO_REF flag Date: Thu, 14 Mar 2019 14:44:33 -0600 Message-Id: <20190314204435.7692-7-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190314204435.7692-1-axboe@kernel.dk> References: <20190314204435.7692-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP For ITER_BVEC, if we're holding on to kernel pages, the caller doesn't need to grab a reference to the bvec pages, and drop that same reference on IO completion. This is essentially safe for any ITER_BVEC, but some use cases end up reusing pages and uncondtionally dropping a page reference on completion. And example of that is sendfile(2), that ends up being a splice_in + splice_out on the pipe pages. Add a flag that tells us it's fine to not grab a page reference to the bvec pages, since that caller knows not to drop a reference when it's done with the pages. Signed-off-by: Jens Axboe --- fs/io_uring.c | 3 +++ include/linux/uio.h | 24 +++++++++++++++++++----- 2 files changed, 22 insertions(+), 5 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 46cf38b8d863..b89f91ab6e32 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -850,6 +850,9 @@ static int io_import_fixed(struct io_ring_ctx *ctx, int rw, iov_iter_bvec(iter, rw, imu->bvec, imu->nr_bvecs, offset + len); if (offset) iov_iter_advance(iter, offset); + + /* don't drop a reference to these pages */ + iter->type |= ITER_BVEC_FLAG_NO_REF; return 0; } diff --git a/include/linux/uio.h b/include/linux/uio.h index ecf584f6b82d..4e926641fa80 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -23,14 +23,23 @@ struct kvec { }; enum iter_type { - ITER_IOVEC = 0, - ITER_KVEC = 2, - ITER_BVEC = 4, - ITER_PIPE = 8, - ITER_DISCARD = 16, + /* set if ITER_BVEC doesn't hold a bv_page ref */ + ITER_BVEC_FLAG_NO_REF = 2, + + /* iter types */ + ITER_IOVEC = 4, + ITER_KVEC = 8, + ITER_BVEC = 16, + ITER_PIPE = 32, + ITER_DISCARD = 64, }; struct iov_iter { + /* + * Bit 0 is the read/write bit, set if we're writing. + * Bit 1 is the BVEC_FLAG_NO_REF bit, set if type is a bvec and + * the caller isn't expecting to drop a page reference when done. + */ unsigned int type; size_t iov_offset; size_t count; @@ -84,6 +93,11 @@ static inline unsigned char iov_iter_rw(const struct iov_iter *i) return i->type & (READ | WRITE); } +static inline bool iov_iter_bvec_no_ref(const struct iov_iter *i) +{ + return (i->type & ITER_BVEC_FLAG_NO_REF) != 0; +} + /* * Total number of bytes covered by an iovec. * From patchwork Thu Mar 14 20:44:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10853579 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AA39A1874 for ; Thu, 14 Mar 2019 20:44:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9ACC02A650 for ; Thu, 14 Mar 2019 20:44:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8EB772A654; Thu, 14 Mar 2019 20:44:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1B9EE2A650 for ; Thu, 14 Mar 2019 20:44:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727935AbfCNUoy (ORCPT ); Thu, 14 Mar 2019 16:44:54 -0400 Received: from mail-it1-f195.google.com ([209.85.166.195]:38071 "EHLO mail-it1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727931AbfCNUox (ORCPT ); Thu, 14 Mar 2019 16:44:53 -0400 Received: by mail-it1-f195.google.com with SMTP id k193so7345759ita.3 for ; Thu, 14 Mar 2019 13:44:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=gN840LcjsGueYPKzT8V5G3GCmhFsVue5RDrc94TYwF8=; b=BY73Nqku9pDd7Tt63H0nWva/+qC+nYYLeIseswb6XjfgcsuxlekYjhlpcwqcubu04C 89LhMs8JojYWO/bKAS+ygqaWuIx66BKsoP0reJH57vFZuVvZE24Mfh86Mh0hFdmKCRh4 mEaPRawGIA0MPhYRGjyOvL2EhKQbwrFS+D3woXxK29xEGLhhLgrsP6cRy3K7tEDQZ7+H 81wpotgaL3p2DPdc8NvUTzWKsCVdUCZUojHVobfcH30LChV3apU217xKELqyMEXhf8UG ukKvjw+4fkla21vDFVsH0E6B3VM34X9832lmiv6FZ7lEoFa5eaqU6ETnFKUXYnsYlyeJ XR3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=gN840LcjsGueYPKzT8V5G3GCmhFsVue5RDrc94TYwF8=; b=pxmttQTrVS3w1mS3uTKq68jcz1J7AaCIP0GBx7xkrxEYjtgOJ6oHSWZtJNB2nqa07D /5iWEx4NCInzZdVi4bIGzH5aVVWWAnCQdlHCTMoe70i5l8ITnrEJS5kO9EwzZqcgL8QO +XSNSxLTaf+Iolej7rcxns9czaYNNBFp95zhfqS9WKL/PlIuebLp6d/vtXpE9IIi2n/u SBLCoC43HwD+qzcVgKnfDTxEgSN74zOLJVzZYBwcoMqVe8QBt6YwkYgUY1aWcKHQfVpk d6R15sPtbfYY6k6AqpEMPQdYmV/CfnaFQ9b0m4ifkLJUuadPZE/QXfSPwpGTl07Sz8eZ Z58A== X-Gm-Message-State: APjAAAWkTS64eYFgQyLCCzAOflGRN8u1SdMML6jVYZWfCnrbsXit7lOR mthhHdqqrsSwlFNE5AccWNzlAO35znE05A== X-Google-Smtp-Source: APXvYqxgiKVBxGm8tkS8ryVboSIYwzaJFzcb+ybnacq8EPE3WSLep5bL0V1suqgL7zogmL3Jm1pySQ== X-Received: by 2002:a24:f644:: with SMTP id u65mr334070ith.162.1552596291893; Thu, 14 Mar 2019 13:44:51 -0700 (PDT) Received: from x1.thefacebook.com ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id w19sm1820538ita.33.2019.03.14.13.44.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 14 Mar 2019 13:44:50 -0700 (PDT) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: viro@ZenIV.linux.org.uk, Jens Axboe Subject: [PATCH 7/8] block: add BIO_NO_PAGE_REF flag Date: Thu, 14 Mar 2019 14:44:34 -0600 Message-Id: <20190314204435.7692-8-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190314204435.7692-1-axboe@kernel.dk> References: <20190314204435.7692-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If bio_iov_iter_get_pages() is called on an iov_iter that is flagged with NO_REF, then we don't need to add a page reference for the pages that we add. Add BIO_NO_PAGE_REF to track this in the bio, so IO completion knows not to drop a reference to these pages. Signed-off-by: Jens Axboe --- block/bio.c | 43 ++++++++++++++++++++++----------------- fs/block_dev.c | 12 ++++++----- fs/iomap.c | 12 ++++++----- include/linux/blk_types.h | 1 + 4 files changed, 39 insertions(+), 29 deletions(-) diff --git a/block/bio.c b/block/bio.c index 71a78d9fb8b7..b64cedc7f87c 100644 --- a/block/bio.c +++ b/block/bio.c @@ -849,20 +849,14 @@ static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter) size = bio_add_page(bio, bv->bv_page, len, bv->bv_offset + iter->iov_offset); if (size == len) { - struct page *page; - int i; + if (!bio_flagged(bio, BIO_NO_PAGE_REF)) { + struct page *page; + int i; + + mp_bvec_for_each_page(page, bv, i) + get_page(page); + } - /* - * For the normal O_DIRECT case, we could skip grabbing this - * reference and then not have to put them again when IO - * completes. But this breaks some in-kernel users, like - * splicing to/from a loop device, where we release the pipe - * pages unconditionally. If we can fix that case, we can - * get rid of the get here and the need to call - * bio_release_pages() at IO completion time. - */ - mp_bvec_for_each_page(page, bv, i) - get_page(page); iov_iter_advance(iter, size); return 0; } @@ -925,10 +919,12 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) * This takes either an iterator pointing to user memory, or one pointing to * kernel pages (BVEC iterator). If we're adding user pages, we pin them and * map them into the kernel. On IO completion, the caller should put those - * pages. For now, when adding kernel pages, we still grab a reference to the - * page. This isn't strictly needed for the common case, but some call paths - * end up releasing pages from eg a pipe and we can't easily control these. - * See comment in __bio_iov_bvec_add_pages(). + * pages. If we're adding kernel pages, and the caller told us it's safe to + * do so, we just have to add the pages to the bio directly. We don't grab an + * extra reference to those pages (the user should already have that), and we + * don't put the page on IO completion. The caller needs to check if the bio is + * flagged BIO_NO_PAGE_REF on IO completion. If it isn't, then pages should be + * released. * * The function tries, but does not guarantee, to pin as many pages as * fit into the bio, or are requested in *iter, whatever is smaller. If @@ -940,6 +936,13 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) const bool is_bvec = iov_iter_is_bvec(iter); unsigned short orig_vcnt = bio->bi_vcnt; + /* + * If this is a BVEC iter, then the pages are kernel pages. Don't + * release them on IO completion, if the caller asked us to. + */ + if (is_bvec && iov_iter_bvec_no_ref(iter)) + bio_set_flag(bio, BIO_NO_PAGE_REF); + do { int ret; @@ -1696,7 +1699,8 @@ static void bio_dirty_fn(struct work_struct *work) next = bio->bi_private; bio_set_pages_dirty(bio); - bio_release_pages(bio); + if (!bio_flagged(bio, BIO_NO_PAGE_REF)) + bio_release_pages(bio); bio_put(bio); } } @@ -1713,7 +1717,8 @@ void bio_check_pages_dirty(struct bio *bio) goto defer; } - bio_release_pages(bio); + if (!bio_flagged(bio, BIO_NO_PAGE_REF)) + bio_release_pages(bio); bio_put(bio); return; defer: diff --git a/fs/block_dev.c b/fs/block_dev.c index e9faa52bb489..78d3257435c0 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -336,12 +336,14 @@ static void blkdev_bio_end_io(struct bio *bio) if (should_dirty) { bio_check_pages_dirty(bio); } else { - struct bio_vec *bvec; - int i; - struct bvec_iter_all iter_all; + if (!bio_flagged(bio, BIO_NO_PAGE_REF)) { + struct bvec_iter_all iter_all; + struct bio_vec *bvec; + int i; - bio_for_each_segment_all(bvec, bio, i, iter_all) - put_page(bvec->bv_page); + bio_for_each_segment_all(bvec, bio, i, iter_all) + put_page(bvec->bv_page); + } bio_put(bio); } } diff --git a/fs/iomap.c b/fs/iomap.c index 97cb9d486a7d..abdd18e404f8 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -1589,12 +1589,14 @@ static void iomap_dio_bio_end_io(struct bio *bio) if (should_dirty) { bio_check_pages_dirty(bio); } else { - struct bio_vec *bvec; - int i; - struct bvec_iter_all iter_all; + if (!bio_flagged(bio, BIO_NO_PAGE_REF)) { + struct bvec_iter_all iter_all; + struct bio_vec *bvec; + int i; - bio_for_each_segment_all(bvec, bio, i, iter_all) - put_page(bvec->bv_page); + bio_for_each_segment_all(bvec, bio, i, iter_all) + put_page(bvec->bv_page); + } bio_put(bio); } } diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index d66bf5f32610..791fee35df88 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -215,6 +215,7 @@ struct bio { /* * bio flags */ +#define BIO_NO_PAGE_REF 0 /* don't put release vec pages */ #define BIO_SEG_VALID 1 /* bi_phys_segments valid */ #define BIO_CLONED 2 /* doesn't own data */ #define BIO_BOUNCED 3 /* bio is a bounce bio */ From patchwork Thu Mar 14 20:44:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10853583 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5069A1874 for ; Thu, 14 Mar 2019 20:44:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 41F1C2A650 for ; Thu, 14 Mar 2019 20:44:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 369152A654; Thu, 14 Mar 2019 20:44:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E17152A650 for ; Thu, 14 Mar 2019 20:44:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727941AbfCNUo4 (ORCPT ); Thu, 14 Mar 2019 16:44:56 -0400 Received: from mail-it1-f195.google.com ([209.85.166.195]:37518 "EHLO mail-it1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727931AbfCNUo4 (ORCPT ); Thu, 14 Mar 2019 16:44:56 -0400 Received: by mail-it1-f195.google.com with SMTP id z124so7346547itc.2 for ; Thu, 14 Mar 2019 13:44:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=aGFJ5HGPoz/ktnKoEe1M9JMfYCh1Xl63PfrPUdXe2mU=; b=dpwVwuZfEY46gyqEya30pK4loHBgwJ0o32ppkgOurjxCt0x7yxUa2cqsjEp6bM1Wgc pwtsDuM9qi5q5+Uz1umiQB1BIH27t0EgOhEi9k9mxgK3BMLIjvM4lUtkROhx12B00WW8 /Q8Hub2ikfHdg7QglxI9rH1zz/EHz009umbnKy0jx+0FDtys9SI1oy9Kbs4HD66g20Bz 5VMDhsC/xEBwNZgTja/UJ/EiaSb2LbdNO/wKAOiFiAmZOausj39K9QJ0HW2/Wr61FRnb 0HevPn3k2XnqykEyyTxCcIKguy8buPvmimMeVoxLLVg6qYUPyyziZN6MbjYHlk2IjPjI +80Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=aGFJ5HGPoz/ktnKoEe1M9JMfYCh1Xl63PfrPUdXe2mU=; b=tkkKUIybnoZJw8vriYaRNILYKkDOLLIg1rJt/w7XoLgPDBwQFbqYPuKolO0sCyP5b+ OmGeApYhm1y9JJMqvS5FlBzwKe0n2ypKPODjnIL9dthiltnEwRYiU3zF2bYdsmk8QZUW g9auHDPqc99/sbaHnnOUa1wd3mC67zVR6LlmYnJJPHe5LxHnZR9T8agBpIIR5mJXbyDH A2TS82QBwMF9H1ViN3yDvgceiW2nNEpHK50zUnn2Rpb9FTguCdtlO4cORKvFLKoZASe8 0G9zfODW3Uiby4YV6iyOGuoqhrVQa8cOvO6N2n47M7Jqu4UqyGQPyBgsb2y3B0+IXMkK axDQ== X-Gm-Message-State: APjAAAXwKD+NitwNHVulWLbEap6T1jWvEIVBOKjL2rLe/O2tDu8MbnTm zEPF8mO+FV+YeclveTFuQrEdXXkKgdFOIw== X-Google-Smtp-Source: APXvYqzh7OViW2aFBiKTuBennqWW2dDjWk3E5qSf4Y4qA51RoR1yywPEuiY+bO7OMV06pEVMtOo5RQ== X-Received: by 2002:a02:a903:: with SMTP id n3mr148847jam.3.1552596295243; Thu, 14 Mar 2019 13:44:55 -0700 (PDT) Received: from x1.thefacebook.com ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id w19sm1820538ita.33.2019.03.14.13.44.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 14 Mar 2019 13:44:54 -0700 (PDT) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: viro@ZenIV.linux.org.uk, Jens Axboe Subject: [PATCH 8/8] io_uring: add io_uring_event cache hit information Date: Thu, 14 Mar 2019 14:44:35 -0600 Message-Id: <20190314204435.7692-9-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190314204435.7692-1-axboe@kernel.dk> References: <20190314204435.7692-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add hint on whether a read was served out of the page cache, or if it hit media. This is useful for buffered async IO, O_DIRECT reads would never have this set (for obvious reasons). If the read hit page cache, cqe->flags will have IOCQE_FLAG_CACHEHIT set. Signed-off-by: Jens Axboe --- fs/io_uring.c | 6 +++++- include/uapi/linux/io_uring.h | 5 +++++ 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index b89f91ab6e32..3437f538b377 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -625,10 +625,14 @@ static void kiocb_end_write(struct kiocb *kiocb) static void io_complete_rw(struct kiocb *kiocb, long res, long res2) { struct io_kiocb *req = container_of(kiocb, struct io_kiocb, rw); + unsigned ev_flags = 0; kiocb_end_write(kiocb); - io_cqring_add_event(req->ctx, req->user_data, res, 0); + if (res > 0 && (req->flags & REQ_F_FORCE_NONBLOCK)) + ev_flags = IOCQE_FLAG_CACHEHIT; + + io_cqring_add_event(req->ctx, req->user_data, res, ev_flags); io_put_req(req); } diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index e23408692118..24906e99fdc7 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -69,6 +69,11 @@ struct io_uring_cqe { __u32 flags; }; +/* + * io_uring_event->flags + */ +#define IOCQE_FLAG_CACHEHIT (1U << 0) /* IO did not hit media */ + /* * Magic offsets for the application to mmap the data it needs */