From patchwork Fri Dec 7 22:19:51 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718835 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B646218B8 for ; Fri, 7 Dec 2018 22:20:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A8BA12F719 for ; Fri, 7 Dec 2018 22:20:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9CFE22F72F; Fri, 7 Dec 2018 22:20:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 46CBE2F719 for ; Fri, 7 Dec 2018 22:20:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726079AbeLGWU0 (ORCPT ); Fri, 7 Dec 2018 17:20:26 -0500 Received: from mail-pf1-f195.google.com ([209.85.210.195]:32920 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726080AbeLGWU0 (ORCPT ); Fri, 7 Dec 2018 17:20:26 -0500 Received: by mail-pf1-f195.google.com with SMTP id c123so2583418pfb.0 for ; Fri, 07 Dec 2018 14:20:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=uEeDBkOwPid70UEg8pUacK6i7pi0+OoFq4mtLStZsZw=; b=1D3nPyQ0nmtuXSQzT0r6H2eH/pk5CTR8ogEP9fGDYBAbJaXJlF+G/eze0R7r66S4WS inXR9Wkkze+mm3X62672uu/UDW6lKce00Leu/+WmKk7XJfKmljq/spen1A+Ishm8oTsT Stf+mqVGzmOLw3XodKo2bY2SIoKlke41XM6AxEhVmOSR/wD8mIDa9UgiZdh0rVArT+QH SOe+iyl96m9a3iQ5jECj0mqRup21GMGUEWIU8gf5honqTjStcstYL8viNrHF1+2GieQy Emd6sVVeHnkKSy3XVp+j+p8PY6SfCaiTv9N28o0LpB4tuHdatdz5fnJ1GiyQiCdvaYeQ h/Rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=uEeDBkOwPid70UEg8pUacK6i7pi0+OoFq4mtLStZsZw=; b=sfnLJpe3+vh+IPMmGH3r1sEQo5y/HL+i/fA9yeX772EAsLjpMXlFkqquHQvct6apek mBWu+L6sXDsL50B2W8pugCWPqvYS/Va9tUKtEv11ql1clxvt0hDFU2oG9my9Rn2tbzwt 7U7vgLWpT2BaH7LK2//m1V34jqrMHKjFa/XDU/jzqowMexqD4lFsR/zat4kACBkd3h1L CDO0+8ZdcErqYABWx6b4djbyLFV8uF7qQ2BzjJVCb/rCGA/V9a4MFKmXTS0u7Y2yF/On xyW+f9L5aQfGU7zz2PijNUzbgeTc/R16hxpzc18qm00CXGy1CeiBP/3/3vTO/tzPRGXA 3QEQ== X-Gm-Message-State: AA+aEWZJwjYhsek2K1bd/2W5AuwvVTQxM2R6EGwfzz0qJ6ihzbXYxhif 9fmXmv+9B2vCHfC6P+XLivA4BySDWgw= X-Google-Smtp-Source: AFSGD/Vy2wqspFtiLNmMaxhXf5147i9wkZ9pN47HYa8PSKx3f5bJqpRfzhTn/iHazSEI0tv0ae8pvQ== X-Received: by 2002:a63:e001:: with SMTP id e1mr3519450pgh.39.1544221224991; Fri, 07 Dec 2018 14:20:24 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.20.23 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:20:23 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 01/26] fs: add an iopoll method to struct file_operations Date: Fri, 7 Dec 2018 15:19:51 -0700 Message-Id: <20181207222016.29387-2-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Christoph Hellwig This new methods is used to explicitly poll for I/O completion for an iocb. It must be called for any iocb submitted asynchronously (that is with a non-null ki_complete) which has the IOCB_HIPRI flag set. The method is assisted by a new ki_cookie field in struct iocb to store the polling cookie. TODO: we can probably union ki_cookie with the existing hint and I/O priority fields to avoid struct kiocb growth. Reviewed-by: Johannes Thumshirn Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- Documentation/filesystems/vfs.txt | 3 +++ include/linux/fs.h | 2 ++ 2 files changed, 5 insertions(+) diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 5f71a252e2e0..d9dc5e4d82b9 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -857,6 +857,7 @@ struct file_operations { ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); ssize_t (*read_iter) (struct kiocb *, struct iov_iter *); ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); + int (*iopoll)(struct kiocb *kiocb, bool spin); int (*iterate) (struct file *, struct dir_context *); int (*iterate_shared) (struct file *, struct dir_context *); __poll_t (*poll) (struct file *, struct poll_table_struct *); @@ -902,6 +903,8 @@ otherwise noted. write_iter: possibly asynchronous write with iov_iter as source + iopoll: called when aio wants to poll for completions on HIPRI iocbs + iterate: called when the VFS needs to read the directory contents iterate_shared: called when the VFS needs to read the directory contents diff --git a/include/linux/fs.h b/include/linux/fs.h index a1ab233e6469..6a5f71f8ae06 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -310,6 +310,7 @@ struct kiocb { int ki_flags; u16 ki_hint; u16 ki_ioprio; /* See linux/ioprio.h */ + unsigned int ki_cookie; /* for ->iopoll */ } __randomize_layout; static inline bool is_sync_kiocb(struct kiocb *kiocb) @@ -1781,6 +1782,7 @@ struct file_operations { ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); ssize_t (*read_iter) (struct kiocb *, struct iov_iter *); ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); + int (*iopoll)(struct kiocb *kiocb, bool spin); int (*iterate) (struct file *, struct dir_context *); int (*iterate_shared) (struct file *, struct dir_context *); __poll_t (*poll) (struct file *, struct poll_table_struct *); From patchwork Fri Dec 7 22:19:52 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718839 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3FFBB18B8 for ; Fri, 7 Dec 2018 22:20:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2FDAD2F719 for ; Fri, 7 Dec 2018 22:20:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 23A742F72F; Fri, 7 Dec 2018 22:20:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D5AA32F719 for ; Fri, 7 Dec 2018 22:20:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726078AbeLGWU2 (ORCPT ); Fri, 7 Dec 2018 17:20:28 -0500 Received: from mail-pf1-f195.google.com ([209.85.210.195]:38018 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726112AbeLGWU1 (ORCPT ); Fri, 7 Dec 2018 17:20:27 -0500 Received: by mail-pf1-f195.google.com with SMTP id q1so2573368pfi.5 for ; Fri, 07 Dec 2018 14:20:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=iYR2nYossn2CYfY6XYD4jgz1urOe8gSQ6v8WOe22hMY=; b=ZL/gTdA+jDOXWytYxbU1Mio0kBomwG7eHy6S3fDMGw8O7hB28SwuEKU7bdpsxFQZez 3SP0A9yhs36LpazT092SAtFkbNWaA8iFIsGNZMEv03gm/zxCCvdSCvAhSHQ9Tm+VuyqO WCWoN7Y0EuqRoPSovdDFtfoD3xB57k6c+Vpm602k5cPQsfIwPoAdQkueKIvxLVhbspil qisvbKXQzmKRsGCjRV3DX/XWhq83l6oOcmO4T0hgqEBNkLbdAjVzEHTClOzqduEJzzmh U/9V0gDP793gI4uI8rsu1xZMKaPSXGPJcuZXSdOk6A6dm7+DCePmf4OMmwZVpFyWGJTg sDPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=iYR2nYossn2CYfY6XYD4jgz1urOe8gSQ6v8WOe22hMY=; b=HlSsJybbZNmZVhMDVfdZFHRnWWo1ukmi5ABcsCg1Q8EVbvW27XZRrLNGERjcXhc0lk p/I5Ub07JFvn+RlkwA7sUrW03jN4o0CD7OY8mBrjqNcrsJgL3zoyFVQuAAXPMMjFXeZS RxxeiVFoRpbfFyWeKx1e9iRGW3GOwVrEDEa4YxiAqsRGR6Hbh/GyUgpJ5L/iP2IB6xfG uShbQvfXCofRsDulvQJgwCjrAnscUVU4IhFyNS2d6sqLRkIOjtfV79Ow43+2ANUuYIyG BtAx8AY946fg/3kEF4ROK/jA7ubDpWQo3bA7GMGW2Ln6pgpUETr3mqOdiNm77eR0JZ4E sJgA== X-Gm-Message-State: AA+aEWZnLurkdTD/BiF4531sXAdYVdlyZuqqNJONxxKl5bOYhL1obkMW vL140q04JEkpTJWH6kQhdUgRaEIaMIw= X-Google-Smtp-Source: AFSGD/V4NslnNK5Ts3XuaMSUGIbwZgGI8x4/73fJAJXH1Psih0xnCjPyFhZZJnyIP3C8vH2NDG+gpw== X-Received: by 2002:a63:e84c:: with SMTP id a12mr3443454pgk.241.1544221226860; Fri, 07 Dec 2018 14:20:26 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.20.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:20:25 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 02/26] block: add REQ_HIPRI_ASYNC Date: Fri, 7 Dec 2018 15:19:52 -0700 Message-Id: <20181207222016.29387-3-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP For the upcoming async polled IO, we can't sleep allocating requests. If we do, then we introduce a deadlock where the submitter already has async polled IO in-flight, but can't wait for them to complete since polled requests must be active found and reaped. Signed-off-by: Jens Axboe --- include/linux/blk_types.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 46c005d601ac..921d734d6b5d 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -347,6 +347,7 @@ enum req_flag_bits { #define REQ_NOWAIT (1ULL << __REQ_NOWAIT) #define REQ_NOUNMAP (1ULL << __REQ_NOUNMAP) #define REQ_HIPRI (1ULL << __REQ_HIPRI) +#define REQ_HIPRI_ASYNC (REQ_HIPRI | REQ_NOWAIT) #define REQ_DRV (1ULL << __REQ_DRV) #define REQ_SWAP (1ULL << __REQ_SWAP) From patchwork Fri Dec 7 22:19:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718843 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9520418B8 for ; Fri, 7 Dec 2018 22:20:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8801C2F719 for ; Fri, 7 Dec 2018 22:20:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7BCF22F72F; Fri, 7 Dec 2018 22:20:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2FFE52F719 for ; Fri, 7 Dec 2018 22:20:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726080AbeLGWUa (ORCPT ); Fri, 7 Dec 2018 17:20:30 -0500 Received: from mail-pg1-f195.google.com ([209.85.215.195]:46751 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726112AbeLGWUa (ORCPT ); Fri, 7 Dec 2018 17:20:30 -0500 Received: by mail-pg1-f195.google.com with SMTP id w7so2282302pgp.13 for ; Fri, 07 Dec 2018 14:20:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=eJ3aWzBT4gm4FwYHYUV1JBWUWsiFyJVaotzVIFD6bng=; b=ASWBDrJlAYIMXIY/ngfp+WB5zKJfKwYfKDW2yWUq0kQv6AeB2NPEw9bE2ivmcJx55O 3EgbtvaZevMN8c8lsbUJonzIOGyFrDGkJO4ajLJXnon3HQeMS9Mzp6PRCV0G8g+bsmoD zb980OcDUGxMK70565TlDXMTsh4en/fk0Api4Y21d61FbVRZVqMuWI+QrWQT/nxsyOZY JF8dqEA3clRoWtY5cGliN4r2dAZnrDoMZKVo93Tik+kTAVhJDGuru/eexFbf5GfpeID1 Ak5/OroiPXZvrMIw2tiLZCMcE5ag6bc9P4ERETJxy+csTPiAH+bzybbldx0fKifQhNbX /adA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=eJ3aWzBT4gm4FwYHYUV1JBWUWsiFyJVaotzVIFD6bng=; b=IwzyM6ika4p91e7D0HHWb7mrjMf2pLEvOaz/z6hPf5NhbR+y3Wb2eLh20uIKb0hBpb 7ltAD/4zl2j6NpbNvSgCuo1sPpGZRDnEq3IJd6sy1MsRZOAeVVYpSfmG9+mDwVw01yJ0 cdlKP5Q2HipZTe2voMQunmVonEOm1E3iUHBAEr4Fm+YcNRIu1xgMw6T4eCqA8ofGdJlw 792CQKFMDyonfCiglhcudyKRhxf1+du66S15Yg/hAMGMMqzs1CUXby27ecYffPrgt5ZK dxtIu299YgFsWeftl1E4UGnqnbkikgtMeupUF00LrYO2wbFo/jM9rX9xXFdlRECNozgq lMBA== X-Gm-Message-State: AA+aEWbC67+BHHRHmjN0MTPV36JBgyKBEnqbHSCRY4bwV+5ZANgC2m42 ut8pfYdH3dEk7mVgLWjd3eZ9LrFtCJs= X-Google-Smtp-Source: AFSGD/XWeYRdClJsIVTm6TL8xVVIBkpVwJVBVPYtoNpPNhwVQelk1JOyWD2OKgyez+Y7XAuN85yWdw== X-Received: by 2002:a63:6cc:: with SMTP id 195mr3545336pgg.52.1544221228891; Fri, 07 Dec 2018 14:20:28 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.20.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:20:28 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 03/26] block: wire up block device iopoll method Date: Fri, 7 Dec 2018 15:19:53 -0700 Message-Id: <20181207222016.29387-4-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Christoph Hellwig Just call blk_poll on the iocb cookie, we can derive the block device from the inode trivially. Reviewed-by: Johannes Thumshirn Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/block_dev.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/block_dev.c b/fs/block_dev.c index e1886cc7048f..6de8d35f6e41 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -281,6 +281,14 @@ struct blkdev_dio { static struct bio_set blkdev_dio_pool; +static int blkdev_iopoll(struct kiocb *kiocb, bool wait) +{ + struct block_device *bdev = I_BDEV(kiocb->ki_filp->f_mapping->host); + struct request_queue *q = bdev_get_queue(bdev); + + return blk_poll(q, READ_ONCE(kiocb->ki_cookie), wait); +} + static void blkdev_bio_end_io(struct bio *bio) { struct blkdev_dio *dio = bio->bi_private; @@ -398,6 +406,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) bio->bi_opf |= REQ_HIPRI; qc = submit_bio(bio); + WRITE_ONCE(iocb->ki_cookie, qc); break; } @@ -2070,6 +2079,7 @@ const struct file_operations def_blk_fops = { .llseek = block_llseek, .read_iter = blkdev_read_iter, .write_iter = blkdev_write_iter, + .iopoll = blkdev_iopoll, .mmap = generic_file_mmap, .fsync = blkdev_fsync, .unlocked_ioctl = block_ioctl, From patchwork Fri Dec 7 22:19:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718847 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4BB1B18B8 for ; Fri, 7 Dec 2018 22:20:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3E0C62F719 for ; Fri, 7 Dec 2018 22:20:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 320752F72F; Fri, 7 Dec 2018 22:20:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E3CAF2F719 for ; Fri, 7 Dec 2018 22:20:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726112AbeLGWUc (ORCPT ); Fri, 7 Dec 2018 17:20:32 -0500 Received: from mail-pl1-f195.google.com ([209.85.214.195]:45024 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726121AbeLGWUb (ORCPT ); Fri, 7 Dec 2018 17:20:31 -0500 Received: by mail-pl1-f195.google.com with SMTP id k8so2420101pls.11 for ; Fri, 07 Dec 2018 14:20:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=CoUoLVQhZCk01wz0SKdc8zjKvmVqwUz+894lpbana7A=; b=W8HFBiyB2XMVBHaxGd7NtGW5LYdA6hlG7sdQehqbUYp98smU7U/Ym1XWMnxMvGO9oX 7BAacez6JE/Wy6Mj6eXUdwJHtqikXK1eUEFr6iETjIT5CM3TXrOAbZ0dUDDkEZi9Dx8U YIIP+N6YMUCu1v9J3Lnp4DCqdbC+sqeFfln9FEqcleEaq0jl5jerRJnmupnqLQ97N+0A cii+hnqvQWBzHrjbB/lvPBJrmmaKqWqpN/xdSV9syZFmyJVItS+FJfufruDG7nSt2Hjb rA/smpdquhTRSxHIP8vWImwp5oZ/jh2t+fK/SDQrtc5LGSCMNwfJgrWjw+4kf7pn5o7O XUVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=CoUoLVQhZCk01wz0SKdc8zjKvmVqwUz+894lpbana7A=; b=FDatnF4hKxFaQROOYY8+HZwus1q4u4w4YOsaM6R+s1TsV4sRx6I0f7OPJCqMH2/McC P+eFTlQqosRK+QuVV/rVOvfcermegqbK6aYRrHccE1aAK11iec3x/Uoz0mlGxEd7e0TN K+W2WDiPAiDs8t9nqhUjMZZlxlWnqnAELkHCeHeLaWHAUgawxb75yGFsERkHHHJd3yE2 zWk0vXjdgr/tWWhaaCSzpOlcmSw2mMJnxVCA7u6bCdQ04bE1roeobSIIHC9de26r40GG c5k0YSApmavNjNVmdrzKn2vMpEM8Qj13j7XNzAG2oLmvP05EW01CPUe1428vh0ufff0a GsCg== X-Gm-Message-State: AA+aEWbn3yce25W2ynvvhixeWfOVG6H0hNTsRiXrqd7qKHxLXMDd8tBh ZZMVirR8hRJYkX5DH4BK7o0hkt8tVSI= X-Google-Smtp-Source: AFSGD/VaLTBN08GZha/K58BiRnbxCx8e8ICKA277hrvYXvc5aZjqHbkMCUMhBFI5eQtEciTOaL470g== X-Received: by 2002:a17:902:714c:: with SMTP id u12mr3761521plm.234.1544221230789; Fri, 07 Dec 2018 14:20:30 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.20.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:20:29 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 04/26] block: use REQ_HIPRI_ASYNC for non-sync polled IO Date: Fri, 7 Dec 2018 15:19:54 -0700 Message-Id: <20181207222016.29387-5-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Tell the block layer if it's a sync or async polled request, so it can do the right thing. Signed-off-by: Jens Axboe --- fs/block_dev.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 6de8d35f6e41..b8f574615792 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -402,8 +402,12 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES); if (!nr_pages) { - if (iocb->ki_flags & IOCB_HIPRI) - bio->bi_opf |= REQ_HIPRI; + if (iocb->ki_flags & IOCB_HIPRI) { + if (!is_sync) + bio->bi_opf |= REQ_HIPRI_ASYNC; + else + bio->bi_opf |= REQ_HIPRI; + } qc = submit_bio(bio); WRITE_ONCE(iocb->ki_cookie, qc); From patchwork Fri Dec 7 22:19:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718851 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 057261750 for ; Fri, 7 Dec 2018 22:20:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EB5BB2F719 for ; Fri, 7 Dec 2018 22:20:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DF62A2F72F; Fri, 7 Dec 2018 22:20:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 739912F719 for ; Fri, 7 Dec 2018 22:20:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726121AbeLGWUe (ORCPT ); Fri, 7 Dec 2018 17:20:34 -0500 Received: from mail-pg1-f196.google.com ([209.85.215.196]:33973 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726130AbeLGWUe (ORCPT ); Fri, 7 Dec 2018 17:20:34 -0500 Received: by mail-pg1-f196.google.com with SMTP id 17so2309267pgg.1 for ; Fri, 07 Dec 2018 14:20:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=8tm06g2UYqNEscam4O7nNmIUSHvbGtCWrlfmTvusP68=; b=LoI5LldSdzkiujTS9mNEGMKtBQrrJ535jmesletPGXGNxCqYRwN1u4XaPBk5vnmtpZ AgL72W8tKQyU2jyHuIEVAFeJI6M0Z5hKZWRPLTc4fBEXo3WjAxDYU/kPv3H6G4iM95Cu tFm95Ie2mKzHcFEISK6ejlaO/Q115uFGxtSvRSq6hzzpGnuMp2pQTXtBNMDnwWucXlli 4+4LSh9DMDITKpSA9jzCdgrWzRIsoJQ25PxH+H2OQZt4aVLhYd+BYXX2Co8GBaARkkrR tM3r4Dl0M+064kpdNmIJ66uaz/3FLhE7qrJu9umCUtDQPTYn/g4X+D8wKWiWKRRvJppP /AFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=8tm06g2UYqNEscam4O7nNmIUSHvbGtCWrlfmTvusP68=; b=qBNmvV0fy9ZgE6J4/KAkRCZ2CcztYooR1DYRA/TxSf5QeG9XNz6cLPsBSjwEG7nhwH 0VbAIXxS0xllJZxIkGPF6UL/vfuL4OxCD9pcv5Wzugba9lEF2U2F/+UyF8b95HT4GiuW 5736yWMenJz6aoOfWTjtGY08qsJWh+ZdXdQZ6bikTrdZkfxzw7EHBAjlbvSLUMqyKj+X lIgBXE6X5I5/y6QhdlZG7CMw5XNzKdBMtdcWkop0FsS0RljzLrDeppcQCuzADBhCR7FM 9JMia19WS6eyV4klTFTJ/HpwHFjkh1NNrDg6ZeG++LXDzcy/fZL8u+94XIwmUs9VC3Nv pFAA== X-Gm-Message-State: AA+aEWZ5fm0gfvuW1BPewTo3rAaWtrW4uloiROSFub/HXkKfGZBk9oga 0pFva27PKE/6Nlfsy0xzRvow1sQL0fY= X-Google-Smtp-Source: AFSGD/U9YmPL/dvzuVjrgCW9aORy1jqu/lCFFpTPxcycj2r+QsTA8ZQre6++4n5kHbBakFZXgXD97w== X-Received: by 2002:a63:b34f:: with SMTP id x15mr3537587pgt.243.1544221232814; Fri, 07 Dec 2018 14:20:32 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.20.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:20:31 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 05/26] iomap: wire up the iopoll method Date: Fri, 7 Dec 2018 15:19:55 -0700 Message-Id: <20181207222016.29387-6-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Christoph Hellwig Store the request queue the last bio was submitted to in the iocb private data in addition to the cookie so that we find the right block device. Also refactor the common direct I/O bio submission code into a nice little helper. Signed-off-by: Christoph Hellwig Modified to use REQ_HIPRI_ASYNC for async polled IO. Signed-off-by: Jens Axboe --- fs/gfs2/file.c | 2 ++ fs/iomap.c | 47 +++++++++++++++++++++++++++++-------------- fs/xfs/xfs_file.c | 1 + include/linux/iomap.h | 1 + 4 files changed, 36 insertions(+), 15 deletions(-) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 45a17b770d97..358157efc5b7 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -1280,6 +1280,7 @@ const struct file_operations gfs2_file_fops = { .llseek = gfs2_llseek, .read_iter = gfs2_file_read_iter, .write_iter = gfs2_file_write_iter, + .iopoll = iomap_dio_iopoll, .unlocked_ioctl = gfs2_ioctl, .mmap = gfs2_mmap, .open = gfs2_open, @@ -1310,6 +1311,7 @@ const struct file_operations gfs2_file_fops_nolock = { .llseek = gfs2_llseek, .read_iter = gfs2_file_read_iter, .write_iter = gfs2_file_write_iter, + .iopoll = iomap_dio_iopoll, .unlocked_ioctl = gfs2_ioctl, .mmap = gfs2_mmap, .open = gfs2_open, diff --git a/fs/iomap.c b/fs/iomap.c index d094e5688bd3..bd483fcb7b5a 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -1441,6 +1441,32 @@ struct iomap_dio { }; }; +int iomap_dio_iopoll(struct kiocb *kiocb, bool spin) +{ + struct request_queue *q = READ_ONCE(kiocb->private); + + if (!q) + return 0; + return blk_poll(q, READ_ONCE(kiocb->ki_cookie), spin); +} +EXPORT_SYMBOL_GPL(iomap_dio_iopoll); + +static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap, + struct bio *bio) +{ + atomic_inc(&dio->ref); + + if (dio->iocb->ki_flags & IOCB_HIPRI) { + if (!dio->wait_for_completion) + bio->bi_opf |= REQ_HIPRI_ASYNC; + else + bio->bi_opf |= REQ_HIPRI; + } + + dio->submit.last_queue = bdev_get_queue(iomap->bdev); + dio->submit.cookie = submit_bio(bio); +} + static ssize_t iomap_dio_complete(struct iomap_dio *dio) { struct kiocb *iocb = dio->iocb; @@ -1553,7 +1579,7 @@ static void iomap_dio_bio_end_io(struct bio *bio) } } -static blk_qc_t +static void iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos, unsigned len) { @@ -1567,15 +1593,10 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos, bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; - if (dio->iocb->ki_flags & IOCB_HIPRI) - flags |= REQ_HIPRI; - get_page(page); __bio_add_page(bio, page, len, 0); bio_set_op_attrs(bio, REQ_OP_WRITE, flags); - - atomic_inc(&dio->ref); - return submit_bio(bio); + iomap_dio_submit_bio(dio, iomap, bio); } static loff_t @@ -1678,9 +1699,6 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, bio_set_pages_dirty(bio); } - if (dio->iocb->ki_flags & IOCB_HIPRI) - bio->bi_opf |= REQ_HIPRI; - iov_iter_advance(dio->submit.iter, n); dio->size += n; @@ -1688,11 +1706,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, copied += n; nr_pages = iov_iter_npages(&iter, BIO_MAX_PAGES); - - atomic_inc(&dio->ref); - - dio->submit.last_queue = bdev_get_queue(iomap->bdev); - dio->submit.cookie = submit_bio(bio); + iomap_dio_submit_bio(dio, iomap, bio); } while (nr_pages); /* @@ -1912,6 +1926,9 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, if (dio->flags & IOMAP_DIO_WRITE_FUA) dio->flags &= ~IOMAP_DIO_NEED_SYNC; + WRITE_ONCE(iocb->ki_cookie, dio->submit.cookie); + WRITE_ONCE(iocb->private, dio->submit.last_queue); + if (!atomic_dec_and_test(&dio->ref)) { if (!dio->wait_for_completion) return -EIOCBQUEUED; diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index e47425071e65..60c2da41f0fc 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1203,6 +1203,7 @@ const struct file_operations xfs_file_operations = { .write_iter = xfs_file_write_iter, .splice_read = generic_file_splice_read, .splice_write = iter_file_splice_write, + .iopoll = iomap_dio_iopoll, .unlocked_ioctl = xfs_file_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = xfs_file_compat_ioctl, diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 9a4258154b25..0fefb5455bda 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -162,6 +162,7 @@ typedef int (iomap_dio_end_io_t)(struct kiocb *iocb, ssize_t ret, unsigned flags); ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, iomap_dio_end_io_t end_io); +int iomap_dio_iopoll(struct kiocb *kiocb, bool spin); #ifdef CONFIG_SWAP struct file; From patchwork Fri Dec 7 22:19:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718855 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 71E4215A6 for ; Fri, 7 Dec 2018 22:20:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 636752F71D for ; Fri, 7 Dec 2018 22:20:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5748E2F73F; Fri, 7 Dec 2018 22:20:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 09B012F71D for ; Fri, 7 Dec 2018 22:20:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726135AbeLGWUg (ORCPT ); Fri, 7 Dec 2018 17:20:36 -0500 Received: from mail-pl1-f196.google.com ([209.85.214.196]:45817 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726137AbeLGWUf (ORCPT ); Fri, 7 Dec 2018 17:20:35 -0500 Received: by mail-pl1-f196.google.com with SMTP id a14so2419716plm.12 for ; Fri, 07 Dec 2018 14:20:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=er4XNTBrDyS5t2gvWJtJjpkVFpAPOC78sIXCy9gEMJ4=; b=EDmQ/OxYLLqNzUokUrsNdKb8Phl575uEIHuKF/MdO/zmFqMbe3+HIKPktivzMZfmq+ 0FjGmJcRPa4zQB7zhKfk0dVxmdsyYBwZVGXvt9sAwbTU/oj2vHxvOeZ6ADcrcWWHkMDv jTGWqcKtdQq5flsOAuFJPKq+IP+s9Hr0bA/Z7SGOFOmmqHCAc3PyV/gzMQDyJdbSJQlw GJXFM6tuvDuIjHAvoZmWYFOx10bVqvBk9Rf7oSyMh/RVBrW3kWkypgETSV12F9WchLYM R13pvbJaEFEPzAHlql+EVhweUWoQFEX0gPAVW/nsq9lCHkLeV+Dt6uZ69DhhHIAcUVqk dD8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=er4XNTBrDyS5t2gvWJtJjpkVFpAPOC78sIXCy9gEMJ4=; b=kjMwb8kcamL9K5Lifqv2d4gFXM0PAmUKEvifBLAJ6k0xYZNVycYo4JsDag2w/E5loi nBZcR77bJ6kR3cfHeCvCrfnWNoS4Uh9SCGUyEMYTyfVHViOEVNQzVv8KFdzie7tkU30q raqmTyXDzweShIooJ2y8P+ChGd+dQ8V2Y+aJWX9w50faYgCfUrwZYGtazYjmBW9pCJHp 8LwDnKTOJ7FZ10DxIUbaxKUNm2xgoyQNt75CrjB8X0Bkb1Dk6MiqDMottfiHMzBXplbW TM5vnHiT5fZ4mBcfh/HahGlEiXqQsAmYpJfI7R0Ipxj8HyFLXwnu16p9MBfPejqORpjr 1cpg== X-Gm-Message-State: AA+aEWYKo8pXzVPhjyyVcpZfKfU861PcER+bDBaISTwwFp1vue0NVzxE LTxy3SHSauWmXZmpWJTxw2W4CqFe6RQ= X-Google-Smtp-Source: AFSGD/WvkI1dtS+ZD1xths86jtHnF8YACme2Jv4mmSrxPuRuPvMfeBGklT56Lp04yvxx91TLssnu9g== X-Received: by 2002:a17:902:8541:: with SMTP id d1mr3870976plo.205.1544221234616; Fri, 07 Dec 2018 14:20:34 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.20.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:20:33 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 06/26] aio: use assigned completion handler Date: Fri, 7 Dec 2018 15:19:56 -0700 Message-Id: <20181207222016.29387-7-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We know this is a read/write request, but in preparation for having different kinds of those, ensure that we call the assigned handler instead of assuming it's aio_complete_rq(). Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/aio.c b/fs/aio.c index 05647d352bf3..cf0de61743e8 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1490,7 +1490,7 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t ret) ret = -EINTR; /*FALLTHRU*/ default: - aio_complete_rw(req, ret, 0); + req->ki_complete(req, ret, 0); } } From patchwork Fri Dec 7 22:19:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718859 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8F2A81750 for ; Fri, 7 Dec 2018 22:20:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 81AB52F719 for ; Fri, 7 Dec 2018 22:20:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 75F472F72F; Fri, 7 Dec 2018 22:20:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2205B2F719 for ; Fri, 7 Dec 2018 22:20:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726077AbeLGWUi (ORCPT ); Fri, 7 Dec 2018 17:20:38 -0500 Received: from mail-pl1-f193.google.com ([209.85.214.193]:45029 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726137AbeLGWUh (ORCPT ); Fri, 7 Dec 2018 17:20:37 -0500 Received: by mail-pl1-f193.google.com with SMTP id k8so2420197pls.11 for ; Fri, 07 Dec 2018 14:20:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=X5thebUgmjkZJznWx9PghrfmA/cZ8t8Zs5+DcD2OHEI=; b=cZ31jsPf/wcI5Gsyf6/QNeyUeTJWhxquFNELyzhvHoOlRBu8Lyv7d90GskhogG2yy7 HJdSx8/Shm+8v9L5ppPntu0zK+7A4i4nlRJ/Anuf0mvmWNp+UKw6rcTVKrP+/vXceZVU UKiitHRrksj4Ia5DwbjbXAJhbK211RZlXCekhMMDVTlRzMuLGS1BhPJEEaJUnICvqGm2 N6jgUrn9Nkqh4Aj6SxgEYJ1CKW7otH78Az9bRTw67HOYMLBF5qPI3Y5toO5nMIn5aTC5 RO8oP1z+a2F2NTypoSNmLjPiAnqEww1Ev1SyBrFTjhOFmHW6Q/LEw+msQRAy1yW99APv kctw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=X5thebUgmjkZJznWx9PghrfmA/cZ8t8Zs5+DcD2OHEI=; b=FzAR0bv2AFT9JEZa346Exm2NObVmV406aJHxBPVPgSQvtoYnm3Y3lbMlWfRE7yCezk ehXhlp6YF6KDtV91pk7zAZUOKuJrsxECzjgzqEfTz6zlHKfvEjqGaN/hTRUtA9Wkcic9 AqW1Yw8rePv3k9/PwebF9wYjFb7RImaosBTrj0gDwSPt9laex49JMfU3OEySK7KzshiQ cEoHQjI4fxih5UVIyEuTzwP3Fs+uJDgPWO9UuL1T7fr8ZpfdIr61nBnDJNWxPZNZNd/f 7y9WAlf9bPqH1Nx3I5mt2iDhvVHUPcw1rbzTKCNKelCep0em74GcH0NnEfZgIO1SfMc7 erHg== X-Gm-Message-State: AA+aEWYUD2NDN6959eWsOlAZsjIpXQUsqdqTZpNRV3c7SHHuqvCjuSG4 joRoiw5gjHuA3kiPBTMzz/BOChWEZac= X-Google-Smtp-Source: AFSGD/XDoYI+wbdKPa1hxYv4fTm9aWFKWDAo5fyTPIeEgmISTIpXTGcQYajxY2GYwyaKKQZ+g4LczA== X-Received: by 2002:a17:902:142:: with SMTP id 60mr3921332plb.330.1544221236443; Fri, 07 Dec 2018 14:20:36 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.20.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:20:35 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 07/26] aio: separate out ring reservation from req allocation Date: Fri, 7 Dec 2018 15:19:57 -0700 Message-Id: <20181207222016.29387-8-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Christoph Hellwig This is in preparation for certain types of IO not needing a ring reserveration. Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index cf0de61743e8..eaceb40e6cf5 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -901,7 +901,7 @@ static void put_reqs_available(struct kioctx *ctx, unsigned nr) local_irq_restore(flags); } -static bool get_reqs_available(struct kioctx *ctx) +static bool __get_reqs_available(struct kioctx *ctx) { struct kioctx_cpu *kcpu; bool ret = false; @@ -993,6 +993,14 @@ static void user_refill_reqs_available(struct kioctx *ctx) spin_unlock_irq(&ctx->completion_lock); } +static bool get_reqs_available(struct kioctx *ctx) +{ + if (__get_reqs_available(ctx)) + return true; + user_refill_reqs_available(ctx); + return __get_reqs_available(ctx); +} + /* aio_get_req * Allocate a slot for an aio request. * Returns NULL if no requests are free. @@ -1001,24 +1009,15 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx) { struct aio_kiocb *req; - if (!get_reqs_available(ctx)) { - user_refill_reqs_available(ctx); - if (!get_reqs_available(ctx)) - return NULL; - } - req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL|__GFP_ZERO); if (unlikely(!req)) - goto out_put; + return NULL; percpu_ref_get(&ctx->reqs); INIT_LIST_HEAD(&req->ki_list); refcount_set(&req->ki_refcnt, 0); req->ki_ctx = ctx; return req; -out_put: - put_reqs_available(ctx, 1); - return NULL; } static struct kioctx *lookup_ioctx(unsigned long ctx_id) @@ -1805,9 +1804,13 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, return -EINVAL; } + if (!get_reqs_available(ctx)) + return -EAGAIN; + + ret = -EAGAIN; req = aio_get_req(ctx); if (unlikely(!req)) - return -EAGAIN; + goto out_put_reqs_available; if (iocb.aio_flags & IOCB_FLAG_RESFD) { /* @@ -1870,11 +1873,12 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, goto out_put_req; return 0; out_put_req: - put_reqs_available(ctx, 1); percpu_ref_put(&ctx->reqs); if (req->ki_eventfd) eventfd_ctx_put(req->ki_eventfd); kmem_cache_free(kiocb_cachep, req); +out_put_reqs_available: + put_reqs_available(ctx, 1); return ret; } From patchwork Fri Dec 7 22:19:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718863 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E8E4F15A6 for ; Fri, 7 Dec 2018 22:20:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DB0242F719 for ; Fri, 7 Dec 2018 22:20:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CF3B12F72F; Fri, 7 Dec 2018 22:20:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8789B2F719 for ; Fri, 7 Dec 2018 22:20:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726143AbeLGWUk (ORCPT ); Fri, 7 Dec 2018 17:20:40 -0500 Received: from mail-pg1-f194.google.com ([209.85.215.194]:32814 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726137AbeLGWUj (ORCPT ); Fri, 7 Dec 2018 17:20:39 -0500 Received: by mail-pg1-f194.google.com with SMTP id z11so2311196pgu.0 for ; Fri, 07 Dec 2018 14:20:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=j8zbGImEaiI6fzdEzobInCNQIlTJ46kLaRSNEoXmu1s=; b=H3vt1ygNmN7jAiiG/KRMk9LQKtQheMfaCpKOCYztxiaFLavZ5wjh4vbptta/7gbi5i 4tg2qf0guLyW2xkYre0GdKRE0dTzbfu9O6CzpQ8Z2KA5pH7H8YTbwB4Xcf44HlNg71D/ ijloX7nOSFO8qGNwgthDKqju3OWs44D28QZKHAaQ4zbtjrCilsW7kRJ0heMuG+aECpEW SAM/1C1Qsn1YuHg3FoYegZUhsac/QJmFuASeeofuC+RV/1fvbI8ycpNmezqcF5JDU9sm SQpVpOf7GrLvlXwbP1cC/aWn+uZPfiC2vVLYVVREDKa55YZOARLIeWIoVRIpeIxfNCY9 G6ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=j8zbGImEaiI6fzdEzobInCNQIlTJ46kLaRSNEoXmu1s=; b=LQjY3uBTU6qXHeq8PzR9xLuuQHAFDOkarLuKmMeQxj99SdwwLDibkzY7xePgYBwOTj Bz3tzSP3B+kZoZ91WcF9M3zGR0XmH/lDo3kEEByq2VH3aHKOP/XLyKXKVdByVKQURbgy I/Bx45qXLMhF3H0bdeuF5wQqc0npXeOq2yRDo3FNSrG9aWziy+wPqN1TbUlxXK/UAHMQ qMCS5N9gNRZaFFhyDZjvodYyWnzYs1owi09/W5XjxiJkACN20eATFN4xBlaifuXJT2c+ YUcsK4AtDbu6A7YO8wPuHOjV+bOvkhRhQ+RX6xTaJDEITNKJqvU+Aikcfx4G7U8OR0qx i+mQ== X-Gm-Message-State: AA+aEWa++8PogRhbwCyaWekZH7kDFuqHR31lT86eA+gcT61PW32cpagj 4fpZtE7/+3KL4RtZCIKpihe1Md5Yah8= X-Google-Smtp-Source: AFSGD/UB1/R66WvOFzl7R7V/TsvM2l/guymfbUrF/oKzdnniJcZ5of7d2lk1n08QCqYmGtpk6telfw== X-Received: by 2002:a63:d747:: with SMTP id w7mr3451509pgi.360.1544221238149; Fri, 07 Dec 2018 14:20:38 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.20.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:20:37 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 08/26] aio: don't zero entire aio_kiocb aio_get_req() Date: Fri, 7 Dec 2018 15:19:58 -0700 Message-Id: <20181207222016.29387-9-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP It's 192 bytes, fairly substantial. Most items don't need to be cleared, especially not upfront. Clear the ones we do need to clear, and leave the other ones for setup when the iocb is prepared and submitted. Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index eaceb40e6cf5..522c04864d82 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1009,14 +1009,15 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx) { struct aio_kiocb *req; - req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL|__GFP_ZERO); + req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL); if (unlikely(!req)) return NULL; percpu_ref_get(&ctx->reqs); + req->ki_ctx = ctx; INIT_LIST_HEAD(&req->ki_list); refcount_set(&req->ki_refcnt, 0); - req->ki_ctx = ctx; + req->ki_eventfd = NULL; return req; } @@ -1730,6 +1731,10 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb) if (unlikely(!req->file)) return -EBADF; + req->head = NULL; + req->woken = false; + req->cancelled = false; + apt.pt._qproc = aio_poll_queue_proc; apt.pt._key = req->events; apt.iocb = aiocb; From patchwork Fri Dec 7 22:19:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718867 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A1B8D18B8 for ; Fri, 7 Dec 2018 22:20:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 946DA2F719 for ; Fri, 7 Dec 2018 22:20:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 88F922F72F; Fri, 7 Dec 2018 22:20:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 38D652F719 for ; Fri, 7 Dec 2018 22:20:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726137AbeLGWUl (ORCPT ); Fri, 7 Dec 2018 17:20:41 -0500 Received: from mail-pl1-f193.google.com ([209.85.214.193]:35823 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726146AbeLGWUl (ORCPT ); Fri, 7 Dec 2018 17:20:41 -0500 Received: by mail-pl1-f193.google.com with SMTP id p8so2440766plo.2 for ; Fri, 07 Dec 2018 14:20:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Fw/O4zwzIPzN/w4BqDNf+gyUsaBSdiYM1Ss+7FBryao=; b=Oc2uevk+FZE9kvl1eEmR3zJM8jmKRQ+wDkct5aKFAOcWqcwgBPWY9oVqQEVt/Mblrt ZFUUD5yODEwyJiQREbGBqA0L2VDJWi8YwLoX+8oXC3ZOVt/c5NnTsIBN/Qz9+qH59LQR 7O/x9kT3djcxOw8urpV83mgO46PxTZD+viR6d1vNn/tWsZesNj26UmjAb1LsE6jDBVWx CSUEUp+QdiHQVHVAUtoQGyTU/qXfGq5/jUU3H2D0TTgZvHekOlal7LiiwvLg+OGgi38T Emx9hCF1Hmcpj+ATsyvIkzimk2N64YTFckUvDIvw7uZijm/bikBNALN6MMj6Zr/6OQHE 39Ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Fw/O4zwzIPzN/w4BqDNf+gyUsaBSdiYM1Ss+7FBryao=; b=itkhy9VVXQ1bq7jXFUsaAY6F2jgdp78PXzVlKz4ERcgucx2aK/Brlgqwtj8k0+WOKC Lv7LoRZc1fgPJlf8gujm8SEliuBSbPsLHE8UyfhwHQaMeH1XUvJ85k/+2F93fyJEFfMZ So83rMd5ZlLyHlCw+7PGjIprJBahCH+VqLC3vg9U8qHJco00jjJtxZKvoqMLrjTEmCIZ LcJp8HwUz8GjezSA5824GJG/ibXh7elEVj+l2fIGw378MGpuOevdaqT/HN2JjrsaK+dG WFESkqLF2uT5F4Da2RBLu1F1zlFfFi0yQwOzMGmR9nlgAwIUQ+9nTYBqnpEjo0sxbeJi KAoA== X-Gm-Message-State: AA+aEWbRcvVMkBEcibOJDEYPdyTv0yqNra65+vbCNLHJ5apwJr/gb59s QlCAAoKZrNizWNdSOisYKLIkI3cjIcA= X-Google-Smtp-Source: AFSGD/XleFk33AYQfgDFs9iSjvPjE0+sZDwoKrwTkg4KIcYgnfJ5wnbV0YCcpatCe/3gF3eTsHDzgQ== X-Received: by 2002:a17:902:8d8e:: with SMTP id v14mr3774951plo.133.1544221239880; Fri, 07 Dec 2018 14:20:39 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.20.38 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:20:38 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 09/26] aio: only use blk plugs for > 2 depth submissions Date: Fri, 7 Dec 2018 15:19:59 -0700 Message-Id: <20181207222016.29387-10-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Plugging is meant to optimize submission of a string of IOs, if we don't have more than 2 being submitted, don't bother setting up a plug. Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 522c04864d82..ed6c3914477a 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -69,6 +69,12 @@ struct aio_ring { struct io_event io_events[0]; }; /* 128 bytes + ring size */ +/* + * Plugging is meant to work with larger batches of IOs. If we don't + * have more than the below, then don't bother setting up a plug. + */ +#define AIO_PLUG_THRESHOLD 2 + #define AIO_RING_PAGES 8 struct kioctx_table { @@ -1919,7 +1925,8 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, if (nr > ctx->nr_events) nr = ctx->nr_events; - blk_start_plug(&plug); + if (nr > AIO_PLUG_THRESHOLD) + blk_start_plug(&plug); for (i = 0; i < nr; i++) { struct iocb __user *user_iocb; @@ -1932,7 +1939,8 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, if (ret) break; } - blk_finish_plug(&plug); + if (nr > AIO_PLUG_THRESHOLD) + blk_finish_plug(&plug); percpu_ref_put(&ctx->users); return i ? i : ret; @@ -1959,7 +1967,8 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, if (nr > ctx->nr_events) nr = ctx->nr_events; - blk_start_plug(&plug); + if (nr > AIO_PLUG_THRESHOLD) + blk_start_plug(&plug); for (i = 0; i < nr; i++) { compat_uptr_t user_iocb; @@ -1972,7 +1981,8 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, if (ret) break; } - blk_finish_plug(&plug); + if (nr > AIO_PLUG_THRESHOLD) + blk_finish_plug(&plug); percpu_ref_put(&ctx->users); return i ? i : ret; From patchwork Fri Dec 7 22:20:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718871 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 07D1D18B8 for ; Fri, 7 Dec 2018 22:20:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EE8A52F719 for ; Fri, 7 Dec 2018 22:20:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E2C992F72F; Fri, 7 Dec 2018 22:20:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A03792F719 for ; Fri, 7 Dec 2018 22:20:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726146AbeLGWUm (ORCPT ); Fri, 7 Dec 2018 17:20:42 -0500 Received: from mail-pf1-f195.google.com ([209.85.210.195]:33929 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726149AbeLGWUm (ORCPT ); Fri, 7 Dec 2018 17:20:42 -0500 Received: by mail-pf1-f195.google.com with SMTP id h3so2578488pfg.1 for ; Fri, 07 Dec 2018 14:20:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=46YCPSjyTrlcpzDW0Ukbp5ofhNBoK9a5mtdA0d9T/5Q=; b=hEIAIA57YvM6T49z4nRuyqivgkx339NE0Is7F8EdC6VGpeWDszYovlYGGrhvY4SMGR sU8ASN6Ecg3VH8BPxiYsIEhx1wiCu0tRRMPkHFgR9O19xGot9pkheEXp6CpL7aIobacu BPoQ1BcpcDNN+PLk1jxH0m0/den4TPQ8J4EkE2i623VtLWRLMzEvkzPGPrVRfiz94nYO 4vtqal+sSIT0Jgf+i6dy92KAi5KdS0ct1KpvfHYLSffwCtlo/SBzGrXQ92Odsy5lUPQy WLHbwaKtQKOXS8WftqXoV1DegvF5NLa3Z7OcISu2Q+gsIx5athvMdlamTQ5zH7vjcyJk WcyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=46YCPSjyTrlcpzDW0Ukbp5ofhNBoK9a5mtdA0d9T/5Q=; b=kk8An2c9iScAzu/1aam5vIPspc4RuFVaWvl8lfJhCPB4wSwapw+r0PMbhQRPjudR2f WDTo/F9d8V4vCcRhRB8rwWLrxUDdOmei1qzK3FTesOAlbNc70L/enWSZWhvd7yfRxc8E XDzsjiSP35Sda04kXCM3/etapgggoFP2Cs8cwPoVlrsKlc3TF7o7jHQqv0vK3NBWfbqn ZgO11QwVHGE/pSagSc6Rgw3Tq2a6DgM/BISpaEWs/B/P+e0ZAgpO45B0uHGaFHUpdHy2 HcAVlfd/LnxasoYcJ2jHAsUQVO2gTN6X0uidud4+6/6tOB/DDe4DglEMpcLnHIAXoT5q GWTg== X-Gm-Message-State: AA+aEWaHEoOLsg4Z5ExbXWlDo1Hqzjyb/IC4y3akMszniuFO66K01OmK sc/gEFYWioNbPMStSQHFtQJLxwaHKUo= X-Google-Smtp-Source: AFSGD/XYpuAO/UtIXwV9OMIiUAOwFkuwGINGLpYrd1WPe1j9FE0emBxGfQLSTjjT+JJ/6BhL0pDM+g== X-Received: by 2002:a65:65c9:: with SMTP id y9mr3590368pgv.438.1544221241575; Fri, 07 Dec 2018 14:20:41 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.20.39 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:20:40 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 10/26] aio: use iocb_put() instead of open coding it Date: Fri, 7 Dec 2018 15:20:00 -0700 Message-Id: <20181207222016.29387-11-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Replace the percpu_ref_put() + kmem_cache_free() with a call to iocb_put() instead. Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index ed6c3914477a..cf93b92bfb1e 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1884,10 +1884,9 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, goto out_put_req; return 0; out_put_req: - percpu_ref_put(&ctx->reqs); if (req->ki_eventfd) eventfd_ctx_put(req->ki_eventfd); - kmem_cache_free(kiocb_cachep, req); + iocb_put(req); out_put_reqs_available: put_reqs_available(ctx, 1); return ret; From patchwork Fri Dec 7 22:20:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718875 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0F8F418B8 for ; Fri, 7 Dec 2018 22:20:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 01EFA2F719 for ; Fri, 7 Dec 2018 22:20:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E8F742F72F; Fri, 7 Dec 2018 22:20:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 61A3A2F71D for ; Fri, 7 Dec 2018 22:20:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726149AbeLGWUo (ORCPT ); Fri, 7 Dec 2018 17:20:44 -0500 Received: from mail-pl1-f194.google.com ([209.85.214.194]:34626 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726153AbeLGWUo (ORCPT ); Fri, 7 Dec 2018 17:20:44 -0500 Received: by mail-pl1-f194.google.com with SMTP id w4so2440649plz.1 for ; Fri, 07 Dec 2018 14:20:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=zjXjGJpzxG4sQV1Ooff1P7NVDuSBRD7W2guw+LhTNUo=; b=WJc9Uvbb3JgisLIjJpgdcSnWG+ZtSQ2+gsAyZSsX63ok7RgZ6jlBswh2l9bjK8sN6o k2py1U8Tm/bXGSDrzUY+RxDOgzp0Jiqd+mWbte9/U7r0+y5doZkqy6bSRS6EYxAlQVCS BsPGrF8EVXuH4yFmTTiLdkGQpq+VC2/nNeACZ2OxL2/vrk3Sf6k+PYxbKTzfTrNeJdyv 7d9WvcAbOXTht0Cm7hXtgotOEMvIkt/a55AlykeM5dVGsBnviV4TN2D+QTregB0uA7He 7NhCs3DZ4FhMVOVIw/XiDfr0uOKw/FxOEH7ItaeHpePNhOWoptJJxbWtewrZmKiteu9c ybMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=zjXjGJpzxG4sQV1Ooff1P7NVDuSBRD7W2guw+LhTNUo=; b=GUG5N6/S1d0g06vgEepiYcoTAJJpSibfRSBGujZiNoC9TUXzhkfJKhCEY/X4N8NLdc lEd/iY+uo6JVDLQLIO3fJK2IM1tg5noQ4tambrtOglL6JL6O2ea0odM+PFGaw6gZ8nlE /AXo9ZSuCXtqG3V5vgLdcPwsiNObT82fUC3tsVHYmbp69wMDA42ScIHvoENa4V5xM34p dhh2qSfwpmxD0bwmKTDaWIreGMrNj5sZNoBNEJIdfaF9PoQSUcarY4OwJ21kXJxs1HXj BAGinzF0ZoIM014I45Cv73wnJbTUIUQU9gs80isEHgQnN18MvX1MtOP81xnUS22RuMuU naJA== X-Gm-Message-State: AA+aEWYchKNhNyK2v1gw17Egz7hFp6CfvNigviKi6o87Pf06wM6gHSKG XgXOQQPu2FI0q+mgyhkhoCvbnDpWSTk= X-Google-Smtp-Source: AFSGD/UCP/oo0Cstc7DY+gneb1pdcqQivAd6wmtGHQPb9xMOOBrhy/jJWmFMPXhpp0Mlq+oVLz592w== X-Received: by 2002:a17:902:bb86:: with SMTP id m6mr3843766pls.315.1544221243338; Fri, 07 Dec 2018 14:20:43 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.20.41 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:20:42 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 11/26] aio: split out iocb copy from io_submit_one() Date: Fri, 7 Dec 2018 15:20:01 -0700 Message-Id: <20181207222016.29387-12-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In preparation of handing in iocbs in a different fashion as well. Also make it clear that the iocb being passed in isn't modified, by marking it const throughout. Signed-off-by: Jens Axboe --- fs/aio.c | 68 +++++++++++++++++++++++++++++++------------------------- 1 file changed, 38 insertions(+), 30 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index cf93b92bfb1e..06c8bcc72496 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1420,7 +1420,7 @@ static void aio_complete_rw(struct kiocb *kiocb, long res, long res2) aio_complete(iocb, res, res2); } -static int aio_prep_rw(struct kiocb *req, struct iocb *iocb) +static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb) { int ret; @@ -1461,7 +1461,7 @@ static int aio_prep_rw(struct kiocb *req, struct iocb *iocb) return ret; } -static int aio_setup_rw(int rw, struct iocb *iocb, struct iovec **iovec, +static int aio_setup_rw(int rw, const struct iocb *iocb, struct iovec **iovec, bool vectored, bool compat, struct iov_iter *iter) { void __user *buf = (void __user *)(uintptr_t)iocb->aio_buf; @@ -1500,8 +1500,8 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t ret) } } -static ssize_t aio_read(struct kiocb *req, struct iocb *iocb, bool vectored, - bool compat) +static ssize_t aio_read(struct kiocb *req, const struct iocb *iocb, + bool vectored, bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct iov_iter iter; @@ -1533,8 +1533,8 @@ static ssize_t aio_read(struct kiocb *req, struct iocb *iocb, bool vectored, return ret; } -static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored, - bool compat) +static ssize_t aio_write(struct kiocb *req, const struct iocb *iocb, + bool vectored, bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct iov_iter iter; @@ -1589,7 +1589,8 @@ static void aio_fsync_work(struct work_struct *work) aio_complete(container_of(req, struct aio_kiocb, fsync), ret, 0); } -static int aio_fsync(struct fsync_iocb *req, struct iocb *iocb, bool datasync) +static int aio_fsync(struct fsync_iocb *req, const struct iocb *iocb, + bool datasync) { if (unlikely(iocb->aio_buf || iocb->aio_offset || iocb->aio_nbytes || iocb->aio_rw_flags)) @@ -1717,7 +1718,7 @@ aio_poll_queue_proc(struct file *file, struct wait_queue_head *head, add_wait_queue(head, &pt->iocb->poll.wait); } -static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb) +static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) { struct kioctx *ctx = aiocb->ki_ctx; struct poll_iocb *req = &aiocb->poll; @@ -1789,27 +1790,23 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb) return 0; } -static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, - bool compat) +static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, + struct iocb __user *user_iocb, bool compat) { struct aio_kiocb *req; - struct iocb iocb; ssize_t ret; - if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb)))) - return -EFAULT; - /* enforce forwards compatibility on users */ - if (unlikely(iocb.aio_reserved2)) { + if (unlikely(iocb->aio_reserved2)) { pr_debug("EINVAL: reserve field set\n"); return -EINVAL; } /* prevent overflows */ if (unlikely( - (iocb.aio_buf != (unsigned long)iocb.aio_buf) || - (iocb.aio_nbytes != (size_t)iocb.aio_nbytes) || - ((ssize_t)iocb.aio_nbytes < 0) + (iocb->aio_buf != (unsigned long)iocb->aio_buf) || + (iocb->aio_nbytes != (size_t)iocb->aio_nbytes) || + ((ssize_t)iocb->aio_nbytes < 0) )) { pr_debug("EINVAL: overflow check\n"); return -EINVAL; @@ -1823,14 +1820,14 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, if (unlikely(!req)) goto out_put_reqs_available; - if (iocb.aio_flags & IOCB_FLAG_RESFD) { + if (iocb->aio_flags & IOCB_FLAG_RESFD) { /* * If the IOCB_FLAG_RESFD flag of aio_flags is set, get an * instance of the file* now. The file descriptor must be * an eventfd() fd, and will be signaled for each completed * event using the eventfd_signal() function. */ - req->ki_eventfd = eventfd_ctx_fdget((int) iocb.aio_resfd); + req->ki_eventfd = eventfd_ctx_fdget((int) iocb->aio_resfd); if (IS_ERR(req->ki_eventfd)) { ret = PTR_ERR(req->ki_eventfd); req->ki_eventfd = NULL; @@ -1845,32 +1842,32 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, } req->ki_user_iocb = user_iocb; - req->ki_user_data = iocb.aio_data; + req->ki_user_data = iocb->aio_data; - switch (iocb.aio_lio_opcode) { + switch (iocb->aio_lio_opcode) { case IOCB_CMD_PREAD: - ret = aio_read(&req->rw, &iocb, false, compat); + ret = aio_read(&req->rw, iocb, false, compat); break; case IOCB_CMD_PWRITE: - ret = aio_write(&req->rw, &iocb, false, compat); + ret = aio_write(&req->rw, iocb, false, compat); break; case IOCB_CMD_PREADV: - ret = aio_read(&req->rw, &iocb, true, compat); + ret = aio_read(&req->rw, iocb, true, compat); break; case IOCB_CMD_PWRITEV: - ret = aio_write(&req->rw, &iocb, true, compat); + ret = aio_write(&req->rw, iocb, true, compat); break; case IOCB_CMD_FSYNC: - ret = aio_fsync(&req->fsync, &iocb, false); + ret = aio_fsync(&req->fsync, iocb, false); break; case IOCB_CMD_FDSYNC: - ret = aio_fsync(&req->fsync, &iocb, true); + ret = aio_fsync(&req->fsync, iocb, true); break; case IOCB_CMD_POLL: - ret = aio_poll(req, &iocb); + ret = aio_poll(req, iocb); break; default: - pr_debug("invalid aio operation %d\n", iocb.aio_lio_opcode); + pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode); ret = -EINVAL; break; } @@ -1892,6 +1889,17 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, return ret; } +static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, + bool compat) +{ + struct iocb iocb; + + if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb)))) + return -EFAULT; + + return __io_submit_one(ctx, &iocb, user_iocb, compat); +} + /* sys_io_submit: * Queue the nr iocbs pointed to by iocbpp for processing. Returns * the number of iocbs queued. May return -EINVAL if the aio_context From patchwork Fri Dec 7 22:20:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718879 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B644E18B8 for ; Fri, 7 Dec 2018 22:20:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A83872F719 for ; Fri, 7 Dec 2018 22:20:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9C7902F72F; Fri, 7 Dec 2018 22:20:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 574AB2F719 for ; Fri, 7 Dec 2018 22:20:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726110AbeLGWUq (ORCPT ); Fri, 7 Dec 2018 17:20:46 -0500 Received: from mail-pf1-f193.google.com ([209.85.210.193]:36008 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726158AbeLGWUq (ORCPT ); Fri, 7 Dec 2018 17:20:46 -0500 Received: by mail-pf1-f193.google.com with SMTP id b85so2577920pfc.3 for ; Fri, 07 Dec 2018 14:20:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=L8yxXuFlKRVZGhU6vByhKAQcKxwaMq31dRwuK5kkt+A=; b=Er+hb6Yq8/TZJYcriPO+ne2CT5MTRyzMeNe/cnfSi9cafGzEqBVKW/MnUcyOArxdtU OdIAEorCKNtv/3lBtMLONpXbljeVjwseXMlsnUdBqxLeTEEsYZam4GwKeZoL786ZIeSN af/FwHXiasB1Qyo1FN/pg/se0xIg9vpLslimDDApRaLuZJJJCgyaRKryOzyybU5hKqsN 7B83HvNC6kR5WoMOpugnJUFUpt8wz3QyWfgSip1/QpxZUULcBbXhn1drs8chBJnIRH1D m6HoJ9XqXy3p6yELMg1ahWKdk5DG2mDR6gdMQAWgyB0RjIXfyrDxic8oKp1NgHgrWwRr 8P3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=L8yxXuFlKRVZGhU6vByhKAQcKxwaMq31dRwuK5kkt+A=; b=dkd+Fp9HSRCUgqaPIm73FVUUo6lAmLZzSitRBYGLxFWPffKzr11Q0kgr+Qz46o3ZVI VBrunHncsbHWtz22ocAKO3IuPTzUiufS33ZKDyUtf9rH5vcSmaIy7prbDgEnqQSkxgCL 36rLz3XsZ8hNPiWAJCU5mHDwf/kiv2JUrbfNhxGdQvZip6NnujTw8KjgGxPINmbsKXbR U8D1mR+yxG+okgdVkjKWRDa1d06I+vQ2emYe3J4nRkaSgfpG1x8YgrJebrb+x12hSkwv TiFsE++IDHYQo3sKRACHgH5DI78vxE9hOtKOzoOJOTycdfMxPPEgrcBiU9xd2G+umiRg IegA== X-Gm-Message-State: AA+aEWb5ljRhOOMhCVwsaAXxvvR/VsBFviPl9iQOKJpOz6AdEkpueVQJ 4/Ibwv/YQqxAAxhP7ISLkU8/VpU+04s= X-Google-Smtp-Source: AFSGD/UTY46d36ukrZ4o4YP08pwTLo3oBl7YQ7lg/oo5PzxRdBf9yKwn299K7iekdPC6G9nBHy8kEA== X-Received: by 2002:a62:e30d:: with SMTP id g13mr3921641pfh.151.1544221245263; Fri, 07 Dec 2018 14:20:45 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.20.43 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:20:44 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 12/26] aio: abstract out io_event filler helper Date: Fri, 7 Dec 2018 15:20:02 -0700 Message-Id: <20181207222016.29387-13-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Jens Axboe --- fs/aio.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 06c8bcc72496..173f1f79dc8f 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1063,6 +1063,15 @@ static inline void iocb_put(struct aio_kiocb *iocb) } } +static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb, + long res, long res2) +{ + ev->obj = (u64)(unsigned long)iocb->ki_user_iocb; + ev->data = iocb->ki_user_data; + ev->res = res; + ev->res2 = res2; +} + /* aio_complete * Called when the io request on the given iocb is complete. */ @@ -1090,10 +1099,7 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2) ev_page = kmap_atomic(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]); event = ev_page + pos % AIO_EVENTS_PER_PAGE; - event->obj = (u64)(unsigned long)iocb->ki_user_iocb; - event->data = iocb->ki_user_data; - event->res = res; - event->res2 = res2; + aio_fill_event(event, iocb, res, res2); kunmap_atomic(ev_page); flush_dcache_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]); From patchwork Fri Dec 7 22:20:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718883 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8FE731750 for ; Fri, 7 Dec 2018 22:20:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 812F52F719 for ; Fri, 7 Dec 2018 22:20:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 759382F72F; Fri, 7 Dec 2018 22:20:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E398A2F719 for ; Fri, 7 Dec 2018 22:20:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726158AbeLGWUt (ORCPT ); Fri, 7 Dec 2018 17:20:49 -0500 Received: from mail-pf1-f196.google.com ([209.85.210.196]:46052 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726160AbeLGWUs (ORCPT ); Fri, 7 Dec 2018 17:20:48 -0500 Received: by mail-pf1-f196.google.com with SMTP id g62so2556795pfd.12 for ; Fri, 07 Dec 2018 14:20:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=fcaGUab+hmUcXDyEXpGB51iaDLsNQ4/e2XEygehySfs=; b=v3rCw0X0fLOjGhHA/QIc2zQsIKhK3zJSS48+6ulBTNi46vIM6FYLXTErDTT5mAawKl 0YlcGOqGInZ2I1i4MfgFoEink7w0pVWetgghaSYro6mx8afInSHqnhdXUyy1qDBWC7yo tUtYBm+w5FMqpXVO0BBbDr89FzEkJItH94OXHy+BFVwBnVoiK5YZQ+0wk5tTbFR2XvZm vzq/B+Lb3R3P05l+oS8xA8Y+1lYYX543jMymaKQJBZRjIJ2qqnkgnVHiZjzvv3tMvvKb p3Mron+10x+e+FuYJBYRliOek/yVc35NmwsqChfrMtD7DTWsTM3YkFwCZo7xUCXAPK30 U3DA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=fcaGUab+hmUcXDyEXpGB51iaDLsNQ4/e2XEygehySfs=; b=MOwXDdTHCHsEKlZDG+PJYAhVnriNhf3FXJ5eKHw/K2uMSF1JIf2z5eOj8x6DksdlEK ujBpiqYcOcrW4GfSBm3ewarUhqXuQ5F2j/obAV5OIqXG6z6KCcwxAq4PehkmwEVgECuo yOiQ3boK20DP6rqpqeg65rwvMIQFBwsHMwRheSxGvLPISKsYDPg+b8OW6NIok6q9sUSZ n1GbCPc+IwjqD05SQ7tIZJCNmBVqJp1AkrUuR5Q8WHSeBohwFhgSBj5kVLcSnNhtS0Pi 0sxJOIUxmNIKIfIXqYjtHsWmLtsus8e/2ZCUAcsWD0/klZjvuvvRHCOSODuSck2Zvk9k wQEQ== X-Gm-Message-State: AA+aEWY9MNilKUKczqECy/V4oXJSxB6ym5JgMwWr7zoPV+Rgj6+5wD7t ATnUi7u9bAdrPorexZo7LHDdfJKGbZU= X-Google-Smtp-Source: AFSGD/VyKNwvEDwrlF6wHGtvkydSiihsEPBJ31GfUa7cVXyV2iurRk72pJdv94bStnHQJn+8oL7wmQ== X-Received: by 2002:a62:2a4b:: with SMTP id q72mr3911041pfq.61.1544221247035; Fri, 07 Dec 2018 14:20:47 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.20.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:20:46 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 13/26] aio: add io_setup2() system call Date: Fri, 7 Dec 2018 15:20:03 -0700 Message-Id: <20181207222016.29387-14-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is just like io_setup(), except add a flags argument to let the caller control/define some of the io_context behavior. Outside of the flags, we add an iocb array and two user pointers for future use. Signed-off-by: Jens Axboe --- arch/x86/entry/syscalls/syscall_64.tbl | 1 + fs/aio.c | 69 ++++++++++++++++---------- include/linux/syscalls.h | 3 ++ include/uapi/asm-generic/unistd.h | 4 +- kernel/sys_ni.c | 1 + 5 files changed, 52 insertions(+), 26 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index f0b1709a5ffb..67c357225fb0 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -343,6 +343,7 @@ 332 common statx __x64_sys_statx 333 common io_pgetevents __x64_sys_io_pgetevents 334 common rseq __x64_sys_rseq +335 common io_setup2 __x64_sys_io_setup2 # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/fs/aio.c b/fs/aio.c index 173f1f79dc8f..26631d6872d2 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -100,6 +100,8 @@ struct kioctx { unsigned long user_id; + unsigned int flags; + struct __percpu kioctx_cpu *cpu; /* @@ -686,10 +688,8 @@ static void aio_nr_sub(unsigned nr) spin_unlock(&aio_nr_lock); } -/* ioctx_alloc - * Allocates and initializes an ioctx. Returns an ERR_PTR if it failed. - */ -static struct kioctx *ioctx_alloc(unsigned nr_events) +static struct kioctx *io_setup_flags(unsigned long ctxid, + unsigned int nr_events, unsigned int flags) { struct mm_struct *mm = current->mm; struct kioctx *ctx; @@ -701,6 +701,12 @@ static struct kioctx *ioctx_alloc(unsigned nr_events) */ unsigned int max_reqs = nr_events; + if (unlikely(ctxid || nr_events == 0)) { + pr_debug("EINVAL: ctx %lu nr_events %u\n", + ctxid, nr_events); + return ERR_PTR(-EINVAL); + } + /* * We keep track of the number of available ringbuffer slots, to prevent * overflow (reqs_available), and we also use percpu counters for this. @@ -726,6 +732,7 @@ static struct kioctx *ioctx_alloc(unsigned nr_events) if (!ctx) return ERR_PTR(-ENOMEM); + ctx->flags = flags; ctx->max_reqs = max_reqs; spin_lock_init(&ctx->ctx_lock); @@ -1281,6 +1288,34 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr, return ret; } +SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, struct iocb __user *, + iocbs, void __user *, user1, void __user *, user2, + aio_context_t __user *, ctxp) +{ + struct kioctx *ioctx; + unsigned long ctx; + long ret; + + if (flags || user1 || user2) + return -EINVAL; + + ret = get_user(ctx, ctxp); + if (unlikely(ret)) + goto out; + + ioctx = io_setup_flags(ctx, nr_events, flags); + ret = PTR_ERR(ioctx); + if (IS_ERR(ioctx)) + goto out; + + ret = put_user(ioctx->user_id, ctxp); + if (ret) + kill_ioctx(current->mm, ioctx, NULL); + percpu_ref_put(&ioctx->users); +out: + return ret; +} + /* sys_io_setup: * Create an aio_context capable of receiving at least nr_events. * ctxp must not point to an aio_context that already exists, and @@ -1296,7 +1331,7 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr, */ SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctxp) { - struct kioctx *ioctx = NULL; + struct kioctx *ioctx; unsigned long ctx; long ret; @@ -1304,14 +1339,7 @@ SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctxp) if (unlikely(ret)) goto out; - ret = -EINVAL; - if (unlikely(ctx || nr_events == 0)) { - pr_debug("EINVAL: ctx %lu nr_events %u\n", - ctx, nr_events); - goto out; - } - - ioctx = ioctx_alloc(nr_events); + ioctx = io_setup_flags(ctx, nr_events, 0); ret = PTR_ERR(ioctx); if (!IS_ERR(ioctx)) { ret = put_user(ioctx->user_id, ctxp); @@ -1327,7 +1355,7 @@ SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctxp) #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE2(io_setup, unsigned, nr_events, u32 __user *, ctx32p) { - struct kioctx *ioctx = NULL; + struct kioctx *ioctx; unsigned long ctx; long ret; @@ -1335,23 +1363,14 @@ COMPAT_SYSCALL_DEFINE2(io_setup, unsigned, nr_events, u32 __user *, ctx32p) if (unlikely(ret)) goto out; - ret = -EINVAL; - if (unlikely(ctx || nr_events == 0)) { - pr_debug("EINVAL: ctx %lu nr_events %u\n", - ctx, nr_events); - goto out; - } - - ioctx = ioctx_alloc(nr_events); + ioctx = io_setup_flags(ctx, nr_events, 0); ret = PTR_ERR(ioctx); if (!IS_ERR(ioctx)) { - /* truncating is ok because it's a user address */ - ret = put_user((u32)ioctx->user_id, ctx32p); + ret = put_user(ioctx->user_id, ctx32p); if (ret) kill_ioctx(current->mm, ioctx, NULL); percpu_ref_put(&ioctx->users); } - out: return ret; } diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 2ac3d13a915b..a20a663d583f 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -287,6 +287,9 @@ static inline void addr_limit_user_check(void) */ #ifndef CONFIG_ARCH_HAS_SYSCALL_WRAPPER asmlinkage long sys_io_setup(unsigned nr_reqs, aio_context_t __user *ctx); +asmlinkage long sys_io_setup2(unsigned, unsigned, struct iocb __user *, + void __user *, void __user *, + aio_context_t __user *); asmlinkage long sys_io_destroy(aio_context_t ctx); asmlinkage long sys_io_submit(aio_context_t, long, struct iocb __user * __user *); diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 538546edbfbd..b4527ed373b0 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -738,9 +738,11 @@ __SYSCALL(__NR_statx, sys_statx) __SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents) #define __NR_rseq 293 __SYSCALL(__NR_rseq, sys_rseq) +#define __NR_io_setup2 294 +__SYSCALL(__NR_io_setup2, sys_io_setup2) #undef __NR_syscalls -#define __NR_syscalls 294 +#define __NR_syscalls 295 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index df556175be50..17c8b4393669 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -37,6 +37,7 @@ asmlinkage long sys_ni_syscall(void) */ COND_SYSCALL(io_setup); +COND_SYSCALL(io_setup2); COND_SYSCALL_COMPAT(io_setup); COND_SYSCALL(io_destroy); COND_SYSCALL(io_submit); From patchwork Fri Dec 7 22:20:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718887 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 986411750 for ; Fri, 7 Dec 2018 22:20:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8A6AE2F719 for ; Fri, 7 Dec 2018 22:20:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7EA9F2F72F; Fri, 7 Dec 2018 22:20:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E5B882F719 for ; Fri, 7 Dec 2018 22:20:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726160AbeLGWUv (ORCPT ); Fri, 7 Dec 2018 17:20:51 -0500 Received: from mail-pf1-f195.google.com ([209.85.210.195]:37339 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726165AbeLGWUu (ORCPT ); Fri, 7 Dec 2018 17:20:50 -0500 Received: by mail-pf1-f195.google.com with SMTP id y126so2573618pfb.4 for ; Fri, 07 Dec 2018 14:20:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=VIelYwIQhuJ5AQqTwtkFXKeN6WxDwL+cWQ8voRQ13S0=; b=wQf2il1AT3i1eW8QPm3pwIBuFJwED4Cg0sPnXfT5QuZgjGwne9r/xJDY4ojykqYnNM xnYZUudpYWz7VkNAoIaWt1hTBcBzKwcX2lKuqZ0GWnQm6P5gwprIx2caVBN/jXojnQxA 2hFgHBZuaB03i5eb7cI8R2YJ+Kx+Jym1cIs7D59c+v/6AyqfJdVgsG/rw0MmGmDoTgJP RhiabHWhOqGO6sdGML9HaWgRYFK3FcVngwn447zIZSHNwygbPRX3+LUdHOsVo2jZtAFx 4ZajkSHctzFnLI/eGRYX3pt6HGz0J76MyV090SRLu5DC/2Chtyc9apxHDk9jfpN+Ojb9 d+0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=VIelYwIQhuJ5AQqTwtkFXKeN6WxDwL+cWQ8voRQ13S0=; b=e7mtlH8nIGAdjw2AhEv989qE477Wg8yPpzingH+HllyFLRAwPNxTlhiKZ8xL0LSA+5 NUu2bjvEcX32nsvdO0fEoXApFu+pOYKu8KycL4L1y08S/Hb4Wh5h/qJIMUdHoLKspv2Y tPolqlS26BPh9CROT/2LN8AITwqL4ujYWM59gY0vOr2NjWwVDJjXPUHDtb4n7CH1urA0 Z5lXI3GqaIbIkVRVInSB8Gen67CIOVxP0RQkbflbnj+SQ9VdfoaZHzTcXAGq8C4YjHUH FESNM3CUwIT/1iAd36c+25d07apqSFigCCRqL8bumiL1k7sgfxPmk+qMGzjOpv27FiMu YIbA== X-Gm-Message-State: AA+aEWZtCGkJUCxQNHe5yanHVFyoUn2uzpyvulOKZ7qQGFQWQIzDg828 +ZhBGPGNSxYEfaFaUe4yf/Zz27WTZWs= X-Google-Smtp-Source: AFSGD/WgSh+6Z1l3NOEvKJkJo7Sc7A1vJvKqfs5SxWClO2KpXCiStupxo2Py4alFCWLiei2PHT1QFQ== X-Received: by 2002:a62:8c11:: with SMTP id m17mr3957122pfd.224.1544221249177; Fri, 07 Dec 2018 14:20:49 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.20.47 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:20:48 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 14/26] aio: add support for having user mapped iocbs Date: Fri, 7 Dec 2018 15:20:04 -0700 Message-Id: <20181207222016.29387-15-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP For io_submit(), we have to first copy each pointer to an iocb, then copy the iocb. The latter is 64 bytes in size, and that's a lot of copying for a single IO. Add support for setting IOCTX_FLAG_USERIOCB through the new io_setup2() system call, which allows the iocbs to reside in userspace. If this flag is used, then io_submit() doesn't take pointers to iocbs anymore, it takes an index value into the array of iocbs instead. Similary, for io_getevents(), the iocb ->obj will be the index, not the pointer to the iocb. See the change made to fio to support this feature, it's pretty trivialy to adapt to. For applications, like fio, that previously embedded the iocb inside a application private structure, some sort of lookup table/structure is needed to find the private IO structure from the index at io_getevents() time. http://git.kernel.dk/cgit/fio/commit/?id=3c3168e91329c83880c91e5abc28b9d6b940fd95 Signed-off-by: Jens Axboe --- fs/aio.c | 126 +++++++++++++++++++++++++++++++---- include/uapi/linux/aio_abi.h | 2 + 2 files changed, 116 insertions(+), 12 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 26631d6872d2..bb6f07ca6940 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -92,6 +92,11 @@ struct ctx_rq_wait { atomic_t count; }; +struct aio_mapped_range { + struct page **pages; + long nr_pages; +}; + struct kioctx { struct percpu_ref users; atomic_t dead; @@ -127,6 +132,8 @@ struct kioctx { struct page **ring_pages; long nr_pages; + struct aio_mapped_range iocb_range; + struct rcu_work free_rwork; /* see free_ioctx() */ /* @@ -222,6 +229,11 @@ static struct vfsmount *aio_mnt; static const struct file_operations aio_ring_fops; static const struct address_space_operations aio_ctx_aops; +static const unsigned int iocb_page_shift = + ilog2(PAGE_SIZE / sizeof(struct iocb)); + +static void aio_useriocb_unmap(struct kioctx *); + static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages) { struct file *file; @@ -578,6 +590,7 @@ static void free_ioctx(struct work_struct *work) free_rwork); pr_debug("freeing %p\n", ctx); + aio_useriocb_unmap(ctx); aio_free_ring(ctx); free_percpu(ctx->cpu); percpu_ref_exit(&ctx->reqs); @@ -1288,6 +1301,70 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr, return ret; } +static struct iocb *aio_iocb_from_index(struct kioctx *ctx, int index) +{ + struct iocb *iocb; + + iocb = page_address(ctx->iocb_range.pages[index >> iocb_page_shift]); + index &= ((1 << iocb_page_shift) - 1); + return iocb + index; +} + +static void aio_unmap_range(struct aio_mapped_range *range) +{ + int i; + + if (!range->nr_pages) + return; + + for (i = 0; i < range->nr_pages; i++) + put_page(range->pages[i]); + + kfree(range->pages); + range->pages = NULL; + range->nr_pages = 0; +} + +static int aio_map_range(struct aio_mapped_range *range, void __user *uaddr, + size_t size, int gup_flags) +{ + int nr_pages, ret; + + if ((unsigned long) uaddr & ~PAGE_MASK) + return -EINVAL; + + nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; + + range->pages = kzalloc(nr_pages * sizeof(struct page *), GFP_KERNEL); + if (!range->pages) + return -ENOMEM; + + down_write(¤t->mm->mmap_sem); + ret = get_user_pages((unsigned long) uaddr, nr_pages, gup_flags, + range->pages, NULL); + up_write(¤t->mm->mmap_sem); + + if (ret < nr_pages) { + kfree(range->pages); + return -ENOMEM; + } + + range->nr_pages = nr_pages; + return 0; +} + +static void aio_useriocb_unmap(struct kioctx *ctx) +{ + aio_unmap_range(&ctx->iocb_range); +} + +static int aio_useriocb_map(struct kioctx *ctx, struct iocb __user *iocbs) +{ + size_t size = sizeof(struct iocb) * ctx->max_reqs; + + return aio_map_range(&ctx->iocb_range, iocbs, size, 0); +} + SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, struct iocb __user *, iocbs, void __user *, user1, void __user *, user2, aio_context_t __user *, ctxp) @@ -1296,7 +1373,9 @@ SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, struct iocb __user *, unsigned long ctx; long ret; - if (flags || user1 || user2) + if (user1 || user2) + return -EINVAL; + if (flags & ~IOCTX_FLAG_USERIOCB) return -EINVAL; ret = get_user(ctx, ctxp); @@ -1308,9 +1387,17 @@ SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, struct iocb __user *, if (IS_ERR(ioctx)) goto out; + if (flags & IOCTX_FLAG_USERIOCB) { + ret = aio_useriocb_map(ioctx, iocbs); + if (ret) + goto err; + } + ret = put_user(ioctx->user_id, ctxp); - if (ret) + if (ret) { +err: kill_ioctx(current->mm, ioctx, NULL); + } percpu_ref_put(&ioctx->users); out: return ret; @@ -1860,10 +1947,13 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, } } - ret = put_user(KIOCB_KEY, &user_iocb->aio_key); - if (unlikely(ret)) { - pr_debug("EFAULT: aio_key\n"); - goto out_put_req; + /* Don't support cancel on user mapped iocbs */ + if (!(ctx->flags & IOCTX_FLAG_USERIOCB)) { + ret = put_user(KIOCB_KEY, &user_iocb->aio_key); + if (unlikely(ret)) { + pr_debug("EFAULT: aio_key\n"); + goto out_put_req; + } } req->ki_user_iocb = user_iocb; @@ -1917,12 +2007,22 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, bool compat) { - struct iocb iocb; + struct iocb iocb, *iocbp; - if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb)))) - return -EFAULT; + if (ctx->flags & IOCTX_FLAG_USERIOCB) { + unsigned long iocb_index = (unsigned long) user_iocb; - return __io_submit_one(ctx, &iocb, user_iocb, compat); + if (iocb_index >= ctx->max_reqs) + return -EINVAL; + + iocbp = aio_iocb_from_index(ctx, iocb_index); + } else { + if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb)))) + return -EFAULT; + iocbp = &iocb; + } + + return __io_submit_one(ctx, iocbp, user_iocb, compat); } /* sys_io_submit: @@ -2066,6 +2166,9 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb, if (unlikely(!ctx)) return -EINVAL; + if (ctx->flags & IOCTX_FLAG_USERIOCB) + goto err; + spin_lock_irq(&ctx->ctx_lock); kiocb = lookup_kiocb(ctx, iocb); if (kiocb) { @@ -2082,9 +2185,8 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb, */ ret = -EINPROGRESS; } - +err: percpu_ref_put(&ctx->users); - return ret; } diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index 8387e0af0f76..814e6606c413 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -106,6 +106,8 @@ struct iocb { __u32 aio_resfd; }; /* 64 bytes */ +#define IOCTX_FLAG_USERIOCB (1 << 0) /* iocbs are user mapped */ + #undef IFBIG #undef IFLITTLE From patchwork Fri Dec 7 22:20:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718895 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EE1EE15A6 for ; Fri, 7 Dec 2018 22:20:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DFA292F719 for ; Fri, 7 Dec 2018 22:20:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D40AE2F72F; Fri, 7 Dec 2018 22:20:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B78382F719 for ; Fri, 7 Dec 2018 22:20:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726178AbeLGWUy (ORCPT ); Fri, 7 Dec 2018 17:20:54 -0500 Received: from mail-pf1-f195.google.com ([209.85.210.195]:35753 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726172AbeLGWUx (ORCPT ); Fri, 7 Dec 2018 17:20:53 -0500 Received: by mail-pf1-f195.google.com with SMTP id z9so2577757pfi.2 for ; Fri, 07 Dec 2018 14:20:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=ZFF6/ZynOr0yPqA3Ly2n7U13U1kw5HoaUBPGhqNUis8=; b=UCiPv2Zfhc+TxZFu6rZXToo7XhS+PZ1FYBCmFMGx0N5LcwVU1h2DumXLGryT+4pH6h 4r9umgNJgP/iVnouyg7EDBnd+A9mdrcJzhEZFkuPGf0hRDKslaKwX0X+/pooHadFu+W9 WAB6fHsjWRwzIqpb6vbSprH1h+8tGbOHdgEE4XJzxorXdcvTAnRm15vZrPzROifUMEwJ 0iaQqesq3dRUb8ZINGQGYnUtX8j/YNvk9NkbffgpJFZvEi3HWHRCWav+pudQGxMLh10+ iQYkGc2VU4NDoDQUt8kbAO5nHO8Dq3NLAh4I1hTwdjp3ELTbQrI7bmObp84Ph4MAy5Rk +cyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ZFF6/ZynOr0yPqA3Ly2n7U13U1kw5HoaUBPGhqNUis8=; b=VwhltzWhhSjFeGrmJjrDlDQrrloAhwHETSd/En4AsblWdpho1x5uhTdgDVGuZ/5b0k HQq7apUJyxSL+FanzIPrLJbsvfC99zUDQrntijKokJA5+l9IotaAaRY2Yo9LWf/XtMnb V16xmHaaaOhpUBMCh/4je2GR8MpSQoXsJT3EaRKlEAlkpejdFFi0MKF1P1hXJrEaWYz4 /QlhwBCctnaGyIf/i3NX86u+AteF+2QRUznoofvtMdmQMSqKvYSMgB9WBk5gk520PK04 9ya4wgJpJI/MlWAArdoVKIvVmjlE1hz+L4LTRqQdCYLMnnKdMJ1o1ZEkJlodshb5ZQIC gEQw== X-Gm-Message-State: AA+aEWYgmOKAGOspXyLU+CIhoDz02sU+SLFnqhsxinlRx9Ivd+XuDj9R V30JSeDFK2NavHdoQxVS/OOBsxODrN4= X-Google-Smtp-Source: AFSGD/ViirZfmkL/jJ+FQ8ymWBHwrpQ3NVKOwJUeRG2BgvPacupRc0N9/kcrQSGJYQvbdihfsUpHHg== X-Received: by 2002:a63:6b05:: with SMTP id g5mr3397996pgc.15.1544221251073; Fri, 07 Dec 2018 14:20:51 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.20.49 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:20:50 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 15/26] aio: support for IO polling Date: Fri, 7 Dec 2018 15:20:05 -0700 Message-Id: <20181207222016.29387-16-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add polled variants of PREAD/PREADV and PWRITE/PWRITEV. These act like their non-polled counterparts, except we expect to poll for completion of them. The polling happens at io_getevent() time, and works just like non-polled IO. To setup an io_context for polled IO, the application must call io_setup2() with IOCTX_FLAG_IOPOLL as one of the flags. It is illegal to mix and match polled and non-polled IO on an io_context. Polled IO doesn't support the user mapped completion ring. Events must be reaped through the io_getevents() system call. For non-irq driven poll devices, there's no way to support completion reaping from userspace by just looking at the ring. The application itself is the one that pulls completion entries. Signed-off-by: Jens Axboe --- fs/aio.c | 396 +++++++++++++++++++++++++++++++---- include/uapi/linux/aio_abi.h | 3 + 2 files changed, 363 insertions(+), 36 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index bb6f07ca6940..cc8763b395c1 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -153,6 +153,18 @@ struct kioctx { atomic_t reqs_available; } ____cacheline_aligned_in_smp; + /* iopoll submission state */ + struct { + spinlock_t poll_lock; + struct list_head poll_submitted; + } ____cacheline_aligned_in_smp; + + /* iopoll completion state */ + struct { + struct list_head poll_completing; + struct mutex getevents_lock; + } ____cacheline_aligned_in_smp; + struct { spinlock_t ctx_lock; struct list_head active_reqs; /* used for cancellation */ @@ -205,14 +217,27 @@ struct aio_kiocb { __u64 ki_user_data; /* user's data for completion */ struct list_head ki_list; /* the aio core uses this - * for cancellation */ + * for cancellation, or for + * polled IO */ + + unsigned long ki_flags; +#define IOCB_POLL_COMPLETED 0 /* polled IO has completed */ +#define IOCB_POLL_EAGAIN 1 /* polled submission got EAGAIN */ + refcount_t ki_refcnt; - /* - * If the aio_resfd field of the userspace iocb is not zero, - * this is the underlying eventfd context to deliver events to. - */ - struct eventfd_ctx *ki_eventfd; + union { + /* + * If the aio_resfd field of the userspace iocb is not zero, + * this is the underlying eventfd context to deliver events to. + */ + struct eventfd_ctx *ki_eventfd; + + /* + * For polled IO, stash completion info here + */ + struct io_event ki_ev; + }; }; /*------ sysctl variables----*/ @@ -233,6 +258,7 @@ static const unsigned int iocb_page_shift = ilog2(PAGE_SIZE / sizeof(struct iocb)); static void aio_useriocb_unmap(struct kioctx *); +static void aio_iopoll_reap_events(struct kioctx *); static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages) { @@ -471,11 +497,15 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) int i; struct file *file; - /* Compensate for the ring buffer's head/tail overlap entry */ - nr_events += 2; /* 1 is required, 2 for good luck */ - + /* + * Compensate for the ring buffer's head/tail overlap entry. + * IO polling doesn't require any io event entries + */ size = sizeof(struct aio_ring); - size += sizeof(struct io_event) * nr_events; + if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) { + nr_events += 2; /* 1 is required, 2 for good luck */ + size += sizeof(struct io_event) * nr_events; + } nr_pages = PFN_UP(size); if (nr_pages < 0) @@ -758,6 +788,11 @@ static struct kioctx *io_setup_flags(unsigned long ctxid, INIT_LIST_HEAD(&ctx->active_reqs); + spin_lock_init(&ctx->poll_lock); + INIT_LIST_HEAD(&ctx->poll_submitted); + INIT_LIST_HEAD(&ctx->poll_completing); + mutex_init(&ctx->getevents_lock); + if (percpu_ref_init(&ctx->users, free_ioctx_users, 0, GFP_KERNEL)) goto err; @@ -829,11 +864,15 @@ static int kill_ioctx(struct mm_struct *mm, struct kioctx *ctx, { struct kioctx_table *table; + mutex_lock(&ctx->getevents_lock); spin_lock(&mm->ioctx_lock); if (atomic_xchg(&ctx->dead, 1)) { spin_unlock(&mm->ioctx_lock); + mutex_unlock(&ctx->getevents_lock); return -EINVAL; } + aio_iopoll_reap_events(ctx); + mutex_unlock(&ctx->getevents_lock); table = rcu_dereference_raw(mm->ioctx_table); WARN_ON(ctx != rcu_access_pointer(table->table[ctx->id])); @@ -1042,6 +1081,7 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx) percpu_ref_get(&ctx->reqs); req->ki_ctx = ctx; INIT_LIST_HEAD(&req->ki_list); + req->ki_flags = 0; refcount_set(&req->ki_refcnt, 0); req->ki_eventfd = NULL; return req; @@ -1083,6 +1123,15 @@ static inline void iocb_put(struct aio_kiocb *iocb) } } +static void iocb_put_many(struct kioctx *ctx, void **iocbs, int *nr) +{ + if (*nr) { + percpu_ref_put_many(&ctx->reqs, *nr); + kmem_cache_free_bulk(kiocb_cachep, *nr, iocbs); + *nr = 0; + } +} + static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb, long res, long res2) { @@ -1272,6 +1321,185 @@ static bool aio_read_events(struct kioctx *ctx, long min_nr, long nr, return ret < 0 || *i >= min_nr; } +#define AIO_IOPOLL_BATCH 8 + +/* + * Process completed iocb iopoll entries, copying the result to userspace. + */ +static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, + unsigned int *nr_events, long max) +{ + void *iocbs[AIO_IOPOLL_BATCH]; + struct aio_kiocb *iocb, *n; + int to_free = 0, ret = 0; + + /* Shouldn't happen... */ + if (*nr_events >= max) + return 0; + + list_for_each_entry_safe(iocb, n, &ctx->poll_completing, ki_list) { + if (*nr_events == max) + break; + if (!test_bit(IOCB_POLL_COMPLETED, &iocb->ki_flags)) + continue; + if (to_free == AIO_IOPOLL_BATCH) + iocb_put_many(ctx, iocbs, &to_free); + + list_del(&iocb->ki_list); + iocbs[to_free++] = iocb; + + fput(iocb->rw.ki_filp); + + if (evs && copy_to_user(evs + *nr_events, &iocb->ki_ev, + sizeof(iocb->ki_ev))) { + ret = -EFAULT; + break; + } + (*nr_events)++; + } + + if (to_free) + iocb_put_many(ctx, iocbs, &to_free); + + return ret; +} + +/* + * Poll for a mininum of 'min' events, and a maximum of 'max'. Note that if + * min == 0 we consider that a non-spinning poll check - we'll still enter + * the driver poll loop, but only as a non-spinning completion check. + */ +static int aio_iopoll_getevents(struct kioctx *ctx, + struct io_event __user *event, + unsigned int *nr_events, long min, long max) +{ + struct aio_kiocb *iocb; + int to_poll, polled, ret; + + /* + * Check if we already have done events that satisfy what we need + */ + if (!list_empty(&ctx->poll_completing)) { + ret = aio_iopoll_reap(ctx, event, nr_events, max); + if (ret < 0) + return ret; + if ((min && *nr_events >= min) || *nr_events >= max) + return 0; + } + + /* + * Take in a new working set from the submitted list, if possible. + */ + if (!list_empty_careful(&ctx->poll_submitted)) { + spin_lock(&ctx->poll_lock); + list_splice_init(&ctx->poll_submitted, &ctx->poll_completing); + spin_unlock(&ctx->poll_lock); + } + + if (list_empty(&ctx->poll_completing)) + return 0; + + /* + * Check again now that we have a new batch. + */ + ret = aio_iopoll_reap(ctx, event, nr_events, max); + if (ret < 0) + return ret; + if ((min && *nr_events >= min) || *nr_events >= max) + return 0; + + /* + * Find up to 'max' worth of events to poll for, including the + * events we already successfully polled + */ + polled = to_poll = 0; + list_for_each_entry(iocb, &ctx->poll_completing, ki_list) { + /* + * Poll for needed events with spin == true, anything after + * that we just check if we have more, up to max. + */ + bool spin = !polled || *nr_events < min; + struct kiocb *kiocb = &iocb->rw; + + if (test_bit(IOCB_POLL_COMPLETED, &iocb->ki_flags)) + break; + if (++to_poll + *nr_events > max) + break; + + ret = kiocb->ki_filp->f_op->iopoll(kiocb, spin); + if (ret < 0) + return ret; + + polled += ret; + if (polled + *nr_events >= max) + break; + } + + ret = aio_iopoll_reap(ctx, event, nr_events, max); + if (ret < 0) + return ret; + if (*nr_events >= min) + return 0; + return to_poll; +} + +/* + * We can't just wait for polled events to come to us, we have to actively + * find and complete them. + */ +static void aio_iopoll_reap_events(struct kioctx *ctx) +{ + if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) + return; + + while (!list_empty_careful(&ctx->poll_submitted) || + !list_empty(&ctx->poll_completing)) { + unsigned int nr_events = 0; + + aio_iopoll_getevents(ctx, NULL, &nr_events, 1, UINT_MAX); + } +} + +static int __aio_iopoll_check(struct kioctx *ctx, struct io_event __user *event, + unsigned int *nr_events, long min_nr, long max_nr) +{ + int ret = 0; + + while (!*nr_events || !need_resched()) { + int tmin = 0; + + if (*nr_events < min_nr) + tmin = min_nr - *nr_events; + + ret = aio_iopoll_getevents(ctx, event, nr_events, tmin, max_nr); + if (ret <= 0) + break; + ret = 0; + } + + return ret; +} + +static int aio_iopoll_check(struct kioctx *ctx, long min_nr, long nr, + struct io_event __user *event) +{ + unsigned int nr_events = 0; + int ret; + + /* Only allow one thread polling at a time */ + if (!mutex_trylock(&ctx->getevents_lock)) + return -EBUSY; + if (unlikely(atomic_read(&ctx->dead))) { + ret = -EINVAL; + goto err; + } + + ret = __aio_iopoll_check(ctx, event, &nr_events, min_nr, nr); +err: + mutex_unlock(&ctx->getevents_lock); + return nr_events ? nr_events : ret; +} + static long read_events(struct kioctx *ctx, long min_nr, long nr, struct io_event __user *event, ktime_t until) @@ -1375,7 +1603,7 @@ SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, struct iocb __user *, if (user1 || user2) return -EINVAL; - if (flags & ~IOCTX_FLAG_USERIOCB) + if (flags & ~(IOCTX_FLAG_USERIOCB | IOCTX_FLAG_IOPOLL)) return -EINVAL; ret = get_user(ctx, ctxp); @@ -1509,13 +1737,8 @@ static void aio_remove_iocb(struct aio_kiocb *iocb) spin_unlock_irqrestore(&ctx->ctx_lock, flags); } -static void aio_complete_rw(struct kiocb *kiocb, long res, long res2) +static void kiocb_end_write(struct kiocb *kiocb) { - struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw); - - if (!list_empty_careful(&iocb->ki_list)) - aio_remove_iocb(iocb); - if (kiocb->ki_flags & IOCB_WRITE) { struct inode *inode = file_inode(kiocb->ki_filp); @@ -1527,19 +1750,48 @@ static void aio_complete_rw(struct kiocb *kiocb, long res, long res2) __sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE); file_end_write(kiocb->ki_filp); } +} + +static void aio_complete_rw(struct kiocb *kiocb, long res, long res2) +{ + struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw); + + if (!list_empty_careful(&iocb->ki_list)) + aio_remove_iocb(iocb); + + kiocb_end_write(kiocb); fput(kiocb->ki_filp); aio_complete(iocb, res, res2); } -static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb) +static void aio_complete_rw_poll(struct kiocb *kiocb, long res, long res2) { + struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw); + + kiocb_end_write(kiocb); + + /* + * Handle EAGAIN from resource limits with polled IO inline, don't + * pass the event back to userspace. + */ + if (unlikely(res == -EAGAIN)) + set_bit(IOCB_POLL_EAGAIN, &iocb->ki_flags); + else { + aio_fill_event(&iocb->ki_ev, iocb, res, res2); + set_bit(IOCB_POLL_COMPLETED, &iocb->ki_flags); + } +} + +static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb) +{ + struct kioctx *ctx = kiocb->ki_ctx; + struct kiocb *req = &kiocb->rw; int ret; req->ki_filp = fget(iocb->aio_fildes); if (unlikely(!req->ki_filp)) return -EBADF; - req->ki_complete = aio_complete_rw; req->ki_pos = iocb->aio_offset; req->ki_flags = iocb_flags(req->ki_filp); if (iocb->aio_flags & IOCB_FLAG_RESFD) @@ -1565,9 +1817,35 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb) if (unlikely(ret)) goto out_fput; - req->ki_flags &= ~IOCB_HIPRI; /* no one is going to poll for this I/O */ - return 0; + if (iocb->aio_flags & IOCB_FLAG_HIPRI) { + /* shares space in the union, and is rather pointless.. */ + ret = -EINVAL; + if (iocb->aio_flags & IOCB_FLAG_RESFD) + goto out_fput; + + /* can't submit polled IO to a non-polled ctx */ + if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) + goto out_fput; + + ret = -EOPNOTSUPP; + if (!(req->ki_flags & IOCB_DIRECT) || + !req->ki_filp->f_op->iopoll) + goto out_fput; + + req->ki_flags |= IOCB_HIPRI; + req->ki_complete = aio_complete_rw_poll; + } else { + /* can't submit non-polled IO to a polled ctx */ + ret = -EINVAL; + if (ctx->flags & IOCTX_FLAG_IOPOLL) + goto out_fput; + /* no one is going to poll for this I/O */ + req->ki_flags &= ~IOCB_HIPRI; + req->ki_complete = aio_complete_rw; + } + + return 0; out_fput: fput(req->ki_filp); return ret; @@ -1612,15 +1890,40 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t ret) } } -static ssize_t aio_read(struct kiocb *req, const struct iocb *iocb, +/* + * After the iocb has been issued, it's safe to be found on the poll list. + * Adding the kiocb to the list AFTER submission ensures that we don't + * find it from a io_getevents() thread before the issuer is done accessing + * the kiocb cookie. + */ +static void aio_iopoll_iocb_issued(struct aio_kiocb *kiocb) +{ + /* + * For fast devices, IO may have already completed. If it has, add + * it to the front so we find it first. We can't add to the poll_done + * list as that's unlocked from the completion side. + */ + const int front_add = test_bit(IOCB_POLL_COMPLETED, &kiocb->ki_flags); + struct kioctx *ctx = kiocb->ki_ctx; + + spin_lock(&ctx->poll_lock); + if (front_add) + list_add(&kiocb->ki_list, &ctx->poll_submitted); + else + list_add_tail(&kiocb->ki_list, &ctx->poll_submitted); + spin_unlock(&ctx->poll_lock); +} + +static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, bool vectored, bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; + struct kiocb *req = &kiocb->rw; struct iov_iter iter; struct file *file; ssize_t ret; - ret = aio_prep_rw(req, iocb); + ret = aio_prep_rw(kiocb, iocb); if (ret) return ret; file = req->ki_filp; @@ -1645,15 +1948,16 @@ static ssize_t aio_read(struct kiocb *req, const struct iocb *iocb, return ret; } -static ssize_t aio_write(struct kiocb *req, const struct iocb *iocb, +static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, bool vectored, bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; + struct kiocb *req = &kiocb->rw; struct iov_iter iter; struct file *file; ssize_t ret; - ret = aio_prep_rw(req, iocb); + ret = aio_prep_rw(kiocb, iocb); if (ret) return ret; file = req->ki_filp; @@ -1924,7 +2228,8 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, return -EINVAL; } - if (!get_reqs_available(ctx)) + /* Poll IO doesn't need ring reservations */ + if (!(ctx->flags & IOCTX_FLAG_IOPOLL) && !get_reqs_available(ctx)) return -EAGAIN; ret = -EAGAIN; @@ -1947,8 +2252,8 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, } } - /* Don't support cancel on user mapped iocbs */ - if (!(ctx->flags & IOCTX_FLAG_USERIOCB)) { + /* Don't support cancel on user mapped iocbs or polled context */ + if (!(ctx->flags & (IOCTX_FLAG_USERIOCB | IOCTX_FLAG_IOPOLL))) { ret = put_user(KIOCB_KEY, &user_iocb->aio_key); if (unlikely(ret)) { pr_debug("EFAULT: aio_key\n"); @@ -1959,26 +2264,33 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, req->ki_user_iocb = user_iocb; req->ki_user_data = iocb->aio_data; + ret = -EINVAL; switch (iocb->aio_lio_opcode) { case IOCB_CMD_PREAD: - ret = aio_read(&req->rw, iocb, false, compat); + ret = aio_read(req, iocb, false, compat); break; case IOCB_CMD_PWRITE: - ret = aio_write(&req->rw, iocb, false, compat); + ret = aio_write(req, iocb, false, compat); break; case IOCB_CMD_PREADV: - ret = aio_read(&req->rw, iocb, true, compat); + ret = aio_read(req, iocb, true, compat); break; case IOCB_CMD_PWRITEV: - ret = aio_write(&req->rw, iocb, true, compat); + ret = aio_write(req, iocb, true, compat); break; case IOCB_CMD_FSYNC: + if (ctx->flags & IOCTX_FLAG_IOPOLL) + break; ret = aio_fsync(&req->fsync, iocb, false); break; case IOCB_CMD_FDSYNC: + if (ctx->flags & IOCTX_FLAG_IOPOLL) + break; ret = aio_fsync(&req->fsync, iocb, true); break; case IOCB_CMD_POLL: + if (ctx->flags & IOCTX_FLAG_IOPOLL) + break; ret = aio_poll(req, iocb); break; default: @@ -1994,13 +2306,21 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, */ if (ret) goto out_put_req; + if (ctx->flags & IOCTX_FLAG_IOPOLL) { + if (test_bit(IOCB_POLL_EAGAIN, &req->ki_flags)) { + ret = -EAGAIN; + goto out_put_req; + } + aio_iopoll_iocb_issued(req); + } return 0; out_put_req: if (req->ki_eventfd) eventfd_ctx_put(req->ki_eventfd); iocb_put(req); out_put_reqs_available: - put_reqs_available(ctx, 1); + if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) + put_reqs_available(ctx, 1); return ret; } @@ -2166,7 +2486,7 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb, if (unlikely(!ctx)) return -EINVAL; - if (ctx->flags & IOCTX_FLAG_USERIOCB) + if (ctx->flags & (IOCTX_FLAG_USERIOCB | IOCTX_FLAG_IOPOLL)) goto err; spin_lock_irq(&ctx->ctx_lock); @@ -2201,8 +2521,12 @@ static long do_io_getevents(aio_context_t ctx_id, long ret = -EINVAL; if (likely(ioctx)) { - if (likely(min_nr <= nr && min_nr >= 0)) - ret = read_events(ioctx, min_nr, nr, events, until); + if (likely(min_nr <= nr && min_nr >= 0)) { + if (ioctx->flags & IOCTX_FLAG_IOPOLL) + ret = aio_iopoll_check(ioctx, min_nr, nr, events); + else + ret = read_events(ioctx, min_nr, nr, events, until); + } percpu_ref_put(&ioctx->users); } diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index 814e6606c413..ea0b9a19f4df 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -52,9 +52,11 @@ enum { * is valid. * IOCB_FLAG_IOPRIO - Set if the "aio_reqprio" member of the "struct iocb" * is valid. + * IOCB_FLAG_HIPRI - Use IO completion polling */ #define IOCB_FLAG_RESFD (1 << 0) #define IOCB_FLAG_IOPRIO (1 << 1) +#define IOCB_FLAG_HIPRI (1 << 2) /* read() from /dev/aio returns these structures. */ struct io_event { @@ -107,6 +109,7 @@ struct iocb { }; /* 64 bytes */ #define IOCTX_FLAG_USERIOCB (1 << 0) /* iocbs are user mapped */ +#define IOCTX_FLAG_IOPOLL (1 << 1) /* io_context is polled */ #undef IFBIG #undef IFLITTLE From patchwork Fri Dec 7 22:20:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718893 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 938B51750 for ; Fri, 7 Dec 2018 22:20:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 858F72F71D for ; Fri, 7 Dec 2018 22:20:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7A1F72F72F; Fri, 7 Dec 2018 22:20:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D944A2F73E for ; Fri, 7 Dec 2018 22:20:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726165AbeLGWUz (ORCPT ); Fri, 7 Dec 2018 17:20:55 -0500 Received: from mail-pg1-f195.google.com ([209.85.215.195]:46781 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726177AbeLGWUy (ORCPT ); Fri, 7 Dec 2018 17:20:54 -0500 Received: by mail-pg1-f195.google.com with SMTP id w7so2282733pgp.13 for ; Fri, 07 Dec 2018 14:20:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=6XVh73s0Q0W4EH1e/N3KNk6BZ3fkpfZDYy4sV7jgKY4=; b=vVRMmWd2SL57CotPINjpU14ygQDlkY3ZeJG/ynIW5B3Une2bjM9yWyLXOa4kJXN1dF d1ZEd3BW3pKX6EShtcqpE2ujezzk/ca2zggZI0tJo+EZk3zaqAPwVhKR6Bbj0W/twahz vC0OEfEA1mBQYc1ULMUhnoUbqSV21KOBrd6w6OhEdDREWDG7SoWjwIFBUXOhEjlbR+je zyF3VPx+qWPD/iM2AHp7VaQgFFAFmcjd6784jK/IiVa5rBeZ5J/pPyhmiGnJGtjB3ACa VwzUiQFKxknjH9Yll48uWx2cDq3pRwMIOBbu+Z9gWvvKqiQqQDas3BJTaRHiszj0BE4x f/pA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=6XVh73s0Q0W4EH1e/N3KNk6BZ3fkpfZDYy4sV7jgKY4=; b=n9SqITjcrayRgGkDZEkHX7Nej0p2JB74Am0yPJyUAt+SeO9q7wzioSfJRpBVxHmJSB M1AaoVsbQV98XCy/Y7EThdkyVHLOmVtlPR8xGWmrEKaxkNhpjN1C43LKv/cYsOf5WocR USP+cDq4jqNCS8RKvxpfOWb9IJ3tHQupDusEqBPpykLnHW0P/6P2q9c651SSguJNkaLF OoyZ/iUxArqQEtuQSSh3oZesSLDbVuK8SJc+fiuDyoXA649Z0/o4EMKVYarextrcIuZQ tF+9C0j0E1h403ylqVzCZn3yJf8IzuQ8nB4vAtg2ZUXAdSXq9L543zVcgH4UBctIG/JO bSpQ== X-Gm-Message-State: AA+aEWYjiFWiPER1KK6HQO5QYVDnobZcCQG3EwIrQdaW2EH6LeZHLLxh 6umrPh78/UF+9l1LaW2XNLQL0SCiJ6A= X-Google-Smtp-Source: AFSGD/Xy95ZQT18H5BYY9VeDcclaLhGOXXMFuNVyHfEHDgwDCQZYAKwP3wHRn0rq9mp/+KKGfWiKNw== X-Received: by 2002:aa7:8497:: with SMTP id u23mr3942149pfn.220.1544221253127; Fri, 07 Dec 2018 14:20:53 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.20.51 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:20:52 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 16/26] aio: add submission side request cache Date: Fri, 7 Dec 2018 15:20:06 -0700 Message-Id: <20181207222016.29387-17-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We have to add each submitted polled request to the io_context poll_submitted list, which means we have to grab the poll_lock. We already use the block plug to batch submissions if we're doing a batch of IO submissions, extend that to cover the poll requests internally as well. Signed-off-by: Jens Axboe --- fs/aio.c | 136 +++++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 113 insertions(+), 23 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index cc8763b395c1..5e840396f6d8 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -240,6 +240,21 @@ struct aio_kiocb { }; }; +struct aio_submit_state { + struct kioctx *ctx; + + struct blk_plug plug; +#ifdef CONFIG_BLOCK + struct blk_plug_cb plug_cb; +#endif + + /* + * Polled iocbs that have been submitted, but not added to the ctx yet + */ + struct list_head req_list; + unsigned int req_count; +}; + /*------ sysctl variables----*/ static DEFINE_SPINLOCK(aio_nr_lock); unsigned long aio_nr; /* current system wide number of aio requests */ @@ -257,6 +272,15 @@ static const struct address_space_operations aio_ctx_aops; static const unsigned int iocb_page_shift = ilog2(PAGE_SIZE / sizeof(struct iocb)); +/* + * We rely on block level unplugs to flush pending requests, if we schedule + */ +#ifdef CONFIG_BLOCK +static const bool aio_use_state_req_list = true; +#else +static const bool aio_use_state_req_list = false; +#endif + static void aio_useriocb_unmap(struct kioctx *); static void aio_iopoll_reap_events(struct kioctx *); @@ -1890,13 +1914,28 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t ret) } } +/* + * Called either at the end of IO submission, or through a plug callback + * because we're going to schedule. Moves out local batch of requests to + * the ctx poll list, so they can be found for polling + reaping. + */ +static void aio_flush_state_reqs(struct kioctx *ctx, + struct aio_submit_state *state) +{ + spin_lock(&ctx->poll_lock); + list_splice_tail_init(&state->req_list, &ctx->poll_submitted); + spin_unlock(&ctx->poll_lock); + state->req_count = 0; +} + /* * After the iocb has been issued, it's safe to be found on the poll list. * Adding the kiocb to the list AFTER submission ensures that we don't * find it from a io_getevents() thread before the issuer is done accessing * the kiocb cookie. */ -static void aio_iopoll_iocb_issued(struct aio_kiocb *kiocb) +static void aio_iopoll_iocb_issued(struct aio_submit_state *state, + struct aio_kiocb *kiocb) { /* * For fast devices, IO may have already completed. If it has, add @@ -1906,12 +1945,21 @@ static void aio_iopoll_iocb_issued(struct aio_kiocb *kiocb) const int front_add = test_bit(IOCB_POLL_COMPLETED, &kiocb->ki_flags); struct kioctx *ctx = kiocb->ki_ctx; - spin_lock(&ctx->poll_lock); - if (front_add) - list_add(&kiocb->ki_list, &ctx->poll_submitted); - else - list_add_tail(&kiocb->ki_list, &ctx->poll_submitted); - spin_unlock(&ctx->poll_lock); + if (!state || !aio_use_state_req_list) { + spin_lock(&ctx->poll_lock); + if (front_add) + list_add(&kiocb->ki_list, &ctx->poll_submitted); + else + list_add_tail(&kiocb->ki_list, &ctx->poll_submitted); + spin_unlock(&ctx->poll_lock); + } else { + if (front_add) + list_add(&kiocb->ki_list, &state->req_list); + else + list_add_tail(&kiocb->ki_list, &state->req_list); + if (++state->req_count >= AIO_IOPOLL_BATCH) + aio_flush_state_reqs(ctx, state); + } } static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, @@ -2207,7 +2255,8 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) } static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, - struct iocb __user *user_iocb, bool compat) + struct iocb __user *user_iocb, + struct aio_submit_state *state, bool compat) { struct aio_kiocb *req; ssize_t ret; @@ -2311,7 +2360,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, ret = -EAGAIN; goto out_put_req; } - aio_iopoll_iocb_issued(req); + aio_iopoll_iocb_issued(state, req); } return 0; out_put_req: @@ -2325,7 +2374,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, } static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, - bool compat) + struct aio_submit_state *state, bool compat) { struct iocb iocb, *iocbp; @@ -2342,7 +2391,44 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, iocbp = &iocb; } - return __io_submit_one(ctx, iocbp, user_iocb, compat); + return __io_submit_one(ctx, iocbp, user_iocb, state, compat); +} + +#ifdef CONFIG_BLOCK +static void aio_state_unplug(struct blk_plug_cb *cb, bool from_schedule) +{ + struct aio_submit_state *state; + + state = container_of(cb, struct aio_submit_state, plug_cb); + if (!list_empty(&state->req_list)) + aio_flush_state_reqs(state->ctx, state); +} +#endif + +/* + * Batched submission is done, ensure local IO is flushed out. + */ +static void aio_submit_state_end(struct aio_submit_state *state) +{ + blk_finish_plug(&state->plug); + if (!list_empty(&state->req_list)) + aio_flush_state_reqs(state->ctx, state); +} + +/* + * Start submission side cache. + */ +static void aio_submit_state_start(struct aio_submit_state *state, + struct kioctx *ctx) +{ + state->ctx = ctx; + INIT_LIST_HEAD(&state->req_list); + state->req_count = 0; +#ifdef CONFIG_BLOCK + state->plug_cb.callback = aio_state_unplug; + blk_start_plug(&state->plug); + list_add(&state->plug_cb.list, &state->plug.cb_list); +#endif } /* sys_io_submit: @@ -2360,10 +2446,10 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, struct iocb __user * __user *, iocbpp) { + struct aio_submit_state state, *statep = NULL; struct kioctx *ctx; long ret = 0; int i = 0; - struct blk_plug plug; if (unlikely(nr < 0)) return -EINVAL; @@ -2377,8 +2463,10 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, if (nr > ctx->nr_events) nr = ctx->nr_events; - if (nr > AIO_PLUG_THRESHOLD) - blk_start_plug(&plug); + if (nr > AIO_PLUG_THRESHOLD) { + aio_submit_state_start(&state, ctx); + statep = &state; + } for (i = 0; i < nr; i++) { struct iocb __user *user_iocb; @@ -2387,12 +2475,12 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, break; } - ret = io_submit_one(ctx, user_iocb, false); + ret = io_submit_one(ctx, user_iocb, statep, false); if (ret) break; } - if (nr > AIO_PLUG_THRESHOLD) - blk_finish_plug(&plug); + if (statep) + aio_submit_state_end(statep); percpu_ref_put(&ctx->users); return i ? i : ret; @@ -2402,10 +2490,10 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, int, nr, compat_uptr_t __user *, iocbpp) { + struct aio_submit_state state, *statep = NULL; struct kioctx *ctx; long ret = 0; int i = 0; - struct blk_plug plug; if (unlikely(nr < 0)) return -EINVAL; @@ -2419,8 +2507,10 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, if (nr > ctx->nr_events) nr = ctx->nr_events; - if (nr > AIO_PLUG_THRESHOLD) - blk_start_plug(&plug); + if (nr > AIO_PLUG_THRESHOLD) { + aio_submit_state_start(&state, ctx); + statep = &state; + } for (i = 0; i < nr; i++) { compat_uptr_t user_iocb; @@ -2429,12 +2519,12 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, break; } - ret = io_submit_one(ctx, compat_ptr(user_iocb), true); + ret = io_submit_one(ctx, compat_ptr(user_iocb), statep, true); if (ret) break; } - if (nr > AIO_PLUG_THRESHOLD) - blk_finish_plug(&plug); + if (statep) + aio_submit_state_end(statep); percpu_ref_put(&ctx->users); return i ? i : ret; From patchwork Fri Dec 7 22:20:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718899 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4781C18B8 for ; Fri, 7 Dec 2018 22:20:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 396262F719 for ; Fri, 7 Dec 2018 22:20:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2DABF2F71D; Fri, 7 Dec 2018 22:20:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BD59E2F72F for ; Fri, 7 Dec 2018 22:20:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726177AbeLGWU4 (ORCPT ); Fri, 7 Dec 2018 17:20:56 -0500 Received: from mail-pf1-f193.google.com ([209.85.210.193]:46372 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726180AbeLGWU4 (ORCPT ); Fri, 7 Dec 2018 17:20:56 -0500 Received: by mail-pf1-f193.google.com with SMTP id c73so2552830pfe.13 for ; Fri, 07 Dec 2018 14:20:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=14kCfLiACV93AenWhL1xKvUDPabDTnEKBX7DXRr/kpI=; b=QniSx9Juudp/AWQnI6Dek3QXL18XRZWqIwDObYPrY/SYNgHxpExBtqmp4O2t/Pcj61 SqxPvtQ7Ur/2UULiAHwjJKzN1/9gbEO3VgsWsp6OEgFuB/WiiQFLAdR96dzbPkfBVquT 2qQnkoWrQXHRQvk6r/uOW8ljhoU/i471c92fxT3QQcO3CI+U8pXuB7hlEWObLjQEWo0+ 8dxc9gnrS2tKH+c/mWYn9MV1VEvFwqdRfpS2ATzUqGeYKDxaT1T3ssAKyX/B6JYXNjGw 0YkKHulaXiYKyIbphJ1i0UENuTr5WgX4mX2+xT/M9fSgu45o2BJg3QG3V+Ebhljf71CC n13Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=14kCfLiACV93AenWhL1xKvUDPabDTnEKBX7DXRr/kpI=; b=Uz06eIuEeoygcQQUCIwswV/p5DjZlEP8c3CyiTzbFYgtq3s9vPsH6RVMLhUmRf/8Bz IJUXmP398oovBsOG9Sf5kMueOt/KlVRbbrBXC4Sqj3Vj5w8/sxVRPu9DoOO26n/aMFVf 2EcNsEDsiNlY+uzzudB6e1hDCfVXBdXZPI0bnBNuqAaN6rq3HxWZRP7JjB0pkCweV9Wo dFG3r7WiYYo+McwjKH9B6PzGejA8pY6sB6m/mJIXn0uW/nb1LmvZayKL92vg/kWC+YGI hV03hbc/CPAjTLzh1j1gE5mY8Ep5kHV4KtZrT0b8l6FBHUNoDAB2a7H89Yt9GH6jgikw Ke+w== X-Gm-Message-State: AA+aEWaEcDwUs9zADL7XRJXCxTb4I/QagPEPGTe+DsoXIQA6vxfDCkel TaDHcXnE0a6XY8AhZP7v8xlV4dKos1c= X-Google-Smtp-Source: AFSGD/UEkx4Qcxfc4BhWKR4uYfM80KgpvBecNZtZ9VFlrwNzhEz34gA3n0x/Ntx4DwkMGYyg+CTyyg== X-Received: by 2002:a63:1f1c:: with SMTP id f28mr3500068pgf.193.1544221255166; Fri, 07 Dec 2018 14:20:55 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.20.53 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:20:54 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 17/26] fs: add fget_many() and fput_many() Date: Fri, 7 Dec 2018 15:20:07 -0700 Message-Id: <20181207222016.29387-18-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Some uses cases repeatedly get and put references to the same file, but the only exposed interface is doing these one at the time. As each of these entail an atomic inc or dec on a shared structure, that cost can add up. Add fget_many(), which works just like fget(), except it takes an argument for how many references to get on the file. Ditto fput_many(), which can drop an arbitrary number of references to a file. Signed-off-by: Jens Axboe --- fs/file.c | 15 ++++++++++----- fs/file_table.c | 10 ++++++++-- include/linux/file.h | 2 ++ include/linux/fs.h | 3 ++- 4 files changed, 22 insertions(+), 8 deletions(-) diff --git a/fs/file.c b/fs/file.c index 7ffd6e9d103d..ad9870edfd51 100644 --- a/fs/file.c +++ b/fs/file.c @@ -676,7 +676,7 @@ void do_close_on_exec(struct files_struct *files) spin_unlock(&files->file_lock); } -static struct file *__fget(unsigned int fd, fmode_t mask) +static struct file *__fget(unsigned int fd, fmode_t mask, unsigned int refs) { struct files_struct *files = current->files; struct file *file; @@ -691,7 +691,7 @@ static struct file *__fget(unsigned int fd, fmode_t mask) */ if (file->f_mode & mask) file = NULL; - else if (!get_file_rcu(file)) + else if (!get_file_rcu_many(file, refs)) goto loop; } rcu_read_unlock(); @@ -699,15 +699,20 @@ static struct file *__fget(unsigned int fd, fmode_t mask) return file; } +struct file *fget_many(unsigned int fd, unsigned int refs) +{ + return __fget(fd, FMODE_PATH, refs); +} + struct file *fget(unsigned int fd) { - return __fget(fd, FMODE_PATH); + return fget_many(fd, 1); } EXPORT_SYMBOL(fget); struct file *fget_raw(unsigned int fd) { - return __fget(fd, 0); + return __fget(fd, 0, 1); } EXPORT_SYMBOL(fget_raw); @@ -738,7 +743,7 @@ static unsigned long __fget_light(unsigned int fd, fmode_t mask) return 0; return (unsigned long)file; } else { - file = __fget(fd, mask); + file = __fget(fd, mask, 1); if (!file) return 0; return FDPUT_FPUT | (unsigned long)file; diff --git a/fs/file_table.c b/fs/file_table.c index e49af4caf15d..6a3964df33e4 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -326,9 +326,9 @@ void flush_delayed_fput(void) static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput); -void fput(struct file *file) +void fput_many(struct file *file, unsigned int refs) { - if (atomic_long_dec_and_test(&file->f_count)) { + if (atomic_long_sub_and_test(refs, &file->f_count)) { struct task_struct *task = current; if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) { @@ -347,6 +347,12 @@ void fput(struct file *file) } } +void fput(struct file *file) +{ + fput_many(file, 1); +} + + /* * synchronous analog of fput(); for kernel threads that might be needed * in some umount() (and thus can't use flush_delayed_fput() without diff --git a/include/linux/file.h b/include/linux/file.h index 6b2fb032416c..3fcddff56bc4 100644 --- a/include/linux/file.h +++ b/include/linux/file.h @@ -13,6 +13,7 @@ struct file; extern void fput(struct file *); +extern void fput_many(struct file *, unsigned int); struct file_operations; struct vfsmount; @@ -44,6 +45,7 @@ static inline void fdput(struct fd fd) } extern struct file *fget(unsigned int fd); +extern struct file *fget_many(unsigned int fd, unsigned int refs); extern struct file *fget_raw(unsigned int fd); extern unsigned long __fdget(unsigned int fd); extern unsigned long __fdget_raw(unsigned int fd); diff --git a/include/linux/fs.h b/include/linux/fs.h index 6a5f71f8ae06..dc54a65c401a 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -952,7 +952,8 @@ static inline struct file *get_file(struct file *f) atomic_long_inc(&f->f_count); return f; } -#define get_file_rcu(x) atomic_long_inc_not_zero(&(x)->f_count) +#define get_file_rcu_many(x, cnt) atomic_long_add_unless(&(x)->f_count, (cnt), 0) +#define get_file_rcu(x) get_file_rcu_many((x), 1) #define fput_atomic(x) atomic_long_add_unless(&(x)->f_count, -1, 1) #define file_count(x) atomic_long_read(&(x)->f_count) From patchwork Fri Dec 7 22:20:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718903 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 827611750 for ; Fri, 7 Dec 2018 22:21:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7438D2F719 for ; Fri, 7 Dec 2018 22:21:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6881B2F72F; Fri, 7 Dec 2018 22:21:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D09872F719 for ; Fri, 7 Dec 2018 22:20:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726180AbeLGWU7 (ORCPT ); Fri, 7 Dec 2018 17:20:59 -0500 Received: from mail-pf1-f194.google.com ([209.85.210.194]:39396 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726183AbeLGWU6 (ORCPT ); Fri, 7 Dec 2018 17:20:58 -0500 Received: by mail-pf1-f194.google.com with SMTP id c72so2569548pfc.6 for ; Fri, 07 Dec 2018 14:20:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=gLcGjah9R8bCJj71DaRq8i6noeZAvkvjJ/zZE6Isvv8=; b=IFFtMdwcun4EZXlFAsoExdZUQPPwSJEEc1H4qTtcaLGqmthQCnS0ROErY4J8s3bDYG NPEgDaxm+jzbIRlAHI36b3BnqhZM0Eo1z16zpXxIvXyEM4OraIiWWQqIl4R7Rduf+cDs B253oSV1+Z3KFIreY5YXdRM0KkQ7uYvBC9gL1X4ceq4d17WZP0g2dPPOnoJdlCY5K2Zt YM64D1NLK3mwLqF1uFbv+8HpAp/Zl0s9K+xGRS/OLCIA4J2nJQDP3+wSnI95AeT8wT02 B7KTaMM0nO/KJzvd8vms6PGOv4BIeT+r3SJ5bKzbbF1dH3ZyK5YoSbPQOrZChJ9QMmHL SveQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=gLcGjah9R8bCJj71DaRq8i6noeZAvkvjJ/zZE6Isvv8=; b=Wq4IPsk4bXNgsl9uBfReB3np/1P21CNaT0LoDYPr2eYVMEtMZa06erqc2CiqqRrr3j 2QYDtEtbqkkJr4UjMaOezgctUuIGINyl2IdvA7flxgxk+O78M+60RHEvZXhhLw/c7auw TT8v35OYn9GwNqY7O9g+SG7UWXnBAVZMhT90nn8O2kTHXH29AQ7EL5bRzO3vqh8CI8L2 gtL+uMqp7Hh5y742/fby/lBLtwY6Z4i9x/IkbCTEiqCJ7CjURrvou9DhtMqQSCo+2YHG ruVdR/RDHQSiPXXuxPQL2OnjRVeO09opEsMEG9iEF5GTQJL5FRxShy9b7qSD/x2nis81 ZtSw== X-Gm-Message-State: AA+aEWa5J9YDv7UE4rChbjUdnVx/f/huq4WvEX4YO2xGJd+ghb1wWA6w 0rMb0Fog6028gESrkkb1I3dF6d8SOr8= X-Google-Smtp-Source: AFSGD/U1ww0Xy3ERiO+rPtEeJDLikZ6eMNv1PQQ8XtI2RoAmiD7UIuFFwWmsdUG510Vd9pfT+ak77w== X-Received: by 2002:a63:1f4e:: with SMTP id q14mr3374273pgm.88.1544221257045; Fri, 07 Dec 2018 14:20:57 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.20.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:20:56 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 18/26] aio: use fget/fput_many() for file references Date: Fri, 7 Dec 2018 15:20:08 -0700 Message-Id: <20181207222016.29387-19-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On the submission side, add file reference batching to the aio_submit_state. We get as many references as the number of iocbs we are submitting, and drop unused ones if we end up switching files. The assumption here is that we're usually only dealing with one fd, and if there are multiple, hopefuly they are at least somewhat ordered. Could trivially be extended to cover multiple fds, if needed. On the completion side we do the same thing, except this is trivially done just locally in aio_iopoll_reap(). Signed-off-by: Jens Axboe --- fs/aio.c | 106 +++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 91 insertions(+), 15 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 5e840396f6d8..3c07cc9cb11a 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -253,6 +253,15 @@ struct aio_submit_state { */ struct list_head req_list; unsigned int req_count; + + /* + * File reference cache + */ + struct file *file; + unsigned int fd; + unsigned int has_refs; + unsigned int used_refs; + unsigned int ios_left; }; /*------ sysctl variables----*/ @@ -1355,7 +1364,8 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, { void *iocbs[AIO_IOPOLL_BATCH]; struct aio_kiocb *iocb, *n; - int to_free = 0, ret = 0; + int file_count, to_free = 0, ret = 0; + struct file *file = NULL; /* Shouldn't happen... */ if (*nr_events >= max) @@ -1372,7 +1382,20 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, list_del(&iocb->ki_list); iocbs[to_free++] = iocb; - fput(iocb->rw.ki_filp); + /* + * Batched puts of the same file, to avoid dirtying the + * file usage count multiple times, if avoidable. + */ + if (!file) { + file = iocb->rw.ki_filp; + file_count = 1; + } else if (file == iocb->rw.ki_filp) { + file_count++; + } else { + fput_many(file, file_count); + file = iocb->rw.ki_filp; + file_count = 1; + } if (evs && copy_to_user(evs + *nr_events, &iocb->ki_ev, sizeof(iocb->ki_ev))) { @@ -1382,6 +1405,9 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, (*nr_events)++; } + if (file) + fput_many(file, file_count); + if (to_free) iocb_put_many(ctx, iocbs, &to_free); @@ -1807,13 +1833,58 @@ static void aio_complete_rw_poll(struct kiocb *kiocb, long res, long res2) } } -static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb) +static void aio_file_put(struct aio_submit_state *state) +{ + if (state->file) { + int diff = state->has_refs - state->used_refs; + + if (diff) + fput_many(state->file, diff); + state->file = NULL; + } +} + +/* + * Get as many references to a file as we have IOs left in this submission, + * assuming most submissions are for one file, or at least that each file + * has more than one submission. + */ +static struct file *aio_file_get(struct aio_submit_state *state, int fd) +{ + if (!state) + return fget(fd); + + if (!state->file) { +get_file: + state->file = fget_many(fd, state->ios_left); + if (!state->file) + return NULL; + + state->fd = fd; + state->has_refs = state->ios_left; + state->used_refs = 1; + state->ios_left--; + return state->file; + } + + if (state->fd == fd) { + state->used_refs++; + state->ios_left--; + return state->file; + } + + aio_file_put(state); + goto get_file; +} + +static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb, + struct aio_submit_state *state) { struct kioctx *ctx = kiocb->ki_ctx; struct kiocb *req = &kiocb->rw; int ret; - req->ki_filp = fget(iocb->aio_fildes); + req->ki_filp = aio_file_get(state, iocb->aio_fildes); if (unlikely(!req->ki_filp)) return -EBADF; req->ki_pos = iocb->aio_offset; @@ -1963,7 +2034,8 @@ static void aio_iopoll_iocb_issued(struct aio_submit_state *state, } static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, - bool vectored, bool compat) + struct aio_submit_state *state, bool vectored, + bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *req = &kiocb->rw; @@ -1971,7 +2043,7 @@ static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, struct file *file; ssize_t ret; - ret = aio_prep_rw(kiocb, iocb); + ret = aio_prep_rw(kiocb, iocb, state); if (ret) return ret; file = req->ki_filp; @@ -1997,7 +2069,8 @@ static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, } static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, - bool vectored, bool compat) + struct aio_submit_state *state, bool vectored, + bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *req = &kiocb->rw; @@ -2005,7 +2078,7 @@ static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, struct file *file; ssize_t ret; - ret = aio_prep_rw(kiocb, iocb); + ret = aio_prep_rw(kiocb, iocb, state); if (ret) return ret; file = req->ki_filp; @@ -2316,16 +2389,16 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, ret = -EINVAL; switch (iocb->aio_lio_opcode) { case IOCB_CMD_PREAD: - ret = aio_read(req, iocb, false, compat); + ret = aio_read(req, iocb, state, false, compat); break; case IOCB_CMD_PWRITE: - ret = aio_write(req, iocb, false, compat); + ret = aio_write(req, iocb, state, false, compat); break; case IOCB_CMD_PREADV: - ret = aio_read(req, iocb, true, compat); + ret = aio_read(req, iocb, state, true, compat); break; case IOCB_CMD_PWRITEV: - ret = aio_write(req, iocb, true, compat); + ret = aio_write(req, iocb, state, true, compat); break; case IOCB_CMD_FSYNC: if (ctx->flags & IOCTX_FLAG_IOPOLL) @@ -2413,17 +2486,20 @@ static void aio_submit_state_end(struct aio_submit_state *state) blk_finish_plug(&state->plug); if (!list_empty(&state->req_list)) aio_flush_state_reqs(state->ctx, state); + aio_file_put(state); } /* * Start submission side cache. */ static void aio_submit_state_start(struct aio_submit_state *state, - struct kioctx *ctx) + struct kioctx *ctx, int max_ios) { state->ctx = ctx; INIT_LIST_HEAD(&state->req_list); state->req_count = 0; + state->file = NULL; + state->ios_left = max_ios; #ifdef CONFIG_BLOCK state->plug_cb.callback = aio_state_unplug; blk_start_plug(&state->plug); @@ -2464,7 +2540,7 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, nr = ctx->nr_events; if (nr > AIO_PLUG_THRESHOLD) { - aio_submit_state_start(&state, ctx); + aio_submit_state_start(&state, ctx, nr); statep = &state; } for (i = 0; i < nr; i++) { @@ -2508,7 +2584,7 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, nr = ctx->nr_events; if (nr > AIO_PLUG_THRESHOLD) { - aio_submit_state_start(&state, ctx); + aio_submit_state_start(&state, ctx, nr); statep = &state; } for (i = 0; i < nr; i++) { From patchwork Fri Dec 7 22:20:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718907 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8C1281750 for ; Fri, 7 Dec 2018 22:21:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7F4F52F754 for ; Fri, 7 Dec 2018 22:21:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 732A32F72F; Fri, 7 Dec 2018 22:21:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2BF1F2F72F for ; Fri, 7 Dec 2018 22:21:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726111AbeLGWVA (ORCPT ); Fri, 7 Dec 2018 17:21:00 -0500 Received: from mail-pf1-f194.google.com ([209.85.210.194]:38046 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726186AbeLGWU7 (ORCPT ); Fri, 7 Dec 2018 17:20:59 -0500 Received: by mail-pf1-f194.google.com with SMTP id q1so2573918pfi.5 for ; Fri, 07 Dec 2018 14:20:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=NdzcXBictQlYbUlStHxqWqhzWxN3PnIKj/R9vsgka+c=; b=PsO5M1NZ/rF8/n9dzwaRjhm3pJMN1CMINpKOUo4iOb0uO7zbw/torLf5p7AUR+4v2F UcVUsWaeGSoqj9bI6GyYSQdreZMFHYehZb2lwtBQgSS00fpMrcq69bmWE9t91ESkkU4+ gmLlECDbyJaJU4u3Yi4OdZPQqlLRTfJ1KVMDSv01kQgqfHVI1Lii2NcBJrCN27A4p5BI iOzCBc9zdbNjxbP6/HYkxAnW3DNrXdo/uHqwvEHt+h1kUi+UK1KQ0Xw8L2lRrV/atc3p zifr/4YRWiEeVsZQokSgINtkttT61ykJzb6xXw9/VBsmyNLBc6fzeCiRVT6hLZZGmAAu rCjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=NdzcXBictQlYbUlStHxqWqhzWxN3PnIKj/R9vsgka+c=; b=TC7uy8X3omyMnXKSxHXmdu9TMvipZVGDtiiUcVLFI4ec3tjsjTxNxi5XCaS5by8Rp4 dT0JgRYwdU/OaSUTqycZzuYCc8TZHKpsCdbWyIQkJCgsDoDxawDmTBaRiVNFq2NGv/92 eqrTDuTtqpPSR51VXK+5hz7IYGBGp8aCQYhl06CSd3gtH7V3bb9SnxNgJzTsrP0EceX2 5PuNtLhsA9JERI92nB6EdSAeCIp/ulyoXZP4hb345SCTnYyqkui5KKXp4TSy9RCF65QS mi9NaYkVj5WgWp2PMvUT5lP2kUP6v/ZB40LvXTnje978DEbap/jtfKTpUmLMbsztEr6C jiYw== X-Gm-Message-State: AA+aEWaoCpy11q0tMzK/JF15B6UFwo6MXXJK393DyQkvFb9mBzPDcKju Nbpq2GxA7/tPO4M9xigQUuP0hLbGoUE= X-Google-Smtp-Source: AFSGD/X2tb6Tl7VAekiPGQmP2AYLfPJTu9UbZZxILVGaHZ/LYeMfl2yJY406ym01aM7K+RLk9X8hJw== X-Received: by 2002:a63:fd53:: with SMTP id m19mr3563577pgj.340.1544221258789; Fri, 07 Dec 2018 14:20:58 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.20.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:20:57 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 19/26] aio: split iocb init from allocation Date: Fri, 7 Dec 2018 15:20:09 -0700 Message-Id: <20181207222016.29387-20-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In preparation from having pre-allocated requests, that we then just need to initialize before use. Signed-off-by: Jens Axboe --- fs/aio.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 3c07cc9cb11a..51c7159f09bf 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1099,6 +1099,16 @@ static bool get_reqs_available(struct kioctx *ctx) return __get_reqs_available(ctx); } +static void aio_iocb_init(struct kioctx *ctx, struct aio_kiocb *req) +{ + percpu_ref_get(&ctx->reqs); + req->ki_ctx = ctx; + INIT_LIST_HEAD(&req->ki_list); + req->ki_flags = 0; + refcount_set(&req->ki_refcnt, 0); + req->ki_eventfd = NULL; +} + /* aio_get_req * Allocate a slot for an aio request. * Returns NULL if no requests are free. @@ -1111,12 +1121,7 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx) if (unlikely(!req)) return NULL; - percpu_ref_get(&ctx->reqs); - req->ki_ctx = ctx; - INIT_LIST_HEAD(&req->ki_list); - req->ki_flags = 0; - refcount_set(&req->ki_refcnt, 0); - req->ki_eventfd = NULL; + aio_iocb_init(ctx, req); return req; } From patchwork Fri Dec 7 22:20:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718909 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1FB5C15A6 for ; Fri, 7 Dec 2018 22:21:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 111EF2F748 for ; Fri, 7 Dec 2018 22:21:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 053282F73E; Fri, 7 Dec 2018 22:21:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 949492F77E for ; Fri, 7 Dec 2018 22:21:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726193AbeLGWVC (ORCPT ); Fri, 7 Dec 2018 17:21:02 -0500 Received: from mail-pg1-f193.google.com ([209.85.215.193]:36535 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726183AbeLGWVB (ORCPT ); Fri, 7 Dec 2018 17:21:01 -0500 Received: by mail-pg1-f193.google.com with SMTP id n2so2304775pgm.3 for ; Fri, 07 Dec 2018 14:21:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=UcPqE80Ltl1zbvUwQf6KT+BSP/CUpMTJDTXygzw2sHo=; b=pT+EpGt+hVszXHU33vPKeNTnGsaNOM4uueWUYszxyxI8xegLRK6vcfCThfF5zUewA/ CrI2U/VrGp2CUvjHp7cE+f55Mnsqg64Hy9yFUyfzIZFtHkwT//HeQP55MSlsGPPGsPrm LdvBOGkXIan67PXaQKIjFt3qBvulvEpc2Ku/5O9ClTAnG6i2M3USl9W4wGlnPzJDXwix jFskszHkvQYNwOWrU7yO+Kfu5o+4N/SShf6MUTr0DuAbswwyhMMsFaZOCnaOe9w/hdgZ zJ0ZyhEiLNC5bG7pfFO47PwNnR7G6Z/MxmgulrVEhjbhtEVxF77mT8r4n3IfKOSSz0BK wGCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=UcPqE80Ltl1zbvUwQf6KT+BSP/CUpMTJDTXygzw2sHo=; b=mJheQjGUNu/9TE/7CLWTJfX/Sh7s2r8JyHzgIwA9I7wFUVinxci7CrI89O8RkLeUd6 FSdofqDVM9idzeOKrJfINLnJ8v22kxlnzoo8w8WxQQWpiI2OU5xbLsNNeCoSqcopACxn +euHtnq7Hm73P0ng2bd6XPg6wJ67bzQjh/V/uTYbB4+9IzHS5H57uSkMzdUurQnPB1Vf jRN9iHXW1Xm5Jc/oGqPOujp2uh6/YGYTOs0UeY8eaVBusiDPZGXmHDBQM3W72NXlnUtf 570djOFsBMQ/iETKv0f0G4eE+NlL9qJ0UEby4gQKkoTUXeSvOObv6ZR8/EeY6A1YGoux x5nQ== X-Gm-Message-State: AA+aEWZHwDY8AdX+aT9gxjSZViagkBIbhanASOkFRei3eJIzIwZaYoSw Xjq7kwht6QW9xkpw5GMmwmoVaW6nATo= X-Google-Smtp-Source: AFSGD/UJYUBi0NbSbjuROEiJNNEr36E05T9Z1nf07ympULZ22V7AzBER1t/o4VK/YMBmfpcemjTKnA== X-Received: by 2002:a62:b9a:: with SMTP id 26mr4017377pfl.196.1544221260567; Fri, 07 Dec 2018 14:21:00 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.20.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:20:59 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 20/26] aio: batch aio_kiocb allocation Date: Fri, 7 Dec 2018 15:20:10 -0700 Message-Id: <20181207222016.29387-21-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Similarly to how we use the state->ios_left to know how many references to get to a file, we can use it to allocate the aio_kiocb's we need in bulk. Signed-off-by: Jens Axboe --- fs/aio.c | 47 +++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 39 insertions(+), 8 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 51c7159f09bf..52a0b2291f71 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -240,6 +240,8 @@ struct aio_kiocb { }; }; +#define AIO_IOPOLL_BATCH 8 + struct aio_submit_state { struct kioctx *ctx; @@ -254,6 +256,13 @@ struct aio_submit_state { struct list_head req_list; unsigned int req_count; + /* + * aio_kiocb alloc cache + */ + void *iocbs[AIO_IOPOLL_BATCH]; + unsigned int free_iocbs; + unsigned int cur_iocb; + /* * File reference cache */ @@ -1113,15 +1122,35 @@ static void aio_iocb_init(struct kioctx *ctx, struct aio_kiocb *req) * Allocate a slot for an aio request. * Returns NULL if no requests are free. */ -static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx) +static struct aio_kiocb *aio_get_req(struct kioctx *ctx, + struct aio_submit_state *state) { struct aio_kiocb *req; - req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL); - if (unlikely(!req)) - return NULL; + if (!state) + req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL); + else if (!state->free_iocbs) { + size_t size; + + size = min_t(size_t, state->ios_left, ARRAY_SIZE(state->iocbs)); + size = kmem_cache_alloc_bulk(kiocb_cachep, GFP_KERNEL, size, + state->iocbs); + if (size < 0) + return ERR_PTR(size); + else if (!size) + return ERR_PTR(-ENOMEM); + state->free_iocbs = size - 1; + state->cur_iocb = 1; + req = state->iocbs[0]; + } else { + req = state->iocbs[state->cur_iocb]; + state->free_iocbs--; + state->cur_iocb++; + } + + if (req) + aio_iocb_init(ctx, req); - aio_iocb_init(ctx, req); return req; } @@ -1359,8 +1388,6 @@ static bool aio_read_events(struct kioctx *ctx, long min_nr, long nr, return ret < 0 || *i >= min_nr; } -#define AIO_IOPOLL_BATCH 8 - /* * Process completed iocb iopoll entries, copying the result to userspace. */ @@ -2360,7 +2387,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, return -EAGAIN; ret = -EAGAIN; - req = aio_get_req(ctx); + req = aio_get_req(ctx, state); if (unlikely(!req)) goto out_put_reqs_available; @@ -2492,6 +2519,9 @@ static void aio_submit_state_end(struct aio_submit_state *state) if (!list_empty(&state->req_list)) aio_flush_state_reqs(state->ctx, state); aio_file_put(state); + if (state->free_iocbs) + kmem_cache_free_bulk(kiocb_cachep, state->free_iocbs, + &state->iocbs[state->cur_iocb]); } /* @@ -2503,6 +2533,7 @@ static void aio_submit_state_start(struct aio_submit_state *state, state->ctx = ctx; INIT_LIST_HEAD(&state->req_list); state->req_count = 0; + state->free_iocbs = 0; state->file = NULL; state->ios_left = max_ios; #ifdef CONFIG_BLOCK From patchwork Fri Dec 7 22:20:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718915 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CFD7E1750 for ; Fri, 7 Dec 2018 22:21:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C1E262F719 for ; Fri, 7 Dec 2018 22:21:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B5D872F72F; Fri, 7 Dec 2018 22:21:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 646C52F719 for ; Fri, 7 Dec 2018 22:21:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726183AbeLGWVE (ORCPT ); Fri, 7 Dec 2018 17:21:04 -0500 Received: from mail-pg1-f193.google.com ([209.85.215.193]:43779 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726186AbeLGWVE (ORCPT ); Fri, 7 Dec 2018 17:21:04 -0500 Received: by mail-pg1-f193.google.com with SMTP id v28so2293121pgk.10 for ; Fri, 07 Dec 2018 14:21:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=qHGUyfMFPUHePmcCM/30R2Nq4Nobfnj5bVLJuuayYxs=; b=HLYj8QAKuvg6vFqQeQEjbn1uUwvS+3UwseMPUNQE3zRB2IuTFM27kHoHS83wKDo3Hf CtlmScrJoZfZDq0u/sFgDqnUYEdWxG0Bjt7glgz/OGOMPEXBZ4P7EtilTSThq9j+F+fU 4jmf/GTkspYhOGGZSKg7h97vVOEhqiefG4tUbF8mfmzrTWERJHjsqbtLXquonwgLR0UL oVqgqTmeh1hO5n2HTH+jtGeX0FH6vlgv9BJf26X90f8uTx7URf0REFk41kEU/QoPYC1j hR5tD/0PE9/8sy1dYZDyjNq1SIm6k+EEi5VIp1+Ey+zMTNuB3DXrDL5mi5nVyahkTzB1 wTFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=qHGUyfMFPUHePmcCM/30R2Nq4Nobfnj5bVLJuuayYxs=; b=cModdYGQ2oCPpRZnI5dOe47G17OZQK8JzqHht/n0rVeu+K3qwHBUYPxXppV0RBYcC8 geS9daQ0X9e30meLIVunfHs2Iq3Ay/QfR7Z2q69fY1demcxXIY02Jt7c/XvdhCLrWRRj ZcaZmta97hJ8ESY0OdzuvN6GDEIBc49HYFP9XO57yx3vAFZNf4T0kSIgYUrszhNXmT5l hmD2slesILfefkwfTzrug6VsXu9nbxJ9KbMkmSuSUcKmo1k/UJtvUSUeTV2VWhWriv5Q Ofl4Leplo/uv7syXwYhPg26zJ3dPk71vQnoMO199uFyRWUi8P7Tgtc9fwYa/QKD/e5iN adOw== X-Gm-Message-State: AA+aEWZF6bOigVmKJtCQMcayacvT8poFyygOylIdQvnhKKby+VRMirUr 97blD9kF5Br/WTQiLXFuQ39e4z1zdHQ= X-Google-Smtp-Source: AFSGD/UdNL9M+7tBMv1YOKzwTcKOdwgKr9dJ/Bk8jdy8STrmHXnTOVgxqL9MMwEBbqM/Usr7jgdGTQ== X-Received: by 2002:a63:cf08:: with SMTP id j8mr3552907pgg.113.1544221262694; Fri, 07 Dec 2018 14:21:02 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.21.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:21:01 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 21/26] block: add BIO_HOLD_PAGES flag Date: Fri, 7 Dec 2018 15:20:11 -0700 Message-Id: <20181207222016.29387-22-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP For user mapped IO, we do get_user_pages() upfront, and then do a put_page() on each page at end_io time to release the page reference. In preparation for having permanently mapped pages, add a BIO_HOLD_PAGES flag that tells us not to release the pages, the caller will do that. Signed-off-by: Jens Axboe --- block/bio.c | 6 ++++-- include/linux/blk_types.h | 1 + 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/block/bio.c b/block/bio.c index 06760543ec81..3e45e5650265 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1636,7 +1636,8 @@ static void bio_dirty_fn(struct work_struct *work) next = bio->bi_private; bio_set_pages_dirty(bio); - bio_release_pages(bio); + if (!bio_flagged(bio, BIO_HOLD_PAGES)) + bio_release_pages(bio); bio_put(bio); } } @@ -1652,7 +1653,8 @@ void bio_check_pages_dirty(struct bio *bio) goto defer; } - bio_release_pages(bio); + if (!bio_flagged(bio, BIO_HOLD_PAGES)) + bio_release_pages(bio); bio_put(bio); return; defer: diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 921d734d6b5d..356a4c89b0d9 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -228,6 +228,7 @@ struct bio { #define BIO_TRACE_COMPLETION 10 /* bio_endio() should trace the final completion * of this bio. */ #define BIO_QUEUE_ENTERED 11 /* can use blk_queue_enter_live() */ +#define BIO_HOLD_PAGES 12 /* don't put O_DIRECT pages */ /* See BVEC_POOL_OFFSET below before adding new flags */ From patchwork Fri Dec 7 22:20:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718935 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4932F18B8 for ; Fri, 7 Dec 2018 22:21:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3B1212F719 for ; Fri, 7 Dec 2018 22:21:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2F2B72F72F; Fri, 7 Dec 2018 22:21:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D1CAC2F719 for ; Fri, 7 Dec 2018 22:21:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726200AbeLGWVG (ORCPT ); Fri, 7 Dec 2018 17:21:06 -0500 Received: from mail-pf1-f196.google.com ([209.85.210.196]:41486 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726197AbeLGWVF (ORCPT ); Fri, 7 Dec 2018 17:21:05 -0500 Received: by mail-pf1-f196.google.com with SMTP id b7so2569070pfi.8 for ; Fri, 07 Dec 2018 14:21:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=l+G2ayniAzRV+upTz48Z9LC9V1kcFWjPEF7AeVywtTI=; b=j7ai4T47xsc2iEqx8lmS22reuF8fx7v0qZWySNmTPh4FsAhHDy1oP75ErMu7FC+IXg xMfArmp7hiGZMllbAkEk9hY7AHaomkeJX0X+XANWMg6dDWo8Ztv6J3pbtI8GS0xT9Ow6 XrEuaIhKJnJjqtRmACtn5aZuHdVS3XYXaYurr43uGXqs6ZfpS5Ri6VBB3y1XamPWgoJO PqC/xbR68UKCIXQ5RU+Js2OgJiQs5URB/64YCZ56VCef8VROnjwcF6rMRQnGF7Fe1vSf H1emvK8HZZaHQKg3WgGSLzapFSMd7W5WRcRxD2+2bex4wRcLP791P5rqHROJaCVNDtU6 N0cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=l+G2ayniAzRV+upTz48Z9LC9V1kcFWjPEF7AeVywtTI=; b=pIjNLkK+ppzyHUiUuPk3GVulrmzcAmwpqTTIh6gj3N8c1bp5rUwLiQscirumXJ1uYe zJSqW0VP9oNSqmROTJvk9JkniZw26+PvDccol+IknS1MuuoFrnfEWz0qMELrNAzz8Y3W uUEcAAvvvhIBvm34mljSvGdlIq5c5/R7fQBUQggrpGqRVKgBUve9qGQV+ThL2EduAb1A 0KSSA2W9+OcjrsCc0ovzgKWySiOQLdR/kTTkewJSEk5XmW5J5hOERcg6B3E7Dsu7LsHt qwFJvIKrpV3TOMqGsJFveiG3Pzg9oI345A+6eLGlSPiQnEtqa8dQfDf/KCk9ZHd4Nse0 nY+w== X-Gm-Message-State: AA+aEWb5uK4kpN+MwLNSd9K7UjM8qozK4lQb75BjsQQQUOWVrlTNNuDW zkVUAme//drLbb7OiUJTtDukExeTSUw= X-Google-Smtp-Source: AFSGD/Wl6HGR/AtY5Hu5MJMTrwz9JfIZEW/HFZ4UcroV4d8aMJ2awWp5MRkgYO8aHs3ydznOTLoLAA== X-Received: by 2002:a62:61c3:: with SMTP id v186mr3998859pfb.55.1544221264625; Fri, 07 Dec 2018 14:21:04 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.21.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:21:03 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 22/26] block: implement bio helper to add iter bvec pages to bio Date: Fri, 7 Dec 2018 15:20:12 -0700 Message-Id: <20181207222016.29387-23-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP For an ITER_BVEC, we can just iterate the iov and add the pages to the bio directly. Signed-off-by: Jens Axboe --- block/bio.c | 27 +++++++++++++++++++++++++++ include/linux/bio.h | 1 + 2 files changed, 28 insertions(+) diff --git a/block/bio.c b/block/bio.c index 3e45e5650265..8158edeb750e 100644 --- a/block/bio.c +++ b/block/bio.c @@ -904,6 +904,33 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) } EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages); +/** + * bio_iov_bvec_add_pages - add pages from an ITER_BVEC to a bio + * @bio: bio to add pages to + * @iter: iov iterator describing the region to be added + * + * Iterate pages in the @iter and add them to the bio. We flag the + * @bio with BIO_HOLD_PAGES, telling IO completion not to free them. + */ +int bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter) +{ + unsigned short orig_vcnt = bio->bi_vcnt; + const struct bio_vec *bv; + + do { + size_t size; + + bv = iter->bvec + iter->iov_offset; + size = bio_add_page(bio, bv->bv_page, bv->bv_len, bv->bv_offset); + if (size != bv->bv_len) + break; + iov_iter_advance(iter, size); + } while (iov_iter_count(iter) && !bio_full(bio)); + + bio_set_flag(bio, BIO_HOLD_PAGES); + return bio->bi_vcnt > orig_vcnt ? 0 : -EINVAL; +} + static void submit_bio_wait_endio(struct bio *bio) { complete(bio->bi_private); diff --git a/include/linux/bio.h b/include/linux/bio.h index 7380b094dcca..ca25ea890192 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -434,6 +434,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page, void __bio_add_page(struct bio *bio, struct page *page, unsigned int len, unsigned int off); int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter); +int bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter); struct rq_map_data; extern struct bio *bio_map_user_iov(struct request_queue *, struct iov_iter *, gfp_t); From patchwork Fri Dec 7 22:20:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718921 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 055FF15A6 for ; Fri, 7 Dec 2018 22:21:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EBC402F71D for ; Fri, 7 Dec 2018 22:21:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DFE5A2F734; Fri, 7 Dec 2018 22:21:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8CDC92F72F for ; Fri, 7 Dec 2018 22:21:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726186AbeLGWVH (ORCPT ); Fri, 7 Dec 2018 17:21:07 -0500 Received: from mail-pg1-f195.google.com ([209.85.215.195]:36545 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726079AbeLGWVH (ORCPT ); Fri, 7 Dec 2018 17:21:07 -0500 Received: by mail-pg1-f195.google.com with SMTP id n2so2304856pgm.3 for ; Fri, 07 Dec 2018 14:21:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=zZOhdIxDv845Z5UGbqrPoX9bgWw6pO6TACmobM92btU=; b=Gt+cLwFNRr/dysEZCzTM6TPJiwZNvTwtB4Xh018YTEXkxA+2F37y7wHOwmDUtZX5gP gwiRDSwHdBtePJzEw1g+k8yQvh3RyCCe2NeM84P8EsuAz3ZIyYNXN3sB8HGGJ8fI8cAT R0/8J6ZPIVOJG5O49OA7mhFHnhr98Csic4Z0oVg6vtQftvflCZfxBysP0yot44n918tW aZyUVKqI2gVygxqNK+7AHh+AZ5eQj3Vbpcd8p/ohcCbB81fml2oq0eEBN5CU9Wx5ldOa 3Shdcf9iTgvhd269kQH004joXIGhejhRWESViADsX06dpGvg6bij2O4QUdN96ZVRXbi7 7xMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=zZOhdIxDv845Z5UGbqrPoX9bgWw6pO6TACmobM92btU=; b=HTJqFVVPcCozojPkVF+nxwcEsjZidENu8G0mvQIa2tApUoa7xcNzZNczUTKpQS7dfY paP/TBwJooLuxpNPEGnosCmwkZRNOxwW2YKzVuD+BnYyX12jsKyAAQhFXL3DvTTm13NP Zk32rS8oX/QOIfyqgixi3DoX0BbTzhGiX7CV/jFAXETnRB3Kf9kkbMGF7V7IEda4DvX8 zbRSEHH3I3UDwl6iCHgUdoG89xJkqgH+Yx7tLMjSNKhWitDmqJ/qZ+QcELjQN2s3K0Uu a0VKqiCzmC2mQinIxBS8i5opr0/H1egfrqLIpxTJ7Nnk+cANvkvNScFbKEHGdGq9eOuO COSA== X-Gm-Message-State: AA+aEWYT2VPUXQcT4A8OsF7wSxB1AQWxqcRb1pzTZoCL6dyJPihQkW3X gIM7HUQsb4WVZhIl0cvLXoC+8mREgOk= X-Google-Smtp-Source: AFSGD/XQyaNZ+KKTfKQw11gP0/tK020TnPXRjvGY3CuEn3DK8mPYCRZTusMLG6nOHrj372wK6sU0yA== X-Received: by 2002:a62:1a44:: with SMTP id a65mr4074933pfa.30.1544221266362; Fri, 07 Dec 2018 14:21:06 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.21.04 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:21:05 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 23/26] fs: add support for mapping an ITER_BVEC for O_DIRECT Date: Fri, 7 Dec 2018 15:20:13 -0700 Message-Id: <20181207222016.29387-24-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This adds support for sync/async O_DIRECT to make a bvec type iter for bdev access, as well as iomap. Signed-off-by: Jens Axboe --- fs/block_dev.c | 16 ++++++++++++---- fs/iomap.c | 10 +++++++--- 2 files changed, 19 insertions(+), 7 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index b8f574615792..236c6abe649d 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -219,7 +219,10 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, bio.bi_end_io = blkdev_bio_end_io_simple; bio.bi_ioprio = iocb->ki_ioprio; - ret = bio_iov_iter_get_pages(&bio, iter); + if (iov_iter_is_bvec(iter)) + ret = bio_iov_bvec_add_pages(&bio, iter); + else + ret = bio_iov_iter_get_pages(&bio, iter); if (unlikely(ret)) goto out; ret = bio.bi_iter.bi_size; @@ -326,8 +329,9 @@ static void blkdev_bio_end_io(struct bio *bio) struct bio_vec *bvec; int i; - bio_for_each_segment_all(bvec, bio, i) - put_page(bvec->bv_page); + if (!bio_flagged(bio, BIO_HOLD_PAGES)) + bio_for_each_segment_all(bvec, bio, i) + put_page(bvec->bv_page); bio_put(bio); } } @@ -381,7 +385,11 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) bio->bi_end_io = blkdev_bio_end_io; bio->bi_ioprio = iocb->ki_ioprio; - ret = bio_iov_iter_get_pages(bio, iter); + if (iov_iter_is_bvec(iter)) + ret = bio_iov_bvec_add_pages(bio, iter); + else + ret = bio_iov_iter_get_pages(bio, iter); + if (unlikely(ret)) { bio->bi_status = BLK_STS_IOERR; bio_endio(bio); diff --git a/fs/iomap.c b/fs/iomap.c index bd483fcb7b5a..e36ad0be03f5 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -1573,8 +1573,9 @@ static void iomap_dio_bio_end_io(struct bio *bio) struct bio_vec *bvec; int i; - bio_for_each_segment_all(bvec, bio, i) - put_page(bvec->bv_page); + if (!bio_flagged(bio, BIO_HOLD_PAGES)) + bio_for_each_segment_all(bvec, bio, i) + put_page(bvec->bv_page); bio_put(bio); } } @@ -1673,7 +1674,10 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; - ret = bio_iov_iter_get_pages(bio, &iter); + if (iov_iter_is_bvec(&iter)) + ret = bio_iov_bvec_add_pages(bio, &iter); + else + ret = bio_iov_iter_get_pages(bio, &iter); if (unlikely(ret)) { /* * We have to stop part way through an IO. We must fall From patchwork Fri Dec 7 22:20:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718925 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3B03A15A6 for ; Fri, 7 Dec 2018 22:21:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2BEF12F71D for ; Fri, 7 Dec 2018 22:21:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1FBE52F734; Fri, 7 Dec 2018 22:21:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 54FFA2F72F for ; Fri, 7 Dec 2018 22:21:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726079AbeLGWVK (ORCPT ); Fri, 7 Dec 2018 17:21:10 -0500 Received: from mail-pg1-f195.google.com ([209.85.215.195]:32848 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726084AbeLGWVK (ORCPT ); Fri, 7 Dec 2018 17:21:10 -0500 Received: by mail-pg1-f195.google.com with SMTP id z11so2311708pgu.0 for ; Fri, 07 Dec 2018 14:21:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=WVJDSRBfflF2eLtTWuUce/zNx6uF2Wx4FI1yEsGrNKo=; b=pzw7xpgVkFcKpoXZq4sDn/6BsTPt28YQGtAs1Y+fI6et8opj9meVNoc0Rj8Z8sET7q wjKm7krfysOJEnlWGHO0lL3SvzdnDrMsqPNCrz64NJBNNtVo6/jNjx651IDVnu51Kqpz nvSLEkYZ9zHieGulY9Qm3EMUmbdu6pNCX/QjmFlXW33UD0uvA1a6VvIP+G9ipPcy2zdm Lpx9cki/Nwc+FA53zKRUcMSSWxKa0ENBJnSAXj9ilbkfWn142Y155WSdpUvErkYKOdD0 INtogt0IwkH1ushM2x8B19/Y5yYqUfyI98dz43YUwHMbuIXJGwq2n1e+G/CxIsugRSw6 RqNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=WVJDSRBfflF2eLtTWuUce/zNx6uF2Wx4FI1yEsGrNKo=; b=M4RmqWsYEGS1CiqDyQsEADAkmQChdkxAZXFsw/drZVW3qs9QfN4K6qXb7TgPm1Wtin mP0wZ180HDOcnYUwDxIr4AFagTuaW6N13qnrNtn9Eu8rB+Cm3TL5E9Kfn5UHaN8EJMis C9Np61u/w2Si2zfQW4JyONQifIMZXBRWpWlExAjK5KrnHDY8OuuAfiAlIzKF15t3n8CX EaA2OWqOsTEOSRGYQHc46j6E0A0lS0Q4QyhjQB5DOYKrRszGd1d5Ssm0Z1dHp8xQm3gG tvxn14077pq/IDgC6ThvV45Y1i4/ja8dQqe1tWZNSy+2/UEfMSx+vESeWff9k2IOCBHp 4EoA== X-Gm-Message-State: AA+aEWbUbMjP8dtcklLAOD1Z2iACYvcJX//RAeIltPRb4RP678hSSQs8 yyheOYLMX+wt+LKIMmClH/9h/ZnouVQ= X-Google-Smtp-Source: AFSGD/Xy30V8rBKIgJLmYkQZUXeTQ0z3hPO08zaAgo03/KPKrOd7ATsDICBBkegWixfvscgkp0oPPw== X-Received: by 2002:a62:1112:: with SMTP id z18mr3911928pfi.173.1544221268097; Fri, 07 Dec 2018 14:21:08 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.21.06 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:21:07 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 24/26] aio: add support for pre-mapped user IO buffers Date: Fri, 7 Dec 2018 15:20:14 -0700 Message-Id: <20181207222016.29387-25-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If we have fixed user buffers, we can map them into the kernel when we setup the io_context. That avoids the need to do get_user_pages() for each and every IO. To utilize this feature, the application must set both IOCTX_FLAG_USERIOCB, to provide iocb's in userspace, and then IOCTX_FLAG_FIXEDBUFS. The latter tells aio that the iocbs that are mapped already contain valid destination and sizes. These buffers can then be mapped into the kernel for the life time of the io_context, as opposed to just the duration of the each single IO. Only works with non-vectored read/write commands for now, not with PREADV/PWRITEV. A limit of 4M is imposed as the largest buffer we currently support. There's nothing preventing us from going larger, but we need some cap, and 4M seemed like it would definitely be big enough. RLIMIT_MEMLOCK is used to cap the total amount of memory pinned. See the fio change for how to utilize this feature: http://git.kernel.dk/cgit/fio/commit/?id=2041bd343da1c1e955253f62374588718c64f0f3 Signed-off-by: Jens Axboe --- fs/aio.c | 193 ++++++++++++++++++++++++++++++++--- include/uapi/linux/aio_abi.h | 1 + 2 files changed, 177 insertions(+), 17 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 52a0b2291f71..fd323b3ba499 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -42,6 +42,7 @@ #include #include #include +#include #include #include @@ -97,6 +98,11 @@ struct aio_mapped_range { long nr_pages; }; +struct aio_mapped_ubuf { + struct bio_vec *bvec; + unsigned int nr_bvecs; +}; + struct kioctx { struct percpu_ref users; atomic_t dead; @@ -132,6 +138,8 @@ struct kioctx { struct page **ring_pages; long nr_pages; + struct aio_mapped_ubuf *user_bufs; + struct aio_mapped_range iocb_range; struct rcu_work free_rwork; /* see free_ioctx() */ @@ -301,6 +309,7 @@ static const bool aio_use_state_req_list = false; static void aio_useriocb_unmap(struct kioctx *); static void aio_iopoll_reap_events(struct kioctx *); +static void aio_iocb_buffer_unmap(struct kioctx *); static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages) { @@ -662,6 +671,7 @@ static void free_ioctx(struct work_struct *work) free_rwork); pr_debug("freeing %p\n", ctx); + aio_iocb_buffer_unmap(ctx); aio_useriocb_unmap(ctx); aio_free_ring(ctx); free_percpu(ctx->cpu); @@ -1675,6 +1685,122 @@ static int aio_useriocb_map(struct kioctx *ctx, struct iocb __user *iocbs) return aio_map_range(&ctx->iocb_range, iocbs, size, 0); } +static void aio_iocb_buffer_unmap(struct kioctx *ctx) +{ + int i, j; + + if (!ctx->user_bufs) + return; + + for (i = 0; i < ctx->max_reqs; i++) { + struct aio_mapped_ubuf *amu = &ctx->user_bufs[i]; + + for (j = 0; j < amu->nr_bvecs; j++) + put_page(amu->bvec[j].bv_page); + + kfree(amu->bvec); + amu->nr_bvecs = 0; + } + + kfree(ctx->user_bufs); + ctx->user_bufs = NULL; +} + +static int aio_iocb_buffer_map(struct kioctx *ctx) +{ + unsigned long total_pages, page_limit; + struct page **pages = NULL; + int i, j, got_pages = 0; + struct iocb *iocb; + int ret = -EINVAL; + + ctx->user_bufs = kzalloc(ctx->max_reqs * sizeof(struct aio_mapped_ubuf), + GFP_KERNEL); + if (!ctx->user_bufs) + return -ENOMEM; + + /* Don't allow more pages than we can safely lock */ + total_pages = 0; + page_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; + + for (i = 0; i < ctx->max_reqs; i++) { + struct aio_mapped_ubuf *amu = &ctx->user_bufs[i]; + unsigned long off, start, end, ubuf; + int pret, nr_pages; + size_t size; + + iocb = aio_iocb_from_index(ctx, i); + + /* + * Don't impose further limits on the size and buffer + * constraints here, we'll -EINVAL later when IO is + * submitted if they are wrong. + */ + ret = -EFAULT; + if (!iocb->aio_buf) + goto err; + + /* arbitrary limit, but we need something */ + if (iocb->aio_nbytes > SZ_4M) + goto err; + + ubuf = iocb->aio_buf; + end = (ubuf + iocb->aio_nbytes + PAGE_SIZE - 1) >> PAGE_SHIFT; + start = ubuf >> PAGE_SHIFT; + nr_pages = end - start; + + ret = -ENOMEM; + if (total_pages + nr_pages > page_limit) + goto err; + + if (!pages || nr_pages > got_pages) { + kfree(pages); + pages = kmalloc(nr_pages * sizeof(struct page *), + GFP_KERNEL); + if (!pages) + goto err; + got_pages = nr_pages; + } + + amu->bvec = kmalloc(nr_pages * sizeof(struct bio_vec), + GFP_KERNEL); + if (!amu->bvec) + goto err; + + down_write(¤t->mm->mmap_sem); + pret = get_user_pages((unsigned long) iocb->aio_buf, nr_pages, + 1, pages, NULL); + up_write(¤t->mm->mmap_sem); + + if (pret < nr_pages) { + if (pret < 0) + ret = pret; + goto err; + } + + off = ubuf & ~PAGE_MASK; + size = iocb->aio_nbytes; + for (j = 0; j < nr_pages; j++) { + size_t vec_len; + + vec_len = min_t(size_t, size, PAGE_SIZE - off); + amu->bvec[j].bv_page = pages[j]; + amu->bvec[j].bv_len = vec_len; + amu->bvec[j].bv_offset = off; + off = 0; + size -= vec_len; + } + amu->nr_bvecs = nr_pages; + total_pages += nr_pages; + } + kfree(pages); + return 0; +err: + kfree(pages); + aio_iocb_buffer_unmap(ctx); + return ret; +} + SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, struct iocb __user *, iocbs, void __user *, user1, void __user *, user2, aio_context_t __user *, ctxp) @@ -1685,7 +1811,8 @@ SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, struct iocb __user *, if (user1 || user2) return -EINVAL; - if (flags & ~(IOCTX_FLAG_USERIOCB | IOCTX_FLAG_IOPOLL)) + if (flags & ~(IOCTX_FLAG_USERIOCB | IOCTX_FLAG_IOPOLL | + IOCTX_FLAG_FIXEDBUFS)) return -EINVAL; ret = get_user(ctx, ctxp); @@ -1701,6 +1828,15 @@ SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, struct iocb __user *, ret = aio_useriocb_map(ioctx, iocbs); if (ret) goto err; + if (flags & IOCTX_FLAG_FIXEDBUFS) { + ret = aio_iocb_buffer_map(ioctx); + if (ret) + goto err; + } + } else if (flags & IOCTX_FLAG_FIXEDBUFS) { + /* can only support fixed bufs with user mapped iocbs */ + ret = -EINVAL; + goto err; } ret = put_user(ioctx->user_id, ctxp); @@ -1978,23 +2114,39 @@ static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb, return ret; } -static int aio_setup_rw(int rw, const struct iocb *iocb, struct iovec **iovec, - bool vectored, bool compat, struct iov_iter *iter) +static int aio_setup_rw(int rw, struct aio_kiocb *kiocb, + const struct iocb *iocb, struct iovec **iovec, bool vectored, + bool compat, bool kaddr, struct iov_iter *iter) { - void __user *buf = (void __user *)(uintptr_t)iocb->aio_buf; + void __user *ubuf = (void __user *)(uintptr_t)iocb->aio_buf; size_t len = iocb->aio_nbytes; if (!vectored) { - ssize_t ret = import_single_range(rw, buf, len, *iovec, iter); + ssize_t ret; + + if (!kaddr) { + ret = import_single_range(rw, ubuf, len, *iovec, iter); + } else { + long index = (long) kiocb->ki_user_iocb; + struct aio_mapped_ubuf *amu; + + /* __io_submit_one() already validated the index */ + amu = &kiocb->ki_ctx->user_bufs[index]; + iov_iter_bvec(iter, rw, amu->bvec, amu->nr_bvecs, len); + ret = 0; + + } *iovec = NULL; return ret; } + if (kaddr) + return -EINVAL; #ifdef CONFIG_COMPAT if (compat) - return compat_import_iovec(rw, buf, len, UIO_FASTIOV, iovec, + return compat_import_iovec(rw, ubuf, len, UIO_FASTIOV, iovec, iter); #endif - return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter); + return import_iovec(rw, ubuf, len, UIO_FASTIOV, iovec, iter); } static inline void aio_rw_done(struct kiocb *req, ssize_t ret) @@ -2067,7 +2219,7 @@ static void aio_iopoll_iocb_issued(struct aio_submit_state *state, static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, struct aio_submit_state *state, bool vectored, - bool compat) + bool compat, bool kaddr) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *req = &kiocb->rw; @@ -2087,9 +2239,11 @@ static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, if (unlikely(!file->f_op->read_iter)) goto out_fput; - ret = aio_setup_rw(READ, iocb, &iovec, vectored, compat, &iter); + ret = aio_setup_rw(READ, kiocb, iocb, &iovec, vectored, compat, kaddr, + &iter); if (ret) goto out_fput; + ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter)); if (!ret) aio_rw_done(req, call_read_iter(file, req, &iter)); @@ -2102,7 +2256,7 @@ static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, struct aio_submit_state *state, bool vectored, - bool compat) + bool compat, bool kaddr) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *req = &kiocb->rw; @@ -2122,7 +2276,8 @@ static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, if (unlikely(!file->f_op->write_iter)) goto out_fput; - ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, compat, &iter); + ret = aio_setup_rw(WRITE, kiocb, iocb, &iovec, vectored, compat, kaddr, + &iter); if (ret) goto out_fput; ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter)); @@ -2361,7 +2516,8 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, struct iocb __user *user_iocb, - struct aio_submit_state *state, bool compat) + struct aio_submit_state *state, bool compat, + bool kaddr) { struct aio_kiocb *req; ssize_t ret; @@ -2421,16 +2577,16 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, ret = -EINVAL; switch (iocb->aio_lio_opcode) { case IOCB_CMD_PREAD: - ret = aio_read(req, iocb, state, false, compat); + ret = aio_read(req, iocb, state, false, compat, kaddr); break; case IOCB_CMD_PWRITE: - ret = aio_write(req, iocb, state, false, compat); + ret = aio_write(req, iocb, state, false, compat, kaddr); break; case IOCB_CMD_PREADV: - ret = aio_read(req, iocb, state, true, compat); + ret = aio_read(req, iocb, state, true, compat, kaddr); break; case IOCB_CMD_PWRITEV: - ret = aio_write(req, iocb, state, true, compat); + ret = aio_write(req, iocb, state, true, compat, kaddr); break; case IOCB_CMD_FSYNC: if (ctx->flags & IOCTX_FLAG_IOPOLL) @@ -2482,6 +2638,7 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, struct aio_submit_state *state, bool compat) { struct iocb iocb, *iocbp; + bool kaddr; if (ctx->flags & IOCTX_FLAG_USERIOCB) { unsigned long iocb_index = (unsigned long) user_iocb; @@ -2489,14 +2646,16 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, if (iocb_index >= ctx->max_reqs) return -EINVAL; + kaddr = (ctx->flags & IOCTX_FLAG_FIXEDBUFS) != 0; iocbp = aio_iocb_from_index(ctx, iocb_index); } else { if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb)))) return -EFAULT; + kaddr = false; iocbp = &iocb; } - return __io_submit_one(ctx, iocbp, user_iocb, state, compat); + return __io_submit_one(ctx, iocbp, user_iocb, state, compat, kaddr); } #ifdef CONFIG_BLOCK diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index ea0b9a19f4df..05d72cf86bd3 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -110,6 +110,7 @@ struct iocb { #define IOCTX_FLAG_USERIOCB (1 << 0) /* iocbs are user mapped */ #define IOCTX_FLAG_IOPOLL (1 << 1) /* io_context is polled */ +#define IOCTX_FLAG_FIXEDBUFS (1 << 2) /* IO buffers are fixed */ #undef IFBIG #undef IFLITTLE From patchwork Fri Dec 7 22:20:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718929 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 55E9F15A6 for ; Fri, 7 Dec 2018 22:21:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 480562F71D for ; Fri, 7 Dec 2018 22:21:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3C1062F734; Fri, 7 Dec 2018 22:21:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E7D872F71D for ; Fri, 7 Dec 2018 22:21:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726207AbeLGWVL (ORCPT ); Fri, 7 Dec 2018 17:21:11 -0500 Received: from mail-pg1-f194.google.com ([209.85.215.194]:44919 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726084AbeLGWVL (ORCPT ); Fri, 7 Dec 2018 17:21:11 -0500 Received: by mail-pg1-f194.google.com with SMTP id t13so2286968pgr.11 for ; Fri, 07 Dec 2018 14:21:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=5L87GVqD2gO1OQfLYMKmRtAK+NVQxFtlE73GHDkfrqc=; b=Pizans7bLfj2W7PuQhQQ9a5P0s7tuNnJRng2Q0UeG++g6z3jv1wAnW8qTX3+ekDiCV sCp/NW1N6da1Zt+gciOV2lV/gti5TdMkUFiaD0xSftpSyaZVmY4Ce0W9CF3Vi52SSkn+ qzXZ2lHrM241PXfPyhTJt3l98k6RrwXeFLoEdlyrchMfVU+3xUo4vptKEotXIabxFRre /8YFZ5wIA1uWMwzd0IyC45tcSVoRYJuAypboWMP5OiK1hyfMMi5ZAPtVP1T00IF58m48 tBPTw5B2hl/88UBitt1qln4OE260m/Kh0itxdQGb21Nrp+tnDM+pkSR7tw5m7HaAN1+P eTag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=5L87GVqD2gO1OQfLYMKmRtAK+NVQxFtlE73GHDkfrqc=; b=AEengjYCI5Axr3jpVI8hvGGCqeGCwcPGXxsKFKPvqCmwjOg7zuK36AjZgOD4fsla04 QxK+zriiG5fGd8CNpWGELdfEqooxXrrwUTLEKssSxfr1hSZ7UaMQujgEWfD/MTd4AnTO QyjuD3r41exxg766XQvo2C+84YLfCsmSqDc+fqmomDZqy4hiU2z3Uoqc1oOkUi0ril/e oYiz3aFDpcT5qh3CYsy8HUvZj/biRlMUkpthoZyxDVQ4GtfqpmMKkXa8ov9f5EditTsT /X6iHlwSRnPuNPKoII1R1HWtI+soWqrWTni7xbwPULZzhSZDd59sW4M2SKgJLzwdT6+4 torA== X-Gm-Message-State: AA+aEWYxcZAiHlNzu5CNob4G4aMmzFrJIADg52BwGaUr682JuLFEtY8f 1xttd+cCx5/shZ/Hy6D/aePYi+LvtD0= X-Google-Smtp-Source: AFSGD/XfJimPn/XcMBR25QRr/5CtqdTitfJNUEMUtxcW6GCKeiH/OW66Cj0+ZkmjHtoVsEC1OUdNBw== X-Received: by 2002:a62:3811:: with SMTP id f17mr4061789pfa.206.1544221270229; Fri, 07 Dec 2018 14:21:10 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.21.08 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:21:09 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 25/26] aio: split old ring complete out from aio_complete() Date: Fri, 7 Dec 2018 15:20:15 -0700 Message-Id: <20181207222016.29387-26-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Jens Axboe --- fs/aio.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index fd323b3ba499..de48faeab0fd 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1218,12 +1218,9 @@ static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb, ev->res2 = res2; } -/* aio_complete - * Called when the io request on the given iocb is complete. - */ -static void aio_complete(struct aio_kiocb *iocb, long res, long res2) +static void aio_ring_complete(struct kioctx *ctx, struct aio_kiocb *iocb, + long res, long res2) { - struct kioctx *ctx = iocb->ki_ctx; struct aio_ring *ring; struct io_event *ev_page, *event; unsigned tail, pos, head; @@ -1273,6 +1270,16 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2) spin_unlock_irqrestore(&ctx->completion_lock, flags); pr_debug("added to ring %p at [%u]\n", iocb, tail); +} + +/* aio_complete + * Called when the io request on the given iocb is complete. + */ +static void aio_complete(struct aio_kiocb *iocb, long res, long res2) +{ + struct kioctx *ctx = iocb->ki_ctx; + + aio_ring_complete(ctx, iocb, res, res2); /* * Check if the user asked us to deliver the result through an From patchwork Fri Dec 7 22:20:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10718933 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 070931750 for ; Fri, 7 Dec 2018 22:21:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EB8BD2F71D for ; Fri, 7 Dec 2018 22:21:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DF6D52F734; Fri, 7 Dec 2018 22:21:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3CEEE2F72F for ; Fri, 7 Dec 2018 22:21:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726084AbeLGWVO (ORCPT ); Fri, 7 Dec 2018 17:21:14 -0500 Received: from mail-pl1-f194.google.com ([209.85.214.194]:40214 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726041AbeLGWVO (ORCPT ); Fri, 7 Dec 2018 17:21:14 -0500 Received: by mail-pl1-f194.google.com with SMTP id u18so2424634plq.7 for ; Fri, 07 Dec 2018 14:21:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=+ZiQL0vyYjDxbHqXAWxOWqZqWxNuKSk469GJ26BVEiQ=; b=ynuxm8PwdnLoS39tJjZdUcyQ4olEalH4Dtf8f789LQ3uI3DTGFdtkGk59pS1gx+V1G 6oP0AaeTLeoQgg1LCpFy/lkR59rSadjd7/wOc7ojwD4EukjspPV8Br6ctvkXcKpIuY8J HKiqPaDTIsQYP4VfhFar7Tq1IOgKx1rug5qJlcDtTQcX4LtKZkBAIJhuy+kG8wEqqRqv d1KCV/xpnojTUWEzJZFy3p6iLyHssEYARflCNWZ1/9JZQYuKjmIEKRjqVlAIwPeYCUnH UM9VbPhkO6EslAreeqOZUitKN0/p0ra20iBi2TyZ4j8/l8UCt+nNZxFMIG23gWYAaoqT 183A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=+ZiQL0vyYjDxbHqXAWxOWqZqWxNuKSk469GJ26BVEiQ=; b=sn54iNaaTJkBY9GArJAyNsaUCSdZ97HtRdrLZxjQ47vmpUbsOvNihOLlH2lyX+Q560 LLSneVea3bTJWm6WQUDedeR6/U5GwQ5/gcfYGE+NbrPThAIBGre4MX3ThyMGMMeaLB0D PPRmVsfrXs26BD+xF8l54JvKkNXigZegnwXFkimBZm+XTuVrZauZ/LdL9l6/BRmUBVjH nHVJpV2dr77MTz8jHRolH3ZeNquIAsty7BbzHKXovKBYkYzrPtv8aZ+9E4VNhkqBQwZz l5/rO5s2Vzcl97j6EFAxC1SfRIBLlmLRZoNwtx8HGqA4t0cgu+13QxwXUqW+iCbLTPlx HbKw== X-Gm-Message-State: AA+aEWbOSo1oUU3Qxlb2FYXP3NCCp+07cG2nVHSOa5p+391mkXVnJWtW qeDweSJZpNSfLaJsthnmQPvaBQ/KxaU= X-Google-Smtp-Source: AFSGD/VqgpiCSRbLwWk7pB+jKO1YaVdQ6Xfn7pOLpZDHGJgV4BVAhg8bjNRI7FaAMathomPLEDPkmQ== X-Received: by 2002:a17:902:28e9:: with SMTP id f96mr3818695plb.169.1544221272486; Fri, 07 Dec 2018 14:21:12 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id e9sm5282511pff.5.2018.12.07.14.21.10 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 14:21:11 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 26/26] aio: add support for submission/completion rings Date: Fri, 7 Dec 2018 15:20:16 -0700 Message-Id: <20181207222016.29387-27-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181207222016.29387-1-axboe@kernel.dk> References: <20181207222016.29387-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Experimental support for submitting and completing IO through rings shared between the application and kernel. The submission rings are struct iocb, like we would submit through io_submit(), and the completion rings are struct io_event, like we would pass in (and copy back) from io_getevents(). A new system call is added for this, io_ring_enter(). This system call submits IO that is queued in the SQ ring, and/or completes IO and stores the results in the CQ ring. This could be augmented with a kernel thread that does the submission and polling, then the application would never have to enter the kernel to do IO. Sample application: http://brick.kernel.dk/snaps/aio-ring.c Signed-off-by: Jens Axboe --- arch/x86/entry/syscalls/syscall_64.tbl | 1 + fs/aio.c | 435 +++++++++++++++++++++++-- include/linux/syscalls.h | 4 +- include/uapi/linux/aio_abi.h | 26 ++ kernel/sys_ni.c | 1 + 5 files changed, 433 insertions(+), 34 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 67c357225fb0..55a26700a637 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -344,6 +344,7 @@ 333 common io_pgetevents __x64_sys_io_pgetevents 334 common rseq __x64_sys_rseq 335 common io_setup2 __x64_sys_io_setup2 +336 common io_ring_enter __x64_sys_io_ring_enter # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/fs/aio.c b/fs/aio.c index de48faeab0fd..b00c9fb9fa35 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -142,6 +142,11 @@ struct kioctx { struct aio_mapped_range iocb_range; + /* if used, completion and submission rings */ + struct aio_mapped_range sq_ring; + struct aio_mapped_range cq_ring; + int cq_ring_overflow; + struct rcu_work free_rwork; /* see free_ioctx() */ /* @@ -297,6 +302,8 @@ static const struct address_space_operations aio_ctx_aops; static const unsigned int iocb_page_shift = ilog2(PAGE_SIZE / sizeof(struct iocb)); +static const unsigned int event_page_shift = + ilog2(PAGE_SIZE / sizeof(struct io_event)); /* * We rely on block level unplugs to flush pending requests, if we schedule @@ -307,6 +314,7 @@ static const bool aio_use_state_req_list = true; static const bool aio_use_state_req_list = false; #endif +static void aio_scqring_unmap(struct kioctx *); static void aio_useriocb_unmap(struct kioctx *); static void aio_iopoll_reap_events(struct kioctx *); static void aio_iocb_buffer_unmap(struct kioctx *); @@ -539,6 +547,12 @@ static const struct address_space_operations aio_ctx_aops = { #endif }; +/* Polled IO or SQ/CQ rings don't use the old ring */ +static bool aio_ctx_old_ring(struct kioctx *ctx) +{ + return !(ctx->flags & (IOCTX_FLAG_IOPOLL | IOCTX_FLAG_SCQRING)); +} + static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) { struct aio_ring *ring; @@ -553,7 +567,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) * IO polling doesn't require any io event entries */ size = sizeof(struct aio_ring); - if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) { + if (aio_ctx_old_ring(ctx)) { nr_events += 2; /* 1 is required, 2 for good luck */ size += sizeof(struct io_event) * nr_events; } @@ -640,6 +654,17 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) return 0; } +/* + * Don't support cancel on anything that isn't regular aio + */ +static bool aio_ctx_supports_cancel(struct kioctx *ctx) +{ + int noflags = IOCTX_FLAG_USERIOCB | IOCTX_FLAG_IOPOLL | + IOCTX_FLAG_SCQRING; + + return (ctx->flags & noflags) == 0; +} + #define AIO_EVENTS_PER_PAGE (PAGE_SIZE / sizeof(struct io_event)) #define AIO_EVENTS_FIRST_PAGE ((PAGE_SIZE - sizeof(struct aio_ring)) / sizeof(struct io_event)) #define AIO_EVENTS_OFFSET (AIO_EVENTS_PER_PAGE - AIO_EVENTS_FIRST_PAGE) @@ -650,6 +675,8 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel) struct kioctx *ctx = req->ki_ctx; unsigned long flags; + if (WARN_ON_ONCE(!aio_ctx_supports_cancel(ctx))) + return; if (WARN_ON_ONCE(!list_empty(&req->ki_list))) return; @@ -673,6 +700,7 @@ static void free_ioctx(struct work_struct *work) aio_iocb_buffer_unmap(ctx); aio_useriocb_unmap(ctx); + aio_scqring_unmap(ctx); aio_free_ring(ctx); free_percpu(ctx->cpu); percpu_ref_exit(&ctx->reqs); @@ -1218,6 +1246,47 @@ static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb, ev->res2 = res2; } +static struct io_event *__aio_get_cqring_ev(struct aio_io_event_ring *ring, + struct aio_mapped_range *range, + unsigned *next_tail) +{ + struct io_event *ev; + unsigned tail; + + smp_rmb(); + tail = READ_ONCE(ring->tail); + *next_tail = tail + 1; + if (*next_tail == ring->nr_events) + *next_tail = 0; + if (*next_tail == READ_ONCE(ring->head)) + return NULL; + + /* io_event array starts offset one into the mapped range */ + tail++; + ev = page_address(range->pages[tail >> event_page_shift]); + tail &= ((1 << event_page_shift) - 1); + return ev + tail; +} + +static void aio_commit_cqring(struct kioctx *ctx, unsigned next_tail) +{ + struct aio_io_event_ring *ring; + + ring = page_address(ctx->cq_ring.pages[0]); + if (next_tail != ring->tail) { + ring->tail = next_tail; + smp_wmb(); + } +} + +static struct io_event *aio_peek_cqring(struct kioctx *ctx, unsigned *ntail) +{ + struct aio_io_event_ring *ring; + + ring = page_address(ctx->cq_ring.pages[0]); + return __aio_get_cqring_ev(ring, &ctx->cq_ring, ntail); +} + static void aio_ring_complete(struct kioctx *ctx, struct aio_kiocb *iocb, long res, long res2) { @@ -1279,7 +1348,36 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2) { struct kioctx *ctx = iocb->ki_ctx; - aio_ring_complete(ctx, iocb, res, res2); + if (ctx->flags & IOCTX_FLAG_SCQRING) { + unsigned long flags; + struct io_event *ev; + unsigned int tail; + + /* + * If we can't get a cq entry, userspace overflowed the + * submission (by quite a lot). Flag it as an overflow + * condition, and next io_ring_enter(2) call will return + * -EOVERFLOW. + */ + spin_lock_irqsave(&ctx->completion_lock, flags); + ev = aio_peek_cqring(ctx, &tail); + if (ev) { + aio_fill_event(ev, iocb, res, res2); + aio_commit_cqring(ctx, tail); + } else + ctx->cq_ring_overflow = 1; + spin_unlock_irqrestore(&ctx->completion_lock, flags); + } else { + aio_ring_complete(ctx, iocb, res, res2); + + /* + * We have to order our ring_info tail store above and test + * of the wait list below outside the wait lock. This is + * like in wake_up_bit() where clearing a bit has to be + * ordered with the unlocked test. + */ + smp_mb(); + } /* * Check if the user asked us to deliver the result through an @@ -1291,14 +1389,6 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2) eventfd_ctx_put(iocb->ki_eventfd); } - /* - * We have to order our ring_info tail store above and test - * of the wait list below outside the wait lock. This is - * like in wake_up_bit() where clearing a bit has to be - * ordered with the unlocked test. - */ - smp_mb(); - if (waitqueue_active(&ctx->wait)) wake_up(&ctx->wait); iocb_put(iocb); @@ -1421,6 +1511,9 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, return 0; list_for_each_entry_safe(iocb, n, &ctx->poll_completing, ki_list) { + struct io_event *ev = NULL; + unsigned int next_tail; + if (*nr_events == max) break; if (!test_bit(IOCB_POLL_COMPLETED, &iocb->ki_flags)) @@ -1428,6 +1521,14 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, if (to_free == AIO_IOPOLL_BATCH) iocb_put_many(ctx, iocbs, &to_free); + /* Will only happen if the application over-commits */ + ret = -EAGAIN; + if (ctx->flags & IOCTX_FLAG_SCQRING) { + ev = aio_peek_cqring(ctx, &next_tail); + if (!ev) + break; + } + list_del(&iocb->ki_list); iocbs[to_free++] = iocb; @@ -1446,8 +1547,11 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, file_count = 1; } - if (evs && copy_to_user(evs + *nr_events, &iocb->ki_ev, - sizeof(iocb->ki_ev))) { + if (ev) { + memcpy(ev, &iocb->ki_ev, sizeof(*ev)); + aio_commit_cqring(ctx, next_tail); + } else if (evs && copy_to_user(evs + *nr_events, &iocb->ki_ev, + sizeof(iocb->ki_ev))) { ret = -EFAULT; break; } @@ -1628,15 +1732,42 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr, return ret; } -static struct iocb *aio_iocb_from_index(struct kioctx *ctx, int index) +static struct iocb *__aio_sqring_from_index(struct aio_iocb_ring *ring, + struct aio_mapped_range *range, + int index) { struct iocb *iocb; - iocb = page_address(ctx->iocb_range.pages[index >> iocb_page_shift]); + /* iocb array starts offset one into the mapped range */ + index++; + iocb = page_address(range->pages[index >> iocb_page_shift]); index &= ((1 << iocb_page_shift) - 1); return iocb + index; } +static struct iocb *aio_sqring_from_index(struct kioctx *ctx, int index) +{ + struct aio_iocb_ring *ring; + + ring = page_address(ctx->sq_ring.pages[0]); + return __aio_sqring_from_index(ring, &ctx->sq_ring, index); +} + +static struct iocb *aio_iocb_from_index(struct kioctx *ctx, int index) +{ + struct iocb *iocb; + + if (ctx->flags & IOCTX_FLAG_SCQRING) { + iocb = aio_sqring_from_index(ctx, index); + } else { + iocb = page_address(ctx->iocb_range.pages[index >> iocb_page_shift]); + index &= ((1 << iocb_page_shift) - 1); + iocb += index; + } + + return iocb; +} + static void aio_unmap_range(struct aio_mapped_range *range) { int i; @@ -1692,6 +1823,52 @@ static int aio_useriocb_map(struct kioctx *ctx, struct iocb __user *iocbs) return aio_map_range(&ctx->iocb_range, iocbs, size, 0); } +static void aio_scqring_unmap(struct kioctx *ctx) +{ + aio_unmap_range(&ctx->sq_ring); + aio_unmap_range(&ctx->cq_ring); +} + +static int aio_scqring_map(struct kioctx *ctx, + struct aio_iocb_ring __user *sq_ring, + struct aio_io_event_ring __user *cq_ring) +{ + struct aio_iocb_ring *ksq_ring; + struct aio_io_event_ring *kcq_ring; + int ret, sq_ring_size, cq_ring_size; + size_t size; + + /* + * The CQ ring size is QD + 1, so we don't have to track full condition + * for head == tail. The SQ ring we make twice that in size, to make + * room for having more inflight than the QD. + */ + sq_ring_size = ctx->max_reqs; + cq_ring_size = 2 * ctx->max_reqs; + + size = sq_ring_size * sizeof(struct iocb); + ret = aio_map_range(&ctx->sq_ring, sq_ring, + sq_ring_size * sizeof(struct iocb), 0); + if (ret) + return ret; + + ret = aio_map_range(&ctx->cq_ring, cq_ring, + cq_ring_size * sizeof(struct io_event), FOLL_WRITE); + if (ret) { + aio_unmap_range(&ctx->sq_ring); + return ret; + } + + ksq_ring = page_address(ctx->sq_ring.pages[0]); + ksq_ring->nr_events = sq_ring_size; + ksq_ring->head = ksq_ring->tail = 0; + + kcq_ring = page_address(ctx->cq_ring.pages[0]); + kcq_ring->nr_events = cq_ring_size; + kcq_ring->head = kcq_ring->tail = 0; + return 0; +} + static void aio_iocb_buffer_unmap(struct kioctx *ctx) { int i, j; @@ -1808,18 +1985,18 @@ static int aio_iocb_buffer_map(struct kioctx *ctx) return ret; } -SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, struct iocb __user *, - iocbs, void __user *, user1, void __user *, user2, +SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, + struct iocb __user *, iocbs, + struct aio_iocb_ring __user *, sq_ring, + struct aio_io_event_ring __user *, cq_ring, aio_context_t __user *, ctxp) { struct kioctx *ioctx; unsigned long ctx; long ret; - if (user1 || user2) - return -EINVAL; if (flags & ~(IOCTX_FLAG_USERIOCB | IOCTX_FLAG_IOPOLL | - IOCTX_FLAG_FIXEDBUFS)) + IOCTX_FLAG_FIXEDBUFS | IOCTX_FLAG_SCQRING)) return -EINVAL; ret = get_user(ctx, ctxp); @@ -1832,18 +2009,26 @@ SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, struct iocb __user *, goto out; if (flags & IOCTX_FLAG_USERIOCB) { + ret = -EINVAL; + if (flags & IOCTX_FLAG_SCQRING) + goto err; + ret = aio_useriocb_map(ioctx, iocbs); if (ret) goto err; - if (flags & IOCTX_FLAG_FIXEDBUFS) { - ret = aio_iocb_buffer_map(ioctx); - if (ret) - goto err; - } - } else if (flags & IOCTX_FLAG_FIXEDBUFS) { - /* can only support fixed bufs with user mapped iocbs */ + } + if (flags & IOCTX_FLAG_SCQRING) { + ret = aio_scqring_map(ioctx, sq_ring, cq_ring); + if (ret) + goto err; + } + if (flags & IOCTX_FLAG_FIXEDBUFS) { ret = -EINVAL; - goto err; + if (!(flags & (IOCTX_FLAG_USERIOCB | IOCTX_FLAG_SCQRING))) + goto err; + ret = aio_iocb_buffer_map(ioctx); + if (ret) + goto err; } ret = put_user(ioctx->user_id, ctxp); @@ -2545,8 +2730,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, return -EINVAL; } - /* Poll IO doesn't need ring reservations */ - if (!(ctx->flags & IOCTX_FLAG_IOPOLL) && !get_reqs_available(ctx)) + if (aio_ctx_old_ring(ctx) && !get_reqs_available(ctx)) return -EAGAIN; ret = -EAGAIN; @@ -2570,7 +2754,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, } /* Don't support cancel on user mapped iocbs or polled context */ - if (!(ctx->flags & (IOCTX_FLAG_USERIOCB | IOCTX_FLAG_IOPOLL))) { + if (aio_ctx_supports_cancel(ctx)) { ret = put_user(KIOCB_KEY, &user_iocb->aio_key); if (unlikely(ret)) { pr_debug("EFAULT: aio_key\n"); @@ -2636,7 +2820,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, eventfd_ctx_put(req->ki_eventfd); iocb_put(req); out_put_reqs_available: - if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) + if (aio_ctx_old_ring(ctx)) put_reqs_available(ctx, 1); return ret; } @@ -2709,6 +2893,184 @@ static void aio_submit_state_start(struct aio_submit_state *state, #endif } +static struct iocb *__aio_get_sqring(struct aio_iocb_ring *ring, + struct aio_mapped_range *range, + unsigned *next_head) +{ + unsigned head; + + smp_rmb(); + head = READ_ONCE(ring->head); + if (head == READ_ONCE(ring->tail)) + return NULL; + + *next_head = head + 1; + if (*next_head == ring->nr_events) + *next_head = 0; + + return __aio_sqring_from_index(ring, range, head); +} + +static void aio_commit_sqring(struct kioctx *ctx, unsigned next_head) +{ + struct aio_iocb_ring *ring; + + ring = page_address(ctx->sq_ring.pages[0]); + if (ring->head != next_head) { + ring->head = next_head; + smp_wmb(); + } +} + +static const struct iocb *aio_peek_sqring(struct kioctx *ctx, unsigned *nhead) +{ + struct aio_iocb_ring *ring; + + ring = page_address(ctx->sq_ring.pages[0]); + return __aio_get_sqring(ring, &ctx->sq_ring, nhead); +} + +static int aio_ring_submit(struct kioctx *ctx, unsigned int to_submit) +{ + bool kaddr = (ctx->flags & IOCTX_FLAG_FIXEDBUFS) != 0; + struct aio_submit_state state, *statep = NULL; + int i, ret = 0, submit = 0; + + if (to_submit > AIO_PLUG_THRESHOLD) { + aio_submit_state_start(&state, ctx, to_submit); + statep = &state; + } + + for (i = 0; i < to_submit; i++) { + const struct iocb *iocb; + unsigned int next_head; + + iocb = aio_peek_sqring(ctx, &next_head); + if (!iocb) + break; + + ret = __io_submit_one(ctx, iocb, NULL, NULL, false, kaddr); + if (ret) + break; + + submit++; + aio_commit_sqring(ctx, next_head); + } + + if (statep) + aio_submit_state_end(statep); + + return submit ? submit : ret; +} + +/* + * Wait until events become available, if we don't already have some. The + * application must reap them itself, as they reside on the shared cq ring. + */ +static int aio_cqring_wait(struct kioctx *ctx, int min_events) +{ + struct aio_io_event_ring *ring; + DEFINE_WAIT(wait); + int ret = 0; + + ring = page_address(ctx->cq_ring.pages[0]); + smp_rmb(); + if (ring->head != ring->tail) + return 0; + + do { + prepare_to_wait(&ctx->wait, &wait, TASK_INTERRUPTIBLE); + + ret = 0; + smp_rmb(); + if (ring->head != ring->tail) + break; + if (!min_events) + break; + + schedule(); + + ret = -EINVAL; + if (atomic_read(&ctx->dead)) + break; + ret = -EINTR; + if (signal_pending(current)) + break; + } while (1); + + finish_wait(&ctx->wait, &wait); + return ret; +} + +static int __io_ring_enter(struct kioctx *ctx, unsigned int to_submit, + unsigned int min_complete, unsigned int flags) +{ + int ret = 0; + + if (flags & IORING_FLAG_SUBMIT) { + ret = aio_ring_submit(ctx, to_submit); + if (ret < 0) + return ret; + } + if (flags & IORING_FLAG_GETEVENTS) { + unsigned int nr_events = 0; + int get_ret; + + if (!ret && to_submit) + min_complete = 0; + + if (ctx->flags & IOCTX_FLAG_IOPOLL) + get_ret = __aio_iopoll_check(ctx, NULL, &nr_events, + min_complete, -1U); + else + get_ret = aio_cqring_wait(ctx, min_complete); + + if (get_ret < 0 && !ret) + ret = get_ret; + } + + return ret; +} + +SYSCALL_DEFINE4(io_ring_enter, aio_context_t, ctx_id, u32, to_submit, + u32, min_complete, u32, flags) +{ + struct kioctx *ctx; + long ret; + + BUILD_BUG_ON(sizeof(struct aio_iocb_ring) != sizeof(struct iocb)); + BUILD_BUG_ON(sizeof(struct aio_io_event_ring) != + sizeof(struct io_event)); + + ctx = lookup_ioctx(ctx_id); + if (!ctx) { + pr_debug("EINVAL: invalid context id\n"); + return -EINVAL; + } + + ret = -EBUSY; + if (!mutex_trylock(&ctx->getevents_lock)) + goto err; + + ret = -EOVERFLOW; + if (ctx->cq_ring_overflow) { + ctx->cq_ring_overflow = 0; + goto err; + } + + ret = -EINVAL; + if (unlikely(atomic_read(&ctx->dead))) + goto err; + + if (ctx->flags & IOCTX_FLAG_SCQRING) + ret = __io_ring_enter(ctx, to_submit, min_complete, flags); + + mutex_unlock(&ctx->getevents_lock); +err: + percpu_ref_put(&ctx->users); + return ret; +} + /* sys_io_submit: * Queue the nr iocbs pointed to by iocbpp for processing. Returns * the number of iocbs queued. May return -EINVAL if the aio_context @@ -2738,6 +3100,10 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, return -EINVAL; } + /* SCQRING must use io_ring_enter() */ + if (ctx->flags & IOCTX_FLAG_SCQRING) + return -EINVAL; + if (nr > ctx->nr_events) nr = ctx->nr_events; @@ -2854,7 +3220,7 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb, if (unlikely(!ctx)) return -EINVAL; - if (ctx->flags & (IOCTX_FLAG_USERIOCB | IOCTX_FLAG_IOPOLL)) + if (!aio_ctx_supports_cancel(ctx)) goto err; spin_lock_irq(&ctx->ctx_lock); @@ -2889,7 +3255,10 @@ static long do_io_getevents(aio_context_t ctx_id, long ret = -EINVAL; if (likely(ioctx)) { - if (likely(min_nr <= nr && min_nr >= 0)) { + /* SCQRING must use io_ring_enter() */ + if (ioctx->flags & IOCTX_FLAG_SCQRING) + ret = -EINVAL; + else if (min_nr <= nr && min_nr >= 0) { if (ioctx->flags & IOCTX_FLAG_IOPOLL) ret = aio_iopoll_check(ioctx, min_nr, nr, events); else diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index a20a663d583f..576725d00020 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -288,8 +288,10 @@ static inline void addr_limit_user_check(void) #ifndef CONFIG_ARCH_HAS_SYSCALL_WRAPPER asmlinkage long sys_io_setup(unsigned nr_reqs, aio_context_t __user *ctx); asmlinkage long sys_io_setup2(unsigned, unsigned, struct iocb __user *, - void __user *, void __user *, + struct aio_iocb_ring __user *, + struct aio_io_event_ring __user *, aio_context_t __user *); +asmlinkage long sys_io_ring_enter(aio_context_t, unsigned, unsigned, unsigned); asmlinkage long sys_io_destroy(aio_context_t ctx); asmlinkage long sys_io_submit(aio_context_t, long, struct iocb __user * __user *); diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index 05d72cf86bd3..9fb7d0ec868f 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -111,6 +111,32 @@ struct iocb { #define IOCTX_FLAG_USERIOCB (1 << 0) /* iocbs are user mapped */ #define IOCTX_FLAG_IOPOLL (1 << 1) /* io_context is polled */ #define IOCTX_FLAG_FIXEDBUFS (1 << 2) /* IO buffers are fixed */ +#define IOCTX_FLAG_SCQRING (1 << 3) /* Use SQ/CQ rings */ + +struct aio_iocb_ring { + union { + struct { + u32 head, tail; + u32 nr_events; + }; + struct iocb pad_iocb; + }; + struct iocb iocbs[0]; +}; + +struct aio_io_event_ring { + union { + struct { + u32 head, tail; + u32 nr_events; + }; + struct io_event pad_event; + }; + struct io_event events[0]; +}; + +#define IORING_FLAG_SUBMIT (1 << 0) +#define IORING_FLAG_GETEVENTS (1 << 1) #undef IFBIG #undef IFLITTLE diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 17c8b4393669..a32b7ea93838 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -38,6 +38,7 @@ asmlinkage long sys_ni_syscall(void) COND_SYSCALL(io_setup); COND_SYSCALL(io_setup2); +COND_SYSCALL(io_ring_enter); COND_SYSCALL_COMPAT(io_setup); COND_SYSCALL(io_destroy); COND_SYSCALL(io_submit);