From patchwork Fri Dec 21 19:22:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740893 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F1C6313A4 for ; Fri, 21 Dec 2018 19:22:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E39F52884A for ; Fri, 21 Dec 2018 19:22:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D746428862; Fri, 21 Dec 2018 19:22:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7B1FD2884A for ; Fri, 21 Dec 2018 19:22:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391629AbeLUTWm (ORCPT ); Fri, 21 Dec 2018 14:22:42 -0500 Received: from mail-it1-f194.google.com ([209.85.166.194]:40750 "EHLO mail-it1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391589AbeLUTWm (ORCPT ); Fri, 21 Dec 2018 14:22:42 -0500 Received: by mail-it1-f194.google.com with SMTP id h193so8015248ita.5 for ; Fri, 21 Dec 2018 11:22:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=uEeDBkOwPid70UEg8pUacK6i7pi0+OoFq4mtLStZsZw=; b=tySEbfnxeDT4jme6deLJgvCeMaG1i9PJApHLbiEG4gGqOiXk3k0/XUsZgXEehDuIsj Ue/rNeN1mHnEfc0v84ggjwczB+wRqgbKsKxQ/SeKSPvIJpurSWXqUgNgTPN/Cs8CWri1 N/7qE6WM8Ix37s/x4LG07rmiKKkJib57hqi680VSYgQETLw+nRd9qyudbm/JCDPjF3lo /d93g6SGI2U+BIwQ8+XWFL0OXD/7rgwO2zeFFKItm+qxJrBwByP0RVi6L8r1DVVmfPUJ mAjs5VceX/raYdh003WaTnt/W/cxvKRUfjGQmch0YvMf5MrCyYBjvfp43mjIyt7bW+l5 Jicg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=uEeDBkOwPid70UEg8pUacK6i7pi0+OoFq4mtLStZsZw=; b=FiUHInUmLkG2cXI1l6XwoRHurXVGIHjzusRCyROt8ubgukN6vggVI7l5XBghQSV2nB Eoy2WronQVw8eOwyz1hyiKwtEgOQG1CTf46YF3MhflSgByMF5nnYspNmmVNqh8JxL5GF xoWg8IX955zUh+28B06m9Hlqg/K/rGobkprrZPTqmOBwZteH18Yodqo1quK5Wx0FO56+ BdghAgfpo4oBzEeytxHzLdN41rABBkcrxBdw2CJOjTs8RrDqquJi3EBUTdh6kg/yKvm0 oycBn7e55Vxy7uuHSFwny6UAd6xlMB+eH939OEqLxxoBiKRhnljlPnA7rHbLsqTgz/Ia LTkQ== X-Gm-Message-State: AA+aEWbmm9IoNV08KGYxdcVVTqsjcJT7Hz/bhPZHAe9JmMwgnnWms1Ns pSAu9IqGr+4HV4Lwd9OYcWZ+/0N6pe19Yw== X-Google-Smtp-Source: AFSGD/Va0w7h33mplE28CDX3MwS7ZDvNothVRbUpzhHewDvjc2j6R4O6QrPbOF1mh5Exua/7V4THCw== X-Received: by 2002:a02:9549:: with SMTP id y67mr2465234jah.4.1545420161456; Fri, 21 Dec 2018 11:22:41 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.22.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:22:40 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 01/22] fs: add an iopoll method to struct file_operations Date: Fri, 21 Dec 2018 12:22:15 -0700 Message-Id: <20181221192236.12866-2-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Christoph Hellwig This new methods is used to explicitly poll for I/O completion for an iocb. It must be called for any iocb submitted asynchronously (that is with a non-null ki_complete) which has the IOCB_HIPRI flag set. The method is assisted by a new ki_cookie field in struct iocb to store the polling cookie. TODO: we can probably union ki_cookie with the existing hint and I/O priority fields to avoid struct kiocb growth. Reviewed-by: Johannes Thumshirn Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- Documentation/filesystems/vfs.txt | 3 +++ include/linux/fs.h | 2 ++ 2 files changed, 5 insertions(+) diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 5f71a252e2e0..d9dc5e4d82b9 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -857,6 +857,7 @@ struct file_operations { ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); ssize_t (*read_iter) (struct kiocb *, struct iov_iter *); ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); + int (*iopoll)(struct kiocb *kiocb, bool spin); int (*iterate) (struct file *, struct dir_context *); int (*iterate_shared) (struct file *, struct dir_context *); __poll_t (*poll) (struct file *, struct poll_table_struct *); @@ -902,6 +903,8 @@ otherwise noted. write_iter: possibly asynchronous write with iov_iter as source + iopoll: called when aio wants to poll for completions on HIPRI iocbs + iterate: called when the VFS needs to read the directory contents iterate_shared: called when the VFS needs to read the directory contents diff --git a/include/linux/fs.h b/include/linux/fs.h index a1ab233e6469..6a5f71f8ae06 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -310,6 +310,7 @@ struct kiocb { int ki_flags; u16 ki_hint; u16 ki_ioprio; /* See linux/ioprio.h */ + unsigned int ki_cookie; /* for ->iopoll */ } __randomize_layout; static inline bool is_sync_kiocb(struct kiocb *kiocb) @@ -1781,6 +1782,7 @@ struct file_operations { ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); ssize_t (*read_iter) (struct kiocb *, struct iov_iter *); ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); + int (*iopoll)(struct kiocb *kiocb, bool spin); int (*iterate) (struct file *, struct dir_context *); int (*iterate_shared) (struct file *, struct dir_context *); __poll_t (*poll) (struct file *, struct poll_table_struct *); From patchwork Fri Dec 21 19:22:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740919 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B389513B5 for ; Fri, 21 Dec 2018 19:23:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A6C5728871 for ; Fri, 21 Dec 2018 19:23:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9AD0328876; Fri, 21 Dec 2018 19:23:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5CFF128875 for ; Fri, 21 Dec 2018 19:23:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391669AbeLUTW7 (ORCPT ); Fri, 21 Dec 2018 14:22:59 -0500 Received: from mail-it1-f194.google.com ([209.85.166.194]:51799 "EHLO mail-it1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391631AbeLUTWo (ORCPT ); Fri, 21 Dec 2018 14:22:44 -0500 Received: by mail-it1-f194.google.com with SMTP id w18so8470304ite.1 for ; Fri, 21 Dec 2018 11:22:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Zqdn6xdQwD4/orgPmDG/qEuzoeNaehaKqwxbMi6hTAw=; b=u2oqI2UoaEm8eyMELctOvD3BYLDqDd58nTJa3vM0vf+3ZGr1PWyGzBw6yQ1YgFoFV3 oaKHmzQ19XsAELUC6bItRA/5Nwuag461dYb3+B1pga0OD3s5tPeJqgcsQqj78jaioPXb mO+bF86rezjXVehWF/qa5tOTfx0Ju8odOd9qAnDFMRNuQ4kBbgsebRNY5TBPTMLwvg9+ pSi7BatvwhGBxmGcHWyTZw9qk+3ET0uBDQAqLLk3ExPRz9X3qCjgCn6QDWq0MLJrZTiZ tmw2gqlYWl0lJZKQCJcgfMlwCrdBvhTphpHl8eX2zxreW5xf65Njm5gPbOk5/Ma6pTPK bMjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Zqdn6xdQwD4/orgPmDG/qEuzoeNaehaKqwxbMi6hTAw=; b=Jco0aqzGHgVTLGD/FyOatttWcNHN0ZgrxuL0mKdNwxAgUkRjgrYjmH24YnUB41wrwT 5pZSfr1weRv2e2qCiKyw6FGY6PTaRbKK/9iuqbDYnxxs0ioFCkavRx7HAM9LxDxWEcJV OimPNMFE9hUi6dEDj1Puxx0MhrPQ1RK4DrebTusH/vDFF9Kgabl/8SDTA6Om0hT+r2cj klr18hNmjnMJU7aCSzQIE+GFTxVqksNR3a5L++yXJ+WfHs6tVviS/2HqL3Z2IizdN5D7 vYPO9LqfbFHYIdqXHemiScX5Ka49YDQA8l3dgZI8vzjkV9E12NPSS+UXWr14nKJN0q7Q 7yxA== X-Gm-Message-State: AA+aEWb53oG777q8yvSVGMvWdEk6i1KVO1p+UCEMGKkGy3F0hhikysxj W/LTQHvntyJb8ZUmXZbuL8vS2oxqg+519w== X-Google-Smtp-Source: AFSGD/V0SaDcTJc8OGhIxUIMEvLGAHeV4abj33ymM1DkbXTM5APqAvfwp5bvWpa9Vb1EH3xbjqM9zw== X-Received: by 2002:a24:b242:: with SMTP id h2mr2557944iti.2.1545420162923; Fri, 21 Dec 2018 11:22:42 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.22.41 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:22:42 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 02/22] block: add bio_set_polled() helper Date: Fri, 21 Dec 2018 12:22:16 -0700 Message-Id: <20181221192236.12866-3-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP For the upcoming async polled IO, we can't sleep allocating requests. If we do, then we introduce a deadlock where the submitter already has async polled IO in-flight, but can't wait for them to complete since polled requests must be active found and reaped. Signed-off-by: Jens Axboe --- include/linux/bio.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/include/linux/bio.h b/include/linux/bio.h index 7380b094dcca..f6f0a2b3cbc8 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -823,5 +823,19 @@ static inline int bio_integrity_add_page(struct bio *bio, struct page *page, #endif /* CONFIG_BLK_DEV_INTEGRITY */ +/* + * Mark a bio as polled. Note that for async polled IO, the caller must + * expect -EWOULDBLOCK if we cannot allocate a request (or other resources). + * We cannot block waiting for requests on polled IO, as those completions + * must be found by the caller. This is different than IRQ driven IO, where + * it's safe to wait for IO to complete. + */ +static inline void bio_set_polled(struct bio *bio, struct kiocb *kiocb) +{ + bio->bi_opf |= REQ_HIPRI; + if (!is_sync_kiocb(kiocb)) + bio->bi_opf |= REQ_NOWAIT; +} + #endif /* CONFIG_BLOCK */ #endif /* __LINUX_BIO_H */ From patchwork Fri Dec 21 19:22:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740903 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E527213A4 for ; Fri, 21 Dec 2018 19:22:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D87792884A for ; Fri, 21 Dec 2018 19:22:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CCD4A2886C; Fri, 21 Dec 2018 19:22:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8CCD72884A for ; Fri, 21 Dec 2018 19:22:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391653AbeLUTWt (ORCPT ); Fri, 21 Dec 2018 14:22:49 -0500 Received: from mail-it1-f196.google.com ([209.85.166.196]:33060 "EHLO mail-it1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391644AbeLUTWq (ORCPT ); Fri, 21 Dec 2018 14:22:46 -0500 Received: by mail-it1-f196.google.com with SMTP id m8so16424576itk.0 for ; Fri, 21 Dec 2018 11:22:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=eJ3aWzBT4gm4FwYHYUV1JBWUWsiFyJVaotzVIFD6bng=; b=UY985MZHK/3IInifx/lasC5imTIft+IY63HHvE8kWZU7fFcS18DIqMWZRRRPyAqKLW 67s9eHi38jNAEZmP8fURojKcKcrp5OC54GqRhUCjWF93YMO5SNFHKUSh1tDTp6eP0VMP l2StwMerq2cET6B0Ls082H2rA9sTzddWptFfK/qfwcFzGLDMGB/q22B2y7wWg6HK1Q6n dbGGqLZS5/AwXRQA0UQd8/+AVHA89K006zaIFGaB461wtreUKezu2cJew/8zNK4h8xjH i059r2cyFYnWkmesxrGjsVNbDrQtTLGZTqVR+To5HYJrnmaTumfIyo8Ao8og6gMVHhr9 dEqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=eJ3aWzBT4gm4FwYHYUV1JBWUWsiFyJVaotzVIFD6bng=; b=o2+dd8hbYhq6VdLzMCr0hwXSodIC2487w4RZEd1+4lzdSbk2ejPTW78x1AFY8juHfY qaZLk7K/ZMeSdJI9vhQPyde4mTd7t+VtoqlYwz5J+mT7XsgbWET6gEvQAM73sABLPUhT IMQ/gwxZpss4m/m+9jc+sqa+iSnWcZ0ihGstGQt2cLWYBCPtGrP1xQbVEj6es+li2LkQ njCzXbcm1OfL0NeFF0tND+sV+SmR+cEgl4mh4BnZ2PP8mycSWHgox94ZkID9sDr2xZLt xFeBK52B5eq0h0s/X22sOVYy4n/uk1Cl0Wiwz6W0qcNQTNdjJN9T18yaOuMdGu4kmpN5 LALg== X-Gm-Message-State: AA+aEWY5BrUsiTuVctXL9CXw3KYHPrBe4Sh/WGJjLytg3nrg6r+wJ1Dq kN88crN8QFpAND4njqVyKA3qOpVVpg/Sig== X-Google-Smtp-Source: AFSGD/VJOKeBAkTZXcHzCv7x/GmAZXEaBZe4CmYjaxLWrJK3WJ7aSaddNb1mtX4jdJWVePwSAEZAiA== X-Received: by 2002:a24:a04:: with SMTP id 4mr2803875itw.122.1545420164535; Fri, 21 Dec 2018 11:22:44 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.22.42 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:22:43 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 03/22] block: wire up block device iopoll method Date: Fri, 21 Dec 2018 12:22:17 -0700 Message-Id: <20181221192236.12866-4-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Christoph Hellwig Just call blk_poll on the iocb cookie, we can derive the block device from the inode trivially. Reviewed-by: Johannes Thumshirn Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/block_dev.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/block_dev.c b/fs/block_dev.c index e1886cc7048f..6de8d35f6e41 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -281,6 +281,14 @@ struct blkdev_dio { static struct bio_set blkdev_dio_pool; +static int blkdev_iopoll(struct kiocb *kiocb, bool wait) +{ + struct block_device *bdev = I_BDEV(kiocb->ki_filp->f_mapping->host); + struct request_queue *q = bdev_get_queue(bdev); + + return blk_poll(q, READ_ONCE(kiocb->ki_cookie), wait); +} + static void blkdev_bio_end_io(struct bio *bio) { struct blkdev_dio *dio = bio->bi_private; @@ -398,6 +406,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) bio->bi_opf |= REQ_HIPRI; qc = submit_bio(bio); + WRITE_ONCE(iocb->ki_cookie, qc); break; } @@ -2070,6 +2079,7 @@ const struct file_operations def_blk_fops = { .llseek = block_llseek, .read_iter = blkdev_read_iter, .write_iter = blkdev_write_iter, + .iopoll = blkdev_iopoll, .mmap = generic_file_mmap, .fsync = blkdev_fsync, .unlocked_ioctl = block_ioctl, From patchwork Fri Dec 21 19:22:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740899 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 062C513B5 for ; Fri, 21 Dec 2018 19:22:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ED3912885F for ; Fri, 21 Dec 2018 19:22:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E1E7E2886C; Fri, 21 Dec 2018 19:22:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 95D6C2885F for ; Fri, 21 Dec 2018 19:22:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391652AbeLUTWs (ORCPT ); Fri, 21 Dec 2018 14:22:48 -0500 Received: from mail-it1-f195.google.com ([209.85.166.195]:51805 "EHLO mail-it1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391647AbeLUTWr (ORCPT ); Fri, 21 Dec 2018 14:22:47 -0500 Received: by mail-it1-f195.google.com with SMTP id w18so8470498ite.1 for ; Fri, 21 Dec 2018 11:22:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=CoUoLVQhZCk01wz0SKdc8zjKvmVqwUz+894lpbana7A=; b=GegkU0H+wROHphSz478S1xM7GYMvl2lNuA97LSeMlUoMhJjGVGXzLgQVvLZQ6vHK2r 1pNDxvMIoDibgh3P4+v9MR2Gcozwy/TG80iHflNKAvs6ha1iZ+Mz7exogD+l/t8RcBKk jM/u5A0KCX66gP6DF4sj9KuCh4ZAhx6vL9gaI6E7WAtDy6LXyAhTdUBHWZSHaMqWyWsn GKxFb1oP/b/adCGBCKFQ5WyrlruqFBeJTzIEgX5M52g5lxKnQxJdUtab0gJ4NidiRS4M ODqc3orQSOre88/wDV0Y5rOKjvtdVjOSOLrreEBCGUenHLrxxW1xJx+1dxpeUGuzDGkK TRXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=CoUoLVQhZCk01wz0SKdc8zjKvmVqwUz+894lpbana7A=; b=UJTMaWa8zXcTEWvGtNak8hUFHYhs9CUEdWMjgKEZ0vbNnnWm/poYXvI18l7rEhHskR BWtGN6fF86BqBRIPxU8WHHfE4zpz+UUhj8xyM+Ty0emLL2kAkEzchNR0htmx63y80FJ8 xGiiQ/IM40DVXyFWW0Bv6ZX/myRhG49hGDEXBTIMSGWSfRc564u/p35N6cTnq6Brd3al BoAI5GxUSrEce4WqANqY6W4gzZzratjUQJKNoDH6H/spBBaKynofTkI7sW/bEX3Oflqk r1wuiJoKobQ+sgsvCQfmVK6A5RgQoBZ4w3/oKvByEn7WkTPSY4BFSnI5tZMrFkl1VLOu ivDw== X-Gm-Message-State: AA+aEWY9mMqnFAKdW+qCdVZG8kDjtoB69Nh5sj3QcUaiNOYOw4mPihaU yA0V3c0kLJXsLYCi9ojrcBGR/dpzVMswSQ== X-Google-Smtp-Source: AFSGD/XA4Ybd/zJZOcAfh3k/1DJHbqzyb1dAcNCVh7+Y4LbL0acNBKPqDyrnO7K4421qLVNmGpxkpw== X-Received: by 2002:a24:1152:: with SMTP id 79mr2965435itf.167.1545420166329; Fri, 21 Dec 2018 11:22:46 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.22.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:22:45 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 04/22] block: use REQ_HIPRI_ASYNC for non-sync polled IO Date: Fri, 21 Dec 2018 12:22:18 -0700 Message-Id: <20181221192236.12866-5-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Tell the block layer if it's a sync or async polled request, so it can do the right thing. Signed-off-by: Jens Axboe --- fs/block_dev.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 6de8d35f6e41..b8f574615792 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -402,8 +402,12 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES); if (!nr_pages) { - if (iocb->ki_flags & IOCB_HIPRI) - bio->bi_opf |= REQ_HIPRI; + if (iocb->ki_flags & IOCB_HIPRI) { + if (!is_sync) + bio->bi_opf |= REQ_HIPRI_ASYNC; + else + bio->bi_opf |= REQ_HIPRI; + } qc = submit_bio(bio); WRITE_ONCE(iocb->ki_cookie, qc); From patchwork Fri Dec 21 19:22:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740911 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9095814DE for ; Fri, 21 Dec 2018 19:22:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8443B2886C for ; Fri, 21 Dec 2018 19:22:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 786F828867; Fri, 21 Dec 2018 19:22:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 447362885F for ; Fri, 21 Dec 2018 19:22:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391656AbeLUTWv (ORCPT ); Fri, 21 Dec 2018 14:22:51 -0500 Received: from mail-io1-f65.google.com ([209.85.166.65]:35666 "EHLO mail-io1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391651AbeLUTWs (ORCPT ); Fri, 21 Dec 2018 14:22:48 -0500 Received: by mail-io1-f65.google.com with SMTP id o13so3116669ioh.2 for ; Fri, 21 Dec 2018 11:22:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=f0KZxg5yR4mj57pII4LTrUbSsPgKpU6QWnRq22MUurc=; b=LXsX+uQ+w/Du8z23ZxABBs5jzyTvWJyBKXEmiy7m8ddztxlRbTVkisJbRWnmLddzzl jZCoMv+qc/aaizWLRXsTH3c3t/ImIOgQ+IkfZl7bDeeDxkCx3TcMwsOv0vm55Ou3IRN/ g3MWl/7SG71TUoO0qob7Cb04QDyTbe8C8rc/QXUwbYMUDTBJ1vFlmXqLja8xzJByfrZO C3VXtnQEWzfs7dAUfRJjD9+Borz2gT18IuXfnOpPZAf73Z2JQ4d2VoIyg8+rZIgfNqEt POr2QoDlUcjfaSXxfSp0lwvBwVD65QxxJ9KtpvlzVMJX0ZKKlQ63kUcT0qBMXWwcn5K0 ciEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=f0KZxg5yR4mj57pII4LTrUbSsPgKpU6QWnRq22MUurc=; b=GC/XwwoaVKaWKBnk+s8WRqcY7dxTjSsISZirX8VkMyZvh4SVxZB2Lhqef1jccjMKQ3 JQLSXXhTitzr3DK9slM13QRL7ut0KiAF2LL9+EuA57MOrhnAWU2rUMiLhdjOl55blLRs TB7n+pkxi/H0blKkn8ZGBdNAzqtwv38Xe87bHyZMHcaCU5iFyl7hjbizOPmeX15A7R3v GE1BKp2/KYuCqQSum+ZtqB9gpKNrVanurTSJi3Zi8nn4H3EO/oZFU/so38tZWrijzhlH CHSIm+amau6twataj6rBjpIz1X0ASUCEOfXMDdHsN4uWR9UN8/jFhNkydTZofsNeVhLU dYYg== X-Gm-Message-State: AJcUukfqzTQKUpHVJqI6fXMNFaMkc8YjkdHW92JvpPQbqGHBoEa691u1 EGETQjpolrVmJmT48f04KWb2jCt8TQ6+Dg== X-Google-Smtp-Source: ALg8bN6NWo3JDXfAKN+2hcy+TvoiXIDi7H0mhoaATgM6ncrsuvJAdpNuEmih+1id2Vm2GiHIah5t2w== X-Received: by 2002:a6b:7b43:: with SMTP id m3mr508028iop.105.1545420167774; Fri, 21 Dec 2018 11:22:47 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.22.46 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:22:47 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 05/22] block: use bio_set_polled() helper for O_DIRECT Date: Fri, 21 Dec 2018 12:22:19 -0700 Message-Id: <20181221192236.12866-6-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We need to treat sync and async IO differently. Use the provided helper instead of setting it manually, then we also get the async case correct by marking it NOWAIT. Signed-off-by: Jens Axboe --- fs/block_dev.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index b8f574615792..9d96c1e30854 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -232,8 +232,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, bio.bi_opf = dio_bio_write_op(iocb); task_io_account_write(ret); } - if (iocb->ki_flags & IOCB_HIPRI) - bio.bi_opf |= REQ_HIPRI; + bio_set_polled(&bio, iocb); qc = submit_bio(&bio); for (;;) { @@ -402,12 +401,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES); if (!nr_pages) { - if (iocb->ki_flags & IOCB_HIPRI) { - if (!is_sync) - bio->bi_opf |= REQ_HIPRI_ASYNC; - else - bio->bi_opf |= REQ_HIPRI; - } + if (iocb->ki_flags & IOCB_HIPRI) + bio_set_polled(bio, iocb); qc = submit_bio(bio); WRITE_ONCE(iocb->ki_cookie, qc); From patchwork Fri Dec 21 19:22:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740909 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8A66B13A4 for ; Fri, 21 Dec 2018 19:22:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7C8692885F for ; Fri, 21 Dec 2018 19:22:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7033E2884A; Fri, 21 Dec 2018 19:22:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 17F312884A for ; Fri, 21 Dec 2018 19:22:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391651AbeLUTWw (ORCPT ); Fri, 21 Dec 2018 14:22:52 -0500 Received: from mail-io1-f67.google.com ([209.85.166.67]:44323 "EHLO mail-io1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391648AbeLUTWu (ORCPT ); Fri, 21 Dec 2018 14:22:50 -0500 Received: by mail-io1-f67.google.com with SMTP id r200so4472532iod.11 for ; Fri, 21 Dec 2018 11:22:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=N5ta7ic2B2jMS544hhhxgjM1PLBspMmNNpvDXhuFWys=; b=x9QRHI+6RryVJJ8r7pl2DLpCYWvY8IJPzcROUOZUBqiITkW5Jrwjbx6YglS5lBNYd4 zOR76ZZ/30gZD5ej7BBKr6LiCBZlY6ybWoyesXXH3LDyPy+mTj4kON+ZnJwHW1RCo9rR f2gQ8JeNyuw9gr1H32WHpHqMBoOpZwS3A6R9jO2t/StOCucU8nMA08Yetba2qSAzjSj1 q2ENypufDGHGyJGs1PitjF6VwFsGnmVJnA2hbGIjYsjwXbV4N4CUyttNT+VhsVIuUcYH CLUye2KZ3pEzeks7TFUxjvSbd0zzN8OPB4akKGg0fltqibKdOVFAuzMs3sINBP+AIUnt +mFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=N5ta7ic2B2jMS544hhhxgjM1PLBspMmNNpvDXhuFWys=; b=OubmPqxXdbP3ELnGbW25YqSjegf5xoDgG+EuByCe/QpAEt27i2tZ0hwzA43povel2J 1Ch3/3crzXtbx8hAQco3G/f6HA6I5JlWNCOCSVVbu8UY2tNA2DL3Ry9h0MQ0aU3B5g64 mTigCq/bEYghAq9beD5Ug1TtPRhNbhtSmQijNuxhHkmw43hKmRdxb940610wCktFBCpo Oyl8dV+DdhHtHmdiXGtvYtxZn+T9nErrJ6ucBLh/cPYiEZEjQDRQqkJ2m5yZraekJEoX wLRpZ6wVFso1HaMk2FoBJVI8WN8F/kHSRwVMQSEBzE0wwr5f0SmcE9k4Vqhzr+VohJgS pu/Q== X-Gm-Message-State: AJcUukfy+MmVCr1sPO4YeEUKcdxng0u1fL99IXMC3KXiLtUZiRI5IxH3 UFDhz7ihXO2P3Oxa4Kvo1cUHhapqyl3UtA== X-Google-Smtp-Source: ALg8bN4zgDz6FTuGfm34H3gFzgVz5wGCMIKgZEHLzCKpoE+SzNkoGxxB92JI4BxUJbYShoxcoB24Qw== X-Received: by 2002:a6b:919:: with SMTP id t25mr2414900ioi.207.1545420169249; Fri, 21 Dec 2018 11:22:49 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.22.47 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:22:48 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 06/22] iomap: wire up the iopoll method Date: Fri, 21 Dec 2018 12:22:20 -0700 Message-Id: <20181221192236.12866-7-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Christoph Hellwig Store the request queue the last bio was submitted to in the iocb private data in addition to the cookie so that we find the right block device. Also refactor the common direct I/O bio submission code into a nice little helper. Signed-off-by: Christoph Hellwig Modified to use bio_set_polled(). Signed-off-by: Jens Axboe --- fs/gfs2/file.c | 2 ++ fs/iomap.c | 43 ++++++++++++++++++++++++++++--------------- fs/xfs/xfs_file.c | 1 + include/linux/iomap.h | 1 + 4 files changed, 32 insertions(+), 15 deletions(-) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 45a17b770d97..358157efc5b7 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -1280,6 +1280,7 @@ const struct file_operations gfs2_file_fops = { .llseek = gfs2_llseek, .read_iter = gfs2_file_read_iter, .write_iter = gfs2_file_write_iter, + .iopoll = iomap_dio_iopoll, .unlocked_ioctl = gfs2_ioctl, .mmap = gfs2_mmap, .open = gfs2_open, @@ -1310,6 +1311,7 @@ const struct file_operations gfs2_file_fops_nolock = { .llseek = gfs2_llseek, .read_iter = gfs2_file_read_iter, .write_iter = gfs2_file_write_iter, + .iopoll = iomap_dio_iopoll, .unlocked_ioctl = gfs2_ioctl, .mmap = gfs2_mmap, .open = gfs2_open, diff --git a/fs/iomap.c b/fs/iomap.c index 99f545e0641f..46f4cb687f6f 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -1448,6 +1448,28 @@ struct iomap_dio { }; }; +int iomap_dio_iopoll(struct kiocb *kiocb, bool spin) +{ + struct request_queue *q = READ_ONCE(kiocb->private); + + if (!q) + return 0; + return blk_poll(q, READ_ONCE(kiocb->ki_cookie), spin); +} +EXPORT_SYMBOL_GPL(iomap_dio_iopoll); + +static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap, + struct bio *bio) +{ + atomic_inc(&dio->ref); + + if (dio->iocb->ki_flags & IOCB_HIPRI) + bio_set_polled(bio, dio->iocb); + + dio->submit.last_queue = bdev_get_queue(iomap->bdev); + dio->submit.cookie = submit_bio(bio); +} + static ssize_t iomap_dio_complete(struct iomap_dio *dio) { struct kiocb *iocb = dio->iocb; @@ -1560,7 +1582,7 @@ static void iomap_dio_bio_end_io(struct bio *bio) } } -static blk_qc_t +static void iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos, unsigned len) { @@ -1574,15 +1596,10 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos, bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; - if (dio->iocb->ki_flags & IOCB_HIPRI) - flags |= REQ_HIPRI; - get_page(page); __bio_add_page(bio, page, len, 0); bio_set_op_attrs(bio, REQ_OP_WRITE, flags); - - atomic_inc(&dio->ref); - return submit_bio(bio); + iomap_dio_submit_bio(dio, iomap, bio); } static loff_t @@ -1685,9 +1702,6 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, bio_set_pages_dirty(bio); } - if (dio->iocb->ki_flags & IOCB_HIPRI) - bio->bi_opf |= REQ_HIPRI; - iov_iter_advance(dio->submit.iter, n); dio->size += n; @@ -1695,11 +1709,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, copied += n; nr_pages = iov_iter_npages(&iter, BIO_MAX_PAGES); - - atomic_inc(&dio->ref); - - dio->submit.last_queue = bdev_get_queue(iomap->bdev); - dio->submit.cookie = submit_bio(bio); + iomap_dio_submit_bio(dio, iomap, bio); } while (nr_pages); /* @@ -1910,6 +1920,9 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, if (dio->flags & IOMAP_DIO_WRITE_FUA) dio->flags &= ~IOMAP_DIO_NEED_SYNC; + WRITE_ONCE(iocb->ki_cookie, dio->submit.cookie); + WRITE_ONCE(iocb->private, dio->submit.last_queue); + if (!atomic_dec_and_test(&dio->ref)) { if (!dio->wait_for_completion) return -EIOCBQUEUED; diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index e47425071e65..60c2da41f0fc 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1203,6 +1203,7 @@ const struct file_operations xfs_file_operations = { .write_iter = xfs_file_write_iter, .splice_read = generic_file_splice_read, .splice_write = iter_file_splice_write, + .iopoll = iomap_dio_iopoll, .unlocked_ioctl = xfs_file_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = xfs_file_compat_ioctl, diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 9a4258154b25..0fefb5455bda 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -162,6 +162,7 @@ typedef int (iomap_dio_end_io_t)(struct kiocb *iocb, ssize_t ret, unsigned flags); ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, iomap_dio_end_io_t end_io); +int iomap_dio_iopoll(struct kiocb *kiocb, bool spin); #ifdef CONFIG_SWAP struct file; From patchwork Fri Dec 21 19:22:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740915 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A74B714DE for ; Fri, 21 Dec 2018 19:22:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9AF092886C for ; Fri, 21 Dec 2018 19:22:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8F6AE28862; Fri, 21 Dec 2018 19:22:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 388B62885F for ; Fri, 21 Dec 2018 19:22:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391661AbeLUTWy (ORCPT ); Fri, 21 Dec 2018 14:22:54 -0500 Received: from mail-io1-f67.google.com ([209.85.166.67]:43676 "EHLO mail-io1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391658AbeLUTWw (ORCPT ); Fri, 21 Dec 2018 14:22:52 -0500 Received: by mail-io1-f67.google.com with SMTP id q17so1069899ior.10 for ; Fri, 21 Dec 2018 11:22:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=cP01O8S4M8Pk79YHUMOcTMlJ2ny7YrlssbKXi8nIgEc=; b=TN7m8vgj7qlDfqFODwRqgHubkiEE7T8XNem8o2jLNUd64r9gXP9UweICoQ5P7NP5D6 qC2kkU/jMi8Xxvzqk2QzRG5HthWIJiEQ3afCtSWV9GcWFV6eZFVxdpJp6a5bSUaGudJ2 rLVAreZUyTaFc5WqpvuCJ6ESDoVjoPdIBAVJoHLWFk83URblrHeSZ0kYhHcKsgXXm4gJ AJ1fESTENsY+EaMKhy84R0S5++s0/T2ovYQKVZH8SXjkaN4EZcTC0mljgxXuewFW5Lbd OywbOhHtIzYhXk5DXASz2d4WAIr8jQXtaALuy61mPIRhs2sqthrFQ4+e31tzcboLE48g OzaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=cP01O8S4M8Pk79YHUMOcTMlJ2ny7YrlssbKXi8nIgEc=; b=bZRE4T+rPbrOZtRTmrtwbeEoE0dLph3LW2TrCyAaWZlGYj1DxOtVWT7di1CW8ih5UN QWQt/njWZZ0roOqoWwqBbPclEzh3/MKe0euooeaXQVCsofpG/O/xDPx5q59oiSQu24Cv Ne5deyG4LUZeHOw46zLWYQFVj1LhsE0Sw+M4c93km4jmiMhKxVEYCtaX8e8ZYYxkRq2I fHxVy8ZrSYyIS2XKUeCCbLpFFtmMOAR16IuIgor2kjUDN7rA2stweVmo2WaNdoV1sGTm Wu9I4xqPhQQJB0IAfhwqLjv+ozjBuv7FgNwq5bdQaeCW91YocwzIU2NIF6dWkaY4BLfi 0BLw== X-Gm-Message-State: AJcUukd4LDeu2mOlDblsTKflEQdxhTUHWLn7pPOKo2KpGL46vfLZQURr x4i9RftrQ1u4suvHKJsB2ID5Km53zOPheg== X-Google-Smtp-Source: ALg8bN4v0kTbRljJ/gCvCIbYZK8/hOoLu9y9Yaealhm4nmv60qwIqJHVWif3NCpL6lVCnXKzgnWuwA== X-Received: by 2002:a6b:1d8:: with SMTP id 207mr2593107iob.62.1545420170921; Fri, 21 Dec 2018 11:22:50 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.22.49 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:22:49 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 07/22] aio: add io_setup2() system call Date: Fri, 21 Dec 2018 12:22:21 -0700 Message-Id: <20181221192236.12866-8-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is just like io_setup(), except add a flags argument to let the caller control/define some of the io_context behavior. Outside of the flags, we add two user pointers for future use. Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- Documentation/sysctl/fs.txt | 8 +-- arch/x86/entry/syscalls/syscall_64.tbl | 1 + fs/aio.c | 76 +++++++++++++++++--------- include/linux/syscalls.h | 2 + include/uapi/asm-generic/unistd.h | 4 +- kernel/sys_ni.c | 1 + 6 files changed, 62 insertions(+), 30 deletions(-) diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt index 819caf8ca05f..5e484eb7a25f 100644 --- a/Documentation/sysctl/fs.txt +++ b/Documentation/sysctl/fs.txt @@ -47,10 +47,10 @@ Currently, these files are in /proc/sys/fs: aio-nr & aio-max-nr: aio-nr is the running total of the number of events specified on the -io_setup system call for all currently active aio contexts. If aio-nr -reaches aio-max-nr then io_setup will fail with EAGAIN. Note that -raising aio-max-nr does not result in the pre-allocation or re-sizing -of any kernel data structures. +io_setup/io_setup2 system call for all currently active aio contexts. +If aio-nr reaches aio-max-nr then io_setup will fail with EAGAIN. +Note that raising aio-max-nr does not result in the pre-allocation or +re-sizing of any kernel data structures. ============================================================== diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index f0b1709a5ffb..67c357225fb0 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -343,6 +343,7 @@ 332 common statx __x64_sys_statx 333 common io_pgetevents __x64_sys_io_pgetevents 334 common rseq __x64_sys_rseq +335 common io_setup2 __x64_sys_io_setup2 # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/fs/aio.c b/fs/aio.c index e2882334b48f..958f432a0e5b 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -101,6 +101,8 @@ struct kioctx { unsigned long user_id; + unsigned int flags; + struct __percpu kioctx_cpu *cpu; /* @@ -687,10 +689,8 @@ static void aio_nr_sub(unsigned nr) spin_unlock(&aio_nr_lock); } -/* ioctx_alloc - * Allocates and initializes an ioctx. Returns an ERR_PTR if it failed. - */ -static struct kioctx *ioctx_alloc(unsigned nr_events) +static struct kioctx *io_setup_flags(unsigned long ctxid, + unsigned int nr_events, unsigned int flags) { struct mm_struct *mm = current->mm; struct kioctx *ctx; @@ -702,6 +702,12 @@ static struct kioctx *ioctx_alloc(unsigned nr_events) */ unsigned int max_reqs = nr_events; + if (unlikely(ctxid || nr_events == 0)) { + pr_debug("EINVAL: ctx %lu nr_events %u\n", + ctxid, nr_events); + return ERR_PTR(-EINVAL); + } + /* * We keep track of the number of available ringbuffer slots, to prevent * overflow (reqs_available), and we also use percpu counters for this. @@ -727,6 +733,7 @@ static struct kioctx *ioctx_alloc(unsigned nr_events) if (!ctx) return ERR_PTR(-ENOMEM); + ctx->flags = flags; ctx->max_reqs = max_reqs; spin_lock_init(&ctx->ctx_lock); @@ -1283,6 +1290,41 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr, return ret; } +/* sys_io_setup2: + * Like sys_io_setup(), except that it takes a set of flags + * (IOCTX_FLAG_*), and some pointers to user structures: + * + * *user1 - reserved for future use + * + * *user2 - reserved for future use. + */ +SYSCALL_DEFINE5(io_setup2, u32, nr_events, u32, flags, void __user *, user1, + void __user *, user2, aio_context_t __user *, ctxp) +{ + struct kioctx *ioctx; + unsigned long ctx; + long ret; + + if (flags || user1 || user2) + return -EINVAL; + + ret = get_user(ctx, ctxp); + if (unlikely(ret)) + goto out; + + ioctx = io_setup_flags(ctx, nr_events, flags); + ret = PTR_ERR(ioctx); + if (IS_ERR(ioctx)) + goto out; + + ret = put_user(ioctx->user_id, ctxp); + if (ret) + kill_ioctx(current->mm, ioctx, NULL); + percpu_ref_put(&ioctx->users); +out: + return ret; +} + /* sys_io_setup: * Create an aio_context capable of receiving at least nr_events. * ctxp must not point to an aio_context that already exists, and @@ -1298,7 +1340,7 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr, */ SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctxp) { - struct kioctx *ioctx = NULL; + struct kioctx *ioctx; unsigned long ctx; long ret; @@ -1306,14 +1348,7 @@ SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctxp) if (unlikely(ret)) goto out; - ret = -EINVAL; - if (unlikely(ctx || nr_events == 0)) { - pr_debug("EINVAL: ctx %lu nr_events %u\n", - ctx, nr_events); - goto out; - } - - ioctx = ioctx_alloc(nr_events); + ioctx = io_setup_flags(ctx, nr_events, 0); ret = PTR_ERR(ioctx); if (!IS_ERR(ioctx)) { ret = put_user(ioctx->user_id, ctxp); @@ -1329,7 +1364,7 @@ SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctxp) #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE2(io_setup, unsigned, nr_events, u32 __user *, ctx32p) { - struct kioctx *ioctx = NULL; + struct kioctx *ioctx; unsigned long ctx; long ret; @@ -1337,23 +1372,14 @@ COMPAT_SYSCALL_DEFINE2(io_setup, unsigned, nr_events, u32 __user *, ctx32p) if (unlikely(ret)) goto out; - ret = -EINVAL; - if (unlikely(ctx || nr_events == 0)) { - pr_debug("EINVAL: ctx %lu nr_events %u\n", - ctx, nr_events); - goto out; - } - - ioctx = ioctx_alloc(nr_events); + ioctx = io_setup_flags(ctx, nr_events, 0); ret = PTR_ERR(ioctx); if (!IS_ERR(ioctx)) { - /* truncating is ok because it's a user address */ - ret = put_user((u32)ioctx->user_id, ctx32p); + ret = put_user(ioctx->user_id, ctx32p); if (ret) kill_ioctx(current->mm, ioctx, NULL); percpu_ref_put(&ioctx->users); } - out: return ret; } diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 2ac3d13a915b..67b7f03aa9fc 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -287,6 +287,8 @@ static inline void addr_limit_user_check(void) */ #ifndef CONFIG_ARCH_HAS_SYSCALL_WRAPPER asmlinkage long sys_io_setup(unsigned nr_reqs, aio_context_t __user *ctx); +asmlinkage long sys_io_setup2(unsigned, unsigned, void __user *, void __user *, + aio_context_t __user *); asmlinkage long sys_io_destroy(aio_context_t ctx); asmlinkage long sys_io_submit(aio_context_t, long, struct iocb __user * __user *); diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index c7f3321fbe43..1bbaa4c59f20 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -738,9 +738,11 @@ __SYSCALL(__NR_statx, sys_statx) __SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents) #define __NR_rseq 293 __SYSCALL(__NR_rseq, sys_rseq) +#define __NR_io_setup2 294 +__SYSCALL(__NR_io_setup2, sys_io_setup2) #undef __NR_syscalls -#define __NR_syscalls 294 +#define __NR_syscalls 295 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index df556175be50..17c8b4393669 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -37,6 +37,7 @@ asmlinkage long sys_ni_syscall(void) */ COND_SYSCALL(io_setup); +COND_SYSCALL(io_setup2); COND_SYSCALL_COMPAT(io_setup); COND_SYSCALL(io_destroy); COND_SYSCALL(io_submit); From patchwork Fri Dec 21 19:22:22 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740967 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3203413B5 for ; Fri, 21 Dec 2018 19:23:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2199428870 for ; Fri, 21 Dec 2018 19:23:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 15EFC28876; Fri, 21 Dec 2018 19:23:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7A27128871 for ; Fri, 21 Dec 2018 19:23:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391672AbeLUTXA (ORCPT ); Fri, 21 Dec 2018 14:23:00 -0500 Received: from mail-it1-f193.google.com ([209.85.166.193]:39510 "EHLO mail-it1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391659AbeLUTWy (ORCPT ); Fri, 21 Dec 2018 14:22:54 -0500 Received: by mail-it1-f193.google.com with SMTP id a6so8023492itl.4 for ; Fri, 21 Dec 2018 11:22:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=TVSdRc4+ZJPeOTfWFC5yXHOapsVhViTyhbLb6Q2MsOM=; b=pC2sb5qL2WmIqV36W/7x2TBXR1//fMbc9C029XWlAXhI7TM14/zdqKbSbLLn+X6v12 XN00lSfnv29xfHSHQkdc4ax4p4qTL+ab3tlkj2HFgIqDquCEqT4EI1HVnOKrqlnCxOTu NgIg1FkepQIdIjaW4g0NQeck47g9dZudmOxPCKzaKCuZ2Vl1dC7z4r3k4PVMi1bLuKvO 4637D5XIYtd/QivUmT2wB1uirBMtfM624fsSzKb/P0lmCN5com+PX/0WKUs9DzNk8S+J UoMpyr184UeXytVi6Ew/Z+8lHyTBAQn+Jrc8URtWIIzpkGagfGjJZ1T02XutkDCxJemh YIsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=TVSdRc4+ZJPeOTfWFC5yXHOapsVhViTyhbLb6Q2MsOM=; b=IHjoai+MkAX5u9czN6vJ8zlIV1Gpe47XkuTpjYYkQlcJw/BFuNjHqJ4BgQ0CPV1xxl XrJfySFkcnnaskeM2VH3IN9cqaq1+BEN/roDowWLgijCnqAlS/UNOSfrHPslS5LrQU9D FqYDzzFySkfseatKp9utzc0rYbSB/NXv9LzAga5APIhARKNOb+p9V9emxpkUPuXIqnlb eZ3r5Q2eXGUhBIvmpLEZQDUB7YYCPrbQgU5hham+62EvWg/r5VygKr78hA+r54tRngJW Gd2wiB1THmfUwdUOVDASQXn7MTXJF8cGUfqrBBBgXxqlGxuHcu6d8M2kkw7S8owa6gdT Ak8g== X-Gm-Message-State: AA+aEWYUyZZJTkTXqS4VUiOza5fhVrFeCl//H/xeOH/jKfe8GnsNo4sL iUAgQ4zgiA4Mqsv5OwoGtaGNfuYsGHhXbw== X-Google-Smtp-Source: AFSGD/X7qpJysDwnRhsDzRyWRDop5bAgB3emFR4xzL7hc4FfJN0htWM2+3Huu1ikLxS/u2kSQyJb1Q== X-Received: by 2002:a24:ad0:: with SMTP id 199mr2780086itw.142.1545420172432; Fri, 21 Dec 2018 11:22:52 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.22.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:22:51 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 08/22] aio: support for IO polling Date: Fri, 21 Dec 2018 12:22:22 -0700 Message-Id: <20181221192236.12866-9-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add polled variants of PREAD/PREADV and PWRITE/PWRITEV. These act like their non-polled counterparts, except we expect to poll for completion of them. The polling happens at io_getevent() time, and works just like non-polled IO. To setup an io_context for polled IO, the application must call io_setup2() with IOCTX_FLAG_IOPOLL as one of the flags. It is illegal to mix and match polled and non-polled IO on an io_context. Polled IO doesn't support the user mapped completion ring. Events must be reaped through the io_getevents() system call. For non-irq driven poll devices, there's no way to support completion reaping from userspace by just looking at the ring. The application itself is the one that pulls completion entries. Reviewed-by: Benny Halevy Signed-off-by: Jens Axboe --- fs/aio.c | 415 +++++++++++++++++++++++++++++++---- include/uapi/linux/aio_abi.h | 4 + 2 files changed, 380 insertions(+), 39 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 958f432a0e5b..f7992c7123bc 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -76,6 +76,8 @@ struct aio_ring { */ #define AIO_PLUG_THRESHOLD 2 +#define AIO_IOPOLL_BATCH 8 + #define AIO_RING_PAGES 8 struct kioctx_table { @@ -147,6 +149,18 @@ struct kioctx { atomic_t reqs_available; } ____cacheline_aligned_in_smp; + /* iopoll submission state */ + struct { + spinlock_t poll_lock; + struct list_head poll_submitted; + } ____cacheline_aligned_in_smp; + + /* iopoll completion state */ + struct { + struct list_head poll_completing; + struct mutex getevents_lock; + } ____cacheline_aligned_in_smp; + struct { spinlock_t ctx_lock; struct list_head active_reqs; /* used for cancellation */ @@ -199,14 +213,27 @@ struct aio_kiocb { __u64 ki_user_data; /* user's data for completion */ struct list_head ki_list; /* the aio core uses this - * for cancellation */ + * for cancellation, or for + * polled IO */ + + unsigned long ki_flags; +#define KIOCB_F_POLL_COMPLETED 0 /* polled IO has completed */ +#define KIOCB_F_POLL_EAGAIN 1 /* polled submission got EAGAIN */ + refcount_t ki_refcnt; - /* - * If the aio_resfd field of the userspace iocb is not zero, - * this is the underlying eventfd context to deliver events to. - */ - struct eventfd_ctx *ki_eventfd; + union { + /* + * If the aio_resfd field of the userspace iocb is not zero, + * this is the underlying eventfd context to deliver events to. + */ + struct eventfd_ctx *ki_eventfd; + + /* + * For polled IO, stash completion info here + */ + struct io_event ki_ev; + }; }; /*------ sysctl variables----*/ @@ -223,6 +250,8 @@ static struct vfsmount *aio_mnt; static const struct file_operations aio_ring_fops; static const struct address_space_operations aio_ctx_aops; +static void aio_iopoll_reap_events(struct kioctx *); + static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages) { struct file *file; @@ -460,11 +489,15 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) int i; struct file *file; - /* Compensate for the ring buffer's head/tail overlap entry */ - nr_events += 2; /* 1 is required, 2 for good luck */ - + /* + * Compensate for the ring buffer's head/tail overlap entry. + * IO polling doesn't require any io event entries + */ size = sizeof(struct aio_ring); - size += sizeof(struct io_event) * nr_events; + if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) { + nr_events += 2; /* 1 is required, 2 for good luck */ + size += sizeof(struct io_event) * nr_events; + } nr_pages = PFN_UP(size); if (nr_pages < 0) @@ -548,6 +581,14 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) return 0; } +/* + * Don't support cancel on anything that isn't old aio + */ +static bool aio_ctx_supports_cancel(struct kioctx *ctx) +{ + return (ctx->flags & IOCTX_FLAG_IOPOLL) == 0; +} + #define AIO_EVENTS_PER_PAGE (PAGE_SIZE / sizeof(struct io_event)) #define AIO_EVENTS_FIRST_PAGE ((PAGE_SIZE - sizeof(struct aio_ring)) / sizeof(struct io_event)) #define AIO_EVENTS_OFFSET (AIO_EVENTS_PER_PAGE - AIO_EVENTS_FIRST_PAGE) @@ -558,6 +599,8 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel) struct kioctx *ctx = req->ki_ctx; unsigned long flags; + if (WARN_ON_ONCE(!aio_ctx_supports_cancel(ctx))) + return; if (WARN_ON_ONCE(!list_empty(&req->ki_list))) return; @@ -746,6 +789,11 @@ static struct kioctx *io_setup_flags(unsigned long ctxid, INIT_LIST_HEAD(&ctx->active_reqs); + spin_lock_init(&ctx->poll_lock); + INIT_LIST_HEAD(&ctx->poll_submitted); + INIT_LIST_HEAD(&ctx->poll_completing); + mutex_init(&ctx->getevents_lock); + if (percpu_ref_init(&ctx->users, free_ioctx_users, 0, GFP_KERNEL)) goto err; @@ -817,11 +865,15 @@ static int kill_ioctx(struct mm_struct *mm, struct kioctx *ctx, { struct kioctx_table *table; + mutex_lock(&ctx->getevents_lock); spin_lock(&mm->ioctx_lock); if (atomic_xchg(&ctx->dead, 1)) { spin_unlock(&mm->ioctx_lock); + mutex_unlock(&ctx->getevents_lock); return -EINVAL; } + aio_iopoll_reap_events(ctx); + mutex_unlock(&ctx->getevents_lock); table = rcu_dereference_raw(mm->ioctx_table); WARN_ON(ctx != rcu_access_pointer(table->table[ctx->id])); @@ -1030,6 +1082,7 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx) percpu_ref_get(&ctx->reqs); req->ki_ctx = ctx; INIT_LIST_HEAD(&req->ki_list); + req->ki_flags = 0; refcount_set(&req->ki_refcnt, 0); req->ki_eventfd = NULL; return req; @@ -1072,6 +1125,15 @@ static inline void iocb_put(struct aio_kiocb *iocb) } } +static void iocb_put_many(struct kioctx *ctx, void **iocbs, int *nr) +{ + if (*nr) { + percpu_ref_put_many(&ctx->reqs, *nr); + kmem_cache_free_bulk(kiocb_cachep, *nr, iocbs); + *nr = 0; + } +} + static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb, long res, long res2) { @@ -1261,6 +1323,183 @@ static bool aio_read_events(struct kioctx *ctx, long min_nr, long nr, return ret < 0 || *i >= min_nr; } +/* + * Process completed iocb iopoll entries, copying the result to userspace. + */ +static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, + unsigned int *nr_events, long max) +{ + void *iocbs[AIO_IOPOLL_BATCH]; + struct aio_kiocb *iocb, *n; + int to_free = 0, ret = 0; + + /* Shouldn't happen... */ + if (*nr_events >= max) + return 0; + + list_for_each_entry_safe(iocb, n, &ctx->poll_completing, ki_list) { + if (*nr_events == max) + break; + if (!test_bit(KIOCB_F_POLL_COMPLETED, &iocb->ki_flags)) + continue; + if (to_free == AIO_IOPOLL_BATCH) + iocb_put_many(ctx, iocbs, &to_free); + + list_del(&iocb->ki_list); + iocbs[to_free++] = iocb; + + fput(iocb->rw.ki_filp); + + if (evs && copy_to_user(evs + *nr_events, &iocb->ki_ev, + sizeof(iocb->ki_ev))) { + ret = -EFAULT; + break; + } + (*nr_events)++; + } + + if (to_free) + iocb_put_many(ctx, iocbs, &to_free); + + return ret; +} + +/* + * Poll for a mininum of 'min' events, and a maximum of 'max'. Note that if + * min == 0 we consider that a non-spinning poll check - we'll still enter + * the driver poll loop, but only as a non-spinning completion check. + */ +static int aio_iopoll_getevents(struct kioctx *ctx, + struct io_event __user *event, + unsigned int *nr_events, long min, long max) +{ + struct aio_kiocb *iocb; + int to_poll, polled, ret; + + /* + * Check if we already have done events that satisfy what we need + */ + if (!list_empty(&ctx->poll_completing)) { + ret = aio_iopoll_reap(ctx, event, nr_events, max); + if (ret < 0) + return ret; + if ((min && *nr_events >= min) || *nr_events >= max) + return 0; + } + + /* + * Take in a new working set from the submitted list, if possible. + */ + if (!list_empty_careful(&ctx->poll_submitted)) { + spin_lock(&ctx->poll_lock); + list_splice_init(&ctx->poll_submitted, &ctx->poll_completing); + spin_unlock(&ctx->poll_lock); + } + + if (list_empty(&ctx->poll_completing)) + return 0; + + /* + * Check again now that we have a new batch. + */ + ret = aio_iopoll_reap(ctx, event, nr_events, max); + if (ret < 0) + return ret; + if ((min && *nr_events >= min) || *nr_events >= max) + return 0; + + /* + * Find up to 'max' worth of events to poll for, including the + * events we already successfully polled + */ + polled = to_poll = 0; + list_for_each_entry(iocb, &ctx->poll_completing, ki_list) { + /* + * Poll for needed events with spin == true, anything after + * that we just check if we have more, up to max. + */ + bool spin = !polled || *nr_events < min; + struct kiocb *kiocb = &iocb->rw; + + if (test_bit(KIOCB_F_POLL_COMPLETED, &iocb->ki_flags)) + break; + if (++to_poll + *nr_events > max) + break; + + ret = kiocb->ki_filp->f_op->iopoll(kiocb, spin); + if (ret < 0) + return ret; + + polled += ret; + if (polled + *nr_events >= max) + break; + } + + ret = aio_iopoll_reap(ctx, event, nr_events, max); + if (ret < 0) + return ret; + if (*nr_events >= min) + return 0; + return to_poll; +} + +/* + * We can't just wait for polled events to come to us, we have to actively + * find and complete them. + */ +static void aio_iopoll_reap_events(struct kioctx *ctx) +{ + if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) + return; + + while (!list_empty_careful(&ctx->poll_submitted) || + !list_empty(&ctx->poll_completing)) { + unsigned int nr_events = 0; + + aio_iopoll_getevents(ctx, NULL, &nr_events, 1, UINT_MAX); + } +} + +static int __aio_iopoll_check(struct kioctx *ctx, struct io_event __user *event, + unsigned int *nr_events, long min_nr, long max_nr) +{ + int ret = 0; + + while (!*nr_events || !need_resched()) { + int tmin = 0; + + if (*nr_events < min_nr) + tmin = min_nr - *nr_events; + + ret = aio_iopoll_getevents(ctx, event, nr_events, tmin, max_nr); + if (ret <= 0) + break; + ret = 0; + } + + return ret; +} + +static int aio_iopoll_check(struct kioctx *ctx, long min_nr, long nr, + struct io_event __user *event) +{ + unsigned int nr_events = 0; + int ret; + + /* Only allow one thread polling at a time */ + if (!mutex_trylock(&ctx->getevents_lock)) + return -EBUSY; + if (unlikely(atomic_read(&ctx->dead))) { + ret = -EINVAL; + goto err; + } + + ret = __aio_iopoll_check(ctx, event, &nr_events, min_nr, nr); +err: + mutex_unlock(&ctx->getevents_lock); + return nr_events ? nr_events : ret; +} + static long read_events(struct kioctx *ctx, long min_nr, long nr, struct io_event __user *event, ktime_t until) @@ -1305,7 +1544,9 @@ SYSCALL_DEFINE5(io_setup2, u32, nr_events, u32, flags, void __user *, user1, unsigned long ctx; long ret; - if (flags || user1 || user2) + if (user1 || user2) + return -EINVAL; + if (flags & ~IOCTX_FLAG_IOPOLL) return -EINVAL; ret = get_user(ctx, ctxp); @@ -1431,13 +1672,8 @@ static void aio_remove_iocb(struct aio_kiocb *iocb) spin_unlock_irqrestore(&ctx->ctx_lock, flags); } -static void aio_complete_rw(struct kiocb *kiocb, long res, long res2) +static void kiocb_end_write(struct kiocb *kiocb) { - struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw); - - if (!list_empty_careful(&iocb->ki_list)) - aio_remove_iocb(iocb); - if (kiocb->ki_flags & IOCB_WRITE) { struct inode *inode = file_inode(kiocb->ki_filp); @@ -1449,19 +1685,44 @@ static void aio_complete_rw(struct kiocb *kiocb, long res, long res2) __sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE); file_end_write(kiocb->ki_filp); } +} + +static void aio_complete_rw(struct kiocb *kiocb, long res, long res2) +{ + struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw); + + if (!list_empty_careful(&iocb->ki_list)) + aio_remove_iocb(iocb); + + kiocb_end_write(kiocb); fput(kiocb->ki_filp); aio_complete(iocb, res, res2); } -static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb) +static void aio_complete_rw_poll(struct kiocb *kiocb, long res, long res2) { + struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw); + + kiocb_end_write(kiocb); + + if (unlikely(res == -EAGAIN)) { + set_bit(KIOCB_F_POLL_EAGAIN, &iocb->ki_flags); + } else { + aio_fill_event(&iocb->ki_ev, iocb, res, res2); + set_bit(KIOCB_F_POLL_COMPLETED, &iocb->ki_flags); + } +} + +static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb) +{ + struct kioctx *ctx = kiocb->ki_ctx; + struct kiocb *req = &kiocb->rw; int ret; req->ki_filp = fget(iocb->aio_fildes); if (unlikely(!req->ki_filp)) return -EBADF; - req->ki_complete = aio_complete_rw; req->ki_pos = iocb->aio_offset; req->ki_flags = iocb_flags(req->ki_filp); if (iocb->aio_flags & IOCB_FLAG_RESFD) @@ -1487,9 +1748,35 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb) if (unlikely(ret)) goto out_fput; - req->ki_flags &= ~IOCB_HIPRI; /* no one is going to poll for this I/O */ - return 0; + if (iocb->aio_flags & IOCB_FLAG_HIPRI) { + /* shares space in the union, and is rather pointless.. */ + ret = -EINVAL; + if (iocb->aio_flags & IOCB_FLAG_RESFD) + goto out_fput; + + /* can't submit polled IO to a non-polled ctx */ + if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) + goto out_fput; + + ret = -EOPNOTSUPP; + if (!(req->ki_flags & IOCB_DIRECT) || + !req->ki_filp->f_op->iopoll) + goto out_fput; + + req->ki_flags |= IOCB_HIPRI; + req->ki_complete = aio_complete_rw_poll; + } else { + /* can't submit non-polled IO to a polled ctx */ + ret = -EINVAL; + if (ctx->flags & IOCTX_FLAG_IOPOLL) + goto out_fput; + /* no one is going to poll for this I/O */ + req->ki_flags &= ~IOCB_HIPRI; + req->ki_complete = aio_complete_rw; + } + + return 0; out_fput: fput(req->ki_filp); return ret; @@ -1534,15 +1821,40 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t ret) } } -static ssize_t aio_read(struct kiocb *req, const struct iocb *iocb, +/* + * After the iocb has been issued, it's safe to be found on the poll list. + * Adding the kiocb to the list AFTER submission ensures that we don't + * find it from a io_getevents() thread before the issuer is done accessing + * the kiocb cookie. + */ +static void aio_iopoll_iocb_issued(struct aio_kiocb *kiocb) +{ + /* + * For fast devices, IO may have already completed. If it has, add + * it to the front so we find it first. We can't add to the poll_done + * list as that's unlocked from the completion side. + */ + const int front = test_bit(KIOCB_F_POLL_COMPLETED, &kiocb->ki_flags); + struct kioctx *ctx = kiocb->ki_ctx; + + spin_lock(&ctx->poll_lock); + if (front) + list_add(&kiocb->ki_list, &ctx->poll_submitted); + else + list_add_tail(&kiocb->ki_list, &ctx->poll_submitted); + spin_unlock(&ctx->poll_lock); +} + +static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, bool vectored, bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; + struct kiocb *req = &kiocb->rw; struct iov_iter iter; struct file *file; ssize_t ret; - ret = aio_prep_rw(req, iocb); + ret = aio_prep_rw(kiocb, iocb); if (ret) return ret; file = req->ki_filp; @@ -1567,15 +1879,16 @@ static ssize_t aio_read(struct kiocb *req, const struct iocb *iocb, return ret; } -static ssize_t aio_write(struct kiocb *req, const struct iocb *iocb, +static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, bool vectored, bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; + struct kiocb *req = &kiocb->rw; struct iov_iter iter; struct file *file; ssize_t ret; - ret = aio_prep_rw(req, iocb); + ret = aio_prep_rw(kiocb, iocb); if (ret) return ret; file = req->ki_filp; @@ -1846,7 +2159,8 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, return -EINVAL; } - if (!get_reqs_available(ctx)) + /* Poll IO doesn't need ring reservations */ + if (!(ctx->flags & IOCTX_FLAG_IOPOLL) && !get_reqs_available(ctx)) return -EAGAIN; ret = -EAGAIN; @@ -1869,35 +2183,44 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, } } - ret = put_user(KIOCB_KEY, &user_iocb->aio_key); - if (unlikely(ret)) { - pr_debug("EFAULT: aio_key\n"); - goto out_put_req; + if (aio_ctx_supports_cancel(ctx)) { + ret = put_user(KIOCB_KEY, &user_iocb->aio_key); + if (unlikely(ret)) { + pr_debug("EFAULT: aio_key\n"); + goto out_put_req; + } } req->ki_user_iocb = user_iocb; req->ki_user_data = iocb->aio_data; + ret = -EINVAL; switch (iocb->aio_lio_opcode) { case IOCB_CMD_PREAD: - ret = aio_read(&req->rw, iocb, false, compat); + ret = aio_read(req, iocb, false, compat); break; case IOCB_CMD_PWRITE: - ret = aio_write(&req->rw, iocb, false, compat); + ret = aio_write(req, iocb, false, compat); break; case IOCB_CMD_PREADV: - ret = aio_read(&req->rw, iocb, true, compat); + ret = aio_read(req, iocb, true, compat); break; case IOCB_CMD_PWRITEV: - ret = aio_write(&req->rw, iocb, true, compat); + ret = aio_write(req, iocb, true, compat); break; case IOCB_CMD_FSYNC: + if (ctx->flags & IOCTX_FLAG_IOPOLL) + break; ret = aio_fsync(&req->fsync, iocb, false); break; case IOCB_CMD_FDSYNC: + if (ctx->flags & IOCTX_FLAG_IOPOLL) + break; ret = aio_fsync(&req->fsync, iocb, true); break; case IOCB_CMD_POLL: + if (ctx->flags & IOCTX_FLAG_IOPOLL) + break; ret = aio_poll(req, iocb); break; default: @@ -1913,13 +2236,21 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, */ if (ret) goto out_put_req; + if (ctx->flags & IOCTX_FLAG_IOPOLL) { + if (test_bit(KIOCB_F_POLL_EAGAIN, &req->ki_flags)) { + ret = -EAGAIN; + goto out_put_req; + } + aio_iopoll_iocb_issued(req); + } return 0; out_put_req: if (req->ki_eventfd) eventfd_ctx_put(req->ki_eventfd); iocb_put(req); out_put_reqs_available: - put_reqs_available(ctx, 1); + if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) + put_reqs_available(ctx, 1); return ret; } @@ -2075,6 +2406,9 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb, if (unlikely(!ctx)) return -EINVAL; + if (!aio_ctx_supports_cancel(ctx)) + goto err; + spin_lock_irq(&ctx->ctx_lock); kiocb = lookup_kiocb(ctx, iocb); if (kiocb) { @@ -2091,9 +2425,8 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb, */ ret = -EINPROGRESS; } - +err: percpu_ref_put(&ctx->users); - return ret; } @@ -2108,8 +2441,12 @@ static long do_io_getevents(aio_context_t ctx_id, long ret = -EINVAL; if (likely(ioctx)) { - if (likely(min_nr <= nr && min_nr >= 0)) - ret = read_events(ioctx, min_nr, nr, events, until); + if (likely(min_nr <= nr && min_nr >= 0)) { + if (ioctx->flags & IOCTX_FLAG_IOPOLL) + ret = aio_iopoll_check(ioctx, min_nr, nr, events); + else + ret = read_events(ioctx, min_nr, nr, events, until); + } percpu_ref_put(&ioctx->users); } diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index 8387e0af0f76..a6829bae9ada 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -52,9 +52,11 @@ enum { * is valid. * IOCB_FLAG_IOPRIO - Set if the "aio_reqprio" member of the "struct iocb" * is valid. + * IOCB_FLAG_HIPRI - Use IO completion polling */ #define IOCB_FLAG_RESFD (1 << 0) #define IOCB_FLAG_IOPRIO (1 << 1) +#define IOCB_FLAG_HIPRI (1 << 2) /* read() from /dev/aio returns these structures. */ struct io_event { @@ -106,6 +108,8 @@ struct iocb { __u32 aio_resfd; }; /* 64 bytes */ +#define IOCTX_FLAG_IOPOLL (1 << 0) /* io_context is polled */ + #undef IFBIG #undef IFLITTLE From patchwork Fri Dec 21 19:22:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740977 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1339413A4 for ; Fri, 21 Dec 2018 19:23:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 063DB28870 for ; Fri, 21 Dec 2018 19:23:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EE3DE28877; Fri, 21 Dec 2018 19:23:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9343C28870 for ; Fri, 21 Dec 2018 19:23:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391668AbeLUTXb (ORCPT ); Fri, 21 Dec 2018 14:23:31 -0500 Received: from mail-io1-f66.google.com ([209.85.166.66]:38346 "EHLO mail-io1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391654AbeLUTW4 (ORCPT ); Fri, 21 Dec 2018 14:22:56 -0500 Received: by mail-io1-f66.google.com with SMTP id l14so4486855ioj.5 for ; Fri, 21 Dec 2018 11:22:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=vXpwguwLtnPUq+6+PIgSPGxoBI182prBZulAu0l53Tc=; b=lo/Po2u+AHqWA5LcJQvX/GaBtcxtWQMWaU3hhYsYu8Uv8mPEfy7YlRwftQRitYg+jI NgR1hrA0DAYjDyae7jAcZ6UDFiy6WMwdhYcrVIo++YpYMo7FLTVnd22QsRfDH+B3edLn ptY3N+SO3I6oJuroOgzDS12LbVcj/xC9NJS5mzqM8N7J+keZOIRW5Rx4Qbx+tlyL8kM1 XAlcLd9NYJcfmg/hoD4WPUjZJGU/Tf+g7G2/JHt/ZZw6zOk6mbFjCTyfLM/9wf089K1R PbfG5CRP+j3WxpUTjlZacGZPwekEsTfLvNYfzaV5UpVYGAJRYRBeVwuDqeYJJye5Vr4X xJ/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=vXpwguwLtnPUq+6+PIgSPGxoBI182prBZulAu0l53Tc=; b=oGCfgnKEmWpfXktYm/MPYZ5BHU9lYUKY5X/0mu5W70MeQu2huWQszP33buB/voSBFn Z1bhs2DUOHjkh6v+3MJ+O/g8mbnQHgpJ3k0RdiVQbthqkb8YyTQan91/h0+NavMr8Fp7 sSmy8Etc7ifyZ73Je5I6/e47ueRsRLQYUH6WUXA8YZ38nHlbiaHzoLY9GeDE/bMxFI0h Gm4AvGmqavK94rMUuNdB2vyH6WdXQp+gKS6Z8Fhh+S9ralQxKm9D0u4HydDn/lQt6/TG d6aYfHYnVszGhbTCqSIAOKO4o45NIJiJxGayu/jS4pZgfJVPsaVEypIQu+siTLy1AuMD Uacw== X-Gm-Message-State: AJcUukf4YwwBhfDitA+ZJiOwhBa1Mtxvqo14X1i9gep6Hz4BwhuQ+wG7 liOkhsjmi8sKiyZRgc689TNGnggcsFxj0w== X-Google-Smtp-Source: ALg8bN7dQPQaIxl1Bd3tOAZ6ToEf0eeHGu2lTKt8POOGbVxOE4/U+3En0UiRYB0/69ayB0ACdvJAIA== X-Received: by 2002:a6b:bf41:: with SMTP id p62mr2656657iof.193.1545420174877; Fri, 21 Dec 2018 11:22:54 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.22.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:22:53 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 09/22] aio: add submission side request cache Date: Fri, 21 Dec 2018 12:22:23 -0700 Message-Id: <20181221192236.12866-10-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We have to add each submitted polled request to the io_context poll_submitted list, which means we have to grab the poll_lock. We already use the block plug to batch submissions if we're doing a batch of IO submissions, extend that to cover the poll requests internally as well. Signed-off-by: Jens Axboe --- fs/aio.c | 140 +++++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 115 insertions(+), 25 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index f7992c7123bc..ac296139593f 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -236,6 +236,21 @@ struct aio_kiocb { }; }; +struct aio_submit_state { + struct kioctx *ctx; + + struct blk_plug plug; +#ifdef CONFIG_BLOCK + struct blk_plug_cb plug_cb; +#endif + + /* + * Polled iocbs that have been submitted, but not added to the ctx yet + */ + struct list_head req_list; + unsigned int req_count; +}; + /*------ sysctl variables----*/ static DEFINE_SPINLOCK(aio_nr_lock); unsigned long aio_nr; /* current system wide number of aio requests */ @@ -1822,29 +1837,62 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t ret) } /* - * After the iocb has been issued, it's safe to be found on the poll list. - * Adding the kiocb to the list AFTER submission ensures that we don't - * find it from a io_getevents() thread before the issuer is done accessing - * the kiocb cookie. + * Called either at the end of IO submission, or through a plug callback + * because we're going to schedule. Moves out local batch of requests to + * the ctx poll list, so they can be found for polling + reaping. */ -static void aio_iopoll_iocb_issued(struct aio_kiocb *kiocb) +static void aio_flush_state_reqs(struct kioctx *ctx, + struct aio_submit_state *state) +{ + spin_lock(&ctx->poll_lock); + list_splice_tail_init(&state->req_list, &ctx->poll_submitted); + spin_unlock(&ctx->poll_lock); + state->req_count = 0; +} + +static void aio_iopoll_iocb_add_list(struct aio_kiocb *kiocb) { + struct kioctx *ctx = kiocb->ki_ctx; + /* * For fast devices, IO may have already completed. If it has, add * it to the front so we find it first. We can't add to the poll_done * list as that's unlocked from the completion side. */ - const int front = test_bit(KIOCB_F_POLL_COMPLETED, &kiocb->ki_flags); - struct kioctx *ctx = kiocb->ki_ctx; - spin_lock(&ctx->poll_lock); - if (front) + if (test_bit(KIOCB_F_POLL_COMPLETED, &kiocb->ki_flags)) list_add(&kiocb->ki_list, &ctx->poll_submitted); else list_add_tail(&kiocb->ki_list, &ctx->poll_submitted); spin_unlock(&ctx->poll_lock); } +static void aio_iopoll_iocb_add_state(struct aio_submit_state *state, + struct aio_kiocb *kiocb) +{ + if (test_bit(KIOCB_F_POLL_COMPLETED, &kiocb->ki_flags)) + list_add(&kiocb->ki_list, &state->req_list); + else + list_add_tail(&kiocb->ki_list, &state->req_list); + + if (++state->req_count >= AIO_IOPOLL_BATCH) + aio_flush_state_reqs(state->ctx, state); +} +/* + * After the iocb has been issued, it's safe to be found on the poll list. + * Adding the kiocb to the list AFTER submission ensures that we don't + * find it from a io_getevents() thread before the issuer is done accessing + * the kiocb cookie. + */ +static void aio_iopoll_iocb_issued(struct aio_submit_state *state, + struct aio_kiocb *kiocb) +{ + if (!state || !IS_ENABLED(CONFIG_BLOCK)) + aio_iopoll_iocb_add_list(kiocb); + else + aio_iopoll_iocb_add_state(state, kiocb); +} + static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, bool vectored, bool compat) { @@ -2138,7 +2186,8 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) } static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, - struct iocb __user *user_iocb, bool compat) + struct iocb __user *user_iocb, + struct aio_submit_state *state, bool compat) { struct aio_kiocb *req; ssize_t ret; @@ -2241,7 +2290,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, ret = -EAGAIN; goto out_put_req; } - aio_iopoll_iocb_issued(req); + aio_iopoll_iocb_issued(state, req); } return 0; out_put_req: @@ -2255,14 +2304,51 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, } static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, - bool compat) + struct aio_submit_state *state, bool compat) { struct iocb iocb; if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb)))) return -EFAULT; - return __io_submit_one(ctx, &iocb, user_iocb, compat); + return __io_submit_one(ctx, &iocb, user_iocb, state, compat); +} + +#ifdef CONFIG_BLOCK +static void aio_state_unplug(struct blk_plug_cb *cb, bool from_schedule) +{ + struct aio_submit_state *state; + + state = container_of(cb, struct aio_submit_state, plug_cb); + if (!list_empty(&state->req_list)) + aio_flush_state_reqs(state->ctx, state); +} +#endif + +/* + * Batched submission is done, ensure local IO is flushed out. + */ +static void aio_submit_state_end(struct aio_submit_state *state) +{ + blk_finish_plug(&state->plug); + if (!list_empty(&state->req_list)) + aio_flush_state_reqs(state->ctx, state); +} + +/* + * Start submission side cache. + */ +static void aio_submit_state_start(struct aio_submit_state *state, + struct kioctx *ctx) +{ + state->ctx = ctx; + INIT_LIST_HEAD(&state->req_list); + state->req_count = 0; +#ifdef CONFIG_BLOCK + state->plug_cb.callback = aio_state_unplug; + blk_start_plug(&state->plug); + list_add(&state->plug_cb.list, &state->plug.cb_list); +#endif } /* sys_io_submit: @@ -2280,10 +2366,10 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, struct iocb __user * __user *, iocbpp) { + struct aio_submit_state state, *statep = NULL; struct kioctx *ctx; long ret = 0; int i = 0; - struct blk_plug plug; if (unlikely(nr < 0)) return -EINVAL; @@ -2297,8 +2383,10 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, if (nr > ctx->nr_events) nr = ctx->nr_events; - if (nr > AIO_PLUG_THRESHOLD) - blk_start_plug(&plug); + if (nr > AIO_PLUG_THRESHOLD) { + aio_submit_state_start(&state, ctx); + statep = &state; + } for (i = 0; i < nr; i++) { struct iocb __user *user_iocb; @@ -2307,12 +2395,12 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, break; } - ret = io_submit_one(ctx, user_iocb, false); + ret = io_submit_one(ctx, user_iocb, statep, false); if (ret) break; } - if (nr > AIO_PLUG_THRESHOLD) - blk_finish_plug(&plug); + if (statep) + aio_submit_state_end(statep); percpu_ref_put(&ctx->users); return i ? i : ret; @@ -2322,10 +2410,10 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, int, nr, compat_uptr_t __user *, iocbpp) { + struct aio_submit_state state, *statep = NULL; struct kioctx *ctx; long ret = 0; int i = 0; - struct blk_plug plug; if (unlikely(nr < 0)) return -EINVAL; @@ -2339,8 +2427,10 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, if (nr > ctx->nr_events) nr = ctx->nr_events; - if (nr > AIO_PLUG_THRESHOLD) - blk_start_plug(&plug); + if (nr > AIO_PLUG_THRESHOLD) { + aio_submit_state_start(&state, ctx); + statep = &state; + } for (i = 0; i < nr; i++) { compat_uptr_t user_iocb; @@ -2349,12 +2439,12 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, break; } - ret = io_submit_one(ctx, compat_ptr(user_iocb), true); + ret = io_submit_one(ctx, compat_ptr(user_iocb), statep, true); if (ret) break; } - if (nr > AIO_PLUG_THRESHOLD) - blk_finish_plug(&plug); + if (statep) + aio_submit_state_end(statep); percpu_ref_put(&ctx->users); return i ? i : ret; From patchwork Fri Dec 21 19:22:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740923 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 43D0F1850 for ; Fri, 21 Dec 2018 19:23:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 334A32884A for ; Fri, 21 Dec 2018 19:23:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2786528875; Fri, 21 Dec 2018 19:23:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C98CB28875 for ; Fri, 21 Dec 2018 19:23:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391673AbeLUTXA (ORCPT ); Fri, 21 Dec 2018 14:23:00 -0500 Received: from mail-it1-f196.google.com ([209.85.166.196]:40793 "EHLO mail-it1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391665AbeLUTW6 (ORCPT ); Fri, 21 Dec 2018 14:22:58 -0500 Received: by mail-it1-f196.google.com with SMTP id h193so8016365ita.5 for ; Fri, 21 Dec 2018 11:22:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Vn1H2PKRH+r2K/nzBAStOQAzrcfiBWWG76LFewR/io0=; b=vzb57S2ZmDfhFGP1J483vLS/Ny1PSITwmhybFwYj1F8swmh5sKKAXZLFxgcbg0HZ8B G0CGi8xrKeUpprNkOry9/Q3VGL4zV0yzxkTuYfCqmOPVB2by3b3oDuJSICTrUCGk+9wk BrPafFCNTZCOZrLr1ZeyuzKH9jvE68u6rWegYxJmjEKfq7JIR+13w9H3OcqSJ0l3l9S5 1BEuJRWHPKZMpsDdgMciAzmmGLt2zDNdAR1JIpQjxg/E5ZImMpsHVux+J8Bap2CeGoY9 CW6q4zE94VZLSYqzUWAgQlNCLcG3YCt7YXOo89LGrYqQ76A0fRAKRpV201LTUDHwfNMU PzzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Vn1H2PKRH+r2K/nzBAStOQAzrcfiBWWG76LFewR/io0=; b=A81mGsTk7kCYCf1Hwtod6ES69elFNqjYcBVDdOM7xZ5khpLdPQ2WzpMVtN2RUVOR1l zwa0Kzqpjx4e2mG2kzY+2LxR4H2BWuBEoerC5Kjjfy5s76hwm+85mOFgsurm36kqxW5z 3kykW6bHb9ylcS+rfj7esEuTm0zTPC5P2gMvqsUxyi6ipadZ2b44Ke+H85/dmgSgRqts UlFrx49wpKnyfEqETz3CLqVvKYiNZIemTPvff/GaxGsWL0Pol7pHqMYodlqUG0+M+TVc 0jCBKt5ps+54oMOm3eG1l9BsjsavemNIA83u6WmsudwHFu9kl5Rm2AK5fxiS+WdtIFcu GrGg== X-Gm-Message-State: AA+aEWaFgFROk9fI8cNYR2ZBomY33ogyOqxxcJfFRk1MmyB2jiqX+E57 zsQuDNlZc+Ggx3yehwzr5yyXpX0PVILGdw== X-Google-Smtp-Source: AFSGD/VX3SuDbhQW2qknCyT0szW+E5Ww7t5ccHQUcWPO3c8Htb1zmqaYVejQUZF3DRt1NDMhXoSUHA== X-Received: by 2002:a24:168f:: with SMTP id a137mr3103235ita.71.1545420176427; Fri, 21 Dec 2018 11:22:56 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.22.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:22:55 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 10/22] fs: add fget_many() and fput_many() Date: Fri, 21 Dec 2018 12:22:24 -0700 Message-Id: <20181221192236.12866-11-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Some uses cases repeatedly get and put references to the same file, but the only exposed interface is doing these one at the time. As each of these entail an atomic inc or dec on a shared structure, that cost can add up. Add fget_many(), which works just like fget(), except it takes an argument for how many references to get on the file. Ditto fput_many(), which can drop an arbitrary number of references to a file. Signed-off-by: Jens Axboe --- fs/file.c | 15 ++++++++++----- fs/file_table.c | 9 +++++++-- include/linux/file.h | 2 ++ include/linux/fs.h | 4 +++- 4 files changed, 22 insertions(+), 8 deletions(-) diff --git a/fs/file.c b/fs/file.c index 7ffd6e9d103d..ad9870edfd51 100644 --- a/fs/file.c +++ b/fs/file.c @@ -676,7 +676,7 @@ void do_close_on_exec(struct files_struct *files) spin_unlock(&files->file_lock); } -static struct file *__fget(unsigned int fd, fmode_t mask) +static struct file *__fget(unsigned int fd, fmode_t mask, unsigned int refs) { struct files_struct *files = current->files; struct file *file; @@ -691,7 +691,7 @@ static struct file *__fget(unsigned int fd, fmode_t mask) */ if (file->f_mode & mask) file = NULL; - else if (!get_file_rcu(file)) + else if (!get_file_rcu_many(file, refs)) goto loop; } rcu_read_unlock(); @@ -699,15 +699,20 @@ static struct file *__fget(unsigned int fd, fmode_t mask) return file; } +struct file *fget_many(unsigned int fd, unsigned int refs) +{ + return __fget(fd, FMODE_PATH, refs); +} + struct file *fget(unsigned int fd) { - return __fget(fd, FMODE_PATH); + return fget_many(fd, 1); } EXPORT_SYMBOL(fget); struct file *fget_raw(unsigned int fd) { - return __fget(fd, 0); + return __fget(fd, 0, 1); } EXPORT_SYMBOL(fget_raw); @@ -738,7 +743,7 @@ static unsigned long __fget_light(unsigned int fd, fmode_t mask) return 0; return (unsigned long)file; } else { - file = __fget(fd, mask); + file = __fget(fd, mask, 1); if (!file) return 0; return FDPUT_FPUT | (unsigned long)file; diff --git a/fs/file_table.c b/fs/file_table.c index e49af4caf15d..6a715639728d 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -326,9 +326,9 @@ void flush_delayed_fput(void) static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput); -void fput(struct file *file) +void fput_many(struct file *file, unsigned int refs) { - if (atomic_long_dec_and_test(&file->f_count)) { + if (atomic_long_sub_and_test(refs, &file->f_count)) { struct task_struct *task = current; if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) { @@ -347,6 +347,11 @@ void fput(struct file *file) } } +void fput(struct file *file) +{ + fput_many(file, 1); +} + /* * synchronous analog of fput(); for kernel threads that might be needed * in some umount() (and thus can't use flush_delayed_fput() without diff --git a/include/linux/file.h b/include/linux/file.h index 6b2fb032416c..3fcddff56bc4 100644 --- a/include/linux/file.h +++ b/include/linux/file.h @@ -13,6 +13,7 @@ struct file; extern void fput(struct file *); +extern void fput_many(struct file *, unsigned int); struct file_operations; struct vfsmount; @@ -44,6 +45,7 @@ static inline void fdput(struct fd fd) } extern struct file *fget(unsigned int fd); +extern struct file *fget_many(unsigned int fd, unsigned int refs); extern struct file *fget_raw(unsigned int fd); extern unsigned long __fdget(unsigned int fd); extern unsigned long __fdget_raw(unsigned int fd); diff --git a/include/linux/fs.h b/include/linux/fs.h index 6a5f71f8ae06..e81d0a64d369 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -952,7 +952,9 @@ static inline struct file *get_file(struct file *f) atomic_long_inc(&f->f_count); return f; } -#define get_file_rcu(x) atomic_long_inc_not_zero(&(x)->f_count) +#define get_file_rcu_many(x, cnt) \ + atomic_long_add_unless(&(x)->f_count, (cnt), 0) +#define get_file_rcu(x) get_file_rcu_many((x), 1) #define fput_atomic(x) atomic_long_add_unless(&(x)->f_count, -1, 1) #define file_count(x) atomic_long_read(&(x)->f_count) From patchwork Fri Dec 21 19:22:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740975 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 650F014DE for ; Fri, 21 Dec 2018 19:23:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 57E3328870 for ; Fri, 21 Dec 2018 19:23:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4BCCF28876; Fri, 21 Dec 2018 19:23:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E6DC428870 for ; Fri, 21 Dec 2018 19:23:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391664AbeLUTXa (ORCPT ); Fri, 21 Dec 2018 14:23:30 -0500 Received: from mail-it1-f193.google.com ([209.85.166.193]:36805 "EHLO mail-it1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391668AbeLUTW7 (ORCPT ); Fri, 21 Dec 2018 14:22:59 -0500 Received: by mail-it1-f193.google.com with SMTP id c9so8041328itj.1 for ; Fri, 21 Dec 2018 11:22:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=jc0IDV8X+KKOBYo0lYsAf4XRCrbzGXLd0FFRKhIPQtQ=; b=cMNK4hQ3O8FBybp18Lk03FkR51+mpXGxdoXHrnSTujrNoxBKFmF3npPwnmeOIKtJcL soa7RGZRQ5jPsQGfP6Sk4WvJD+5dyVlzR9SGhhb//FoSCaL1g1RRPlLoFUw2LnHqSzg0 F8XTBGvKUaDemQxOTJ4/wNr8SR9AZcOx0q2IiTLEMJLMViqIsXu7JLyb8lpTWGq6FIa2 gSwEMveUxJaVYDod4PakWv4fQoMbBfbNgTmtYMIlcewmCLcLx9/npyQpsDN/jWWsjJiY bqmtf0/cPpy8jYh/3oMssMAozb5T+JcH/fxWtzL10fJSVRAzfqcOpO/CdLD6VA5jVHvO NaXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=jc0IDV8X+KKOBYo0lYsAf4XRCrbzGXLd0FFRKhIPQtQ=; b=muv/Epe84kOlytP1ZWjZqcpIrkLww6210naYAV/QqGuC/mempP9bnRghe3ywJuQhnc Cyf4XSogj/x5gYV7s58rydVqqoFJbNBTyHiv/Fjhpq6hlw1T50DFdOtAQ+UC3tg8+1ZI moX/qYSiilh/n6zfG8YayAK5SNNcM9kuPp/gAlfuhZxV1A7Vz0cqyliXoKDaApeeNp6f 2mfSpZVgmo1p2rb/SKYQ8b+3UOSSd2Q+cznksR3hVo8+ZufgK5ad8NHuLrOxHJfvvUe5 enx5RxM8ypxIG1Sfn0/LLwL6g32RoS+3ghoRYXdouF43/HVqqZofTU5fdpTQWgFqxHrn R7Dw== X-Gm-Message-State: AA+aEWYdm95kBavnWUUEwMbTN5+pRlAeIXU76cMQpW863eQET08/4uvz ArKAurXJBGnMgMzJQuakgYmrfDnEaiJPnA== X-Google-Smtp-Source: ALg8bN7/0kMlgHGJ6J7jdzaWgEmFXAgZZNKsUcbjumou+9q/rlYNe6KMlvdnOWiLQ0cdOcOgo5Ij8g== X-Received: by 2002:a24:414c:: with SMTP id x73mr2690274ita.129.1545420178116; Fri, 21 Dec 2018 11:22:58 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.22.56 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:22:57 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 11/22] aio: use fget/fput_many() for file references Date: Fri, 21 Dec 2018 12:22:25 -0700 Message-Id: <20181221192236.12866-12-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On the submission side, add file reference batching to the aio_submit_state. We get as many references as the number of iocbs we are submitting, and drop unused ones if we end up switching files. The assumption here is that we're usually only dealing with one fd, and if there are multiple, hopefuly they are at least somewhat ordered. Could trivially be extended to cover multiple fds, if needed. On the completion side we do the same thing, except this is trivially done just locally in aio_iopoll_reap(). Signed-off-by: Jens Axboe --- fs/aio.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 94 insertions(+), 16 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index ac296139593f..33d1d2c0d6fe 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -249,6 +249,15 @@ struct aio_submit_state { */ struct list_head req_list; unsigned int req_count; + + /* + * File reference cache + */ + struct file *file; + unsigned int fd; + unsigned int has_refs; + unsigned int used_refs; + unsigned int ios_left; }; /*------ sysctl variables----*/ @@ -1346,7 +1355,8 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, { void *iocbs[AIO_IOPOLL_BATCH]; struct aio_kiocb *iocb, *n; - int to_free = 0, ret = 0; + int file_count, to_free = 0, ret = 0; + struct file *file = NULL; /* Shouldn't happen... */ if (*nr_events >= max) @@ -1363,7 +1373,20 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, list_del(&iocb->ki_list); iocbs[to_free++] = iocb; - fput(iocb->rw.ki_filp); + /* + * Batched puts of the same file, to avoid dirtying the + * file usage count multiple times, if avoidable. + */ + if (!file) { + file = iocb->rw.ki_filp; + file_count = 1; + } else if (file == iocb->rw.ki_filp) { + file_count++; + } else { + fput_many(file, file_count); + file = iocb->rw.ki_filp; + file_count = 1; + } if (evs && copy_to_user(evs + *nr_events, &iocb->ki_ev, sizeof(iocb->ki_ev))) { @@ -1373,6 +1396,9 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, (*nr_events)++; } + if (file) + fput_many(file, file_count); + if (to_free) iocb_put_many(ctx, iocbs, &to_free); @@ -1729,13 +1755,60 @@ static void aio_complete_rw_poll(struct kiocb *kiocb, long res, long res2) } } -static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb) +static void aio_file_put(struct aio_submit_state *state, struct file *file) +{ + if (!state) { + fput(file); + } else if (state->file) { + int diff = state->has_refs - state->used_refs; + + if (diff) + fput_many(state->file, diff); + state->file = NULL; + } +} + +/* + * Get as many references to a file as we have IOs left in this submission, + * assuming most submissions are for one file, or at least that each file + * has more than one submission. + */ +static struct file *aio_file_get(struct aio_submit_state *state, int fd) +{ + if (!state) + return fget(fd); + + if (!state->file) { +get_file: + state->file = fget_many(fd, state->ios_left); + if (!state->file) + return NULL; + + state->fd = fd; + state->has_refs = state->ios_left; + state->used_refs = 1; + state->ios_left--; + return state->file; + } + + if (state->fd == fd) { + state->used_refs++; + state->ios_left--; + return state->file; + } + + aio_file_put(state, NULL); + goto get_file; +} + +static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb, + struct aio_submit_state *state) { struct kioctx *ctx = kiocb->ki_ctx; struct kiocb *req = &kiocb->rw; int ret; - req->ki_filp = fget(iocb->aio_fildes); + req->ki_filp = aio_file_get(state, iocb->aio_fildes); if (unlikely(!req->ki_filp)) return -EBADF; req->ki_pos = iocb->aio_offset; @@ -1793,7 +1866,7 @@ static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb) return 0; out_fput: - fput(req->ki_filp); + aio_file_put(state, req->ki_filp); return ret; } @@ -1894,7 +1967,8 @@ static void aio_iopoll_iocb_issued(struct aio_submit_state *state, } static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, - bool vectored, bool compat) + struct aio_submit_state *state, bool vectored, + bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *req = &kiocb->rw; @@ -1902,7 +1976,7 @@ static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, struct file *file; ssize_t ret; - ret = aio_prep_rw(kiocb, iocb); + ret = aio_prep_rw(kiocb, iocb, state); if (ret) return ret; file = req->ki_filp; @@ -1928,7 +2002,8 @@ static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, } static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, - bool vectored, bool compat) + struct aio_submit_state *state, bool vectored, + bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *req = &kiocb->rw; @@ -1936,7 +2011,7 @@ static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, struct file *file; ssize_t ret; - ret = aio_prep_rw(kiocb, iocb); + ret = aio_prep_rw(kiocb, iocb, state); if (ret) return ret; file = req->ki_filp; @@ -2246,16 +2321,16 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, ret = -EINVAL; switch (iocb->aio_lio_opcode) { case IOCB_CMD_PREAD: - ret = aio_read(req, iocb, false, compat); + ret = aio_read(req, iocb, state, false, compat); break; case IOCB_CMD_PWRITE: - ret = aio_write(req, iocb, false, compat); + ret = aio_write(req, iocb, state, false, compat); break; case IOCB_CMD_PREADV: - ret = aio_read(req, iocb, true, compat); + ret = aio_read(req, iocb, state, true, compat); break; case IOCB_CMD_PWRITEV: - ret = aio_write(req, iocb, true, compat); + ret = aio_write(req, iocb, state, true, compat); break; case IOCB_CMD_FSYNC: if (ctx->flags & IOCTX_FLAG_IOPOLL) @@ -2333,17 +2408,20 @@ static void aio_submit_state_end(struct aio_submit_state *state) blk_finish_plug(&state->plug); if (!list_empty(&state->req_list)) aio_flush_state_reqs(state->ctx, state); + aio_file_put(state, NULL); } /* * Start submission side cache. */ static void aio_submit_state_start(struct aio_submit_state *state, - struct kioctx *ctx) + struct kioctx *ctx, int max_ios) { state->ctx = ctx; INIT_LIST_HEAD(&state->req_list); state->req_count = 0; + state->file = NULL; + state->ios_left = max_ios; #ifdef CONFIG_BLOCK state->plug_cb.callback = aio_state_unplug; blk_start_plug(&state->plug); @@ -2384,7 +2462,7 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, nr = ctx->nr_events; if (nr > AIO_PLUG_THRESHOLD) { - aio_submit_state_start(&state, ctx); + aio_submit_state_start(&state, ctx, nr); statep = &state; } for (i = 0; i < nr; i++) { @@ -2428,7 +2506,7 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, nr = ctx->nr_events; if (nr > AIO_PLUG_THRESHOLD) { - aio_submit_state_start(&state, ctx); + aio_submit_state_start(&state, ctx, nr); statep = &state; } for (i = 0; i < nr; i++) { From patchwork Fri Dec 21 19:22:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740927 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ECEDF13A4 for ; Fri, 21 Dec 2018 19:23:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E021C2884A for ; Fri, 21 Dec 2018 19:23:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D494328871; Fri, 21 Dec 2018 19:23:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 98C782884A for ; Fri, 21 Dec 2018 19:23:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391677AbeLUTXC (ORCPT ); Fri, 21 Dec 2018 14:23:02 -0500 Received: from mail-it1-f195.google.com ([209.85.166.195]:52646 "EHLO mail-it1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391664AbeLUTXA (ORCPT ); Fri, 21 Dec 2018 14:23:00 -0500 Received: by mail-it1-f195.google.com with SMTP id g76so8445551itg.2 for ; Fri, 21 Dec 2018 11:23:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=JDeCVsy7jDLcbGUHdnLQfCWULqbGuJjXneNakcz9EtU=; b=N+Lt4/Zus7cYcd3vbDF015+9Z9P3hIMyV1hhIcz74/iSKQVo2YsXn+LaBlq/ZeB9TU tz7AGfNgRy9ZEGZncnLcX2K0fSXWmBK+6kEyJR3UeryQqALp+xZKXNsfy5e2C7z/Wf4g qQmndFcay8mOSzT4NqRCxxrvmW8d2Gr6WjRuHRcOKCIOkguVyzCb/ml54T2of3UBs88T zpJaHo70YV3dCk9rfEBAvoh3opddgKtN9GynsdfRqiO/7KgeywtXwCV5VqDt29AlWXEc WE1eIwsNxcGNiZ+cwG9wM560EbxxFhTJq7wQHBVgStFsdoUD5eRUrcYgP170/SiSix9j X7cQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=JDeCVsy7jDLcbGUHdnLQfCWULqbGuJjXneNakcz9EtU=; b=eehtJVYG4IJPHXXFVlEQYPEaCt1CAsjjXT8uA7z65Mw5ktRy5VnqueDN0MkjS/i7Vs +XRR+Ln2cly+0PqMmD82y3TObS15XG3l8RFYAP1znFU7+OLmzilnfG0lFaL0lMKftjxE rMLhnN56HZKzad97Xdpdu0TPPpBGQjYNBJ7aHF2p2CwwpVodXOqCOwFaApePtZj7X/NV zAlhrPo++sWCEcb2nDfJ2MsJXx2i5fdbvUtaXYcrcOq61YkNut2hynK+sOfX/TZrhhuU 80VBF/RXgx4Vk+kPns6MgvDwR6Qlq+qLg9ufi2cT8BJZEQKPZgvB+DuJkDY69O7FUcUT 9/WQ== X-Gm-Message-State: AA+aEWY3670Wg2M4zp1uyo27W3lzu3UHnz5cMGTwkVmMIBvnWAZBqgQQ f1jyXt+bbi5/063HAkJeFNonrAdHAHMDsA== X-Google-Smtp-Source: AFSGD/XNknEu1MesUURics0AZoJJ8UhJFgcMl0teTbiM2/r7DbS7mN/ahX4sGb6MBaG/PlTaXrt8kw== X-Received: by 2002:a24:fa04:: with SMTP id v4mr2483408ith.175.1545420179465; Fri, 21 Dec 2018 11:22:59 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.22.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:22:58 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 12/22] aio: split iocb init from allocation Date: Fri, 21 Dec 2018 12:22:26 -0700 Message-Id: <20181221192236.12866-13-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In preparation from having pre-allocated requests, that we then just need to initialize before use. Signed-off-by: Jens Axboe --- fs/aio.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 33d1d2c0d6fe..093e6c8e9e09 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1091,6 +1091,16 @@ static bool get_reqs_available(struct kioctx *ctx) return __get_reqs_available(ctx); } +static void aio_iocb_init(struct kioctx *ctx, struct aio_kiocb *req) +{ + percpu_ref_get(&ctx->reqs); + req->ki_ctx = ctx; + INIT_LIST_HEAD(&req->ki_list); + req->ki_flags = 0; + refcount_set(&req->ki_refcnt, 0); + req->ki_eventfd = NULL; +} + /* aio_get_req * Allocate a slot for an aio request. * Returns NULL if no requests are free. @@ -1103,12 +1113,7 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx) if (unlikely(!req)) return NULL; - percpu_ref_get(&ctx->reqs); - req->ki_ctx = ctx; - INIT_LIST_HEAD(&req->ki_list); - req->ki_flags = 0; - refcount_set(&req->ki_refcnt, 0); - req->ki_eventfd = NULL; + aio_iocb_init(ctx, req); return req; } From patchwork Fri Dec 21 19:22:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740931 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 08F2713B5 for ; Fri, 21 Dec 2018 19:23:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EE4B02884A for ; Fri, 21 Dec 2018 19:23:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E284228870; Fri, 21 Dec 2018 19:23:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 982C22884A for ; Fri, 21 Dec 2018 19:23:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391680AbeLUTXE (ORCPT ); Fri, 21 Dec 2018 14:23:04 -0500 Received: from mail-it1-f196.google.com ([209.85.166.196]:39534 "EHLO mail-it1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391676AbeLUTXC (ORCPT ); Fri, 21 Dec 2018 14:23:02 -0500 Received: by mail-it1-f196.google.com with SMTP id a6so8024106itl.4 for ; Fri, 21 Dec 2018 11:23:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=m8JZeS9KbnWxYLam/2YmCF5OT6qc7l1mrTQmfc17bac=; b=qVSq05CH6WuQj5saphOXjPa5ld/kiUNwu6ETrkN0KhxyObiLa62KRiFFO9p1SbW3XT CGma0nTZYrpwUl2H/3sGtk9ck2Y6OCDNfZYKgKFKb1K1UIWt11buD+gPByDEFqpgDQ4F gYZlJqHFdPGm6RwzZ9kXmI5Vy8Mv/OUmnY0Ptqjd3xZTinE+msH2xmkvmFxMBwARvlXB 3KzSTblDDjs6K4b4doIMbnihYUljidyucG65+WF0I6KdRpjdKYWIaSME48u/QK6FFRQ8 GTyFOPjKvq3YO+8bFuIBsj93QRrkYLGE/OjtkXiSc9zv1+rzXcAdRGhCqhgfm+lNPMn4 d2XA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=m8JZeS9KbnWxYLam/2YmCF5OT6qc7l1mrTQmfc17bac=; b=UV9MkwnqgjDCe4iJjofwzYwGdTJfEQ/2ZAQ+cND0bOiiW5fsMJCe7QjBZNUOdamQep tGNapx/f2ObfAgBTOkTSYQPacrGfA4VZvb3yUuw8tG+ws7OKdJxC73i1R7R6lNBxoTcl ydqGnsiMvU/HDRHHOQ5yeUYsYie4t5w/S9C1QX8Xpp1Bt3++GDFHb59lytBPuXrHZg4T rVwXcPdieFQavjskcS8p+4Qg9QHdi3eBoLI2WcjtS04AXgdRciUjdTbpt8IVEqeCKJor Ea5K51Px6huMg4VQz1M5KZhK07fXGwpu3R1hbeYZptOXp6q8pIi0fnjHEVdvq4OxWP03 uYUA== X-Gm-Message-State: AA+aEWbEQLhPrsOFPqs38PxDJN9SSvt67ogsXiKThTEb2WsmHX6TceOI cp5jJ3kVU+U3OpwG4cSgW5p9q0f2XZTfJQ== X-Google-Smtp-Source: AFSGD/VYpaUOUhSpVGo1yQa3w0JnLd+QHeJLVXfULOrJAgRK8TPdDKuVU5897ArkxcNopqpA45ixfA== X-Received: by 2002:a24:1aca:: with SMTP id 193mr2833394iti.150.1545420181164; Fri, 21 Dec 2018 11:23:01 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.22.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:23:00 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 13/22] aio: batch aio_kiocb allocation Date: Fri, 21 Dec 2018 12:22:27 -0700 Message-Id: <20181221192236.12866-14-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Similarly to how we use the state->ios_left to know how many references to get to a file, we can use it to allocate the aio_kiocb's we need in bulk. Signed-off-by: Jens Axboe --- fs/aio.c | 42 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 36 insertions(+), 6 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 093e6c8e9e09..513ecd3fa681 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -250,6 +250,13 @@ struct aio_submit_state { struct list_head req_list; unsigned int req_count; + /* + * aio_kiocb alloc cache + */ + void *iocbs[AIO_IOPOLL_BATCH]; + unsigned int free_iocbs; + unsigned int cur_iocb; + /* * File reference cache */ @@ -1105,15 +1112,34 @@ static void aio_iocb_init(struct kioctx *ctx, struct aio_kiocb *req) * Allocate a slot for an aio request. * Returns NULL if no requests are free. */ -static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx) +static struct aio_kiocb *aio_get_req(struct kioctx *ctx, + struct aio_submit_state *state) { struct aio_kiocb *req; - req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL); - if (unlikely(!req)) - return NULL; + if (!state) + req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL); + else if (!state->free_iocbs) { + size_t size; + int ret; + + size = min_t(size_t, state->ios_left, ARRAY_SIZE(state->iocbs)); + ret = kmem_cache_alloc_bulk(kiocb_cachep, GFP_KERNEL, size, + state->iocbs); + if (ret <= 0) + return ERR_PTR(-ENOMEM); + state->free_iocbs = ret - 1; + state->cur_iocb = 1; + req = state->iocbs[0]; + } else { + req = state->iocbs[state->cur_iocb]; + state->free_iocbs--; + state->cur_iocb++; + } + + if (req) + aio_iocb_init(ctx, req); - aio_iocb_init(ctx, req); return req; } @@ -2293,7 +2319,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, return -EAGAIN; ret = -EAGAIN; - req = aio_get_req(ctx); + req = aio_get_req(ctx, state); if (unlikely(!req)) goto out_put_reqs_available; @@ -2414,6 +2440,9 @@ static void aio_submit_state_end(struct aio_submit_state *state) if (!list_empty(&state->req_list)) aio_flush_state_reqs(state->ctx, state); aio_file_put(state, NULL); + if (state->free_iocbs) + kmem_cache_free_bulk(kiocb_cachep, state->free_iocbs, + &state->iocbs[state->cur_iocb]); } /* @@ -2425,6 +2454,7 @@ static void aio_submit_state_start(struct aio_submit_state *state, state->ctx = ctx; INIT_LIST_HEAD(&state->req_list); state->req_count = 0; + state->free_iocbs = 0; state->file = NULL; state->ios_left = max_ios; #ifdef CONFIG_BLOCK From patchwork Fri Dec 21 19:22:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740971 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6CD8913A4 for ; Fri, 21 Dec 2018 19:23:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5F37228871 for ; Fri, 21 Dec 2018 19:23:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 532F928876; Fri, 21 Dec 2018 19:23:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 175B828870 for ; Fri, 21 Dec 2018 19:23:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389624AbeLUTX2 (ORCPT ); Fri, 21 Dec 2018 14:23:28 -0500 Received: from mail-it1-f196.google.com ([209.85.166.196]:52652 "EHLO mail-it1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391664AbeLUTXD (ORCPT ); Fri, 21 Dec 2018 14:23:03 -0500 Received: by mail-it1-f196.google.com with SMTP id g76so8445763itg.2 for ; Fri, 21 Dec 2018 11:23:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=YmQh69OWzMNazR4KrWCsmkjt1yT30wP5g9jvy5MLpIs=; b=cp/Jii7SBmo3EhtD1ncY3a8m96TLgDO09oejwE8faq6w8Q+uDuP/IfJv12OfQq11vd MtyrsItRDAOumwjpIbSTKTPtujz9Hmugyc4QEs/KzVw2QQwmDAsdUn1J2MKSbnECfX1R HyvzT8dMD3g5mXFgokS1+JpU6zyQ2bvKRvLWoZMn5VzuE38NIM37YtJOiqJPECcebSH9 TiJA36NVkxCA89jryWjbeT31n0e8KXQmag29XRL4XAwjdW/xYJ18KvF19APrcTr94it+ ce3ZKxKiXp47TIFUubTWtTv/TRK9RAgXjXba0BF73KlJQRPD0N84SGR/CVnO7I/6usUu 8LJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=YmQh69OWzMNazR4KrWCsmkjt1yT30wP5g9jvy5MLpIs=; b=SfKV+iL1iXG8oWAEG/v/1U+8eVrueIlCF9SPkpv65KkKrVJkrlZbEisWb2iIPqoyGT z/kSpbtUg0ZzWVLqgGJ/z3iItM6Md1kShLnBXyVEHsXD5Pk/RpfKRmj+giDgNR+5e/0d NMFu1/JkRKOnBL2cZMAv416+eHpziLwKAoqOUpn6ndbo/2SKNU+4UFKVoJDqjVfwD7HH 436HV+j2LKuVpyJeOZ90lJdxKowgAz4Se9jqLgwFKgOguhMFG1kXk1CsL6Y0irsqzelO UnnhAgUd6XKhThLQzOfpRJfIHJnJF/8rp/57VdAKY+oFwM1YGav3f9lHVOwh2roMGcmz D9GA== X-Gm-Message-State: AA+aEWa2BIjfliMOAKP38P7xkzY2V2wC8Fcc0Sye7gTphW0iTUFMEhcc nyk6ao/5cMt6EsdqsSBeMg36SIDhuxn9NA== X-Google-Smtp-Source: AFSGD/WQugzEmJWuku7bCPQqogQu+14Ams5MXrL+s1Fm4fTd8//hkzO4luC2E3rw0iwymYjOSMHwwQ== X-Received: by 2002:a24:838c:: with SMTP id d134mr2700519ite.3.1545420182796; Fri, 21 Dec 2018 11:23:02 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.23.01 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:23:01 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 14/22] aio: split old ring complete out from aio_complete() Date: Fri, 21 Dec 2018 12:22:28 -0700 Message-Id: <20181221192236.12866-15-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Jens Axboe --- fs/aio.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 513ecd3fa681..d33417bee594 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1198,12 +1198,9 @@ static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb, ev->res2 = res2; } -/* aio_complete - * Called when the io request on the given iocb is complete. - */ -static void aio_complete(struct aio_kiocb *iocb, long res, long res2) +static void aio_ring_complete(struct kioctx *ctx, struct aio_kiocb *iocb, + long res, long res2) { - struct kioctx *ctx = iocb->ki_ctx; struct aio_ring *ring; struct io_event *ev_page, *event; unsigned tail, pos, head; @@ -1253,6 +1250,16 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2) spin_unlock_irqrestore(&ctx->completion_lock, flags); pr_debug("added to ring %p at [%u]\n", iocb, tail); +} + +/* aio_complete + * Called when the io request on the given iocb is complete. + */ +static void aio_complete(struct aio_kiocb *iocb, long res, long res2) +{ + struct kioctx *ctx = iocb->ki_ctx; + + aio_ring_complete(ctx, iocb, res, res2); /* * Check if the user asked us to deliver the result through an From patchwork Fri Dec 21 19:22:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740937 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8230013A4 for ; Fri, 21 Dec 2018 19:23:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 757212884A for ; Fri, 21 Dec 2018 19:23:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 695F928870; Fri, 21 Dec 2018 19:23:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1F97F2884A for ; Fri, 21 Dec 2018 19:23:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391687AbeLUTXI (ORCPT ); Fri, 21 Dec 2018 14:23:08 -0500 Received: from mail-it1-f195.google.com ([209.85.166.195]:39540 "EHLO mail-it1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391682AbeLUTXF (ORCPT ); Fri, 21 Dec 2018 14:23:05 -0500 Received: by mail-it1-f195.google.com with SMTP id a6so8024330itl.4 for ; Fri, 21 Dec 2018 11:23:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=GSREJtBuAvhbTUjN/u5HzyiIS3quP1BHw3BKuwPJKYo=; b=mtD/qTCORbaJYIkv8fNODte+zOgsfMoylz4xbuC7ty4hBKQLHkOAFoXu69NMn0YR/1 eg8WN4iMOF9c6PKYSRXBQCKeFe8bix7YfniBx3nabhB0VgZg82EUYqmCfzDhWaMQ4sA8 FJcV8cBFfGQeD4zGIqR+Qo5/R1zJn7FoHl962nPuxZwuq7vKTlDYQSkS3LhDp/Aeabxx 2iyRtV2XF7QjifMHxWStF1Sqm0K17QN4LvdgSPDBXzRo5+PaDJy6zY5VF77yDt5Z7lmf u1solArB3XPkHt15Pn5LGPSomVEE27Iict3NEQiOFGwtvhK2EZcugDIcIZ/MOHyqMW7U QrUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=GSREJtBuAvhbTUjN/u5HzyiIS3quP1BHw3BKuwPJKYo=; b=GIgYNxehOxmbIwSCEkEER/RuDLe65NYUegevZQ2V+a1EEfhF/xpDPFw7lPGDr09kHR NnnYlGaxzM1JUKu2/SOPdpCjhaETWM2u642f7/63T+inKpYVLpQjj2f7dYcD74X609Sy JZE258wLGJzhKil/j9+zA+TTDQKae287zYS0ZaNHtsoMO+f7Qi4i2MnFptnQgyt9LDwf 0SwbiYAr2DhICBvfJECEBQyBRF3Xsel0sUzKC7XrF5XyzKbSVTO+BCa1KGPKRzLIVKOt RfHj6edJ0BGo4slIEgh6f2KQvc3N5M8pCbFYhqvmoZCQ69e23GxNF8r186L+pXr1PmEd iStw== X-Gm-Message-State: AJcUukd521av0RehjO5zt+CV9hrCjDysXcW/d/TB8QG/ShBsGM8cx+yI WmzqrwEo4u7PUPVLIfQFn9C6EG66xT1Hmg== X-Google-Smtp-Source: ALg8bN4+icP+kWC2EohTn7mr62b0McPodIWbdv3yGDNNNgdN21zAlZxMdbm66TTEI20aBEy20hdcPg== X-Received: by 2002:a24:9b89:: with SMTP id o131mr944921itd.41.1545420184311; Fri, 21 Dec 2018 11:23:04 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.23.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:23:03 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 15/22] aio: pass in user index to __io_submit_one() Date: Fri, 21 Dec 2018 12:22:29 -0700 Message-Id: <20181221192236.12866-16-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is used for the user iocb pointer right now, but in preparation for having iocbs not reside in userspace, unionize with a ki_index and pass that in instead. Signed-off-by: Jens Axboe --- fs/aio.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index d33417bee594..9e9b49fe9a8b 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -209,7 +209,11 @@ struct aio_kiocb { struct kioctx *ki_ctx; kiocb_cancel_fn *ki_cancel; - struct iocb __user *ki_user_iocb; /* user's aiocb */ + union { + struct iocb __user *ki_user_iocb; /* user's aiocb */ + unsigned long ki_index; + }; + __u64 ki_user_data; /* user's data for completion */ struct list_head ki_list; /* the aio core uses this @@ -1192,7 +1196,7 @@ static void iocb_put_many(struct kioctx *ctx, void **iocbs, int *nr) static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb, long res, long res2) { - ev->obj = (u64)(unsigned long)iocb->ki_user_iocb; + ev->obj = iocb->ki_index; ev->data = iocb->ki_user_data; ev->res = res; ev->res2 = res2; @@ -2299,7 +2303,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) } static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, - struct iocb __user *user_iocb, + unsigned long ki_index, struct aio_submit_state *state, bool compat) { struct aio_kiocb *req; @@ -2346,14 +2350,17 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, } if (aio_ctx_supports_cancel(ctx)) { + struct iocb __user *user_iocb = (struct iocb __user *) ki_index; + ret = put_user(KIOCB_KEY, &user_iocb->aio_key); if (unlikely(ret)) { pr_debug("EFAULT: aio_key\n"); goto out_put_req; } - } + req->ki_user_iocb = user_iocb; + } else + req->ki_index = ki_index; - req->ki_user_iocb = user_iocb; req->ki_user_data = iocb->aio_data; ret = -EINVAL; @@ -2419,12 +2426,13 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, struct aio_submit_state *state, bool compat) { + unsigned long ki_index = (unsigned long) user_iocb; struct iocb iocb; if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb)))) return -EFAULT; - return __io_submit_one(ctx, &iocb, user_iocb, state, compat); + return __io_submit_one(ctx, &iocb, ki_index, state, compat); } #ifdef CONFIG_BLOCK From patchwork Fri Dec 21 19:22:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740955 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BEE3213A4 for ; Fri, 21 Dec 2018 19:23:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ADD3628870 for ; Fri, 21 Dec 2018 19:23:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A171828879; Fri, 21 Dec 2018 19:23:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BCD722887B for ; Fri, 21 Dec 2018 19:23:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391690AbeLUTXY (ORCPT ); Fri, 21 Dec 2018 14:23:24 -0500 Received: from mail-it1-f193.google.com ([209.85.166.193]:36821 "EHLO mail-it1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391676AbeLUTXI (ORCPT ); Fri, 21 Dec 2018 14:23:08 -0500 Received: by mail-it1-f193.google.com with SMTP id c9so8041864itj.1 for ; Fri, 21 Dec 2018 11:23:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Ouhb1rRBGHWTXdsIfBfWqbEfs7qbZ/CC1wQlv2rjg6o=; b=b82gLt00UeIiM2byq7gDzh/9ANfNcoH/1+iL+rkifp+pO4cm+MxhC/MyktcP5Am2F0 dz8q/whfIgkZK1p4bOEzH6D3aDd3Xc7qzudOvIjUG3AYydxmXuajNoYELIRuNAXDvBes 8nAOR6iarjH1FLkO3+ta6GkP1rOlRqbX8g2i+cbUiw+eVZBewugy3EfeH93s+0S/hzPs ifCxS0WHaz4UHhvGmQ89/KZPKwTFFXISh8LJsEyVd4PYUQ+Cp3fZEojgV6dgaqVRMk8v x66QIFM0dpjvv0v/4IPPgt8rqyQmnkbL3eU++F7e28ZHQKiBMqq57F4Gwftdyi2aKqGt r5nA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Ouhb1rRBGHWTXdsIfBfWqbEfs7qbZ/CC1wQlv2rjg6o=; b=h3dJegjDIz7p3pC8x7Uq0lIXpoplcQoGZ7uuKmPDxPMdvSV5xkmfAKX6E/T/4n/45w UjKnQdz9eFDG7r6GTH3maJQNOCofOdha8Vztv24HP8r2Jx+IlbIDVTL7W1Xlz/IUBCTv G2C3kjB35na/7bc/UR5iy0KfQRs9g2CRKpnVEKPwqCf2UMDTAFdBPraFqedqk8RheDB1 ds84jBgOcL+IjTEV9bui7HuJXzu1cE56pzOqylZrXpU7+adayWn2UKjYNODJGPaKgg7o dOiR5bW0K3De/uXahX72jwJyMucepI/LrHJCDMXxgC5zutfwJfkWvM/5VHvTNGXiKFEL UPdw== X-Gm-Message-State: AA+aEWYiKBJ5rdpwhYbaTSWYZZbnc7wbuPuNfWcUsDbueFbegqWWbaqu bEp6H6+M4ULlpe5Nbl6eG81eUeCgtmoJcQ== X-Google-Smtp-Source: AFSGD/WFFmosFKOUe106NIMMcYIESVNcaqREC3yNUvBd5dL36BEZqWT68UXZZYsxhcas5qluvAZNNA== X-Received: by 2002:a02:6f4d:: with SMTP id b13mr2310914jae.57.1545420185898; Fri, 21 Dec 2018 11:23:05 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.23.04 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:23:05 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 16/22] aio: add support for submission/completion rings Date: Fri, 21 Dec 2018 12:22:30 -0700 Message-Id: <20181221192236.12866-17-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The submission queue (SQ) and completion queue (CQ) rings are shared between the application and the kernel. This eliminates the need to copy data back and forth to submit and complete IO. We use the same structures as the old aio interface. The SQ rings are indexes into a struct iocb array, like we would submit through io_submit(), and the CQ rings are struct io_event, like we would pass in (and copy back) from io_getevents(). A new system call is added for this, io_ring_enter(). This system call submits IO that is stored in the SQ ring, and/or completes IO and stores the results in the CQ ring. Hence it's possible to both complete and submit IO in a single system call. For IRQ driven IO, an application only needs to enter the kernel for completions if it wants to wait for them to occur. Sample application: http://git.kernel.dk/cgit/fio/plain/t/aio-ring.c Signed-off-by: Jens Axboe --- arch/x86/entry/syscalls/syscall_64.tbl | 1 + fs/aio.c | 485 +++++++++++++++++++++++-- include/linux/syscalls.h | 4 +- include/uapi/linux/aio_abi.h | 29 ++ kernel/sys_ni.c | 1 + 5 files changed, 494 insertions(+), 26 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 67c357225fb0..55a26700a637 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -344,6 +344,7 @@ 333 common io_pgetevents __x64_sys_io_pgetevents 334 common rseq __x64_sys_rseq 335 common io_setup2 __x64_sys_io_setup2 +336 common io_ring_enter __x64_sys_io_ring_enter # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/fs/aio.c b/fs/aio.c index 9e9b49fe9a8b..a49109e69334 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -95,6 +95,18 @@ struct ctx_rq_wait { atomic_t count; }; +struct aio_mapped_range { + struct page **pages; + long nr_pages; +}; + +struct aio_iocb_ring { + struct aio_mapped_range ring_range; /* maps user SQ ring */ + struct aio_sq_ring *ring; + + struct aio_mapped_range iocb_range; /* maps user iocbs */ +}; + struct kioctx { struct percpu_ref users; atomic_t dead; @@ -130,6 +142,11 @@ struct kioctx { struct page **ring_pages; long nr_pages; + /* if used, completion and submission rings */ + struct aio_iocb_ring sq_ring; + struct aio_mapped_range cq_ring; + int cq_ring_overflow; + struct rcu_work free_rwork; /* see free_ioctx() */ /* @@ -285,6 +302,14 @@ static struct vfsmount *aio_mnt; static const struct file_operations aio_ring_fops; static const struct address_space_operations aio_ctx_aops; +static const unsigned int array_page_shift = + ilog2(PAGE_SIZE / sizeof(u32)); +static const unsigned int iocb_page_shift = + ilog2(PAGE_SIZE / sizeof(struct iocb)); +static const unsigned int event_page_shift = + ilog2(PAGE_SIZE / sizeof(struct io_event)); + +static void aio_scqring_unmap(struct kioctx *); static void aio_iopoll_reap_events(struct kioctx *); static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages) @@ -515,6 +540,12 @@ static const struct address_space_operations aio_ctx_aops = { #endif }; +/* Polled IO or SQ/CQ rings don't use the old ring */ +static bool aio_ctx_old_ring(struct kioctx *ctx) +{ + return !(ctx->flags & (IOCTX_FLAG_IOPOLL | IOCTX_FLAG_SCQRING)); +} + static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) { struct aio_ring *ring; @@ -529,7 +560,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) * IO polling doesn't require any io event entries */ size = sizeof(struct aio_ring); - if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) { + if (aio_ctx_old_ring(ctx)) { nr_events += 2; /* 1 is required, 2 for good luck */ size += sizeof(struct io_event) * nr_events; } @@ -621,7 +652,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) */ static bool aio_ctx_supports_cancel(struct kioctx *ctx) { - return (ctx->flags & IOCTX_FLAG_IOPOLL) == 0; + return (ctx->flags & (IOCTX_FLAG_IOPOLL | IOCTX_FLAG_SCQRING)) == 0; } #define AIO_EVENTS_PER_PAGE (PAGE_SIZE / sizeof(struct io_event)) @@ -657,6 +688,7 @@ static void free_ioctx(struct work_struct *work) free_rwork); pr_debug("freeing %p\n", ctx); + aio_scqring_unmap(ctx); aio_free_ring(ctx); free_percpu(ctx->cpu); percpu_ref_exit(&ctx->reqs); @@ -1202,6 +1234,39 @@ static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb, ev->res2 = res2; } +static void aio_commit_cqring(struct kioctx *ctx, unsigned next_tail) +{ + struct aio_cq_ring *ring = page_address(ctx->cq_ring.pages[0]); + + if (next_tail != ring->tail) { + ring->tail = next_tail; + smp_wmb(); + } +} + +static struct io_event *aio_peek_cqring(struct kioctx *ctx, unsigned *ntail) +{ + struct aio_cq_ring *ring; + struct io_event *ev; + unsigned tail; + + ring = page_address(ctx->cq_ring.pages[0]); + + smp_rmb(); + tail = READ_ONCE(ring->tail); + *ntail = tail + 1; + if (*ntail == ring->nr_events) + *ntail = 0; + if (*ntail == READ_ONCE(ring->head)) + return NULL; + + /* io_event array starts offset one into the mapped range */ + tail++; + ev = page_address(ctx->cq_ring.pages[tail >> event_page_shift]); + tail &= ((1 << event_page_shift) - 1); + return ev + tail; +} + static void aio_ring_complete(struct kioctx *ctx, struct aio_kiocb *iocb, long res, long res2) { @@ -1263,7 +1328,36 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2) { struct kioctx *ctx = iocb->ki_ctx; - aio_ring_complete(ctx, iocb, res, res2); + if (ctx->flags & IOCTX_FLAG_SCQRING) { + unsigned long flags; + struct io_event *ev; + unsigned int tail; + + /* + * If we can't get a cq entry, userspace overflowed the + * submission (by quite a lot). Flag it as an overflow + * condition, and next io_ring_enter(2) call will return + * -EOVERFLOW. + */ + spin_lock_irqsave(&ctx->completion_lock, flags); + ev = aio_peek_cqring(ctx, &tail); + if (ev) { + aio_fill_event(ev, iocb, res, res2); + aio_commit_cqring(ctx, tail); + } else + ctx->cq_ring_overflow = 1; + spin_unlock_irqrestore(&ctx->completion_lock, flags); + } else { + aio_ring_complete(ctx, iocb, res, res2); + + /* + * We have to order our ring_info tail store above and test + * of the wait list below outside the wait lock. This is + * like in wake_up_bit() where clearing a bit has to be + * ordered with the unlocked test. + */ + smp_mb(); + } /* * Check if the user asked us to deliver the result through an @@ -1275,14 +1369,6 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2) eventfd_ctx_put(iocb->ki_eventfd); } - /* - * We have to order our ring_info tail store above and test - * of the wait list below outside the wait lock. This is - * like in wake_up_bit() where clearing a bit has to be - * ordered with the unlocked test. - */ - smp_mb(); - if (waitqueue_active(&ctx->wait)) wake_up(&ctx->wait); iocb_put(iocb); @@ -1405,6 +1491,9 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, return 0; list_for_each_entry_safe(iocb, n, &ctx->poll_completing, ki_list) { + struct io_event *ev = NULL; + unsigned int next_tail; + if (*nr_events == max) break; if (!test_bit(KIOCB_F_POLL_COMPLETED, &iocb->ki_flags)) @@ -1412,6 +1501,14 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, if (to_free == AIO_IOPOLL_BATCH) iocb_put_many(ctx, iocbs, &to_free); + /* Will only happen if the application over-commits */ + ret = -EAGAIN; + if (ctx->flags & IOCTX_FLAG_SCQRING) { + ev = aio_peek_cqring(ctx, &next_tail); + if (!ev) + break; + } + list_del(&iocb->ki_list); iocbs[to_free++] = iocb; @@ -1430,8 +1527,11 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, file_count = 1; } - if (evs && copy_to_user(evs + *nr_events, &iocb->ki_ev, - sizeof(iocb->ki_ev))) { + if (ev) { + memcpy(ev, &iocb->ki_ev, sizeof(*ev)); + aio_commit_cqring(ctx, next_tail); + } else if (evs && copy_to_user(evs + *nr_events, &iocb->ki_ev, + sizeof(iocb->ki_ev))) { ret = -EFAULT; break; } @@ -1612,24 +1712,139 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr, return ret; } +static void aio_unmap_range(struct aio_mapped_range *range) +{ + int i; + + if (!range->nr_pages) + return; + + for (i = 0; i < range->nr_pages; i++) + put_page(range->pages[i]); + + kfree(range->pages); + range->pages = NULL; + range->nr_pages = 0; +} + +static int aio_map_range(struct aio_mapped_range *range, void __user *uaddr, + size_t size, int gup_flags) +{ + int nr_pages, ret; + + if ((unsigned long) uaddr & ~PAGE_MASK) + return -EINVAL; + + nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; + + range->pages = kzalloc(nr_pages * sizeof(struct page *), GFP_KERNEL); + if (!range->pages) + return -ENOMEM; + + down_write(¤t->mm->mmap_sem); + ret = get_user_pages((unsigned long) uaddr, nr_pages, gup_flags, + range->pages, NULL); + up_write(¤t->mm->mmap_sem); + + if (ret < nr_pages) { + kfree(range->pages); + return -ENOMEM; + } + + range->nr_pages = nr_pages; + return 0; +} + +static void aio_scqring_unmap(struct kioctx *ctx) +{ + aio_unmap_range(&ctx->sq_ring.ring_range); + aio_unmap_range(&ctx->sq_ring.iocb_range); + aio_unmap_range(&ctx->cq_ring); +} + +static int aio_scqring_map(struct kioctx *ctx, + struct aio_sq_ring __user *sq_ring, + struct aio_cq_ring __user *cq_ring) +{ + int ret, sq_ring_size, cq_ring_size; + struct aio_cq_ring *kcq_ring; + void __user *uptr; + size_t size; + + /* Two is the minimum size we can support. */ + if (ctx->max_reqs < 2) + return -EINVAL; + + /* + * The CQ ring size is QD + 1, so we don't have to track full condition + * for head == tail. The SQ ring we make twice that in size, to make + * room for having more inflight than the QD. + */ + sq_ring_size = ctx->max_reqs; + cq_ring_size = 2 * ctx->max_reqs; + + /* Map SQ ring and iocbs */ + size = sizeof(struct aio_sq_ring) + sq_ring_size * sizeof(u32); + ret = aio_map_range(&ctx->sq_ring.ring_range, sq_ring, size, FOLL_WRITE); + if (ret) + return ret; + + ctx->sq_ring.ring = page_address(ctx->sq_ring.ring_range.pages[0]); + if (ctx->sq_ring.ring->nr_events < sq_ring_size) { + ret = -EFAULT; + goto err; + } + ctx->sq_ring.ring->nr_events = sq_ring_size; + ctx->sq_ring.ring->head = ctx->sq_ring.ring->tail = 0; + + size = sizeof(struct iocb) * sq_ring_size; + uptr = (void __user *) (unsigned long) ctx->sq_ring.ring->iocbs; + ret = aio_map_range(&ctx->sq_ring.iocb_range, uptr, size, 0); + if (ret) + goto err; + + /* Map CQ ring and io_events */ + size = sizeof(struct aio_cq_ring) + + cq_ring_size * sizeof(struct io_event); + ret = aio_map_range(&ctx->cq_ring, cq_ring, size, FOLL_WRITE); + if (ret) + goto err; + + kcq_ring = page_address(ctx->cq_ring.pages[0]); + if (kcq_ring->nr_events < cq_ring_size) { + ret = -EFAULT; + goto err; + } + kcq_ring->nr_events = cq_ring_size; + kcq_ring->head = kcq_ring->tail = 0; + +err: + if (ret) { + aio_unmap_range(&ctx->sq_ring.ring_range); + aio_unmap_range(&ctx->sq_ring.iocb_range); + aio_unmap_range(&ctx->cq_ring); + } + return ret; +} + /* sys_io_setup2: * Like sys_io_setup(), except that it takes a set of flags * (IOCTX_FLAG_*), and some pointers to user structures: * - * *user1 - reserved for future use + * *sq_ring - pointer to the userspace SQ ring, if used. * - * *user2 - reserved for future use. + * *cq_ring - pointer to the userspace CQ ring, if used. */ -SYSCALL_DEFINE5(io_setup2, u32, nr_events, u32, flags, void __user *, user1, - void __user *, user2, aio_context_t __user *, ctxp) +SYSCALL_DEFINE5(io_setup2, u32, nr_events, u32, flags, + struct aio_sq_ring __user *, sq_ring, + struct aio_cq_ring __user *, cq_ring, + aio_context_t __user *, ctxp) { struct kioctx *ioctx; unsigned long ctx; long ret; - if (user1 || user2) - return -EINVAL; - if (flags & ~IOCTX_FLAG_IOPOLL) + if (flags & ~(IOCTX_FLAG_IOPOLL | IOCTX_FLAG_SCQRING)) return -EINVAL; ret = get_user(ctx, ctxp); @@ -1641,9 +1856,17 @@ SYSCALL_DEFINE5(io_setup2, u32, nr_events, u32, flags, void __user *, user1, if (IS_ERR(ioctx)) goto out; + if (flags & IOCTX_FLAG_SCQRING) { + ret = aio_scqring_map(ioctx, sq_ring, cq_ring); + if (ret) + goto err; + } + ret = put_user(ioctx->user_id, ctxp); - if (ret) + if (ret) { +err: kill_ioctx(current->mm, ioctx, NULL); + } percpu_ref_put(&ioctx->users); out: return ret; @@ -2325,8 +2548,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, return -EINVAL; } - /* Poll IO doesn't need ring reservations */ - if (!(ctx->flags & IOCTX_FLAG_IOPOLL) && !get_reqs_available(ctx)) + if (aio_ctx_old_ring(ctx) && !get_reqs_available(ctx)) return -EAGAIN; ret = -EAGAIN; @@ -2418,7 +2640,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, eventfd_ctx_put(req->ki_eventfd); iocb_put(req); out_put_reqs_available: - if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) + if (aio_ctx_old_ring(ctx)) put_reqs_available(ctx, 1); return ret; } @@ -2479,6 +2701,212 @@ static void aio_submit_state_start(struct aio_submit_state *state, #endif } +static const struct iocb *aio_iocb_from_index(struct kioctx *ctx, unsigned idx) +{ + struct aio_mapped_range *range = &ctx->sq_ring.iocb_range; + const struct iocb *iocb; + + iocb = page_address(range->pages[idx >> iocb_page_shift]); + idx &= ((1 << iocb_page_shift) - 1); + return iocb + idx; +} + +static void aio_commit_sqring(struct kioctx *ctx, unsigned next_head) +{ + struct aio_sq_ring *ring = ctx->sq_ring.ring; + + if (ring->head != next_head) { + ring->head = next_head; + smp_wmb(); + } +} + +static const struct iocb *aio_peek_sqring(struct kioctx *ctx, + unsigned *iocb_index, unsigned *nhead) +{ + struct aio_mapped_range *range = &ctx->sq_ring.ring_range; + struct aio_sq_ring *ring = ctx->sq_ring.ring; + unsigned head; + u32 *array; + + smp_rmb(); + head = READ_ONCE(ring->head); + if (head == READ_ONCE(ring->tail)) + return NULL; + + *nhead = head + 1; + if (*nhead == ring->nr_events) + *nhead = 0; + + /* + * No guarantee the array is in the first page, so we can't just + * index ring->array. Find the map and offset from the head. + */ + head += offsetof(struct aio_sq_ring, array) >> 2; + array = page_address(range->pages[head >> array_page_shift]); + head &= ((1 << array_page_shift) - 1); + *iocb_index = array[head]; + + if (*iocb_index < ring->nr_events) + return aio_iocb_from_index(ctx, *iocb_index); + + /* drop invalid entries */ + aio_commit_sqring(ctx, *nhead); + return NULL; +} + +static int aio_ring_submit(struct kioctx *ctx, unsigned int to_submit) +{ + struct aio_submit_state state, *statep = NULL; + int i, ret = 0, submit = 0; + + if (to_submit > AIO_PLUG_THRESHOLD) { + aio_submit_state_start(&state, ctx, to_submit); + statep = &state; + } + + for (i = 0; i < to_submit; i++) { + unsigned next_head, iocb_index; + const struct iocb *iocb; + + iocb = aio_peek_sqring(ctx, &iocb_index, &next_head); + if (!iocb) + break; + + ret = __io_submit_one(ctx, iocb, iocb_index, statep, false); + if (ret) + break; + + submit++; + aio_commit_sqring(ctx, next_head); + } + + if (statep) + aio_submit_state_end(statep); + + return submit ? submit : ret; +} + +/* + * Wait until events become available, if we don't already have some. The + * application must reap them itself, as they reside on the shared cq ring. + */ +static int aio_cqring_wait(struct kioctx *ctx, int min_events) +{ + struct aio_cq_ring *ring = page_address(ctx->cq_ring.pages[0]); + DEFINE_WAIT(wait); + int ret = 0; + + smp_rmb(); + if (ring->head != ring->tail) + return 0; + if (!min_events) + return 0; + + do { + prepare_to_wait(&ctx->wait, &wait, TASK_INTERRUPTIBLE); + + ret = 0; + smp_rmb(); + if (ring->head != ring->tail) + break; + + schedule(); + + ret = -EINVAL; + if (atomic_read(&ctx->dead)) + break; + ret = -EINTR; + if (signal_pending(current)) + break; + } while (1); + + finish_wait(&ctx->wait, &wait); + return ret; +} + +static int __io_ring_enter(struct kioctx *ctx, unsigned int to_submit, + unsigned int min_complete, unsigned int flags) +{ + int ret = 0; + + if (flags & IORING_FLAG_SUBMIT) { + ret = aio_ring_submit(ctx, to_submit); + if (ret < 0) + return ret; + } + if (flags & IORING_FLAG_GETEVENTS) { + unsigned int nr_events = 0; + int get_ret; + + if (!ret && to_submit) + min_complete = 0; + + if (ctx->flags & IOCTX_FLAG_IOPOLL) + get_ret = __aio_iopoll_check(ctx, NULL, &nr_events, + min_complete, -1U); + else + get_ret = aio_cqring_wait(ctx, min_complete); + + if (get_ret < 0 && !ret) + ret = get_ret; + } + + return ret; +} + +/* sys_io_ring_enter: + * Alternative way to both submit and complete IO, instead of using + * io_submit(2) and io_getevents(2). Requires the use of the SQ/CQ + * ring interface, hence the io_context must be setup with + * io_setup2() and IOCTX_FLAG_SCQRING must be specified (and the + * sq_ring/cq_ring passed in). + * + * Returns the number of IOs submitted, if IORING_FLAG_SUBMIT + * is used, otherwise returns 0 for IORING_FLAG_GETEVENTS success, + * but not the number of events, as those will have to be found + * by the application by reading the CQ ring anyway. + * + * Apart from that, the error returns are much like io_submit() + * and io_getevents(), since a lot of the same error conditions + * are shared. + */ +SYSCALL_DEFINE4(io_ring_enter, aio_context_t, ctx_id, u32, to_submit, + u32, min_complete, u32, flags) +{ + struct kioctx *ctx; + long ret; + + ctx = lookup_ioctx(ctx_id); + if (!ctx) { + pr_debug("EINVAL: invalid context id\n"); + return -EINVAL; + } + + ret = -EBUSY; + if (!mutex_trylock(&ctx->getevents_lock)) + goto err; + + ret = -EOVERFLOW; + if (ctx->cq_ring_overflow) { + ctx->cq_ring_overflow = 0; + goto err_unlock; + } + + ret = -EINVAL; + if (unlikely(atomic_read(&ctx->dead))) + goto err_unlock; + + if (ctx->flags & IOCTX_FLAG_SCQRING) + ret = __io_ring_enter(ctx, to_submit, min_complete, flags); + +err_unlock: + mutex_unlock(&ctx->getevents_lock); +err: + percpu_ref_put(&ctx->users); + return ret; +} + /* sys_io_submit: * Queue the nr iocbs pointed to by iocbpp for processing. Returns * the number of iocbs queued. May return -EINVAL if the aio_context @@ -2508,6 +2936,10 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, return -EINVAL; } + /* SCQRING must use io_ring_enter() */ + if (ctx->flags & IOCTX_FLAG_SCQRING) + return -EINVAL; + if (nr > ctx->nr_events) nr = ctx->nr_events; @@ -2659,7 +3091,10 @@ static long do_io_getevents(aio_context_t ctx_id, long ret = -EINVAL; if (likely(ioctx)) { - if (likely(min_nr <= nr && min_nr >= 0)) { + /* SCQRING must use io_ring_enter() */ + if (ioctx->flags & IOCTX_FLAG_SCQRING) + ret = -EINVAL; + else if (min_nr <= nr && min_nr >= 0) { if (ioctx->flags & IOCTX_FLAG_IOPOLL) ret = aio_iopoll_check(ioctx, min_nr, nr, events); else diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 67b7f03aa9fc..ebcc73d8a6ad 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -287,8 +287,10 @@ static inline void addr_limit_user_check(void) */ #ifndef CONFIG_ARCH_HAS_SYSCALL_WRAPPER asmlinkage long sys_io_setup(unsigned nr_reqs, aio_context_t __user *ctx); -asmlinkage long sys_io_setup2(unsigned, unsigned, void __user *, void __user *, +asmlinkage long sys_io_setup2(unsigned, unsigned, struct aio_sq_ring __user *, + struct aio_cq_ring __user *, aio_context_t __user *); +asmlinkage long sys_io_ring_enter(aio_context_t, unsigned, unsigned, unsigned); asmlinkage long sys_io_destroy(aio_context_t ctx); asmlinkage long sys_io_submit(aio_context_t, long, struct iocb __user * __user *); diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index a6829bae9ada..5d3ada40ce15 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -109,6 +109,35 @@ struct iocb { }; /* 64 bytes */ #define IOCTX_FLAG_IOPOLL (1 << 0) /* io_context is polled */ +#define IOCTX_FLAG_SCQRING (1 << 1) /* Use SQ/CQ rings */ + +struct aio_sq_ring { + union { + struct { + u32 head; /* kernel consumer head */ + u32 tail; /* app producer tail */ + u32 nr_events; /* max events in ring */ + u64 iocbs; /* setup pointer to app iocbs */ + }; + u32 pad[16]; + }; + u32 array[0]; /* actual ring, index to iocbs */ +}; + +struct aio_cq_ring { + union { + struct { + u32 head; /* app consumer head */ + u32 tail; /* kernel producer tail */ + u32 nr_events; /* max events in ring */ + }; + struct io_event pad; + }; + struct io_event events[0]; /* ring, array of io_events */ +}; + +#define IORING_FLAG_SUBMIT (1 << 0) +#define IORING_FLAG_GETEVENTS (1 << 1) #undef IFBIG #undef IFLITTLE diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 17c8b4393669..a32b7ea93838 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -38,6 +38,7 @@ asmlinkage long sys_ni_syscall(void) COND_SYSCALL(io_setup); COND_SYSCALL(io_setup2); +COND_SYSCALL(io_ring_enter); COND_SYSCALL_COMPAT(io_setup); COND_SYSCALL(io_destroy); COND_SYSCALL(io_submit); From patchwork Fri Dec 21 19:22:31 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740951 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E4AC113A4 for ; Fri, 21 Dec 2018 19:23:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D5DDA28871 for ; Fri, 21 Dec 2018 19:23:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C9FD72887F; Fri, 21 Dec 2018 19:23:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EF64028871 for ; Fri, 21 Dec 2018 19:23:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391695AbeLUTXO (ORCPT ); Fri, 21 Dec 2018 14:23:14 -0500 Received: from mail-it1-f193.google.com ([209.85.166.193]:52658 "EHLO mail-it1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391686AbeLUTXJ (ORCPT ); Fri, 21 Dec 2018 14:23:09 -0500 Received: by mail-it1-f193.google.com with SMTP id g76so8446032itg.2 for ; Fri, 21 Dec 2018 11:23:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=aNPll3PFLtBObI+WTAtjse6wMNnAQ/B5/ms0XCANw9s=; b=jL82wDamJYkDsvDJ62AMeUKwuJ0r9dFY/9z5Tezg/krE36W2pQ4EBaM7jtGkEg0+MD l+dk0Xx1u29MDG3KhJ630t0ymzBBRVt+VvEim626vLt5LpmvFoD6OMoQ0tteSGq0C6te 18MeuNAuahN0sR3G5lsp9IibAtYnkXeep8k0hvhkwlnteGVMBT7bwNLDWUhXoLvDs6VG o/z7HpwnBlUORZFl6XudkcUF174UmFqWr+zYVaPQp5+4LuuPOEMoAKnSJ0Qh9MwQ22xK dER++DkdtkitFYlffkhFKXDzLBCOtYK0rRkyu13fMZnuSA+4yIWhFuq+CfAdSapKmYy9 heqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=aNPll3PFLtBObI+WTAtjse6wMNnAQ/B5/ms0XCANw9s=; b=T1FxYLrpDQsSmoXZIuCyJ/tdNYCASKEoTrhZt5CyZSsgqjg2VMuigz3HDxN5Kj8Q8i Bth8pvxV6x6O6kvfHr/v1cqZtBjfESrOYRgSoRKlzhFG3TSIJXd5U5uuXaAlsKFHXBAw eMW/9x2f6T5WhJU8cy35MFcMYZ3U52MzRfxgE7Z6zNYC8Lz0DEehUkRs+shIX8gbpMd9 gbVb4/vGgXSELnqscMAKRgfHkoVp47OZCo8Qky5TKcQLjMcrWONd8iMXDsnZsIqonyHU Ok6I9hD2y7ytHqsOLxd5R34Ub0NADbWu3VFoe5YzA/LwRY+j9E2IyAGzaco7BZWp2aHm vZXQ== X-Gm-Message-State: AA+aEWZH6TZerUwwTzhL5Bb0uOs2QWRrCDIXlTOgfqJlBL0uJs6SqbRf RG1RUpLssSmIjwp9iLka7Hl3mI6heVRPDQ== X-Google-Smtp-Source: AFSGD/UopSsSuUEvV5b/N3vQmudVKNUb7Bq0QKDFfcyopiRTfHfXHFeQ1atK+Q5Mv51o+Qm6EoxEcg== X-Received: by 2002:a02:8ca8:: with SMTP id f37mr2541758jak.80.1545420187654; Fri, 21 Dec 2018 11:23:07 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.23.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:23:06 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 17/22] block: implement bio helper to add iter bvec pages to bio Date: Fri, 21 Dec 2018 12:22:31 -0700 Message-Id: <20181221192236.12866-18-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP For an ITER_BVEC, we can just iterate the iov and add the pages to the bio directly. This requires that the caller doesn't releases the pages on IO completion, we add a BIO_HOLD_PAGES flag for that. The current two callers of bio_iov_iter_get_pages() are updated to check if they need to release pages on completion. This makes them work with bvecs that contain kernel mapped pages already. Signed-off-by: Jens Axboe --- block/bio.c | 59 ++++++++++++++++++++++++++++++++------- fs/block_dev.c | 5 ++-- fs/iomap.c | 5 ++-- include/linux/blk_types.h | 1 + 4 files changed, 56 insertions(+), 14 deletions(-) diff --git a/block/bio.c b/block/bio.c index 8281bfcbc265..cc1ddf173aaf 100644 --- a/block/bio.c +++ b/block/bio.c @@ -828,6 +828,23 @@ int bio_add_page(struct bio *bio, struct page *page, } EXPORT_SYMBOL(bio_add_page); +static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter) +{ + const struct bio_vec *bv = iter->bvec; + unsigned int len; + size_t size; + + len = min_t(size_t, bv->bv_len, iter->count); + size = bio_add_page(bio, bv->bv_page, len, + bv->bv_offset + iter->iov_offset); + if (size == len) { + iov_iter_advance(iter, size); + return 0; + } + + return -EINVAL; +} + #define PAGE_PTRS_PER_BVEC (sizeof(struct bio_vec) / sizeof(struct page *)) /** @@ -876,23 +893,43 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) } /** - * bio_iov_iter_get_pages - pin user or kernel pages and add them to a bio + * bio_iov_iter_get_pages - add user or kernel pages to a bio * @bio: bio to add pages to - * @iter: iov iterator describing the region to be mapped + * @iter: iov iterator describing the region to be added + * + * This takes either an iterator pointing to user memory, or one pointing to + * kernel pages (BVEC iterator). If we're adding user pages, we pin them and + * map them into the kernel. On IO completion, the caller should put those + * pages. If we're adding kernel pages, we just have to add the pages to the + * bio directly. We don't grab an extra reference to those pages (the user + * should already have that), and we don't put the page on IO completion. + * The caller needs to check if the bio is flagged BIO_HOLD_PAGES on IO + * completion. If it isn't, then pages should be released. * - * Pins pages from *iter and appends them to @bio's bvec array. The - * pages will have to be released using put_page() when done. * The function tries, but does not guarantee, to pin as many pages as - * fit into the bio, or are requested in *iter, whatever is smaller. - * If MM encounters an error pinning the requested pages, it stops. - * Error is returned only if 0 pages could be pinned. + * fit into the bio, or are requested in *iter, whatever is smaller. If + * MM encounters an error pinning the requested pages, it stops. Error + * is returned only if 0 pages could be pinned. */ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) { + const bool is_bvec = iov_iter_is_bvec(iter); unsigned short orig_vcnt = bio->bi_vcnt; + /* + * If this is a BVEC iter, then the pages are kernel pages. Don't + * release them on IO completion. + */ + if (is_bvec) + bio_set_flag(bio, BIO_HOLD_PAGES); + do { - int ret = __bio_iov_iter_get_pages(bio, iter); + int ret; + + if (is_bvec) + ret = __bio_iov_bvec_add_pages(bio, iter); + else + ret = __bio_iov_iter_get_pages(bio, iter); if (unlikely(ret)) return bio->bi_vcnt > orig_vcnt ? 0 : ret; @@ -1634,7 +1671,8 @@ static void bio_dirty_fn(struct work_struct *work) next = bio->bi_private; bio_set_pages_dirty(bio); - bio_release_pages(bio); + if (!bio_flagged(bio, BIO_HOLD_PAGES)) + bio_release_pages(bio); bio_put(bio); } } @@ -1650,7 +1688,8 @@ void bio_check_pages_dirty(struct bio *bio) goto defer; } - bio_release_pages(bio); + if (!bio_flagged(bio, BIO_HOLD_PAGES)) + bio_release_pages(bio); bio_put(bio); return; defer: diff --git a/fs/block_dev.c b/fs/block_dev.c index 9d96c1e30854..1a3981793309 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -325,8 +325,9 @@ static void blkdev_bio_end_io(struct bio *bio) struct bio_vec *bvec; int i; - bio_for_each_segment_all(bvec, bio, i) - put_page(bvec->bv_page); + if (!bio_flagged(bio, BIO_HOLD_PAGES)) + bio_for_each_segment_all(bvec, bio, i) + put_page(bvec->bv_page); bio_put(bio); } } diff --git a/fs/iomap.c b/fs/iomap.c index 46f4cb687f6f..f5a7fc708004 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -1576,8 +1576,9 @@ static void iomap_dio_bio_end_io(struct bio *bio) struct bio_vec *bvec; int i; - bio_for_each_segment_all(bvec, bio, i) - put_page(bvec->bv_page); + if (!bio_flagged(bio, BIO_HOLD_PAGES)) + bio_for_each_segment_all(bvec, bio, i) + put_page(bvec->bv_page); bio_put(bio); } } diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index fc99474ac968..0a19de825f4f 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -215,6 +215,7 @@ struct bio { /* * bio flags */ +#define BIO_HOLD_PAGES 0 /* don't put O_DIRECT pages */ #define BIO_SEG_VALID 1 /* bi_phys_segments valid */ #define BIO_CLONED 2 /* doesn't own data */ #define BIO_BOUNCED 3 /* bio is a bounce bio */ From patchwork Fri Dec 21 19:22:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740963 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4833114DE for ; Fri, 21 Dec 2018 19:23:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 388D528870 for ; Fri, 21 Dec 2018 19:23:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2CA4F28877; Fri, 21 Dec 2018 19:23:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BEC0728870 for ; Fri, 21 Dec 2018 19:23:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391682AbeLUTXY (ORCPT ); Fri, 21 Dec 2018 14:23:24 -0500 Received: from mail-it1-f193.google.com ([209.85.166.193]:34636 "EHLO mail-it1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391689AbeLUTXL (ORCPT ); Fri, 21 Dec 2018 14:23:11 -0500 Received: by mail-it1-f193.google.com with SMTP id x124so16427809itd.1 for ; Fri, 21 Dec 2018 11:23:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=iCpQJPnY6XrF+XNAbmONE2yu31D4Ge1VoiIuSibfIME=; b=1fMN18W+RnX3A9n2EXha47HpZxQIvCZzErnp37ouUfm4c3fBcQc0XBABg65cxDeF4+ TVzw4uEJWMk+e9YGSzn8mgZBYQ5/YmgGR26SVxq/t7W4xrWWFJcWrrnOjWgCrQXOtuEr Q6CRiJpWJSeYkUIWIrdBgXINSmWXIa9x4sujU/YTx7cOvpbhLhGNC/UOjVg/Es2seBPQ h/Wi5VkNpDGdBYsVRz9KCHGylTHqATBXg1woKcqp0TmsLLPM/cpY4MwCmCQGtFWRcBbU /ctGDZvz95Mwy9KgsgCsIv+I9uKHsTJvHAe2LoSNXSXSsb3TAj8589nrf0lpfTVeNw1J lN8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=iCpQJPnY6XrF+XNAbmONE2yu31D4Ge1VoiIuSibfIME=; b=mcNzBu4Vq9djE3g4zePVMmQ7PuKS4udI46FEBm1Y2OLR3LxCH4+9800q0/pemJyoUJ glAWOYDcbT2D+p7NlVKoi3GSdAKeNB1Fz6Npr3wpxbOn7LjepbbeuzSP7CEWv5Jfkt1G +B5Gw1Y/1EfIJikb+VY9Lzxe+zg6G5J6PtXaNRETD8js8GOiXMAZfliWXC1s3duXUCa8 2CoxDdXIMDqPPsgG+7jpsSgDckXTemaNNhOk26Nmn6vEQmKfe0LtEA0a/+CfjzEtAm6e SSqt69Eb7cUJv4NmwGqSASiR7mXDaBU4NYSYc1Rp1ZkqlnnvvnV8/0BRMJWcjByBJ7Uq WkRQ== X-Gm-Message-State: AA+aEWaFJPhRzYAGDYg254pAzhtv/l+VyQIg4ONeUoyXQb86D3K1bXi0 c3RHxWjL8pRclWEoD1KMTOefzUVTSkfMAA== X-Google-Smtp-Source: AFSGD/VJJqaNbrgDP8i71LoMu6ceiL5Cdpda+qtrkLmABFU79UYGlmKlnrh05NZpTS+Ri/dJ6hmlDA== X-Received: by 2002:a24:4e55:: with SMTP id r82mr3088409ita.128.1545420189206; Fri, 21 Dec 2018 11:23:09 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.23.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:23:08 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 18/22] aio: add support for pre-mapped user IO buffers Date: Fri, 21 Dec 2018 12:22:32 -0700 Message-Id: <20181221192236.12866-19-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If we have fixed user buffers, we can map them into the kernel when we setup the io_context. That avoids the need to do get_user_pages() for each and every IO. To utilize this feature, the application must use the SCQRING interface, and additionally set IOCTX_FLAG_FIXEDBUFS when creating the IO context. The latter tells aio that the iocbs in the SQ ring already contain valid destination and sizes. These buffers can then be mapped into the kernel for the life time of the io_context, as opposed to just the duration of the each single IO. It's perfectly valid to setup a larger buffer, and then sometimes only use parts of it for an IO. As long as the range is within the originally mapped region, it will work just fine. Only works with non-vectored read/write commands for now, not with PREADV/PWRITEV. A limit of 4M is imposed as the largest buffer we currently support. There's nothing preventing us from going larger, but we need some cap, and 4M seemed like it would definitely be big enough. RLIMIT_MEMLOCK is used to cap the total amount of memory pinned. Signed-off-by: Jens Axboe --- fs/aio.c | 219 ++++++++++++++++++++++++++++++++--- include/uapi/linux/aio_abi.h | 1 + 2 files changed, 202 insertions(+), 18 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index a49109e69334..c424aa2ed336 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -42,6 +42,8 @@ #include #include #include +#include +#include #include #include @@ -107,6 +109,13 @@ struct aio_iocb_ring { struct aio_mapped_range iocb_range; /* maps user iocbs */ }; +struct aio_mapped_ubuf { + u64 ubuf; + size_t len; + struct bio_vec *bvec; + unsigned int nr_bvecs; +}; + struct kioctx { struct percpu_ref users; atomic_t dead; @@ -142,6 +151,9 @@ struct kioctx { struct page **ring_pages; long nr_pages; + /* if used, fixed mapped user buffers */ + struct aio_mapped_ubuf *user_bufs; + /* if used, completion and submission rings */ struct aio_iocb_ring sq_ring; struct aio_mapped_range cq_ring; @@ -309,8 +321,10 @@ static const unsigned int iocb_page_shift = static const unsigned int event_page_shift = ilog2(PAGE_SIZE / sizeof(struct io_event)); +static void aio_iocb_buffer_unmap(struct kioctx *); static void aio_scqring_unmap(struct kioctx *); static void aio_iopoll_reap_events(struct kioctx *); +static const struct iocb *aio_iocb_from_index(struct kioctx *ctx, unsigned idx); static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages) { @@ -689,6 +703,7 @@ static void free_ioctx(struct work_struct *work) pr_debug("freeing %p\n", ctx); aio_scqring_unmap(ctx); + aio_iocb_buffer_unmap(ctx); aio_free_ring(ctx); free_percpu(ctx->cpu); percpu_ref_exit(&ctx->reqs); @@ -1827,6 +1842,124 @@ static int aio_scqring_map(struct kioctx *ctx, return ret; } +static void aio_iocb_buffer_unmap(struct kioctx *ctx) +{ + int i, j; + + if (!ctx->user_bufs) + return; + + for (i = 0; i < ctx->max_reqs; i++) { + struct aio_mapped_ubuf *amu = &ctx->user_bufs[i]; + + for (j = 0; j < amu->nr_bvecs; j++) + put_page(amu->bvec[j].bv_page); + + kfree(amu->bvec); + amu->nr_bvecs = 0; + } + + kfree(ctx->user_bufs); + ctx->user_bufs = NULL; +} + +static int aio_iocb_buffer_map(struct kioctx *ctx) +{ + unsigned long total_pages, page_limit; + struct page **pages = NULL; + int i, j, got_pages = 0; + const struct iocb *iocb; + int ret = -EINVAL; + + ctx->user_bufs = kzalloc(ctx->max_reqs * sizeof(struct aio_mapped_ubuf), + GFP_KERNEL); + if (!ctx->user_bufs) + return -ENOMEM; + + /* Don't allow more pages than we can safely lock */ + total_pages = 0; + page_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; + + for (i = 0; i < ctx->max_reqs; i++) { + struct aio_mapped_ubuf *amu = &ctx->user_bufs[i]; + unsigned long off, start, end, ubuf; + int pret, nr_pages; + size_t size; + + iocb = aio_iocb_from_index(ctx, i); + + /* + * Don't impose further limits on the size and buffer + * constraints here, we'll -EINVAL later when IO is + * submitted if they are wrong. + */ + ret = -EFAULT; + if (!iocb->aio_buf) + goto err; + + /* arbitrary limit, but we need something */ + if (iocb->aio_nbytes > SZ_4M) + goto err; + + ubuf = iocb->aio_buf; + end = (ubuf + iocb->aio_nbytes + PAGE_SIZE - 1) >> PAGE_SHIFT; + start = ubuf >> PAGE_SHIFT; + nr_pages = end - start; + + ret = -ENOMEM; + if (total_pages + nr_pages > page_limit) + goto err; + + if (!pages || nr_pages > got_pages) { + kfree(pages); + pages = kmalloc(nr_pages * sizeof(struct page *), + GFP_KERNEL); + if (!pages) + goto err; + got_pages = nr_pages; + } + + amu->bvec = kmalloc(nr_pages * sizeof(struct bio_vec), + GFP_KERNEL); + if (!amu->bvec) + goto err; + + down_write(¤t->mm->mmap_sem); + pret = get_user_pages(ubuf, nr_pages, 1, pages, NULL); + up_write(¤t->mm->mmap_sem); + + if (pret < nr_pages) { + if (pret < 0) + ret = pret; + goto err; + } + + off = ubuf & ~PAGE_MASK; + size = iocb->aio_nbytes; + for (j = 0; j < nr_pages; j++) { + size_t vec_len; + + vec_len = min_t(size_t, size, PAGE_SIZE - off); + amu->bvec[j].bv_page = pages[j]; + amu->bvec[j].bv_len = vec_len; + amu->bvec[j].bv_offset = off; + off = 0; + size -= vec_len; + } + /* store original address for later verification */ + amu->ubuf = ubuf; + amu->len = iocb->aio_nbytes; + amu->nr_bvecs = nr_pages; + total_pages += nr_pages; + } + kfree(pages); + return 0; +err: + kfree(pages); + aio_iocb_buffer_unmap(ctx); + return ret; +} + /* sys_io_setup2: * Like sys_io_setup(), except that it takes a set of flags * (IOCTX_FLAG_*), and some pointers to user structures: @@ -1844,7 +1977,8 @@ SYSCALL_DEFINE5(io_setup2, u32, nr_events, u32, flags, unsigned long ctx; long ret; - if (flags & ~(IOCTX_FLAG_IOPOLL | IOCTX_FLAG_SCQRING)) + if (flags & ~(IOCTX_FLAG_IOPOLL | IOCTX_FLAG_SCQRING | + IOCTX_FLAG_FIXEDBUFS)) return -EINVAL; ret = get_user(ctx, ctxp); @@ -1860,6 +1994,15 @@ SYSCALL_DEFINE5(io_setup2, u32, nr_events, u32, flags, ret = aio_scqring_map(ioctx, sq_ring, cq_ring); if (ret) goto err; + if (flags & IOCTX_FLAG_FIXEDBUFS) { + ret = aio_iocb_buffer_map(ioctx); + if (ret) + goto err; + } + } else if (flags & IOCTX_FLAG_FIXEDBUFS) { + /* can only support fixed bufs with SQ/CQ ring */ + ret = -EINVAL; + goto err; } ret = put_user(ioctx->user_id, ctxp); @@ -2135,23 +2278,58 @@ static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb, return ret; } -static int aio_setup_rw(int rw, const struct iocb *iocb, struct iovec **iovec, - bool vectored, bool compat, struct iov_iter *iter) +static int aio_setup_rw(int rw, struct aio_kiocb *kiocb, + const struct iocb *iocb, struct iovec **iovec, bool vectored, + bool compat, bool kaddr, struct iov_iter *iter) { - void __user *buf = (void __user *)(uintptr_t)iocb->aio_buf; + void __user *ubuf = (void __user *)(uintptr_t)iocb->aio_buf; size_t len = iocb->aio_nbytes; if (!vectored) { - ssize_t ret = import_single_range(rw, buf, len, *iovec, iter); + ssize_t ret; + + if (!kaddr) { + ret = import_single_range(rw, ubuf, len, *iovec, iter); + } else { + struct kioctx *ctx = kiocb->ki_ctx; + struct aio_mapped_ubuf *amu; + size_t offset; + int index; + + /* __io_submit_one() already validated the index */ + index = array_index_nospec(kiocb->ki_index, + ctx->max_reqs); + amu = &ctx->user_bufs[index]; + if (iocb->aio_buf < amu->ubuf || + iocb->aio_buf + len > amu->ubuf + amu->len) { + ret = -EFAULT; + goto err; + } + + /* + * May not be a start of buffer, set size appropriately + * and advance us to the beginning. + */ + offset = iocb->aio_buf - amu->ubuf; + iov_iter_bvec(iter, rw, amu->bvec, amu->nr_bvecs, + offset + len); + if (offset) + iov_iter_advance(iter, offset); + ret = 0; + + } +err: *iovec = NULL; return ret; } + if (kaddr) + return -EINVAL; #ifdef CONFIG_COMPAT if (compat) - return compat_import_iovec(rw, buf, len, UIO_FASTIOV, iovec, + return compat_import_iovec(rw, ubuf, len, UIO_FASTIOV, iovec, iter); #endif - return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter); + return import_iovec(rw, ubuf, len, UIO_FASTIOV, iovec, iter); } static inline void aio_rw_done(struct kiocb *req, ssize_t ret) @@ -2233,7 +2411,7 @@ static void aio_iopoll_iocb_issued(struct aio_submit_state *state, static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, struct aio_submit_state *state, bool vectored, - bool compat) + bool compat, bool kaddr) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *req = &kiocb->rw; @@ -2253,9 +2431,11 @@ static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, if (unlikely(!file->f_op->read_iter)) goto out_fput; - ret = aio_setup_rw(READ, iocb, &iovec, vectored, compat, &iter); + ret = aio_setup_rw(READ, kiocb, iocb, &iovec, vectored, compat, kaddr, + &iter); if (ret) goto out_fput; + ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter)); if (!ret) aio_rw_done(req, call_read_iter(file, req, &iter)); @@ -2268,7 +2448,7 @@ static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, struct aio_submit_state *state, bool vectored, - bool compat) + bool compat, bool kaddr) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *req = &kiocb->rw; @@ -2288,7 +2468,8 @@ static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, if (unlikely(!file->f_op->write_iter)) goto out_fput; - ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, compat, &iter); + ret = aio_setup_rw(WRITE, kiocb, iocb, &iovec, vectored, compat, kaddr, + &iter); if (ret) goto out_fput; ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter)); @@ -2527,7 +2708,8 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, unsigned long ki_index, - struct aio_submit_state *state, bool compat) + struct aio_submit_state *state, bool compat, + bool kaddr) { struct aio_kiocb *req; ssize_t ret; @@ -2588,16 +2770,16 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, ret = -EINVAL; switch (iocb->aio_lio_opcode) { case IOCB_CMD_PREAD: - ret = aio_read(req, iocb, state, false, compat); + ret = aio_read(req, iocb, state, false, compat, kaddr); break; case IOCB_CMD_PWRITE: - ret = aio_write(req, iocb, state, false, compat); + ret = aio_write(req, iocb, state, false, compat, kaddr); break; case IOCB_CMD_PREADV: - ret = aio_read(req, iocb, state, true, compat); + ret = aio_read(req, iocb, state, true, compat, kaddr); break; case IOCB_CMD_PWRITEV: - ret = aio_write(req, iocb, state, true, compat); + ret = aio_write(req, iocb, state, true, compat, kaddr); break; case IOCB_CMD_FSYNC: if (ctx->flags & IOCTX_FLAG_IOPOLL) @@ -2654,7 +2836,7 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb)))) return -EFAULT; - return __io_submit_one(ctx, &iocb, ki_index, state, compat); + return __io_submit_one(ctx, &iocb, ki_index, state, compat, false); } #ifdef CONFIG_BLOCK @@ -2757,6 +2939,7 @@ static const struct iocb *aio_peek_sqring(struct kioctx *ctx, static int aio_ring_submit(struct kioctx *ctx, unsigned int to_submit) { + bool kaddr = (ctx->flags & IOCTX_FLAG_FIXEDBUFS) != 0; struct aio_submit_state state, *statep = NULL; int i, ret = 0, submit = 0; @@ -2773,7 +2956,7 @@ static int aio_ring_submit(struct kioctx *ctx, unsigned int to_submit) if (!iocb) break; - ret = __io_submit_one(ctx, iocb, iocb_index, statep, false); + ret = __io_submit_one(ctx, iocb, iocb_index, statep, false, kaddr); if (ret) break; diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index 5d3ada40ce15..39d783175872 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -110,6 +110,7 @@ struct iocb { #define IOCTX_FLAG_IOPOLL (1 << 0) /* io_context is polled */ #define IOCTX_FLAG_SCQRING (1 << 1) /* Use SQ/CQ rings */ +#define IOCTX_FLAG_FIXEDBUFS (1 << 2) /* IO buffers are fixed */ struct aio_sq_ring { union { From patchwork Fri Dec 21 19:22:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740959 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6D1DD1850 for ; Fri, 21 Dec 2018 19:23:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 59D5A28870 for ; Fri, 21 Dec 2018 19:23:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4DD2428876; Fri, 21 Dec 2018 19:23:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9EC4D28877 for ; Fri, 21 Dec 2018 19:23:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391688AbeLUTXY (ORCPT ); Fri, 21 Dec 2018 14:23:24 -0500 Received: from mail-it1-f195.google.com ([209.85.166.195]:39554 "EHLO mail-it1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391690AbeLUTXM (ORCPT ); Fri, 21 Dec 2018 14:23:12 -0500 Received: by mail-it1-f195.google.com with SMTP id a6so8024781itl.4 for ; Fri, 21 Dec 2018 11:23:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=uYYxaCLsV31Ns7M2SLZHZNwYZ0E6V1184518y2bKzY4=; b=l7CKLLz5hxFog9LO9fd6ZmWFtsGdTzGOQMlS2FdIRja/8rwE+vYac4kpfmgeaF+iJ9 ECarlwcveYyL5lh1FwsKbmUBG5EJS6PbB7dfgqX1uh+K8GfQJS1WH8eEu4ZGD2aqXsQ2 eibVRVV7sdPwQueUakHaVWTG5+iXHjoXKyO2UaMSclJIJt+tbY7l19wKxD7qBhQ71wz7 +KUzwDBJ+tGIQsFgLLh+5/ke/otrAhocs3NFpy4kJinEHAONwAPp4ovuIgN/PPJubkYR o+S4+Xtcat7USPRE+4XoMoCoTyn4blp4ou4IHIpCZO199+TDN1xKEbWF5EBshUqQ5vQd mPcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=uYYxaCLsV31Ns7M2SLZHZNwYZ0E6V1184518y2bKzY4=; b=N/0CvNQdDcb/A3THrsK38BFwd22l1ZwitumQ8gqwJGw+mxhJcp6omKgH1u3aGFOgaa fSd3r6pK3g6W/GIdOeUWsYEk3wAeUFnbV/pKE3C/2AV1ZnByw7uB5Cu3+J+mvqBTfVzo uUbObk57C5nsyvVdWGxkifvh0LGI/U9po29j+/Kg3BYM/+LaC9tPA6P7ScxKdR8AcEPs xUiaKoejw7uxC3AJdeWiik2Cv712fnB/74D8NHf+85AvWhIt2Z003cp2gmG4e2LWcCQ2 BPf5ZvHHmjYS0kyVARdQ10nAzKAE7it65jioh/trQa1vO4QpEK5uAI7EhuYb7j5kPYp6 GWyw== X-Gm-Message-State: AA+aEWZpVYznJaT/EMlzujX+7flCMsTYICGYyGw1xLA06mBEh7IjPVWD lfRreYTI0x08nC/2vcx5VBM7/qjK5BWlUg== X-Google-Smtp-Source: AFSGD/U9x+s0FWthTB2EXxR2R1ceNRJa5HPrsyjrZkRHHNaotyRVCrgCmAbm5UtFevcVNs9sD4P+NA== X-Received: by 2002:a24:b242:: with SMTP id h2mr2558890iti.2.1545420190767; Fri, 21 Dec 2018 11:23:10 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.23.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:23:09 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 19/22] aio: support kernel side submission for aio with SCQRING Date: Fri, 21 Dec 2018 12:22:33 -0700 Message-Id: <20181221192236.12866-20-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add support for backing the io_context with either a thread, or a workqueue and letting those handle the submission for us. This can be used to reduce overhead for submission, or to always make submission async. The latter is particularly useful for buffered aio, which is now fully async with this feature. For polled IO, we could have the kernel side thread hammer on the SQ ring and submit when it finds IO. This would mean that an application would NEVER have to enter the kernel to do IO! Didn't add this yet, but it would be trivial to add. If an application sets IOCTX_FLAG_SCQTHREAD, the io_context gets a single thread backing. If used with buffered IO, this will limit the device queue depth to 1, but it will be async, IOs will simply be serialized. Or an application can set IOCTX_FLAG_SQWQ, in which case the io_context gets a work queue backing. The concurrency level is the mininum of twice the available CPUs, or the queue depth specific for the context. For this mode, we attempt to do buffered reads inline, in case they are cached. So we should only punt to a workqueue, if we would have to block to get our data. Tested with polling, no polling, fixedbufs, no fixedbufs, buffered, O_DIRECT. See this sample application for how to use it: http://git.kernel.dk/cgit/fio/plain/t/aio-ring.c Signed-off-by: Jens Axboe --- fs/aio.c | 438 ++++++++++++++++++++++++++++++++--- include/uapi/linux/aio_abi.h | 3 + 2 files changed, 414 insertions(+), 27 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index c424aa2ed336..cd4a61642b46 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include #include @@ -44,6 +45,7 @@ #include #include #include +#include #include #include @@ -116,6 +118,14 @@ struct aio_mapped_ubuf { unsigned int nr_bvecs; }; +struct aio_sq_offload { + struct task_struct *thread; /* if using a thread */ + struct workqueue_struct *wq; /* wq offload */ + struct mm_struct *mm; + struct files_struct *files; + wait_queue_head_t wait; +}; + struct kioctx { struct percpu_ref users; atomic_t dead; @@ -158,6 +168,10 @@ struct kioctx { struct aio_iocb_ring sq_ring; struct aio_mapped_range cq_ring; int cq_ring_overflow; + int submit_eagain; + + /* sq ring submitter thread, if used */ + struct aio_sq_offload sq_offload; struct rcu_work free_rwork; /* see free_ioctx() */ @@ -252,6 +266,7 @@ struct aio_kiocb { unsigned long ki_flags; #define KIOCB_F_POLL_COMPLETED 0 /* polled IO has completed */ #define KIOCB_F_POLL_EAGAIN 1 /* polled submission got EAGAIN */ +#define KIOCB_F_FORCE_NONBLOCK 2 /* inline submission attempt */ refcount_t ki_refcnt; @@ -1349,19 +1364,31 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2) unsigned int tail; /* - * If we can't get a cq entry, userspace overflowed the - * submission (by quite a lot). Flag it as an overflow - * condition, and next io_ring_enter(2) call will return - * -EOVERFLOW. + * Catch EAGAIN early if we've forced a nonblock attempt, as + * we don't want to pass that back down to userspace through + * the CQ ring. Just mark the ctx as such, so the caller will + * see it and punt to workqueue. This is just for buffered + * aio reads. */ - spin_lock_irqsave(&ctx->completion_lock, flags); - ev = aio_peek_cqring(ctx, &tail); - if (ev) { - aio_fill_event(ev, iocb, res, res2); - aio_commit_cqring(ctx, tail); - } else - ctx->cq_ring_overflow = 1; - spin_unlock_irqrestore(&ctx->completion_lock, flags); + if (res == -EAGAIN && + test_bit(KIOCB_F_FORCE_NONBLOCK, &iocb->ki_flags)) { + ctx->submit_eagain = 1; + } else { + /* + * If we can't get a cq entry, userspace overflowed the + * submission (by quite a lot). Flag it as an overflow + * condition, and next io_ring_enter(2) call will return + * -EOVERFLOW. + */ + spin_lock_irqsave(&ctx->completion_lock, flags); + ev = aio_peek_cqring(ctx, &tail); + if (ev) { + aio_fill_event(ev, iocb, res, res2); + aio_commit_cqring(ctx, tail); + } else + ctx->cq_ring_overflow = 1; + spin_unlock_irqrestore(&ctx->completion_lock, flags); + } } else { aio_ring_complete(ctx, iocb, res, res2); @@ -1727,6 +1754,63 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr, return ret; } +static int aio_sq_thread(void *); + +static int aio_sq_thread_start(struct kioctx *ctx) +{ + struct aio_sq_ring *ring = ctx->sq_ring.ring; + struct aio_sq_offload *aso = &ctx->sq_offload; + int ret; + + memset(aso, 0, sizeof(*aso)); + init_waitqueue_head(&aso->wait); + + if (!(ctx->flags & IOCTX_FLAG_FIXEDBUFS)) + aso->mm = current->mm; + + ret = -EBADF; + aso->files = get_files_struct(current); + if (!aso->files) + goto err; + + if (ctx->flags & IOCTX_FLAG_SQTHREAD) { + char name[32]; + + snprintf(name, sizeof(name), "aio-sq-%lu/%d", ctx->user_id, + ring->sq_thread_cpu); + aso->thread = kthread_create_on_cpu(aio_sq_thread, ctx, + ring->sq_thread_cpu, name); + if (IS_ERR(aso->thread)) { + ret = PTR_ERR(aso->thread); + aso->thread = NULL; + goto err; + } + wake_up_process(aso->thread); + } else if (ctx->flags & IOCTX_FLAG_SQWQ) { + int concurrency; + + /* Do QD, or 2 * CPUS, whatever is smallest */ + concurrency = min(ring->nr_events - 1, 2 * num_online_cpus()); + aso->wq = alloc_workqueue("aio-sq-%lu", + WQ_UNBOUND | WQ_FREEZABLE, + concurrency, ctx->user_id); + if (!aso->wq) { + ret = -ENOMEM; + goto err; + } + } + + return 0; +err: + if (aso->files) { + put_files_struct(aso->files); + aso->files = NULL; + } + if (aso->mm) + aso->mm = NULL; + return ret; +} + static void aio_unmap_range(struct aio_mapped_range *range) { int i; @@ -1772,6 +1856,20 @@ static int aio_map_range(struct aio_mapped_range *range, void __user *uaddr, static void aio_scqring_unmap(struct kioctx *ctx) { + struct aio_sq_offload *aso = &ctx->sq_offload; + + if (aso->thread) { + kthread_park(aso->thread); + kthread_stop(aso->thread); + aso->thread = NULL; + } else if (aso->wq) { + destroy_workqueue(aso->wq); + aso->wq = NULL; + } + if (aso->files) { + put_files_struct(aso->files); + aso->files = NULL; + } aio_unmap_range(&ctx->sq_ring.ring_range); aio_unmap_range(&ctx->sq_ring.iocb_range); aio_unmap_range(&ctx->cq_ring); @@ -1833,6 +1931,9 @@ static int aio_scqring_map(struct kioctx *ctx, kcq_ring->nr_events = cq_ring_size; kcq_ring->head = kcq_ring->tail = 0; + if (ctx->flags & (IOCTX_FLAG_SQTHREAD | IOCTX_FLAG_SQWQ)) + ret = aio_sq_thread_start(ctx); + err: if (ret) { aio_unmap_range(&ctx->sq_ring.ring_range); @@ -1978,7 +2079,8 @@ SYSCALL_DEFINE5(io_setup2, u32, nr_events, u32, flags, long ret; if (flags & ~(IOCTX_FLAG_IOPOLL | IOCTX_FLAG_SCQRING | - IOCTX_FLAG_FIXEDBUFS)) + IOCTX_FLAG_FIXEDBUFS | IOCTX_FLAG_SQTHREAD | + IOCTX_FLAG_SQWQ)) return -EINVAL; ret = get_user(ctx, ctxp); @@ -1999,8 +2101,9 @@ SYSCALL_DEFINE5(io_setup2, u32, nr_events, u32, flags, if (ret) goto err; } - } else if (flags & IOCTX_FLAG_FIXEDBUFS) { - /* can only support fixed bufs with SQ/CQ ring */ + } else if (flags & (IOCTX_FLAG_FIXEDBUFS | IOCTX_FLAG_SQTHREAD | + IOCTX_FLAG_SQWQ)) { + /* These features only supported with SCQRING */ ret = -EINVAL; goto err; } @@ -2210,7 +2313,7 @@ static struct file *aio_file_get(struct aio_submit_state *state, int fd) } static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb, - struct aio_submit_state *state) + struct aio_submit_state *state, bool force_nonblock) { struct kioctx *ctx = kiocb->ki_ctx; struct kiocb *req = &kiocb->rw; @@ -2243,6 +2346,10 @@ static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb, ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags); if (unlikely(ret)) goto out_fput; + if (force_nonblock) { + req->ki_flags |= IOCB_NOWAIT; + set_bit(KIOCB_F_FORCE_NONBLOCK, &kiocb->ki_flags); + } if (iocb->aio_flags & IOCB_FLAG_HIPRI) { /* shares space in the union, and is rather pointless.. */ @@ -2411,7 +2518,7 @@ static void aio_iopoll_iocb_issued(struct aio_submit_state *state, static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, struct aio_submit_state *state, bool vectored, - bool compat, bool kaddr) + bool compat, bool kaddr, bool force_nonblock) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *req = &kiocb->rw; @@ -2419,7 +2526,7 @@ static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, struct file *file; ssize_t ret; - ret = aio_prep_rw(kiocb, iocb, state); + ret = aio_prep_rw(kiocb, iocb, state, force_nonblock); if (ret) return ret; file = req->ki_filp; @@ -2456,7 +2563,7 @@ static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, struct file *file; ssize_t ret; - ret = aio_prep_rw(kiocb, iocb, state); + ret = aio_prep_rw(kiocb, iocb, state, false); if (ret) return ret; file = req->ki_filp; @@ -2709,7 +2816,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, unsigned long ki_index, struct aio_submit_state *state, bool compat, - bool kaddr) + bool kaddr, bool force_nonblock) { struct aio_kiocb *req; ssize_t ret; @@ -2770,13 +2877,15 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, ret = -EINVAL; switch (iocb->aio_lio_opcode) { case IOCB_CMD_PREAD: - ret = aio_read(req, iocb, state, false, compat, kaddr); + ret = aio_read(req, iocb, state, false, compat, kaddr, + force_nonblock); break; case IOCB_CMD_PWRITE: ret = aio_write(req, iocb, state, false, compat, kaddr); break; case IOCB_CMD_PREADV: - ret = aio_read(req, iocb, state, true, compat, kaddr); + ret = aio_read(req, iocb, state, true, compat, kaddr, + force_nonblock); break; case IOCB_CMD_PWRITEV: ret = aio_write(req, iocb, state, true, compat, kaddr); @@ -2836,7 +2945,8 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb)))) return -EFAULT; - return __io_submit_one(ctx, &iocb, ki_index, state, compat, false); + return __io_submit_one(ctx, &iocb, ki_index, state, compat, false, + false); } #ifdef CONFIG_BLOCK @@ -2956,7 +3066,8 @@ static int aio_ring_submit(struct kioctx *ctx, unsigned int to_submit) if (!iocb) break; - ret = __io_submit_one(ctx, iocb, iocb_index, statep, false, kaddr); + ret = __io_submit_one(ctx, iocb, iocb_index, statep, false, kaddr, + false); if (ret) break; @@ -3008,15 +3119,288 @@ static int aio_cqring_wait(struct kioctx *ctx, int min_events) return ret; } +static void aio_fill_cq_error(struct kioctx *ctx, const struct iocb *iocb, + long ret) +{ + struct io_event *ev; + unsigned tail; + + /* + * Only really need the lock for non-polled IO, but this is an error + * so not worth checking. Just lock it so we know kernel access to + * the CQ ring is serialized. + */ + spin_lock_irq(&ctx->completion_lock); + ev = aio_peek_cqring(ctx, &tail); + ev->obj = iocb->aio_data; + ev->data = 0; + ev->res = ret; + ev->res2 = 0; + aio_commit_cqring(ctx, tail); + spin_unlock_irq(&ctx->completion_lock); + + /* + * for thread offload, app could already be sleeping in io_ring_enter() + * before we get to flag the error. wake them up, if needed. + */ + if (ctx->flags & (IOCTX_FLAG_SQTHREAD | IOCTX_FLAG_SQWQ)) + if (waitqueue_active(&ctx->wait)) + wake_up(&ctx->wait); +} + +struct iocb_submit { + const struct iocb *iocb; + unsigned int index; +}; + +static int aio_submit_iocbs(struct kioctx *ctx, struct iocb_submit *iocbs, + unsigned int nr, struct mm_struct *cur_mm, + bool mm_fault) +{ + struct aio_submit_state state, *statep = NULL; + int ret, i, submitted = 0; + + if (nr > AIO_PLUG_THRESHOLD) { + aio_submit_state_start(&state, ctx, nr); + statep = &state; + } + + for (i = 0; i < nr; i++) { + if (unlikely(mm_fault)) + ret = -EFAULT; + else + ret = __io_submit_one(ctx, iocbs[i].iocb, + iocbs[i].index, statep, false, + !cur_mm, false); + if (!ret) { + submitted++; + continue; + } + + aio_fill_cq_error(ctx, iocbs[i].iocb, ret); + } + + if (statep) + aio_submit_state_end(&state); + + return submitted; +} + +/* + * sq thread only supports O_DIRECT or FIXEDBUFS IO + */ +static int aio_sq_thread(void *data) +{ + struct iocb_submit iocbs[AIO_IOPOLL_BATCH]; + struct kioctx *ctx = data; + struct aio_sq_offload *aso = &ctx->sq_offload; + struct mm_struct *cur_mm = NULL; + struct files_struct *old_files; + mm_segment_t old_fs; + DEFINE_WAIT(wait); + + old_files = current->files; + current->files = aso->files; + + old_fs = get_fs(); + set_fs(USER_DS); + + while (!kthread_should_stop()) { + const struct iocb *iocb; + bool mm_fault = false; + unsigned nhead, index; + int i; + + iocb = aio_peek_sqring(ctx, &index, &nhead); + if (!iocb) { + prepare_to_wait(&aso->wait, &wait, TASK_INTERRUPTIBLE); + iocb = aio_peek_sqring(ctx, &index, &nhead); + if (!iocb) { + /* + * Drop cur_mm before scheduler. We can't hold + * it for long periods, and it would also + * introduce a deadlock with kill_ioctx(). + */ + if (cur_mm) { + unuse_mm(cur_mm); + mmput(cur_mm); + cur_mm = NULL; + } + if (kthread_should_park()) + kthread_parkme(); + if (kthread_should_stop()) { + finish_wait(&aso->wait, &wait); + break; + } + if (signal_pending(current)) + flush_signals(current); + schedule(); + } + finish_wait(&aso->wait, &wait); + if (!iocb) + continue; + } + + /* If ->mm is set, we're not doing FIXEDBUFS */ + if (aso->mm && !cur_mm) { + mm_fault = !mmget_not_zero(aso->mm); + if (!mm_fault) { + use_mm(aso->mm); + cur_mm = aso->mm; + } + } + + i = 0; + do { + if (i == ARRAY_SIZE(iocbs)) + break; + iocbs[i].iocb = iocb; + iocbs[i].index = index; + ++i; + aio_commit_sqring(ctx, nhead); + } while ((iocb = aio_peek_sqring(ctx, &index, &nhead)) != NULL); + + aio_submit_iocbs(ctx, iocbs, i, cur_mm, mm_fault); + } + current->files = old_files; + set_fs(old_fs); + if (cur_mm) { + unuse_mm(cur_mm); + mmput(cur_mm); + } + return 0; +} + +struct aio_io_work { + struct work_struct work; + struct kioctx *ctx; + struct iocb iocb; + unsigned iocb_index; +}; + +static void aio_sq_wq_submit_work(struct work_struct *work) +{ + struct aio_io_work *aiw = container_of(work, struct aio_io_work, work); + struct kioctx *ctx = aiw->ctx; + struct aio_sq_offload *aso = &ctx->sq_offload; + mm_segment_t old_fs = get_fs(); + struct files_struct *old_files; + int ret; + + old_files = current->files; + current->files = aso->files; + + if (aso->mm) { + if (!mmget_not_zero(aso->mm)) { + ret = -EFAULT; + goto err; + } + use_mm(aso->mm); + } + + set_fs(USER_DS); + + ret = __io_submit_one(ctx, &aiw->iocb, aiw->iocb_index, NULL, false, + !aso->mm, false); + + set_fs(old_fs); + if (aso->mm) { + unuse_mm(aso->mm); + mmput(aso->mm); + } + +err: + if (ret) + aio_fill_cq_error(ctx, &aiw->iocb, ret); + current->files = old_files; + kfree(aiw); +} + +/* + * If this is a read, try a cached inline read first. If the IO is in the + * page cache, we can satisfy it without blocking and without having to + * punt to a threaded execution. This is much faster, particularly for + * lower queue depth IO, and it's always a lot more efficient. + */ +static bool aio_sq_try_inline(struct kioctx *ctx, const struct iocb *iocb, + unsigned index) +{ + struct aio_sq_offload *aso = &ctx->sq_offload; + int ret; + + if (iocb->aio_lio_opcode != IOCB_CMD_PREAD && + iocb->aio_lio_opcode != IOCB_CMD_PREADV) + return false; + + ret = __io_submit_one(ctx, iocb, index, NULL, false, !aso->mm, true); + if (ret == -EAGAIN || ctx->submit_eagain) { + ctx->submit_eagain = 0; + return false; + } + + /* + * We're done - even if this was an error, return 0. The error will + * be in the CQ ring for the application. + */ + return true; +} + +static int aio_sq_wq_submit(struct kioctx *ctx, unsigned int to_submit) +{ + struct aio_io_work *work; + const struct iocb *iocb; + unsigned nhead, index; + int ret, queued; + + ret = queued = 0; + while ((iocb = aio_peek_sqring(ctx, &index, &nhead)) != NULL) { + ret = aio_sq_try_inline(ctx, iocb, index); + if (!ret) { + work = kmalloc(sizeof(*work), GFP_KERNEL); + if (!work) { + ret = -ENOMEM; + break; + } + memcpy(&work->iocb, iocb, sizeof(*iocb)); + aio_commit_sqring(ctx, nhead); + work->iocb_index = index; + INIT_WORK(&work->work, aio_sq_wq_submit_work); + work->ctx = ctx; + queue_work(ctx->sq_offload.wq, &work->work); + } + queued++; + if (queued == to_submit) + break; + } + + return queued ? queued : ret; +} + static int __io_ring_enter(struct kioctx *ctx, unsigned int to_submit, unsigned int min_complete, unsigned int flags) { int ret = 0; if (flags & IORING_FLAG_SUBMIT) { - ret = aio_ring_submit(ctx, to_submit); - if (ret < 0) - return ret; + if (!to_submit) + return 0; + + /* + * Three options here: + * 1) We have an sq thread, just wake it up to do submissions + * 2) We have an sq wq, queue a work item for each iocb + * 3) Submit directly + */ + if (ctx->flags & IOCTX_FLAG_SQTHREAD) { + wake_up(&ctx->sq_offload.wait); + ret = to_submit; + } else if (ctx->flags & IOCTX_FLAG_SQWQ) { + ret = aio_sq_wq_submit(ctx, to_submit); + } else { + ret = aio_ring_submit(ctx, to_submit); + if (ret < 0) + return ret; + } } if (flags & IORING_FLAG_GETEVENTS) { unsigned int nr_events = 0; diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index 39d783175872..b09b1976e038 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -111,6 +111,8 @@ struct iocb { #define IOCTX_FLAG_IOPOLL (1 << 0) /* io_context is polled */ #define IOCTX_FLAG_SCQRING (1 << 1) /* Use SQ/CQ rings */ #define IOCTX_FLAG_FIXEDBUFS (1 << 2) /* IO buffers are fixed */ +#define IOCTX_FLAG_SQTHREAD (1 << 3) /* Use SQ thread */ +#define IOCTX_FLAG_SQWQ (1 << 4) /* Use SQ workqueue */ struct aio_sq_ring { union { @@ -118,6 +120,7 @@ struct aio_sq_ring { u32 head; /* kernel consumer head */ u32 tail; /* app producer tail */ u32 nr_events; /* max events in ring */ + u16 sq_thread_cpu; u64 iocbs; /* setup pointer to app iocbs */ }; u32 pad[16]; From patchwork Fri Dec 21 19:22:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740949 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 676BE13A4 for ; Fri, 21 Dec 2018 19:23:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 591A928870 for ; Fri, 21 Dec 2018 19:23:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4A14828879; Fri, 21 Dec 2018 19:23:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B9AB828870 for ; Fri, 21 Dec 2018 19:23:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391709AbeLUTXW (ORCPT ); Fri, 21 Dec 2018 14:23:22 -0500 Received: from mail-it1-f194.google.com ([209.85.166.194]:36836 "EHLO mail-it1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391694AbeLUTXO (ORCPT ); Fri, 21 Dec 2018 14:23:14 -0500 Received: by mail-it1-f194.google.com with SMTP id c9so8042303itj.1 for ; Fri, 21 Dec 2018 11:23:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=b2wSIAbErE1DD//PnXxUbCFtU6QwyM+Ul4E2BB+mQY8=; b=a6q/676UB8BoSqLAgPHF7JSbka1a94hGLVF16OZF4/ZwqlF4m9ObhAp86eNK4V88QG vYlEwjRn2WXAjvq1evdiZZIKVyjGAEWXXzRaZQHgYgdxXmUbm5/1dnF4IyNk47MIoaco rs8MqaAan8OyVOaod8mSc8PiM5X7W2mmNXoEqi/qtkIC9NOqO7/iFRaP6SJiwbf40USS 2F7xY+Hx5h7UroO6Yh8vYlyzkLsMgnNC2xhn4dznuvL0TvEJuzpBGIlbtW1cGHBJboYS 7D9NBPiIe7Z9tmuFizBbHZXEjpBjBgtW2QJ9JwV1NiJIDsuYpzT5Sw1+fd/knH+KUBKl j+iQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=b2wSIAbErE1DD//PnXxUbCFtU6QwyM+Ul4E2BB+mQY8=; b=G2gQXeUvy86uIO47KMgX4/pWNCJ4NhNd/fpNeuVUpSgi3E7bBFrjAAgE6pjfIXpwsa YQqx6ychJOfHtbXBoBn6To04VFEeNV5iXcpM1OHSz0+btquIKvfSJNGSo6b0zsdA/VFS FnlWBQOgc15jO1c98PVnxSwlIKfTndU80ksCDNiuzmb/5jNkxqSkagAS/gGo9baP2xAv MC2SGGYCbwsX4MHZYK0SRzWAMVXDjq40OWYqjps82g4aJYNjbGT3C8+wS1c8ia2dC9uq sH09Qoy64tAre5uhUF/26jc3Q1VwsI/33EXSwuJo3Q2v6OC2TPSgnJ+4m/QEp8FpHZxp OgoQ== X-Gm-Message-State: AA+aEWZoP7Lxv/yEt/a1OTQFY6q3P7Y/ZqP9sNzmBO6SncR5yMNUZH0E pLrSx3CJ58exkjPm0oM3HqmW69GdAaTxJw== X-Google-Smtp-Source: AFSGD/VC6D2zOOy244jP8ZUz/SszTIXcTcTv+oTN2EZG+pAHl4B+KTGifnJGA7vjSnlysO57PLedyg== X-Received: by 2002:a24:a04:: with SMTP id 4mr2804842itw.122.1545420192297; Fri, 21 Dec 2018 11:23:12 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.23.10 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:23:11 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 20/22] aio: enable polling for IOCTX_FLAG_SQTHREAD Date: Fri, 21 Dec 2018 12:22:34 -0700 Message-Id: <20181221192236.12866-21-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This enables an application to do IO, without ever entering the kernel. By using the SQ ring to fill in new events and watching for completions on the CQ ring, we can submit and reap IOs without doing a single system call. The kernel side thread will poll for new submissions, and in case of HIPRI/polled IO, it'll also poll for completions. For O_DIRECT, we can do this with just SQTHREAD being enabled. For buffered aio, we need the workqueue as well. If we can satisfy the buffered inline from the SQTHREAD, we do that. If not, we punt to the workqueue. This is just like buffered aio off the io_ring_enter(2) system call. Proof of concept. If the thread has been idle for 1 second, it will set sq_ring->kflags |= IORING_SQ_NEED_WAKEUP. The application will have to call io_ring_enter() to start things back up again. If IO is kept busy, that will never be needed. Basically an application that has this feature enabled will guard it's io_ring_enter(2) call with: barrier(); if (ring->kflags & IORING_SQ_NEED_WAKEUP) io_ring_enter(ctx, to_submit, 0, IORING_SUBMIT); instead of calling it unconditionally. Improvements: 1) Maybe have smarter backoff. Busy loop for X time, then go to monitor/mwait, finally the schedule we have now after an idle second. Might not be worth the complexity. 2) Probably want the application to pass in the appropriate grace period, not hard code it at 1 second. Signed-off-by: Jens Axboe --- fs/aio.c | 141 ++++++++++++++++++++++++++++------- include/uapi/linux/aio_abi.h | 4 + 2 files changed, 116 insertions(+), 29 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index cd4a61642b46..8894c9299b39 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -120,6 +120,7 @@ struct aio_mapped_ubuf { struct aio_sq_offload { struct task_struct *thread; /* if using a thread */ + bool thread_poll; struct workqueue_struct *wq; /* wq offload */ struct mm_struct *mm; struct files_struct *files; @@ -340,6 +341,7 @@ static void aio_iocb_buffer_unmap(struct kioctx *); static void aio_scqring_unmap(struct kioctx *); static void aio_iopoll_reap_events(struct kioctx *); static const struct iocb *aio_iocb_from_index(struct kioctx *ctx, unsigned idx); +static void aio_sq_wq_submit_work(struct work_struct *work); static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages) { @@ -1773,6 +1775,9 @@ static int aio_sq_thread_start(struct kioctx *ctx) if (!aso->files) goto err; + if (ctx->flags & IOCTX_FLAG_SQPOLL) + aso->thread_poll = true; + if (ctx->flags & IOCTX_FLAG_SQTHREAD) { char name[32]; @@ -1786,7 +1791,8 @@ static int aio_sq_thread_start(struct kioctx *ctx) goto err; } wake_up_process(aso->thread); - } else if (ctx->flags & IOCTX_FLAG_SQWQ) { + } + if (ctx->flags & IOCTX_FLAG_SQWQ) { int concurrency; /* Do QD, or 2 * CPUS, whatever is smallest */ @@ -1862,7 +1868,8 @@ static void aio_scqring_unmap(struct kioctx *ctx) kthread_park(aso->thread); kthread_stop(aso->thread); aso->thread = NULL; - } else if (aso->wq) { + } + if (aso->wq) { destroy_workqueue(aso->wq); aso->wq = NULL; } @@ -2080,7 +2087,7 @@ SYSCALL_DEFINE5(io_setup2, u32, nr_events, u32, flags, if (flags & ~(IOCTX_FLAG_IOPOLL | IOCTX_FLAG_SCQRING | IOCTX_FLAG_FIXEDBUFS | IOCTX_FLAG_SQTHREAD | - IOCTX_FLAG_SQWQ)) + IOCTX_FLAG_SQWQ | IOCTX_FLAG_SQPOLL)) return -EINVAL; ret = get_user(ctx, ctxp); @@ -3153,28 +3160,69 @@ struct iocb_submit { unsigned int index; }; +struct aio_io_work { + struct work_struct work; + struct kioctx *ctx; + struct iocb iocb; + unsigned iocb_index; +}; + +static int aio_queue_async_work(struct kioctx *ctx, struct iocb_submit *is) +{ + struct aio_io_work *work; + + work = kmalloc(sizeof(*work), GFP_KERNEL); + if (work) { + memcpy(&work->iocb, is->iocb, sizeof(*is->iocb)); + work->iocb_index = is->index; + INIT_WORK(&work->work, aio_sq_wq_submit_work); + work->ctx = ctx; + queue_work(ctx->sq_offload.wq, &work->work); + return 0; + } + + return -ENOMEM; +} + static int aio_submit_iocbs(struct kioctx *ctx, struct iocb_submit *iocbs, unsigned int nr, struct mm_struct *cur_mm, bool mm_fault) { struct aio_submit_state state, *statep = NULL; int ret, i, submitted = 0; + bool force_nonblock; if (nr > AIO_PLUG_THRESHOLD) { aio_submit_state_start(&state, ctx, nr); statep = &state; } + /* + * Having both a thread and a workqueue only makes sense for buffered + * IO, where we can't submit in an async fashion. Use the NOWAIT + * trick from the SQ thread, and punt to the workqueue if we can't + * satisfy this iocb without blocking. This is only necessary + * for buffered IO with sqthread polled submission. + */ + force_nonblock = (ctx->flags & IOCTX_FLAG_SQWQ) != 0; + for (i = 0; i < nr; i++) { - if (unlikely(mm_fault)) + if (unlikely(mm_fault)) { ret = -EFAULT; - else + } else { ret = __io_submit_one(ctx, iocbs[i].iocb, iocbs[i].index, statep, false, - !cur_mm, false); - if (!ret) { - submitted++; - continue; + !cur_mm, force_nonblock); + /* nogo, submit to workqueue */ + if (force_nonblock && + (ret == -EAGAIN || ctx->submit_eagain)) { + ctx->submit_eagain = 0; + ret = aio_queue_async_work(ctx, &iocbs[i]); + } + if (!ret) { + submitted++; + continue; + } } aio_fill_cq_error(ctx, iocbs[i].iocb, ret); @@ -3187,17 +3235,23 @@ static int aio_submit_iocbs(struct kioctx *ctx, struct iocb_submit *iocbs, } /* - * sq thread only supports O_DIRECT or FIXEDBUFS IO + * SQ thread is woken if the app asked for offloaded submission. This can + * be either O_DIRECT, in which case we do submissions directly, or it can + * be buffered IO, in which case we do them inline if we can do so without + * blocking. If we can't, then we punt to a workqueue. */ static int aio_sq_thread(void *data) { struct iocb_submit iocbs[AIO_IOPOLL_BATCH]; struct kioctx *ctx = data; + struct aio_sq_ring *ring = ctx->sq_ring.ring; struct aio_sq_offload *aso = &ctx->sq_offload; struct mm_struct *cur_mm = NULL; struct files_struct *old_files; mm_segment_t old_fs; DEFINE_WAIT(wait); + unsigned inflight; + unsigned long timeout; old_files = current->files; current->files = aso->files; @@ -3205,15 +3259,50 @@ static int aio_sq_thread(void *data) old_fs = get_fs(); set_fs(USER_DS); + timeout = inflight = 0; while (!kthread_should_stop()) { const struct iocb *iocb; bool mm_fault = false; unsigned nhead, index; int i; + if (aso->thread_poll && inflight) { + unsigned int nr_events = 0; + + /* + * Buffered IO, just pretend everything completed. + * We don't have to poll completions for that. + */ + if (ctx->flags & IOCTX_FLAG_IOPOLL) + __aio_iopoll_check(ctx, NULL, &nr_events, 0, -1U); + else + nr_events = inflight; + + inflight -= nr_events; + if (!inflight) + timeout = jiffies + HZ; + } + iocb = aio_peek_sqring(ctx, &index, &nhead); if (!iocb) { + /* + * If we're polling, let us spin for a second without + * work before going to sleep. + */ + if (aso->thread_poll) { + if (inflight || !time_after(jiffies, timeout)) { + cpu_relax(); + continue; + } + } prepare_to_wait(&aso->wait, &wait, TASK_INTERRUPTIBLE); + + /* Tell userspace we may need a wakeup call */ + if (aso->thread_poll) { + ring->kflags |= IORING_SQ_NEED_WAKEUP; + smp_wmb(); + } + iocb = aio_peek_sqring(ctx, &index, &nhead); if (!iocb) { /* @@ -3235,6 +3324,9 @@ static int aio_sq_thread(void *data) if (signal_pending(current)) flush_signals(current); schedule(); + + if (aso->thread_poll) + ring->kflags &= ~IORING_SQ_NEED_WAKEUP; } finish_wait(&aso->wait, &wait); if (!iocb) @@ -3260,7 +3352,7 @@ static int aio_sq_thread(void *data) aio_commit_sqring(ctx, nhead); } while ((iocb = aio_peek_sqring(ctx, &index, &nhead)) != NULL); - aio_submit_iocbs(ctx, iocbs, i, cur_mm, mm_fault); + inflight += aio_submit_iocbs(ctx, iocbs, i, cur_mm, mm_fault); } current->files = old_files; set_fs(old_fs); @@ -3271,13 +3363,6 @@ static int aio_sq_thread(void *data) return 0; } -struct aio_io_work { - struct work_struct work; - struct kioctx *ctx; - struct iocb iocb; - unsigned iocb_index; -}; - static void aio_sq_wq_submit_work(struct work_struct *work) { struct aio_io_work *aiw = container_of(work, struct aio_io_work, work); @@ -3347,7 +3432,6 @@ static bool aio_sq_try_inline(struct kioctx *ctx, const struct iocb *iocb, static int aio_sq_wq_submit(struct kioctx *ctx, unsigned int to_submit) { - struct aio_io_work *work; const struct iocb *iocb; unsigned nhead, index; int ret, queued; @@ -3356,18 +3440,17 @@ static int aio_sq_wq_submit(struct kioctx *ctx, unsigned int to_submit) while ((iocb = aio_peek_sqring(ctx, &index, &nhead)) != NULL) { ret = aio_sq_try_inline(ctx, iocb, index); if (!ret) { - work = kmalloc(sizeof(*work), GFP_KERNEL); - if (!work) { - ret = -ENOMEM; + struct iocb_submit is = { + .iocb = iocb, + .index = index + }; + + ret = aio_queue_async_work(ctx, &is); + if (ret) break; - } - memcpy(&work->iocb, iocb, sizeof(*iocb)); - aio_commit_sqring(ctx, nhead); - work->iocb_index = index; - INIT_WORK(&work->work, aio_sq_wq_submit_work); - work->ctx = ctx; - queue_work(ctx->sq_offload.wq, &work->work); } + + aio_commit_sqring(ctx, nhead); queued++; if (queued == to_submit) break; diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index b09b1976e038..26173de01fee 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -113,6 +113,9 @@ struct iocb { #define IOCTX_FLAG_FIXEDBUFS (1 << 2) /* IO buffers are fixed */ #define IOCTX_FLAG_SQTHREAD (1 << 3) /* Use SQ thread */ #define IOCTX_FLAG_SQWQ (1 << 4) /* Use SQ workqueue */ +#define IOCTX_FLAG_SQPOLL (1 << 5) /* SQ thread polls */ + +#define IORING_SQ_NEED_WAKEUP (1 << 0) /* needs io_ring_enter wakeup */ struct aio_sq_ring { union { @@ -121,6 +124,7 @@ struct aio_sq_ring { u32 tail; /* app producer tail */ u32 nr_events; /* max events in ring */ u16 sq_thread_cpu; + u16 kflags; /* kernel info to app */ u64 iocbs; /* setup pointer to app iocbs */ }; u32 pad[16]; From patchwork Fri Dec 21 19:22:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740943 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EECBD13A4 for ; Fri, 21 Dec 2018 19:23:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E11C52884A for ; Fri, 21 Dec 2018 19:23:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D50AD28870; Fri, 21 Dec 2018 19:23:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D43712884A for ; Fri, 21 Dec 2018 19:23:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391712AbeLUTXR (ORCPT ); Fri, 21 Dec 2018 14:23:17 -0500 Received: from mail-io1-f68.google.com ([209.85.166.68]:38368 "EHLO mail-io1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391685AbeLUTXP (ORCPT ); Fri, 21 Dec 2018 14:23:15 -0500 Received: by mail-io1-f68.google.com with SMTP id l14so4487584ioj.5 for ; Fri, 21 Dec 2018 11:23:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=svESWgD0ciKM/+mOIh/4jEfbiuye9Bk8tge4nuPgSO8=; b=h3rgJGeS17f73IaoHZAmYKroFQ/a7XQIDPVs7x/4Yke+wbVJ5NAfIQKmg+T49Eja/B 7nk3dtL384WsdWz9PGRqGEUFXjN1oFgfiSdgRW73E2mBmbRMkFIt/fAs424gfgrFgEL7 4cuYLsnhwwqfkGeVGssZbJCl+qkrP9SF0ESStszS2mx3jNj2lqIj3SzL+DtZNOdk7AKM ZmRZHy+fHXVig7yAWRrUkbufS03V+VswyAlqsrdgeiLWeiPMVHhIyeDs6LVhzWqJGxtF 4WJpl/ICsXIa7/SNmseOBmGbVzGeYdOJg/tqyriy7CqRdIuayZrdnPRiFEjvRoqyHRKg wJGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=svESWgD0ciKM/+mOIh/4jEfbiuye9Bk8tge4nuPgSO8=; b=TtWTIIufbJuosU3sj6AUO4UXDlzv+xuPOb4bW1etfKPEAqDjwzJp7tal0jd5LmEuYi nhhMHuUShxbmp3pQP9by+vmR1bxZ5ucPG0PM+M5zvBwAXpPWzhZcf8WX81GV9+GGoTHm Sz/rGFwjiEoBpGi5maw6Rvcoe4+OlCcqUIu4qZ7tbzZgJBsxgVMgq3vl+QcJL6iyj9C1 mcCjA7xrgEZArWf8YP4KoqUqAJidfeBMUCmHsjdQsfZ/zyRxsQcEqv+z9WQMq+31ZFaf +0ed6RnCS3fPXQqSxBcGWp6NuwzHc5zPmDJ7RqSxcghn8XvFgI3yd18MNu+GQh6EsRFn FxMQ== X-Gm-Message-State: AJcUukfhbwqAldEZP+FJC+YzlTrAROrhyxF18KlpZlQnanLf6HyrKcYq cHJjxhb4PBHjDsdu5mNzPaNPpaH92pwG6Q== X-Google-Smtp-Source: ALg8bN6NPKS84d546tfg7/xIIk0aNGTHlmpxHl+U0OYgjV5ZwYcjfe7cWqfabgeOPdDUb1NDip0x0g== X-Received: by 2002:a6b:b556:: with SMTP id e83mr2624547iof.195.1545420193755; Fri, 21 Dec 2018 11:23:13 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.23.12 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:23:12 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 21/22] aio: utilize io_event->res2 for CQ ring Date: Fri, 21 Dec 2018 12:22:35 -0700 Message-Id: <20181221192236.12866-22-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We don't use this at all, grab it to provide hints about the data. The first hint we add is IORING_RES2_CACHEHIT, whether a read was served out of the page cache, or if it hit media. This is useful for buffered aio, O_DIRECT reads would never have this set (for obvious reasons). Signed-off-by: Jens Axboe --- fs/aio.c | 7 +++++-- include/uapi/linux/aio_abi.h | 6 ++++++ 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 8894c9299b39..a433503a2dc3 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1361,6 +1361,7 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2) struct kioctx *ctx = iocb->ki_ctx; if (ctx->flags & IOCTX_FLAG_SCQRING) { + int nowait = test_bit(KIOCB_F_FORCE_NONBLOCK, &iocb->ki_flags); unsigned long flags; struct io_event *ev; unsigned int tail; @@ -1372,10 +1373,12 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2) * see it and punt to workqueue. This is just for buffered * aio reads. */ - if (res == -EAGAIN && - test_bit(KIOCB_F_FORCE_NONBLOCK, &iocb->ki_flags)) { + if (res == -EAGAIN && nowait) { ctx->submit_eagain = 1; } else { + if (nowait) + res2 = IOEV_RES2_CACHEHIT; + /* * If we can't get a cq entry, userspace overflowed the * submission (by quite a lot). Flag it as an overflow diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index 26173de01fee..809c6f33f5e6 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -66,6 +66,12 @@ struct io_event { __s64 res2; /* secondary result */ }; +/* + * aio CQ ring commandeers the otherwise unused ev->res2 to provide + * metadata about the IO. + */ +#define IOEV_RES2_CACHEHIT (1 << 0) /* IO did not hit media */ + /* * we always use a 64bit off_t when communicating * with userland. its up to libraries to do the From patchwork Fri Dec 21 19:22:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10740941 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 848FA13B5 for ; Fri, 21 Dec 2018 19:23:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 781C52884A for ; Fri, 21 Dec 2018 19:23:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6C39628870; Fri, 21 Dec 2018 19:23:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2B3F32884A for ; Fri, 21 Dec 2018 19:23:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391685AbeLUTXR (ORCPT ); Fri, 21 Dec 2018 14:23:17 -0500 Received: from mail-it1-f196.google.com ([209.85.166.196]:39565 "EHLO mail-it1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391709AbeLUTXQ (ORCPT ); Fri, 21 Dec 2018 14:23:16 -0500 Received: by mail-it1-f196.google.com with SMTP id a6so8025078itl.4 for ; Fri, 21 Dec 2018 11:23:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=dCoL8psYwDA9lIycdd8lGQAwac9fL5Iq3Ghm5zotmoM=; b=U8r41AvDj2BDuOCSoWB2jOMYoJcs2jbDTVUobyGu/O0REarMOq7cMXzQLvn67+9anX ZOskJ2V12A1xe2cTB2DZFDTof/HvWIO0i1vcohGK6IrWOgrgo9f1SU9bxSK7MNrJ+D0Y T53IQot72LIw/cvMQbWm2U5hgfzqbnwq6BzsmjA/1o9unybvudBpCYe7ugFTnJgoxOYb asqiaqCZCb21a9BVj4iWtEOH4jZ9zWWk8CEbTJdcBM1H6DJDPIx0WQkxdQ6pX8OW2UJi uiYSGa2JZRXoYouxpwegNeiDrUCnDierWnay1DEr9lBzuBrOYa6yB7R9Wocw8SyybXDD gUHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=dCoL8psYwDA9lIycdd8lGQAwac9fL5Iq3Ghm5zotmoM=; b=Vq9wnWPBpVpKMe5uaF7jWBCZn8mK9VxwC1Hi773DX4RaxafFEZVdmD+ZhZ2S/qD4ux GENO35xNbvmXoN4VayO+1F0aHUsYQgLDMPqq1NM79EYgxN3dl7FyKZODOhOaUsq04uKi QwKEs5KRJuGvDtLK3vVq6TFCAsM1z99aNXVQoIAQ5TL5vfB3CWdrO6GqSPpr0LdxKpKJ DPN4sWy1+BjavIlQrHip3UtuOQ2B1u3e4keskZ6jI7Ld5QZNEr+sXm3+3Ew6Wi7uMaTu W4iVpy7M4eu5fJZpmef7KygNleYUam8OOSVFJtlzavB81l+n+PrpYi2yKOY9HK4IbgSZ 7/AQ== X-Gm-Message-State: AA+aEWYXTT63AgL5YFCtU0E/bmpqp6ofxmZbf1aE3rha2zspjavlg6E+ JWPflFj23HfW6BpKj5j/r0s10BqjiXDW8g== X-Google-Smtp-Source: AFSGD/Wb5a/yEofZdbcyFM3P8k2MRLsytQg6muqJLjCky9zo5vwy1AISLleRjm4xF7dnZJ00FJUOlw== X-Received: by 2002:a24:1152:: with SMTP id 79mr2966509itf.167.1545420195301; Fri, 21 Dec 2018 11:23:15 -0800 (PST) Received: from localhost.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id t1sm12456290iol.85.2018.12.21.11.23.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Dec 2018 11:23:14 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org Cc: hch@lst.de, viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 22/22] aio: add my copyright Date: Fri, 21 Dec 2018 12:22:36 -0700 Message-Id: <20181221192236.12866-23-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181221192236.12866-1-axboe@kernel.dk> References: <20181221192236.12866-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Jens Axboe --- fs/aio.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/aio.c b/fs/aio.c index a433503a2dc3..000bfa0de1ed 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -6,6 +6,7 @@ * * Copyright 2000, 2001, 2002 Red Hat, Inc. All Rights Reserved. * Copyright 2018 Christoph Hellwig. + * Copyright 2018 Jens Axboe * * See ../COPYING for licensing terms. */