From patchwork Thu Dec 13 17:56:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729297 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AC1EA1759 for ; Thu, 13 Dec 2018 17:56:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9DC022B0EC for ; Thu, 13 Dec 2018 17:56:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 913272C81B; Thu, 13 Dec 2018 17:56:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2BBE02B0EC for ; Thu, 13 Dec 2018 17:56:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729354AbeLMR44 (ORCPT ); Thu, 13 Dec 2018 12:56:56 -0500 Received: from mail-it1-f193.google.com ([209.85.166.193]:40546 "EHLO mail-it1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729322AbeLMR4z (ORCPT ); Thu, 13 Dec 2018 12:56:55 -0500 Received: by mail-it1-f193.google.com with SMTP id h193so5168084ita.5 for ; Thu, 13 Dec 2018 09:56:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=uEeDBkOwPid70UEg8pUacK6i7pi0+OoFq4mtLStZsZw=; b=K29xAmUeGIFAKrkzOvkZyXu6KCp4r3uo1oeRZsHIVMq9eeMJ2hrnF8GbnAh+KlstGw q9wzTzLOEGcCdcjjBHPDg/5+Lf75D0etSXl1T8p2pbihXw2FyRlJx4wqTaIsog0pgWBh zPgB2S/sF2tvU83g8AR1xYItuA2i6j4//AapPisDOZqqhYpFipJpGWDDIpdMqRkJjpwm LTYNN5+YvypcpcU4qUNDwxEVS/1PbSCReFqeJi0TQWxLkiwo6pxtSOsOJg+IPyMc4Ryh gP60lnMslQm6O81mXHPp4ih6pATqjlBrped6j5Ja9wM+uq4HUccjCtrU/PwS36rDz6Ew 4txQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=uEeDBkOwPid70UEg8pUacK6i7pi0+OoFq4mtLStZsZw=; b=GKn3mI2/8FvVFGXXlSuyRhnM2kMHykbMuKnIOCSE/GUjhHEUDPB44pSvSVBUFVv4xD d93kGfPKDpamp8ghp7/u+BMpQveWanCUmAVt3O8CIprH8EkCG8kkYhB5ZpkqmI7RoyXK 3gGLZjIVoK5A90ws4HaWx6qjCM36D4CBiw97cRuehwIQyBlb2wtDBKqz6WIvxRC2G+XF J5+HO6IHkBJ4P6Sxvm6tuK4ZX3uH+9yV8GVqpqB+2IrYujCBbsachhY9faPUq7NrcMXZ BCcdH8c7D5HBG+DaZbh0ESxbQ/uAjyIU9094fPQfwLGpnIAT0LYkW2J+70I/UJsQ8AfI OSeQ== X-Gm-Message-State: AA+aEWbjNftInARoEQZm3OHfsH8r84BQPZS6XqK+CXy/t0LirFT21RcC 7+zYPHJoNWgmQnat2+YNEMUuhrxin5eCqQ== X-Google-Smtp-Source: AFSGD/U7LG99CXnwN0oPfZJ5ILG0l3zs48UMyEUlQaMGw2AAX6bBWmNddFdHpo2gLpsU3cjh3CO+Eg== X-Received: by 2002:a02:7a58:: with SMTP id z24mr24489660jad.22.1544723813975; Thu, 13 Dec 2018 09:56:53 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.56.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:56:53 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 01/26] fs: add an iopoll method to struct file_operations Date: Thu, 13 Dec 2018 10:56:20 -0700 Message-Id: <20181213175645.22181-2-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Christoph Hellwig This new methods is used to explicitly poll for I/O completion for an iocb. It must be called for any iocb submitted asynchronously (that is with a non-null ki_complete) which has the IOCB_HIPRI flag set. The method is assisted by a new ki_cookie field in struct iocb to store the polling cookie. TODO: we can probably union ki_cookie with the existing hint and I/O priority fields to avoid struct kiocb growth. Reviewed-by: Johannes Thumshirn Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- Documentation/filesystems/vfs.txt | 3 +++ include/linux/fs.h | 2 ++ 2 files changed, 5 insertions(+) diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 5f71a252e2e0..d9dc5e4d82b9 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -857,6 +857,7 @@ struct file_operations { ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); ssize_t (*read_iter) (struct kiocb *, struct iov_iter *); ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); + int (*iopoll)(struct kiocb *kiocb, bool spin); int (*iterate) (struct file *, struct dir_context *); int (*iterate_shared) (struct file *, struct dir_context *); __poll_t (*poll) (struct file *, struct poll_table_struct *); @@ -902,6 +903,8 @@ otherwise noted. write_iter: possibly asynchronous write with iov_iter as source + iopoll: called when aio wants to poll for completions on HIPRI iocbs + iterate: called when the VFS needs to read the directory contents iterate_shared: called when the VFS needs to read the directory contents diff --git a/include/linux/fs.h b/include/linux/fs.h index a1ab233e6469..6a5f71f8ae06 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -310,6 +310,7 @@ struct kiocb { int ki_flags; u16 ki_hint; u16 ki_ioprio; /* See linux/ioprio.h */ + unsigned int ki_cookie; /* for ->iopoll */ } __randomize_layout; static inline bool is_sync_kiocb(struct kiocb *kiocb) @@ -1781,6 +1782,7 @@ struct file_operations { ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); ssize_t (*read_iter) (struct kiocb *, struct iov_iter *); ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); + int (*iopoll)(struct kiocb *kiocb, bool spin); int (*iterate) (struct file *, struct dir_context *); int (*iterate_shared) (struct file *, struct dir_context *); __poll_t (*poll) (struct file *, struct poll_table_struct *); From patchwork Thu Dec 13 17:56:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729305 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2556B3E9D for ; Thu, 13 Dec 2018 17:56:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 19B142B0EC for ; Thu, 13 Dec 2018 17:56:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0DF022BBD8; Thu, 13 Dec 2018 17:56:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BE01B2C81B for ; Thu, 13 Dec 2018 17:56:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729348AbeLMR45 (ORCPT ); Thu, 13 Dec 2018 12:56:57 -0500 Received: from mail-it1-f193.google.com ([209.85.166.193]:33494 "EHLO mail-it1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728587AbeLMR45 (ORCPT ); Thu, 13 Dec 2018 12:56:57 -0500 Received: by mail-it1-f193.google.com with SMTP id m8so16925114itk.0 for ; Thu, 13 Dec 2018 09:56:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=iYR2nYossn2CYfY6XYD4jgz1urOe8gSQ6v8WOe22hMY=; b=Zv++0qf7pOlzIejiXXnGJHxdh3pd8gwSfHuAwXOQWn6ms9gsaqOqVgQtWPZOs/EsDS A49rgPPy1wbAUC/IsJDhN6AT1xvDSXDadF6SFFEGZIi3hUI/j5eZT5prpRTdUVVwSVKw k44KOR+aA3AnBXcTwWQ/TDKzWeNbt9pJ6b5S+Jd2v8ABRGXZ8nWe7a79N/+hwa3UtYP9 QDM+RWTi0KJy8KtKkXZwdkMc9YNR1TcP87J29N1sxayaAbxikT37XOGRWseXy+N8wKvi K+whc0Hn7uXqMJb3sPbiZr/3StUiXvy5A560wfS0iznWcwMjPgMTMKnQbyGEHUgQMDS6 pwBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=iYR2nYossn2CYfY6XYD4jgz1urOe8gSQ6v8WOe22hMY=; b=XkkAqjrgTYg9qWVhqqZeKwZw7px8jAC8Fuy4vkI6WEZUDsTVn6EijWXAPa60uCQ192 X28wpTz5Wt4ubLVzQOoU8ZE6CLjyXhzSl6clC27o8b4FG9tJyb1A3mRqUc97vlZhsooR RJFu0tNfrszyZdmbwS5icWlEON0aoWLJwI4av8tknoCgWwsIgfjDbk9eIcQqjYagxHE9 cZvB5KzZPhiCngK/iHLExQHX0e2gTFG1Cw4W7jmPsjMheiyYZowFbycO4v44DWDKUJuF Z5KrHHC10uStZovsaKgdvsDb4/uIRr7m4Rl3k/WNS1PJrdBWJYYFoUetDpqpQHaCzmFR 26ew== X-Gm-Message-State: AA+aEWZDCPBxBcz+2guxQ/ONUNIDlXPRWDHrT2Tg+uj9xA5hFV/yoRvS ZJJizrYXuUhyYAp174F4b54ejzGc0S/1/Q== X-Google-Smtp-Source: AFSGD/X9Yild44el5oSgDAn+5C8ocz5JyFXvLP6AwfbFqd0er5gP0hCG7XfHKU50Oa73vsdALVg1oA== X-Received: by 2002:a24:4f07:: with SMTP id c7mr277879itb.107.1544723815549; Thu, 13 Dec 2018 09:56:55 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.56.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:56:54 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 02/26] block: add REQ_HIPRI_ASYNC Date: Thu, 13 Dec 2018 10:56:21 -0700 Message-Id: <20181213175645.22181-3-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP For the upcoming async polled IO, we can't sleep allocating requests. If we do, then we introduce a deadlock where the submitter already has async polled IO in-flight, but can't wait for them to complete since polled requests must be active found and reaped. Signed-off-by: Jens Axboe --- include/linux/blk_types.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 46c005d601ac..921d734d6b5d 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -347,6 +347,7 @@ enum req_flag_bits { #define REQ_NOWAIT (1ULL << __REQ_NOWAIT) #define REQ_NOUNMAP (1ULL << __REQ_NOUNMAP) #define REQ_HIPRI (1ULL << __REQ_HIPRI) +#define REQ_HIPRI_ASYNC (REQ_HIPRI | REQ_NOWAIT) #define REQ_DRV (1ULL << __REQ_DRV) #define REQ_SWAP (1ULL << __REQ_SWAP) From patchwork Thu Dec 13 17:56:22 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729307 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0934413BF for ; Thu, 13 Dec 2018 17:57:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EF8BB2B0EC for ; Thu, 13 Dec 2018 17:57:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E3EA52C81B; Thu, 13 Dec 2018 17:57:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8F2902B0EC for ; Thu, 13 Dec 2018 17:57:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729386AbeLMR47 (ORCPT ); Thu, 13 Dec 2018 12:56:59 -0500 Received: from mail-it1-f193.google.com ([209.85.166.193]:36993 "EHLO mail-it1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728133AbeLMR47 (ORCPT ); Thu, 13 Dec 2018 12:56:59 -0500 Received: by mail-it1-f193.google.com with SMTP id b5so5189566iti.2 for ; Thu, 13 Dec 2018 09:56:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=eJ3aWzBT4gm4FwYHYUV1JBWUWsiFyJVaotzVIFD6bng=; b=aYhq76Ge5prZYqChiex1lZztm76hm+rMRCnSJwPw/BL/gfiTqybZaEcTcLSIt53IIC mFsu9ZcjVKm8/achwSdU+Cf7fCKUgpFDSq+sU2iGHsR9rx2bbR/taunZ61uEdJkWtU4H t+/P8agSEvse4G3a8b4WZvDnLaozf/zBc9FTLVK7vdzztCMfwxhxsJnkYI/GnyaMlIGn 1XD+a/l0H6fZdfswK33eVHkyj76HkVTsK2JJacQpqOPpz2lUuGkahFGeq25jKkcCU7LX 21vyoSlgEDkT0K9AQBGsc+30nesat1Yq3KuyLT20Ag6zAVdCON4SLbswttNCursrZaQ/ cNqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=eJ3aWzBT4gm4FwYHYUV1JBWUWsiFyJVaotzVIFD6bng=; b=KV2ih0/MoEokL4kjCZn4YVPcSfwALCwyfHcOgUhw8Pb80bqxKU4Gzk2pf4QbzjhBaT +KX2SM9KcDGcKeRuZDZyGiVELdNZzua19uhMODI/nimjV3gNV4KYK/4lsufS0X+hxQlO KKcdBnaYqHfrkWG6DwngwrhhQB+89H1G0gztOQomaFQp1+crIl42Bhxs6YNCBwuEyzYn ddjzihnrttbS+BswMoPRL2pRPdyhAvftDsenn9uqyX4BJpAW4WkIQt6wiDxNvBYMH1CD uE45txcLIpkvpni4nVdwVpPOpw5rMMdjTscvXcm8ifrI8Am6/GXWRC0AOx+gQM/nxQmv MYMA== X-Gm-Message-State: AA+aEWbgkkycIaLJF2NH4xyDhaPTE6oawZUkyDbFozWd4crDjwsVutyQ dX2eVPjotNp6rzS7gRvl9vS55sGeQ5iTcg== X-Google-Smtp-Source: AFSGD/XV3Y2ozuZi7DvSj71bnLVb/DLZBHTMRYyfr+4wBcSZtZlHhzPYciM60GtD3CBTR7QAjuRmLw== X-Received: by 2002:a24:af0a:: with SMTP id t10mr277907ite.159.1544723817444; Thu, 13 Dec 2018 09:56:57 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.56.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:56:56 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 03/26] block: wire up block device iopoll method Date: Thu, 13 Dec 2018 10:56:22 -0700 Message-Id: <20181213175645.22181-4-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Christoph Hellwig Just call blk_poll on the iocb cookie, we can derive the block device from the inode trivially. Reviewed-by: Johannes Thumshirn Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/block_dev.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/block_dev.c b/fs/block_dev.c index e1886cc7048f..6de8d35f6e41 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -281,6 +281,14 @@ struct blkdev_dio { static struct bio_set blkdev_dio_pool; +static int blkdev_iopoll(struct kiocb *kiocb, bool wait) +{ + struct block_device *bdev = I_BDEV(kiocb->ki_filp->f_mapping->host); + struct request_queue *q = bdev_get_queue(bdev); + + return blk_poll(q, READ_ONCE(kiocb->ki_cookie), wait); +} + static void blkdev_bio_end_io(struct bio *bio) { struct blkdev_dio *dio = bio->bi_private; @@ -398,6 +406,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) bio->bi_opf |= REQ_HIPRI; qc = submit_bio(bio); + WRITE_ONCE(iocb->ki_cookie, qc); break; } @@ -2070,6 +2079,7 @@ const struct file_operations def_blk_fops = { .llseek = block_llseek, .read_iter = blkdev_read_iter, .write_iter = blkdev_write_iter, + .iopoll = blkdev_iopoll, .mmap = generic_file_mmap, .fsync = blkdev_fsync, .unlocked_ioctl = block_ioctl, From patchwork Thu Dec 13 17:56:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729311 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4791114E2 for ; Thu, 13 Dec 2018 17:57:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3BD8F2B0EC for ; Thu, 13 Dec 2018 17:57:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 302702C81C; Thu, 13 Dec 2018 17:57:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D6E272B0EC for ; Thu, 13 Dec 2018 17:57:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728682AbeLMR5A (ORCPT ); Thu, 13 Dec 2018 12:57:00 -0500 Received: from mail-it1-f195.google.com ([209.85.166.195]:51712 "EHLO mail-it1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729381AbeLMR5A (ORCPT ); Thu, 13 Dec 2018 12:57:00 -0500 Received: by mail-it1-f195.google.com with SMTP id x19so5003483itl.1 for ; Thu, 13 Dec 2018 09:56:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=CoUoLVQhZCk01wz0SKdc8zjKvmVqwUz+894lpbana7A=; b=acSniCU1G/HYRD5NI6UAaYNqpGH9rY4W6ybYlE0RCelapdnzC2+LWzxdWAJWcp+rDC zLV+OZQvPkl+vR9OzL1Z3U3XJD9RMMHM+s48w91a565WFP3JM/iq2rgRjxdkwyiWPJ+4 sCEe/N0VnCcEfE2yDVVR9KCi0/nulN0KKiMyNY5IjdlXnAeHuSaX69UoU12LVbtJ3USH UPW5ks+zq70ExVMPpMxfwkjiyveswAvceXMl3TXCfxhCVgXcRKZOY1R23iWq8ZLandd+ 2ZxZ1wmglryroWnkTYeohafn8jD8ju7wxGQ28f7pboedVWGaOvGEwWQPBXUpM3JwB9SW S/oQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=CoUoLVQhZCk01wz0SKdc8zjKvmVqwUz+894lpbana7A=; b=ar5C0UM43rJx18uaNMNtyEU3OVouJtC9OKUZgxienvkjgc3scZDBEqGhhxN86OF511 v3NvlHdKJRe1CmQmboJggt9jNwsOw5hHc3MnFIrOzz5crG+r413hwlTBaWHwQNoLQijw vliiqM1ygoDTuB4F5KRjzqCYDUXSuO2j/3mwn9yxrHFIXDHTVo3nbJiKCoay9WXBkUg+ a9XnyuyTtseu1VhN1xhbxOPu9hZj18L1VPxNbpqCeEf06F5D9EXoxIOhsM1Z2gYBAIYZ pPN1rPbZBqlORUYoUZmaRyLDutHPFJrl2YLkOtTdBIOl1XFmG8qd9J2nRExz1X+3ZexG TeZA== X-Gm-Message-State: AA+aEWaIC4ZJ3iOhMZCQYAJLb1UnxsEcSwXnb+MhKius1jpl3Vs1vr/E roP/j0RYyfqVyLBO/VTcnU6EZ1Ua9f/W7g== X-Google-Smtp-Source: AFSGD/UA3rY1AFcEN8lzB84y2GhM8A8iqBuUP/bkLrDSbW4G5cx2KhYmDzPsq05Ln792GH0iVgwDTg== X-Received: by 2002:a24:68d1:: with SMTP id v200mr273000itb.163.1544723819053; Thu, 13 Dec 2018 09:56:59 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.56.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:56:58 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 04/26] block: use REQ_HIPRI_ASYNC for non-sync polled IO Date: Thu, 13 Dec 2018 10:56:23 -0700 Message-Id: <20181213175645.22181-5-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Tell the block layer if it's a sync or async polled request, so it can do the right thing. Signed-off-by: Jens Axboe --- fs/block_dev.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 6de8d35f6e41..b8f574615792 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -402,8 +402,12 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES); if (!nr_pages) { - if (iocb->ki_flags & IOCB_HIPRI) - bio->bi_opf |= REQ_HIPRI; + if (iocb->ki_flags & IOCB_HIPRI) { + if (!is_sync) + bio->bi_opf |= REQ_HIPRI_ASYNC; + else + bio->bi_opf |= REQ_HIPRI; + } qc = submit_bio(bio); WRITE_ONCE(iocb->ki_cookie, qc); From patchwork Thu Dec 13 17:56:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729315 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 48EF814E2 for ; Thu, 13 Dec 2018 17:57:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3A6E32B0EC for ; Thu, 13 Dec 2018 17:57:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2E33D2C81C; Thu, 13 Dec 2018 17:57:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F14EB2B0EC for ; Thu, 13 Dec 2018 17:57:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729449AbeLMR5D (ORCPT ); Thu, 13 Dec 2018 12:57:03 -0500 Received: from mail-it1-f196.google.com ([209.85.166.196]:39306 "EHLO mail-it1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729381AbeLMR5C (ORCPT ); Thu, 13 Dec 2018 12:57:02 -0500 Received: by mail-it1-f196.google.com with SMTP id a6so5190596itl.4 for ; Thu, 13 Dec 2018 09:57:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=FqB1mIdreafBDC4QK5wLFzs2LJgUqvwh0ZGlyUoGytk=; b=HeKZzp01opH6GPgrEJF9Tzh/gCN23ff4cTIHGAMblmialpa4tv0ObX7gPzX/0LO/pn mOSoEqrJHeQ7MysLtPmTswp+qPy9Yh0pVSxow6co2dgE3kmunbsM8nAACwUIiW78KwRA a8AhAeozPfae+d/afJRXs3XfOeTFr0dBVhiyy+2AzrGH5ORt+PQp4x3DbnY/eGiW8EJ0 BW+Y558LnFry5axiYr5t/sudUcVhYBLSr0ciGzApU/6UbxYAwtYtTBNAPGCqSa6zsLPJ gy/n4VDGHuS2mQyKN/ESs+Yxm+0p7VYFHPdy8EsbJB7rSP6NnpOhiaHFGvlOvo305+uz C8jQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=FqB1mIdreafBDC4QK5wLFzs2LJgUqvwh0ZGlyUoGytk=; b=Ah+iXzb5/7Jez1hQYcnoZsn4E8Y7JrtotCLpPRTLexcOjUbat4lkPNnKMO0n1Lz8hs XZZX4zsw3WXWq8wQk+TR2Ib7bnNHHtgk+zRKAmoxGYhtxXkmRk8tE9PtLvpNXmNqHIIP l1VEsfdtn7rZA4hAe28B4DqragBQT9e5Yev4ESDeS9jkuSDXPq+NfpIFgQUXt91CKLEG RTLrzZHDHCp0XbT9nRaAzK8b3G1jCoPB3otkqPHbrBMeRvTX9ra8j8Bayn45Mq2pW2tv qcRpf+ig++Nl1Fa6+uxXRCNjlyrXVKaOFId3p6cFjZZ9fe93B41AVglsvl8RlJpcgQ99 DyfQ== X-Gm-Message-State: AA+aEWbdoWBdAbQx9K1gTZVecwgZ9nBIqDsGSj/Brc1uRmyKrhbt3Lfe k2MvjZzdkNj30xGLelYcrHtq7YatpkcUnA== X-Google-Smtp-Source: AFSGD/XyQsYTO0x1oylRZByFdA7k6rRZ2jiekAUzMAk7cAW2bm3hpT6U8pp4ywBka5y4Vyo+CJYyvQ== X-Received: by 2002:a24:1a90:: with SMTP id 138mr255838iti.171.1544723820701; Thu, 13 Dec 2018 09:57:00 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.56.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:56:59 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 05/26] iomap: wire up the iopoll method Date: Thu, 13 Dec 2018 10:56:24 -0700 Message-Id: <20181213175645.22181-6-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Christoph Hellwig Store the request queue the last bio was submitted to in the iocb private data in addition to the cookie so that we find the right block device. Also refactor the common direct I/O bio submission code into a nice little helper. Signed-off-by: Christoph Hellwig Modified to use REQ_HIPRI_ASYNC for async polled IO. Signed-off-by: Jens Axboe --- fs/gfs2/file.c | 2 ++ fs/iomap.c | 47 +++++++++++++++++++++++++++++-------------- fs/xfs/xfs_file.c | 1 + include/linux/iomap.h | 1 + 4 files changed, 36 insertions(+), 15 deletions(-) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 45a17b770d97..358157efc5b7 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -1280,6 +1280,7 @@ const struct file_operations gfs2_file_fops = { .llseek = gfs2_llseek, .read_iter = gfs2_file_read_iter, .write_iter = gfs2_file_write_iter, + .iopoll = iomap_dio_iopoll, .unlocked_ioctl = gfs2_ioctl, .mmap = gfs2_mmap, .open = gfs2_open, @@ -1310,6 +1311,7 @@ const struct file_operations gfs2_file_fops_nolock = { .llseek = gfs2_llseek, .read_iter = gfs2_file_read_iter, .write_iter = gfs2_file_write_iter, + .iopoll = iomap_dio_iopoll, .unlocked_ioctl = gfs2_ioctl, .mmap = gfs2_mmap, .open = gfs2_open, diff --git a/fs/iomap.c b/fs/iomap.c index 9a5bf1e8925b..f3039989de73 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -1441,6 +1441,32 @@ struct iomap_dio { }; }; +int iomap_dio_iopoll(struct kiocb *kiocb, bool spin) +{ + struct request_queue *q = READ_ONCE(kiocb->private); + + if (!q) + return 0; + return blk_poll(q, READ_ONCE(kiocb->ki_cookie), spin); +} +EXPORT_SYMBOL_GPL(iomap_dio_iopoll); + +static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap, + struct bio *bio) +{ + atomic_inc(&dio->ref); + + if (dio->iocb->ki_flags & IOCB_HIPRI) { + if (!dio->wait_for_completion) + bio->bi_opf |= REQ_HIPRI_ASYNC; + else + bio->bi_opf |= REQ_HIPRI; + } + + dio->submit.last_queue = bdev_get_queue(iomap->bdev); + dio->submit.cookie = submit_bio(bio); +} + static ssize_t iomap_dio_complete(struct iomap_dio *dio) { struct kiocb *iocb = dio->iocb; @@ -1553,7 +1579,7 @@ static void iomap_dio_bio_end_io(struct bio *bio) } } -static blk_qc_t +static void iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos, unsigned len) { @@ -1567,15 +1593,10 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos, bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; - if (dio->iocb->ki_flags & IOCB_HIPRI) - flags |= REQ_HIPRI; - get_page(page); __bio_add_page(bio, page, len, 0); bio_set_op_attrs(bio, REQ_OP_WRITE, flags); - - atomic_inc(&dio->ref); - return submit_bio(bio); + iomap_dio_submit_bio(dio, iomap, bio); } static loff_t @@ -1678,9 +1699,6 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, bio_set_pages_dirty(bio); } - if (dio->iocb->ki_flags & IOCB_HIPRI) - bio->bi_opf |= REQ_HIPRI; - iov_iter_advance(dio->submit.iter, n); dio->size += n; @@ -1688,11 +1706,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, copied += n; nr_pages = iov_iter_npages(&iter, BIO_MAX_PAGES); - - atomic_inc(&dio->ref); - - dio->submit.last_queue = bdev_get_queue(iomap->bdev); - dio->submit.cookie = submit_bio(bio); + iomap_dio_submit_bio(dio, iomap, bio); } while (nr_pages); /* @@ -1903,6 +1917,9 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, if (dio->flags & IOMAP_DIO_WRITE_FUA) dio->flags &= ~IOMAP_DIO_NEED_SYNC; + WRITE_ONCE(iocb->ki_cookie, dio->submit.cookie); + WRITE_ONCE(iocb->private, dio->submit.last_queue); + if (!atomic_dec_and_test(&dio->ref)) { if (!dio->wait_for_completion) return -EIOCBQUEUED; diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index e47425071e65..60c2da41f0fc 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1203,6 +1203,7 @@ const struct file_operations xfs_file_operations = { .write_iter = xfs_file_write_iter, .splice_read = generic_file_splice_read, .splice_write = iter_file_splice_write, + .iopoll = iomap_dio_iopoll, .unlocked_ioctl = xfs_file_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = xfs_file_compat_ioctl, diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 9a4258154b25..0fefb5455bda 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -162,6 +162,7 @@ typedef int (iomap_dio_end_io_t)(struct kiocb *iocb, ssize_t ret, unsigned flags); ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, iomap_dio_end_io_t end_io); +int iomap_dio_iopoll(struct kiocb *kiocb, bool spin); #ifdef CONFIG_SWAP struct file; From patchwork Thu Dec 13 17:56:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729321 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 66C3714E2 for ; Thu, 13 Dec 2018 17:57:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5AD3A2B0EC for ; Thu, 13 Dec 2018 17:57:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4EA3D2C81B; Thu, 13 Dec 2018 17:57:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 04B5F2B0EC for ; Thu, 13 Dec 2018 17:57:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729523AbeLMR5F (ORCPT ); Thu, 13 Dec 2018 12:57:05 -0500 Received: from mail-it1-f196.google.com ([209.85.166.196]:52655 "EHLO mail-it1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729395AbeLMR5D (ORCPT ); Thu, 13 Dec 2018 12:57:03 -0500 Received: by mail-it1-f196.google.com with SMTP id g76so4985838itg.2 for ; Thu, 13 Dec 2018 09:57:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=er4XNTBrDyS5t2gvWJtJjpkVFpAPOC78sIXCy9gEMJ4=; b=TnrtjidRUEfDoaX3NHtxwJ+ekBSJaBAUWPL873XW4BjBi7dI0F1Kr/UgjFp5nz82se MtVGSE+zUfQEPgR9TO9Vj8A64ZUBT6hLbLWhaJHxX4WRN8UAMr9I9INnNbqz7dagS/I+ 9G99WwGzsE/3d6a4ns81MX1Icdri/nEpzYX0QhZGMvFtu2deibktm+KuSKLU74oLPJM9 d5d7P4nASxNgBDocDtXgAucw4KcvXh0logATb6bzwUOMNGX7OTE3wxxUsJBRWlWsE/da Rq0yxyzCytIhRdOgJh1gooVOMD5KAMDJNfQ7nYonu9c2nvGzLY28gikSL3ll7JVScCRb JQpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=er4XNTBrDyS5t2gvWJtJjpkVFpAPOC78sIXCy9gEMJ4=; b=B5H89Lk+j03A1ODl+apB2d+QTGuqVJwP44eU929FL9jeqNaaIJAMzKQhH+hkvHoDiQ cyTG9EaLAxkhxiqVvzqeO6qZzX1ssP7tLJU8rAkC0Vp4GgsFyTE2rWfGDd+VDN/p6ba2 3qLrxfpEeRJgvnX+YSg/BPRcXVx1onxFard1iI2Ls+E2UoOPlDqrV1XLIPSLFDQCWc12 E8UuLarCay7wN6qp3Y/gsmm7ec4qJxmeZT73yuBi5BfB5BrmIlthUXgmohfRGzdEGCwa 6JmahYH1tOan0M10ylfhaFeaXknWY1sxmAO1O5xowWftjdyGw1VojMTSC+5x86Wd6EiI 3lOA== X-Gm-Message-State: AA+aEWbD1tJrrtePeMKakfkQ7zTlYC0ZYHcmk9frK4TObEG7lNEhAZhw ray1GHJQ0g+++IYAFjP9NMkZ+kQdJCpo+w== X-Google-Smtp-Source: AFSGD/VTDlf+t34bYTzKfPcUGwzJRzGqaopMmzVJ930etZdMTXM7tXMD8AD0KTc0XDnCPaMlE5AVDA== X-Received: by 2002:a05:660c:b12:: with SMTP id f18mr301356itk.118.1544723822383; Thu, 13 Dec 2018 09:57:02 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:01 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 06/26] aio: use assigned completion handler Date: Thu, 13 Dec 2018 10:56:25 -0700 Message-Id: <20181213175645.22181-7-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We know this is a read/write request, but in preparation for having different kinds of those, ensure that we call the assigned handler instead of assuming it's aio_complete_rq(). Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/aio.c b/fs/aio.c index 05647d352bf3..cf0de61743e8 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1490,7 +1490,7 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t ret) ret = -EINTR; /*FALLTHRU*/ default: - aio_complete_rw(req, ret, 0); + req->ki_complete(req, ret, 0); } } From patchwork Thu Dec 13 17:56:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729327 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7F8251759 for ; Thu, 13 Dec 2018 17:57:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 73FD62B0EC for ; Thu, 13 Dec 2018 17:57:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 683BA2C81B; Thu, 13 Dec 2018 17:57:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 145A22B0EC for ; Thu, 13 Dec 2018 17:57:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729585AbeLMR5I (ORCPT ); Thu, 13 Dec 2018 12:57:08 -0500 Received: from mail-it1-f194.google.com ([209.85.166.194]:33506 "EHLO mail-it1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729483AbeLMR5F (ORCPT ); Thu, 13 Dec 2018 12:57:05 -0500 Received: by mail-it1-f194.google.com with SMTP id m8so16925407itk.0 for ; Thu, 13 Dec 2018 09:57:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=X5thebUgmjkZJznWx9PghrfmA/cZ8t8Zs5+DcD2OHEI=; b=goOGTZ3S2bIPtcVEyAIvwmS+N+HRhYtZ6GRBJ6m6CXJgf/OcgDVA7dTJtEpJhfHWwl L/fwf2ReSe1w71CFwyW4r7fk40YPbQzFEg/3YZ0dW6CNDP1UnJH/MIqhXbNJuBF6MZKE wyod2DPH5oiRSRLdTXpNQbQ2BwSV8k89seWYuPo/ntyQX9VbdWT0rh6UNGmPKS18VyH8 +csvJWMDK/q61P47Ctj3F7//178AaK1LUeNw8UWPUwDTUEJEOYSEMBEFb/9+wqUTbo4g 3JNe+VOoBLHdt/9veQPnQvH0gJG9NXe1O3ubC7UOt4HgZsgc8cYHGfi23DNpHd1xaJAX uaMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=X5thebUgmjkZJznWx9PghrfmA/cZ8t8Zs5+DcD2OHEI=; b=XQ+/S7G4suSLBWc13gE2+dSFDFFpnx+vQ67wIYF2E9cd/mYZi+q2KaoMuZjhPY27Km UO7fJ9XFRyvsLGG168kzOZ/De4gmG17YakjanM4eN0IpC69jet9LhtVc/hoI67tkSLLZ Bmp5Shp1VjVXAYsH7C1/ArzTVDI4rFJ9oA3/t5W4sRlBcveZkQ67amVkH+z5+N020gqF O1Vzn/3YJktY6Fc+zuFc5ILVkkQISXTw+t5Cx1HuhIHMfZ/QObsV2P9HkPB+8vXHbvYG 6Gwn5/VnK7mmbeAzNSzrb3+yE9Es4XDXxsbtlE28wMJlfTrKYCiQ8sKM45Of5EHWr666 VrIQ== X-Gm-Message-State: AA+aEWZpPpTbYqgUNvqsPPEMQuC8nTrLd+YbcJd7qqUQNQ6YYFqIqEhg Nl1171lLPwMEQk0wsSWGEiKV/mxn5umvZg== X-Google-Smtp-Source: AFSGD/UGUt/ieaGY/xOrXeJHi8t1gAYSSDyTPADAahpvxgzXyMsjM4uyEfmBYSZNz72jt9HDkbSmug== X-Received: by 2002:a05:660c:648:: with SMTP id y8mr316990itk.48.1544723823981; Thu, 13 Dec 2018 09:57:03 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:03 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 07/26] aio: separate out ring reservation from req allocation Date: Thu, 13 Dec 2018 10:56:26 -0700 Message-Id: <20181213175645.22181-8-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Christoph Hellwig This is in preparation for certain types of IO not needing a ring reserveration. Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index cf0de61743e8..eaceb40e6cf5 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -901,7 +901,7 @@ static void put_reqs_available(struct kioctx *ctx, unsigned nr) local_irq_restore(flags); } -static bool get_reqs_available(struct kioctx *ctx) +static bool __get_reqs_available(struct kioctx *ctx) { struct kioctx_cpu *kcpu; bool ret = false; @@ -993,6 +993,14 @@ static void user_refill_reqs_available(struct kioctx *ctx) spin_unlock_irq(&ctx->completion_lock); } +static bool get_reqs_available(struct kioctx *ctx) +{ + if (__get_reqs_available(ctx)) + return true; + user_refill_reqs_available(ctx); + return __get_reqs_available(ctx); +} + /* aio_get_req * Allocate a slot for an aio request. * Returns NULL if no requests are free. @@ -1001,24 +1009,15 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx) { struct aio_kiocb *req; - if (!get_reqs_available(ctx)) { - user_refill_reqs_available(ctx); - if (!get_reqs_available(ctx)) - return NULL; - } - req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL|__GFP_ZERO); if (unlikely(!req)) - goto out_put; + return NULL; percpu_ref_get(&ctx->reqs); INIT_LIST_HEAD(&req->ki_list); refcount_set(&req->ki_refcnt, 0); req->ki_ctx = ctx; return req; -out_put: - put_reqs_available(ctx, 1); - return NULL; } static struct kioctx *lookup_ioctx(unsigned long ctx_id) @@ -1805,9 +1804,13 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, return -EINVAL; } + if (!get_reqs_available(ctx)) + return -EAGAIN; + + ret = -EAGAIN; req = aio_get_req(ctx); if (unlikely(!req)) - return -EAGAIN; + goto out_put_reqs_available; if (iocb.aio_flags & IOCB_FLAG_RESFD) { /* @@ -1870,11 +1873,12 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, goto out_put_req; return 0; out_put_req: - put_reqs_available(ctx, 1); percpu_ref_put(&ctx->reqs); if (req->ki_eventfd) eventfd_ctx_put(req->ki_eventfd); kmem_cache_free(kiocb_cachep, req); +out_put_reqs_available: + put_reqs_available(ctx, 1); return ret; } From patchwork Thu Dec 13 17:56:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729325 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 25FD113BF for ; Thu, 13 Dec 2018 17:57:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1836C2BBD8 for ; Thu, 13 Dec 2018 17:57:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0C56B2C81B; Thu, 13 Dec 2018 17:57:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9FB452B0EC for ; Thu, 13 Dec 2018 17:57:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729589AbeLMR5H (ORCPT ); Thu, 13 Dec 2018 12:57:07 -0500 Received: from mail-io1-f68.google.com ([209.85.166.68]:39763 "EHLO mail-io1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729095AbeLMR5H (ORCPT ); Thu, 13 Dec 2018 12:57:07 -0500 Received: by mail-io1-f68.google.com with SMTP id k7so2302872iob.6 for ; Thu, 13 Dec 2018 09:57:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=j8zbGImEaiI6fzdEzobInCNQIlTJ46kLaRSNEoXmu1s=; b=wyVRCenGpJ7Utd3thODL9gLn/0Um/RLk+cC32Z8bg8uPkqdCKKPF+y/OmgbxHALv+D HgJm7LsD5066SLdHBMEB5LtRu83wx7DzAO2m91Ktxfuj1iFUJLu7pmq6f0oCYHNOGg5S zesTlVyin4C34Q0K/B9T6uLcYo725WW1Zt5cTzcgUQ63f6D2I7ji3gSV3oPP+kJdaduo M06OuF8QENdFkfDV5oWtp1SVBgMenPN1LRAotaqKQ4npOQydl704/bmvkOqYCe8KSfBy V030HuGHFvZ1miYapW0ZbEn/2STeGa/bmdJgeuv640sg8FiDBwe9gE99schsbaJ9c5cS YFYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=j8zbGImEaiI6fzdEzobInCNQIlTJ46kLaRSNEoXmu1s=; b=Dqu8WDUcp/3hI+7wC5JcYVRelSkBomncBb4noHXkaGx8IBvYfJleLfzPMagEbU1Gw3 bfYatJOwI7GejOke0luE5Df5Rx69JWHNtzEIGYLadFG6iloOY4A48TjDDIKKoBNOviq5 FYvlxLbWJbMzJsueSd4y+7EClG0qXM8/3OT+GXLDhEn5hUhkOBAg4iytEUmTDCLDUTt2 DgvegeCrlrL1IuJfRve10JJzxdTAC18GZGzHo+l/lvTWVuA2BfSEzbYJmI5d9pAfMODD Tzfhr2euvtrFUzVvOeSsOYl7Goj2ubgR0HO00hUTM3AnX6wddYNvge3DovhmFtUzeNas a6VQ== X-Gm-Message-State: AA+aEWY8df0lk3qMsBW947wLe1BpXAOlZF+wGjbTD0u9ceMaX2fYIdKp e1kOeeOmQ+kjPG2E6iGX70wnROmoLTlHpw== X-Google-Smtp-Source: AFSGD/UZuMcpdXV9ceBYr4vyy/sE99WT1EGbVIyhWSUEsc7jJlo6Agstzty9h2WNv6XBn/eCGnEGtA== X-Received: by 2002:a6b:919:: with SMTP id t25mr20293537ioi.207.1544723825932; Thu, 13 Dec 2018 09:57:05 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.04 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:04 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 08/26] aio: don't zero entire aio_kiocb aio_get_req() Date: Thu, 13 Dec 2018 10:56:27 -0700 Message-Id: <20181213175645.22181-9-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP It's 192 bytes, fairly substantial. Most items don't need to be cleared, especially not upfront. Clear the ones we do need to clear, and leave the other ones for setup when the iocb is prepared and submitted. Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index eaceb40e6cf5..522c04864d82 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1009,14 +1009,15 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx) { struct aio_kiocb *req; - req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL|__GFP_ZERO); + req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL); if (unlikely(!req)) return NULL; percpu_ref_get(&ctx->reqs); + req->ki_ctx = ctx; INIT_LIST_HEAD(&req->ki_list); refcount_set(&req->ki_refcnt, 0); - req->ki_ctx = ctx; + req->ki_eventfd = NULL; return req; } @@ -1730,6 +1731,10 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb) if (unlikely(!req->file)) return -EBADF; + req->head = NULL; + req->woken = false; + req->cancelled = false; + apt.pt._qproc = aio_poll_queue_proc; apt.pt._key = req->events; apt.iocb = aiocb; From patchwork Thu Dec 13 17:56:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729333 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A94671759 for ; Thu, 13 Dec 2018 17:57:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9D4812B0EC for ; Thu, 13 Dec 2018 17:57:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 913DE2C81B; Thu, 13 Dec 2018 17:57:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 421022BBD8 for ; Thu, 13 Dec 2018 17:57:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729654AbeLMR5K (ORCPT ); Thu, 13 Dec 2018 12:57:10 -0500 Received: from mail-io1-f67.google.com ([209.85.166.67]:35971 "EHLO mail-io1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729483AbeLMR5J (ORCPT ); Thu, 13 Dec 2018 12:57:09 -0500 Received: by mail-io1-f67.google.com with SMTP id m19so2326517ioh.3 for ; Thu, 13 Dec 2018 09:57:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Fw/O4zwzIPzN/w4BqDNf+gyUsaBSdiYM1Ss+7FBryao=; b=axA/QSsM99YaA9/9NCbIjs/ZXRLwt6850ruelQU6q6PDTFa7NMx54n9spJYe0Y3aD2 NSS7xeo9clYe07YEBwu0eCqvP9jCtJ/axeQDyJ4pzBigT7obPmDmFzuPSmFTaX0XNFFA 2LKk3rty9cgGDbgx4bymYPFNUvCa9BK95yYVaAsaQ9/rmDu4QNGCIrW8dh4hxRx+ggbP emXCSQrDNjKdSj8Gb21H9sFIgiIHsCXiG5X5Mjo6b3Uopua6yCdYmDvLvyoyoZQxirH5 y9rLfgcjaZGNsieCIAta6xn01ec/ZzYXGzz6pCKCvJQoJht4vbtPMAs7hHRBDEOR/NMI L4rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Fw/O4zwzIPzN/w4BqDNf+gyUsaBSdiYM1Ss+7FBryao=; b=ot0wxWYNvAPTcdZajaJXqeOiK7nCkbPwOkf7VodqKqikU7a4DEvnbpBv3XGKDvsLA+ BV2oPhYF2V4qGvYUmuM+RdRkA6CXQ+dC0G9TgzMBaVoqA0MA6w9WpxYzKDMf6J72EKfn YZ9tCnH8ZKqTN77Taqis1IATISC/7zpuYL+g9aaVuI1cu5EagaAi/jIBLm/iM6oWPqP8 YvEVa55K+Q4/jFBtQEGtFRSs127fA7tDsn6V91iuZcdgaW0LHRGk+t7dBaohl0Jpew43 8c4g5hPOj/teIWRi35APNpZZPyU6LkwaiLh+WXzEm2q+gJS82jH6KNGjoZ74Lc2VWUmD A7yA== X-Gm-Message-State: AA+aEWZSlpmGnOM4NLN2SdnA9oCy/9I9DsiZfjgAFDrKecEnXFqudC5z 76KsvmGyVzquY/FcM0HAJelnNNfAqdpc/g== X-Google-Smtp-Source: AFSGD/XTZeSKdeoq0YmMSuXnnql/o3ffi05htmGMUlxqlQtoTwQHR/KSbbpq/gGETftC38ZghVQ+Kw== X-Received: by 2002:a5d:9913:: with SMTP id x19mr7748823iol.99.1544723827625; Thu, 13 Dec 2018 09:57:07 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:06 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 09/26] aio: only use blk plugs for > 2 depth submissions Date: Thu, 13 Dec 2018 10:56:28 -0700 Message-Id: <20181213175645.22181-10-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Plugging is meant to optimize submission of a string of IOs, if we don't have more than 2 being submitted, don't bother setting up a plug. Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 522c04864d82..ed6c3914477a 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -69,6 +69,12 @@ struct aio_ring { struct io_event io_events[0]; }; /* 128 bytes + ring size */ +/* + * Plugging is meant to work with larger batches of IOs. If we don't + * have more than the below, then don't bother setting up a plug. + */ +#define AIO_PLUG_THRESHOLD 2 + #define AIO_RING_PAGES 8 struct kioctx_table { @@ -1919,7 +1925,8 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, if (nr > ctx->nr_events) nr = ctx->nr_events; - blk_start_plug(&plug); + if (nr > AIO_PLUG_THRESHOLD) + blk_start_plug(&plug); for (i = 0; i < nr; i++) { struct iocb __user *user_iocb; @@ -1932,7 +1939,8 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, if (ret) break; } - blk_finish_plug(&plug); + if (nr > AIO_PLUG_THRESHOLD) + blk_finish_plug(&plug); percpu_ref_put(&ctx->users); return i ? i : ret; @@ -1959,7 +1967,8 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, if (nr > ctx->nr_events) nr = ctx->nr_events; - blk_start_plug(&plug); + if (nr > AIO_PLUG_THRESHOLD) + blk_start_plug(&plug); for (i = 0; i < nr; i++) { compat_uptr_t user_iocb; @@ -1972,7 +1981,8 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, if (ret) break; } - blk_finish_plug(&plug); + if (nr > AIO_PLUG_THRESHOLD) + blk_finish_plug(&plug); percpu_ref_put(&ctx->users); return i ? i : ret; From patchwork Thu Dec 13 17:56:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729335 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B5D6613BF for ; Thu, 13 Dec 2018 17:57:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A8B862B0EC for ; Thu, 13 Dec 2018 17:57:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9BA242C81B; Thu, 13 Dec 2018 17:57:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 504872B0EC for ; Thu, 13 Dec 2018 17:57:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729483AbeLMR5L (ORCPT ); Thu, 13 Dec 2018 12:57:11 -0500 Received: from mail-it1-f195.google.com ([209.85.166.195]:37035 "EHLO mail-it1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729686AbeLMR5K (ORCPT ); Thu, 13 Dec 2018 12:57:10 -0500 Received: by mail-it1-f195.google.com with SMTP id b5so5190584iti.2 for ; Thu, 13 Dec 2018 09:57:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=46YCPSjyTrlcpzDW0Ukbp5ofhNBoK9a5mtdA0d9T/5Q=; b=TR9tCm3qtZ9avZXMryoHzIgiemEAOT6AQJdUIeXytYeS5fJV0IsnEdQHlHEr2rvz/B biQt+VmOkmqY4Cridh1E0I/lVMetV7kwg9Kg4ur5AcjDNXyOG1w8am0SQZWSd2lW4HBE t3zEi8dZ/AO/JyeSjH1d0L8MPypneZ/V77J2SZeiH3l0i6lC18LCp5TZCYMOKblZeoJo 5mdVc6Xi4xEt5wtquXW1oIwOkoe8Qe67i37sNXSDap/JLmU1+sSseDnnR7BiTB41KiA2 SjogNmN4Aa/2ShjIeccNix8PyGKH8vTvW3Q5/7lshxz3PISlr2oBl2woT00dtepCEOkH +EDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=46YCPSjyTrlcpzDW0Ukbp5ofhNBoK9a5mtdA0d9T/5Q=; b=M+QmNbKLurGyCUd70EhurS9onJ7yrhRKu5xdbXztO+DyQaIdsnUSXCZ+b0X+QmWBJ3 LrpDKvy+m3OdgzmZJS5ttR7enhNlH+4bkLKFwrCAw5IO9XEoucjtoQkdUcyVtVUPRbhO Vv7r80cvWJR4SQak6LlZ7oHIW0/cxH4LA4zkYs51Ng0jMh+x5HqBX5LLHOvQIYGTPzbp VCJ0vV4v2QCCRLnKAgFIwPCCHrMHuWlz/ZKByrnBI3hLxvbCwQX0DY/mgJhY7s9XcXtW RWNB8/hRcbF1AplZOg9Yl4v/mEps6nVrByZgQiaViv4ulz669YtPWhFxIZyxTcpbEkPx 2OCQ== X-Gm-Message-State: AA+aEWY+oLSQOBTRqUnceWOc7WfxjeCIUvSIolD0MbuUcMad1VzYwsJe u/vKeRYaorTtQMJoEqBPslbrjXjS0gi+kg== X-Google-Smtp-Source: AFSGD/WbZgGgxLtLVsvhgxs7NF9v+jtomC9QdA4pyT9R7xM7KXmdcQB9P68YBppJSiMiFip6jNtc/A== X-Received: by 2002:a24:3007:: with SMTP id q7mr279761itq.57.1544723829614; Thu, 13 Dec 2018 09:57:09 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:08 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 10/26] aio: use iocb_put() instead of open coding it Date: Thu, 13 Dec 2018 10:56:29 -0700 Message-Id: <20181213175645.22181-11-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Replace the percpu_ref_put() + kmem_cache_free() with a call to iocb_put() instead. Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index ed6c3914477a..cf93b92bfb1e 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1884,10 +1884,9 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, goto out_put_req; return 0; out_put_req: - percpu_ref_put(&ctx->reqs); if (req->ki_eventfd) eventfd_ctx_put(req->ki_eventfd); - kmem_cache_free(kiocb_cachep, req); + iocb_put(req); out_put_reqs_available: put_reqs_available(ctx, 1); return ret; From patchwork Thu Dec 13 17:56:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729339 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3819B14E2 for ; Thu, 13 Dec 2018 17:57:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2AFA22B0EC for ; Thu, 13 Dec 2018 17:57:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1ECF62C81C; Thu, 13 Dec 2018 17:57:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 81B262B0EC for ; Thu, 13 Dec 2018 17:57:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729693AbeLMR5N (ORCPT ); Thu, 13 Dec 2018 12:57:13 -0500 Received: from mail-io1-f67.google.com ([209.85.166.67]:46071 "EHLO mail-io1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729629AbeLMR5M (ORCPT ); Thu, 13 Dec 2018 12:57:12 -0500 Received: by mail-io1-f67.google.com with SMTP id n3so2279018iog.12 for ; Thu, 13 Dec 2018 09:57:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=zjXjGJpzxG4sQV1Ooff1P7NVDuSBRD7W2guw+LhTNUo=; b=R/Ro+4r3bt6JFAQyegpDHQIuTP36M1ydtyscVkO0lW0E/egSf06eSjfhV/jrVwXkkW X8SEbs5qtpfpyuTmM3sHtdPhjpjPioBS07+2JNN/ydVAY3Smh5PB8AyFBXrCw/BtJ6wH RZuwNmy8NLE2QKDZBpR5mEU8Dmu41A6nZHbvu79EMcY7g87egijs2u22RUhvjRni58ip GZsdE2DgjmgcHrBo800Dc2ALbR3OxFR2jdzYy/a+QzsJ5/Vll/eC8GvbaD39G2iJfDUx dYXmfC1iCPFhSC5Wk+m3uOV2DsbgrYet+RGMyfbC6BkcfC5l6VnPq2+kcLGrQj0445EH 8fig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=zjXjGJpzxG4sQV1Ooff1P7NVDuSBRD7W2guw+LhTNUo=; b=TcBAhqRSve6VtbN5Drd6BkWVnZkhaSM2eFpBfRWJ1Q4i50DXyOt7aXryhcVxrdzfKQ G0FlbtS8iUmLFY1hEiJpk/v+Oy+kxgSS0dB2K/2ROC85jEGobnOZqUW6REF4gzVFbnJg nAYIno37mEhZqryuVhX3LsqiiDUlvzsFoYjESt9Xg1r0a+e54lT7DQkRn+qbd98R4sPH xF+2ibBCHvgfDqRbGcDCUyoW/aplRFxLAxDvGvZb7UNmQg/L/YLSHCrypNdAHxSD3vsG 8vZO0XaWXyxU7aLhEWAqu8HGlvkGPUWkUMA1czOS/qkAbP+0426IRzHlyynPjpaqIixf Hn7g== X-Gm-Message-State: AA+aEWbODMpPESaEjwEG8JhoD2mdiREr7tcZf/4gHH9fsWdR1O790lzJ Tik5iCsErZMUoVEknhCdM22YtJJv5P5ZhQ== X-Google-Smtp-Source: AFSGD/X2w4OOVggWQeJqvAM6Ty0YlulMWz5+cNqA2oHWX3LAbl7r/4l5smybfhReWvihZBDxobl9/g== X-Received: by 2002:a6b:ce06:: with SMTP id p6mr19345101iob.189.1544723831229; Thu, 13 Dec 2018 09:57:11 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:10 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 11/26] aio: split out iocb copy from io_submit_one() Date: Thu, 13 Dec 2018 10:56:30 -0700 Message-Id: <20181213175645.22181-12-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In preparation of handing in iocbs in a different fashion as well. Also make it clear that the iocb being passed in isn't modified, by marking it const throughout. Signed-off-by: Jens Axboe Reviewed-by: Christoph Hellwig --- fs/aio.c | 68 +++++++++++++++++++++++++++++++------------------------- 1 file changed, 38 insertions(+), 30 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index cf93b92bfb1e..06c8bcc72496 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1420,7 +1420,7 @@ static void aio_complete_rw(struct kiocb *kiocb, long res, long res2) aio_complete(iocb, res, res2); } -static int aio_prep_rw(struct kiocb *req, struct iocb *iocb) +static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb) { int ret; @@ -1461,7 +1461,7 @@ static int aio_prep_rw(struct kiocb *req, struct iocb *iocb) return ret; } -static int aio_setup_rw(int rw, struct iocb *iocb, struct iovec **iovec, +static int aio_setup_rw(int rw, const struct iocb *iocb, struct iovec **iovec, bool vectored, bool compat, struct iov_iter *iter) { void __user *buf = (void __user *)(uintptr_t)iocb->aio_buf; @@ -1500,8 +1500,8 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t ret) } } -static ssize_t aio_read(struct kiocb *req, struct iocb *iocb, bool vectored, - bool compat) +static ssize_t aio_read(struct kiocb *req, const struct iocb *iocb, + bool vectored, bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct iov_iter iter; @@ -1533,8 +1533,8 @@ static ssize_t aio_read(struct kiocb *req, struct iocb *iocb, bool vectored, return ret; } -static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored, - bool compat) +static ssize_t aio_write(struct kiocb *req, const struct iocb *iocb, + bool vectored, bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct iov_iter iter; @@ -1589,7 +1589,8 @@ static void aio_fsync_work(struct work_struct *work) aio_complete(container_of(req, struct aio_kiocb, fsync), ret, 0); } -static int aio_fsync(struct fsync_iocb *req, struct iocb *iocb, bool datasync) +static int aio_fsync(struct fsync_iocb *req, const struct iocb *iocb, + bool datasync) { if (unlikely(iocb->aio_buf || iocb->aio_offset || iocb->aio_nbytes || iocb->aio_rw_flags)) @@ -1717,7 +1718,7 @@ aio_poll_queue_proc(struct file *file, struct wait_queue_head *head, add_wait_queue(head, &pt->iocb->poll.wait); } -static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb) +static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) { struct kioctx *ctx = aiocb->ki_ctx; struct poll_iocb *req = &aiocb->poll; @@ -1789,27 +1790,23 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb) return 0; } -static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, - bool compat) +static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, + struct iocb __user *user_iocb, bool compat) { struct aio_kiocb *req; - struct iocb iocb; ssize_t ret; - if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb)))) - return -EFAULT; - /* enforce forwards compatibility on users */ - if (unlikely(iocb.aio_reserved2)) { + if (unlikely(iocb->aio_reserved2)) { pr_debug("EINVAL: reserve field set\n"); return -EINVAL; } /* prevent overflows */ if (unlikely( - (iocb.aio_buf != (unsigned long)iocb.aio_buf) || - (iocb.aio_nbytes != (size_t)iocb.aio_nbytes) || - ((ssize_t)iocb.aio_nbytes < 0) + (iocb->aio_buf != (unsigned long)iocb->aio_buf) || + (iocb->aio_nbytes != (size_t)iocb->aio_nbytes) || + ((ssize_t)iocb->aio_nbytes < 0) )) { pr_debug("EINVAL: overflow check\n"); return -EINVAL; @@ -1823,14 +1820,14 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, if (unlikely(!req)) goto out_put_reqs_available; - if (iocb.aio_flags & IOCB_FLAG_RESFD) { + if (iocb->aio_flags & IOCB_FLAG_RESFD) { /* * If the IOCB_FLAG_RESFD flag of aio_flags is set, get an * instance of the file* now. The file descriptor must be * an eventfd() fd, and will be signaled for each completed * event using the eventfd_signal() function. */ - req->ki_eventfd = eventfd_ctx_fdget((int) iocb.aio_resfd); + req->ki_eventfd = eventfd_ctx_fdget((int) iocb->aio_resfd); if (IS_ERR(req->ki_eventfd)) { ret = PTR_ERR(req->ki_eventfd); req->ki_eventfd = NULL; @@ -1845,32 +1842,32 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, } req->ki_user_iocb = user_iocb; - req->ki_user_data = iocb.aio_data; + req->ki_user_data = iocb->aio_data; - switch (iocb.aio_lio_opcode) { + switch (iocb->aio_lio_opcode) { case IOCB_CMD_PREAD: - ret = aio_read(&req->rw, &iocb, false, compat); + ret = aio_read(&req->rw, iocb, false, compat); break; case IOCB_CMD_PWRITE: - ret = aio_write(&req->rw, &iocb, false, compat); + ret = aio_write(&req->rw, iocb, false, compat); break; case IOCB_CMD_PREADV: - ret = aio_read(&req->rw, &iocb, true, compat); + ret = aio_read(&req->rw, iocb, true, compat); break; case IOCB_CMD_PWRITEV: - ret = aio_write(&req->rw, &iocb, true, compat); + ret = aio_write(&req->rw, iocb, true, compat); break; case IOCB_CMD_FSYNC: - ret = aio_fsync(&req->fsync, &iocb, false); + ret = aio_fsync(&req->fsync, iocb, false); break; case IOCB_CMD_FDSYNC: - ret = aio_fsync(&req->fsync, &iocb, true); + ret = aio_fsync(&req->fsync, iocb, true); break; case IOCB_CMD_POLL: - ret = aio_poll(req, &iocb); + ret = aio_poll(req, iocb); break; default: - pr_debug("invalid aio operation %d\n", iocb.aio_lio_opcode); + pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode); ret = -EINVAL; break; } @@ -1892,6 +1889,17 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, return ret; } +static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, + bool compat) +{ + struct iocb iocb; + + if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb)))) + return -EFAULT; + + return __io_submit_one(ctx, &iocb, user_iocb, compat); +} + /* sys_io_submit: * Queue the nr iocbs pointed to by iocbpp for processing. Returns * the number of iocbs queued. May return -EINVAL if the aio_context From patchwork Thu Dec 13 17:56:31 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729343 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C42E11759 for ; Thu, 13 Dec 2018 17:57:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B62142BBD8 for ; Thu, 13 Dec 2018 17:57:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AAA742C820; Thu, 13 Dec 2018 17:57:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6930E2BBD8 for ; Thu, 13 Dec 2018 17:57:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728033AbeLMR5P (ORCPT ); Thu, 13 Dec 2018 12:57:15 -0500 Received: from mail-io1-f67.google.com ([209.85.166.67]:43489 "EHLO mail-io1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729686AbeLMR5O (ORCPT ); Thu, 13 Dec 2018 12:57:14 -0500 Received: by mail-io1-f67.google.com with SMTP id l3so2282532ioc.10 for ; Thu, 13 Dec 2018 09:57:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=L8yxXuFlKRVZGhU6vByhKAQcKxwaMq31dRwuK5kkt+A=; b=EPf1AMV38c6QbiYrRYp8twmPBH67m35TDItx+s1+vl9Nv2Ex4wOUvyjzzQOhREoJ7R WqsC3FQSs73r6w6BXHYeNU5e9NIiOBlb868OZb6NV9KuUkRsFOo7uU6QEqKZwuWDQm9d H3KaPWvizbSj3f2OoO+Tfj9s/ikHlbZy2GUaL+XlaC44Dzp2QMTJWM0+fT2tO6zL/l14 SFLVtsbSEHoK0RsnyfOXyIbyQjRbvstVTtde9nzrSOrG40RaGCVxpJS51kjMZKfaphbS KZNTg9zJcb8PxpwjB7z6IvsX8JA4U6gKWwE89gs+JRq2LnpyJiQxBVcKYENYjERkdJLF DzDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=L8yxXuFlKRVZGhU6vByhKAQcKxwaMq31dRwuK5kkt+A=; b=ZYCKV4vIuUKdoX95WlpywNBezoEXSq3qPNefXoVcutp4pqomaMqtcyh8iH68tp9shx TdbZDyh5lfDQjESx+2UCy1Ujk4bQ8D8Ac+ApMdHsgjIATxM35Rw5pIdqcNb3P0OXUUZF Pw9CHqnRXrScal8RSIVQmQySMhp+gdfRzbA9sQlaXoWNeocM2K9pJ5LjsEGUdA/uGMqD I5pQLaXHDBOkiyUVXOHkijYIveZlr973P3r3/D7pwZVhkmib35oDs3X3CVwhHRo/JsfD mLd6gJaW5LWnsGRQLLUhzuLNot3iGv+skoCiSn03vJQHGAE5yLGSKYk8tjtpDdXZoCdu 3/Hg== X-Gm-Message-State: AA+aEWafUmKCTsp5mzJQyJ+uJt1nviJ070Ut4y+FqBqaSjm2X/SrM/tk PPKlgZIv1t2UawqdAECPbzjeyPDkW8fVuA== X-Google-Smtp-Source: AFSGD/XWw3oW5UdfuS3ELzet6YOTbKrkVq1suqQwvq/HO/W+GPuJEuxBHdWjFSyKyCO/uWKBtqx+lQ== X-Received: by 2002:a6b:ea05:: with SMTP id m5mr9102536ioc.97.1544723833113; Thu, 13 Dec 2018 09:57:13 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:11 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 12/26] aio: abstract out io_event filler helper Date: Thu, 13 Dec 2018 10:56:31 -0700 Message-Id: <20181213175645.22181-13-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Jens Axboe Reviewed-by: Christoph Hellwig --- fs/aio.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 06c8bcc72496..173f1f79dc8f 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1063,6 +1063,15 @@ static inline void iocb_put(struct aio_kiocb *iocb) } } +static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb, + long res, long res2) +{ + ev->obj = (u64)(unsigned long)iocb->ki_user_iocb; + ev->data = iocb->ki_user_data; + ev->res = res; + ev->res2 = res2; +} + /* aio_complete * Called when the io request on the given iocb is complete. */ @@ -1090,10 +1099,7 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2) ev_page = kmap_atomic(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]); event = ev_page + pos % AIO_EVENTS_PER_PAGE; - event->obj = (u64)(unsigned long)iocb->ki_user_iocb; - event->data = iocb->ki_user_data; - event->res = res; - event->res2 = res2; + aio_fill_event(event, iocb, res, res2); kunmap_atomic(ev_page); flush_dcache_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]); From patchwork Thu Dec 13 17:56:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729347 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A48D413BF for ; Thu, 13 Dec 2018 17:57:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 96A7C2B0EC for ; Thu, 13 Dec 2018 17:57:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8B3252C81C; Thu, 13 Dec 2018 17:57:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DC61A2B0EC for ; Thu, 13 Dec 2018 17:57:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729629AbeLMR5R (ORCPT ); Thu, 13 Dec 2018 12:57:17 -0500 Received: from mail-it1-f195.google.com ([209.85.166.195]:36031 "EHLO mail-it1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728591AbeLMR5Q (ORCPT ); Thu, 13 Dec 2018 12:57:16 -0500 Received: by mail-it1-f195.google.com with SMTP id c9so5211907itj.1 for ; Thu, 13 Dec 2018 09:57:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=DVeHT+CTEar5aJxB4rOokI+f3iI3nErD0YUemufLYjE=; b=h2fYZcKv7t8SuboFEgwL0PXyNeUMWyEU2F/ffbNJZuW5j7mciUsmkMhNLXObDDsAS8 +WwLNzmfeTNHU7EeixPg+rRlDZV92D1tO/QaF8YT81Dkfrqy1rMgp3zezfthYNu5ljTU GR1vp7bXEySZYEv4OHMZTPnjlAC8myJXkH0a5I9v4q+usdkabP7M1etGe8HqAbQZM2ev K9ZDisenanwzpoWezS/9XEE7S7GI/knfcsrThOKoxBDE2vdft3TYssRe3woOzCZtg2o+ MktN1M/ZeuZdPYff19nmz2Xz68M+LNUcx6B7sc7g+d+GRCUXovJGehRKuNdkZOqj90Ob /W4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=DVeHT+CTEar5aJxB4rOokI+f3iI3nErD0YUemufLYjE=; b=mrM062brizNpa4lo7YIN9xXO428Q1rSH5VBO8ZI9b0El2nNYrBiB64O86MrwLwt0z0 wXvWUhA/g2MmpTq/F6XHBUefM+8HF30nXObW+m27FnhiSXJaaF+caT7BEwpL2F3HneDE m7xydfEfsWFLSvEHX41Wre2a0ZO4AP7VAq8O+1u9H9fcOTaYS/MJfgQQmfPg+Z7BDMSg Z0QF2xptjRDEv7yhsMYaFPCM4DfR6Fe0vRnXdRiUpK6MY3fi/aYiQtCE0SGGkByrakbO ITcUOyp7XMChD86fpytHZdpUjmSy9T8gvCe0Fb33TdQS93/pxBOq9kiEVn/Ts8zYdDMg 94rg== X-Gm-Message-State: AA+aEWbW5h1LBrsafqFG4g+aiWrGMrplMF+SEvnKTMIMs0vKdKFPvSRj DGT33P41pHJSZyHyyqrUMXjl0P7JGPSLiA== X-Google-Smtp-Source: AFSGD/XyHEPFekXZ/VL8IxeSwXIJX2o2LwvWnEnMz7LDJQJ7aOiLZ4OBIGwIgQBkw2+36a+QrgovEg== X-Received: by 2002:a24:1152:: with SMTP id 79mr275120itf.167.1544723835161; Thu, 13 Dec 2018 09:57:15 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:14 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 13/26] aio: add io_setup2() system call Date: Thu, 13 Dec 2018 10:56:32 -0700 Message-Id: <20181213175645.22181-14-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is just like io_setup(), except add a flags argument to let the caller control/define some of the io_context behavior. Outside of the flags, we add an iocb array and two user pointers for future use. Signed-off-by: Jens Axboe --- Documentation/sysctl/fs.txt | 8 +-- arch/x86/entry/syscalls/syscall_64.tbl | 1 + fs/aio.c | 76 +++++++++++++++++--------- include/linux/syscalls.h | 2 + include/uapi/asm-generic/unistd.h | 4 +- kernel/sys_ni.c | 1 + 6 files changed, 62 insertions(+), 30 deletions(-) diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt index 819caf8ca05f..5e484eb7a25f 100644 --- a/Documentation/sysctl/fs.txt +++ b/Documentation/sysctl/fs.txt @@ -47,10 +47,10 @@ Currently, these files are in /proc/sys/fs: aio-nr & aio-max-nr: aio-nr is the running total of the number of events specified on the -io_setup system call for all currently active aio contexts. If aio-nr -reaches aio-max-nr then io_setup will fail with EAGAIN. Note that -raising aio-max-nr does not result in the pre-allocation or re-sizing -of any kernel data structures. +io_setup/io_setup2 system call for all currently active aio contexts. +If aio-nr reaches aio-max-nr then io_setup will fail with EAGAIN. +Note that raising aio-max-nr does not result in the pre-allocation or +re-sizing of any kernel data structures. ============================================================== diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index f0b1709a5ffb..67c357225fb0 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -343,6 +343,7 @@ 332 common statx __x64_sys_statx 333 common io_pgetevents __x64_sys_io_pgetevents 334 common rseq __x64_sys_rseq +335 common io_setup2 __x64_sys_io_setup2 # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/fs/aio.c b/fs/aio.c index 173f1f79dc8f..20f907ba2890 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -100,6 +100,8 @@ struct kioctx { unsigned long user_id; + unsigned int flags; + struct __percpu kioctx_cpu *cpu; /* @@ -686,10 +688,8 @@ static void aio_nr_sub(unsigned nr) spin_unlock(&aio_nr_lock); } -/* ioctx_alloc - * Allocates and initializes an ioctx. Returns an ERR_PTR if it failed. - */ -static struct kioctx *ioctx_alloc(unsigned nr_events) +static struct kioctx *io_setup_flags(unsigned long ctxid, + unsigned int nr_events, unsigned int flags) { struct mm_struct *mm = current->mm; struct kioctx *ctx; @@ -701,6 +701,12 @@ static struct kioctx *ioctx_alloc(unsigned nr_events) */ unsigned int max_reqs = nr_events; + if (unlikely(ctxid || nr_events == 0)) { + pr_debug("EINVAL: ctx %lu nr_events %u\n", + ctxid, nr_events); + return ERR_PTR(-EINVAL); + } + /* * We keep track of the number of available ringbuffer slots, to prevent * overflow (reqs_available), and we also use percpu counters for this. @@ -726,6 +732,7 @@ static struct kioctx *ioctx_alloc(unsigned nr_events) if (!ctx) return ERR_PTR(-ENOMEM); + ctx->flags = flags; ctx->max_reqs = max_reqs; spin_lock_init(&ctx->ctx_lock); @@ -1281,6 +1288,41 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr, return ret; } +/* sys_io_setup2: + * Like sys_io_setup(), except that it takes a set of flags + * (IOCTX_FLAG_*), and some pointers to user structures: + * + * *user1 - reserved for future use + * + * *user2 - reserved for future use. + */ +SYSCALL_DEFINE5(io_setup2, u32, nr_events, u32, flags, void __user *, user1, + void __user *, user2, aio_context_t __user *, ctxp) +{ + struct kioctx *ioctx; + unsigned long ctx; + long ret; + + if (flags || user1 || user2) + return -EINVAL; + + ret = get_user(ctx, ctxp); + if (unlikely(ret)) + goto out; + + ioctx = io_setup_flags(ctx, nr_events, flags); + ret = PTR_ERR(ioctx); + if (IS_ERR(ioctx)) + goto out; + + ret = put_user(ioctx->user_id, ctxp); + if (ret) + kill_ioctx(current->mm, ioctx, NULL); + percpu_ref_put(&ioctx->users); +out: + return ret; +} + /* sys_io_setup: * Create an aio_context capable of receiving at least nr_events. * ctxp must not point to an aio_context that already exists, and @@ -1296,7 +1338,7 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr, */ SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctxp) { - struct kioctx *ioctx = NULL; + struct kioctx *ioctx; unsigned long ctx; long ret; @@ -1304,14 +1346,7 @@ SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctxp) if (unlikely(ret)) goto out; - ret = -EINVAL; - if (unlikely(ctx || nr_events == 0)) { - pr_debug("EINVAL: ctx %lu nr_events %u\n", - ctx, nr_events); - goto out; - } - - ioctx = ioctx_alloc(nr_events); + ioctx = io_setup_flags(ctx, nr_events, 0); ret = PTR_ERR(ioctx); if (!IS_ERR(ioctx)) { ret = put_user(ioctx->user_id, ctxp); @@ -1327,7 +1362,7 @@ SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctxp) #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE2(io_setup, unsigned, nr_events, u32 __user *, ctx32p) { - struct kioctx *ioctx = NULL; + struct kioctx *ioctx; unsigned long ctx; long ret; @@ -1335,23 +1370,14 @@ COMPAT_SYSCALL_DEFINE2(io_setup, unsigned, nr_events, u32 __user *, ctx32p) if (unlikely(ret)) goto out; - ret = -EINVAL; - if (unlikely(ctx || nr_events == 0)) { - pr_debug("EINVAL: ctx %lu nr_events %u\n", - ctx, nr_events); - goto out; - } - - ioctx = ioctx_alloc(nr_events); + ioctx = io_setup_flags(ctx, nr_events, 0); ret = PTR_ERR(ioctx); if (!IS_ERR(ioctx)) { - /* truncating is ok because it's a user address */ - ret = put_user((u32)ioctx->user_id, ctx32p); + ret = put_user(ioctx->user_id, ctx32p); if (ret) kill_ioctx(current->mm, ioctx, NULL); percpu_ref_put(&ioctx->users); } - out: return ret; } diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 2ac3d13a915b..67b7f03aa9fc 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -287,6 +287,8 @@ static inline void addr_limit_user_check(void) */ #ifndef CONFIG_ARCH_HAS_SYSCALL_WRAPPER asmlinkage long sys_io_setup(unsigned nr_reqs, aio_context_t __user *ctx); +asmlinkage long sys_io_setup2(unsigned, unsigned, void __user *, void __user *, + aio_context_t __user *); asmlinkage long sys_io_destroy(aio_context_t ctx); asmlinkage long sys_io_submit(aio_context_t, long, struct iocb __user * __user *); diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index c7f3321fbe43..1bbaa4c59f20 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -738,9 +738,11 @@ __SYSCALL(__NR_statx, sys_statx) __SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents) #define __NR_rseq 293 __SYSCALL(__NR_rseq, sys_rseq) +#define __NR_io_setup2 294 +__SYSCALL(__NR_io_setup2, sys_io_setup2) #undef __NR_syscalls -#define __NR_syscalls 294 +#define __NR_syscalls 295 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index df556175be50..17c8b4393669 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -37,6 +37,7 @@ asmlinkage long sys_ni_syscall(void) */ COND_SYSCALL(io_setup); +COND_SYSCALL(io_setup2); COND_SYSCALL_COMPAT(io_setup); COND_SYSCALL(io_destroy); COND_SYSCALL(io_submit); From patchwork Thu Dec 13 17:56:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729353 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4963514E2 for ; Thu, 13 Dec 2018 17:57:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3D6B42C81B for ; Thu, 13 Dec 2018 17:57:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 31E962C81C; Thu, 13 Dec 2018 17:57:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 11FD02BBD8 for ; Thu, 13 Dec 2018 17:57:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729703AbeLMR5U (ORCPT ); Thu, 13 Dec 2018 12:57:20 -0500 Received: from mail-it1-f196.google.com ([209.85.166.196]:55326 "EHLO mail-it1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728591AbeLMR5T (ORCPT ); Thu, 13 Dec 2018 12:57:19 -0500 Received: by mail-it1-f196.google.com with SMTP id o19so4960130itg.5 for ; Thu, 13 Dec 2018 09:57:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=dGrgYsDaf/q3odvZfsCqgBO+53qZRuIaHOdTbnQeD6A=; b=H2W/rtjJHQUgA0K3xyPtnYrx905RmYWemOj7OjZiZ5zTM9mC1RxtPz2+JVGsK/Hsdb OdPAi32aVMv6J/xRLFBgFHZVDsDl8oTeWCFFzP24fZuuUZfOqqfePIKwBS3Q7K0Zm3zQ 72iCAvo/9XtHiEHIXifFtvPEDAy+2FvOuSM98YgsK5oW5xHKh8E0zvyb9NKE+6/9yZ5C MMm58cqw1ekurK3nbfCsYg+zxaiS4n5FUKIY98mEaqAevWxFhYCRG5xCkrpBQvb33d2b 4Y2o78+t7mDT2DDsC3vt+zAIfepUoNuCHB+6m5Qrm6C+5KW2ofTDEkmf4vPku+Bol4ZK eOrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=dGrgYsDaf/q3odvZfsCqgBO+53qZRuIaHOdTbnQeD6A=; b=UA2IofbongS4mESc7gvvxKTwa3+3DKrvQm7K2tVcNAKDrEyCkHywKM7r5GStBU9bcr ZjfeMp9HaDUIdvDgjT/FEic3nOBIFxvA2qiBpCZJ26RDWZ6Xyl88j/5QvQtMj6W2eNnw LwK0dGeHonAlJkbCfZJmwmlz6+EYlSxMs5Xd3/kioCDRX8aiFp4AQtfhp+TP6xcx2QL4 5BuTp2GBcctMF6VZHCAeyJYvIEUhV3ELcT/64uLaA5HWKc3BUVcPFNcKU6lqJbDVaRN2 LeLmJvA7npuDfwWxQXRG0su/VSuAliP8LN7N4wVnyJkCgmAqrbD0/Pc/HTS+HpAXrEck eL6w== X-Gm-Message-State: AA+aEWaJKb8gQAIQ8kW3uoan0t0tA0Z8aR4XyjVLA/h2PkvZLOcfEqu8 /oISE/7nxoV7Aos9QtA5BIRpuuPWOEZAzw== X-Google-Smtp-Source: AFSGD/UJEc+JIDRLU4BhEfhUz6JVEXSoTJotBr8QJZU7QAh0ZJyYaqrptnjkhx1cKNjEZvLqHVjmQg== X-Received: by 2002:a24:3a0d:: with SMTP id m13mr292328itm.74.1544723837395; Thu, 13 Dec 2018 09:57:17 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.15 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:16 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 14/26] aio: support for IO polling Date: Thu, 13 Dec 2018 10:56:33 -0700 Message-Id: <20181213175645.22181-15-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add polled variants of PREAD/PREADV and PWRITE/PWRITEV. These act like their non-polled counterparts, except we expect to poll for completion of them. The polling happens at io_getevent() time, and works just like non-polled IO. To setup an io_context for polled IO, the application must call io_setup2() with IOCTX_FLAG_IOPOLL as one of the flags. It is illegal to mix and match polled and non-polled IO on an io_context. Polled IO doesn't support the user mapped completion ring. Events must be reaped through the io_getevents() system call. For non-irq driven poll devices, there's no way to support completion reaping from userspace by just looking at the ring. The application itself is the one that pulls completion entries. Signed-off-by: Jens Axboe Reviewed-by: Benny Halevy --- fs/aio.c | 419 +++++++++++++++++++++++++++++++---- include/uapi/linux/aio_abi.h | 4 + 2 files changed, 384 insertions(+), 39 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 20f907ba2890..0f48c288e5dd 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -146,6 +146,18 @@ struct kioctx { atomic_t reqs_available; } ____cacheline_aligned_in_smp; + /* iopoll submission state */ + struct { + spinlock_t poll_lock; + struct list_head poll_submitted; + } ____cacheline_aligned_in_smp; + + /* iopoll completion state */ + struct { + struct list_head poll_completing; + struct mutex getevents_lock; + } ____cacheline_aligned_in_smp; + struct { spinlock_t ctx_lock; struct list_head active_reqs; /* used for cancellation */ @@ -198,14 +210,27 @@ struct aio_kiocb { __u64 ki_user_data; /* user's data for completion */ struct list_head ki_list; /* the aio core uses this - * for cancellation */ + * for cancellation, or for + * polled IO */ + + unsigned long ki_flags; +#define KIOCB_F_POLL_COMPLETED 0 /* polled IO has completed */ +#define KIOCB_F_POLL_EAGAIN 1 /* polled submission got EAGAIN */ + refcount_t ki_refcnt; - /* - * If the aio_resfd field of the userspace iocb is not zero, - * this is the underlying eventfd context to deliver events to. - */ - struct eventfd_ctx *ki_eventfd; + union { + /* + * If the aio_resfd field of the userspace iocb is not zero, + * this is the underlying eventfd context to deliver events to. + */ + struct eventfd_ctx *ki_eventfd; + + /* + * For polled IO, stash completion info here + */ + struct io_event ki_ev; + }; }; /*------ sysctl variables----*/ @@ -222,6 +247,8 @@ static struct vfsmount *aio_mnt; static const struct file_operations aio_ring_fops; static const struct address_space_operations aio_ctx_aops; +static void aio_iopoll_reap_events(struct kioctx *); + static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages) { struct file *file; @@ -459,11 +486,15 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) int i; struct file *file; - /* Compensate for the ring buffer's head/tail overlap entry */ - nr_events += 2; /* 1 is required, 2 for good luck */ - + /* + * Compensate for the ring buffer's head/tail overlap entry. + * IO polling doesn't require any io event entries + */ size = sizeof(struct aio_ring); - size += sizeof(struct io_event) * nr_events; + if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) { + nr_events += 2; /* 1 is required, 2 for good luck */ + size += sizeof(struct io_event) * nr_events; + } nr_pages = PFN_UP(size); if (nr_pages < 0) @@ -547,6 +578,14 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) return 0; } +/* + * Don't support cancel on anything that isn't old aio + */ +static bool aio_ctx_supports_cancel(struct kioctx *ctx) +{ + return (ctx->flags & IOCTX_FLAG_IOPOLL) == 0; +} + #define AIO_EVENTS_PER_PAGE (PAGE_SIZE / sizeof(struct io_event)) #define AIO_EVENTS_FIRST_PAGE ((PAGE_SIZE - sizeof(struct aio_ring)) / sizeof(struct io_event)) #define AIO_EVENTS_OFFSET (AIO_EVENTS_PER_PAGE - AIO_EVENTS_FIRST_PAGE) @@ -557,6 +596,8 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel) struct kioctx *ctx = req->ki_ctx; unsigned long flags; + if (WARN_ON_ONCE(!aio_ctx_supports_cancel(ctx))) + return; if (WARN_ON_ONCE(!list_empty(&req->ki_list))) return; @@ -745,6 +786,11 @@ static struct kioctx *io_setup_flags(unsigned long ctxid, INIT_LIST_HEAD(&ctx->active_reqs); + spin_lock_init(&ctx->poll_lock); + INIT_LIST_HEAD(&ctx->poll_submitted); + INIT_LIST_HEAD(&ctx->poll_completing); + mutex_init(&ctx->getevents_lock); + if (percpu_ref_init(&ctx->users, free_ioctx_users, 0, GFP_KERNEL)) goto err; @@ -816,11 +862,15 @@ static int kill_ioctx(struct mm_struct *mm, struct kioctx *ctx, { struct kioctx_table *table; + mutex_lock(&ctx->getevents_lock); spin_lock(&mm->ioctx_lock); if (atomic_xchg(&ctx->dead, 1)) { spin_unlock(&mm->ioctx_lock); + mutex_unlock(&ctx->getevents_lock); return -EINVAL; } + aio_iopoll_reap_events(ctx); + mutex_unlock(&ctx->getevents_lock); table = rcu_dereference_raw(mm->ioctx_table); WARN_ON(ctx != rcu_access_pointer(table->table[ctx->id])); @@ -1029,6 +1079,7 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx) percpu_ref_get(&ctx->reqs); req->ki_ctx = ctx; INIT_LIST_HEAD(&req->ki_list); + req->ki_flags = 0; refcount_set(&req->ki_refcnt, 0); req->ki_eventfd = NULL; return req; @@ -1070,6 +1121,15 @@ static inline void iocb_put(struct aio_kiocb *iocb) } } +static void iocb_put_many(struct kioctx *ctx, void **iocbs, int *nr) +{ + if (*nr) { + percpu_ref_put_many(&ctx->reqs, *nr); + kmem_cache_free_bulk(kiocb_cachep, *nr, iocbs); + *nr = 0; + } +} + static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb, long res, long res2) { @@ -1259,6 +1319,185 @@ static bool aio_read_events(struct kioctx *ctx, long min_nr, long nr, return ret < 0 || *i >= min_nr; } +#define AIO_IOPOLL_BATCH 8 + +/* + * Process completed iocb iopoll entries, copying the result to userspace. + */ +static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, + unsigned int *nr_events, long max) +{ + void *iocbs[AIO_IOPOLL_BATCH]; + struct aio_kiocb *iocb, *n; + int to_free = 0, ret = 0; + + /* Shouldn't happen... */ + if (*nr_events >= max) + return 0; + + list_for_each_entry_safe(iocb, n, &ctx->poll_completing, ki_list) { + if (*nr_events == max) + break; + if (!test_bit(KIOCB_F_POLL_COMPLETED, &iocb->ki_flags)) + continue; + if (to_free == AIO_IOPOLL_BATCH) + iocb_put_many(ctx, iocbs, &to_free); + + list_del(&iocb->ki_list); + iocbs[to_free++] = iocb; + + fput(iocb->rw.ki_filp); + + if (evs && copy_to_user(evs + *nr_events, &iocb->ki_ev, + sizeof(iocb->ki_ev))) { + ret = -EFAULT; + break; + } + (*nr_events)++; + } + + if (to_free) + iocb_put_many(ctx, iocbs, &to_free); + + return ret; +} + +/* + * Poll for a mininum of 'min' events, and a maximum of 'max'. Note that if + * min == 0 we consider that a non-spinning poll check - we'll still enter + * the driver poll loop, but only as a non-spinning completion check. + */ +static int aio_iopoll_getevents(struct kioctx *ctx, + struct io_event __user *event, + unsigned int *nr_events, long min, long max) +{ + struct aio_kiocb *iocb; + int to_poll, polled, ret; + + /* + * Check if we already have done events that satisfy what we need + */ + if (!list_empty(&ctx->poll_completing)) { + ret = aio_iopoll_reap(ctx, event, nr_events, max); + if (ret < 0) + return ret; + if ((min && *nr_events >= min) || *nr_events >= max) + return 0; + } + + /* + * Take in a new working set from the submitted list, if possible. + */ + if (!list_empty_careful(&ctx->poll_submitted)) { + spin_lock(&ctx->poll_lock); + list_splice_init(&ctx->poll_submitted, &ctx->poll_completing); + spin_unlock(&ctx->poll_lock); + } + + if (list_empty(&ctx->poll_completing)) + return 0; + + /* + * Check again now that we have a new batch. + */ + ret = aio_iopoll_reap(ctx, event, nr_events, max); + if (ret < 0) + return ret; + if ((min && *nr_events >= min) || *nr_events >= max) + return 0; + + /* + * Find up to 'max' worth of events to poll for, including the + * events we already successfully polled + */ + polled = to_poll = 0; + list_for_each_entry(iocb, &ctx->poll_completing, ki_list) { + /* + * Poll for needed events with spin == true, anything after + * that we just check if we have more, up to max. + */ + bool spin = !polled || *nr_events < min; + struct kiocb *kiocb = &iocb->rw; + + if (test_bit(KIOCB_F_POLL_COMPLETED, &iocb->ki_flags)) + break; + if (++to_poll + *nr_events > max) + break; + + ret = kiocb->ki_filp->f_op->iopoll(kiocb, spin); + if (ret < 0) + return ret; + + polled += ret; + if (polled + *nr_events >= max) + break; + } + + ret = aio_iopoll_reap(ctx, event, nr_events, max); + if (ret < 0) + return ret; + if (*nr_events >= min) + return 0; + return to_poll; +} + +/* + * We can't just wait for polled events to come to us, we have to actively + * find and complete them. + */ +static void aio_iopoll_reap_events(struct kioctx *ctx) +{ + if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) + return; + + while (!list_empty_careful(&ctx->poll_submitted) || + !list_empty(&ctx->poll_completing)) { + unsigned int nr_events = 0; + + aio_iopoll_getevents(ctx, NULL, &nr_events, 1, UINT_MAX); + } +} + +static int __aio_iopoll_check(struct kioctx *ctx, struct io_event __user *event, + unsigned int *nr_events, long min_nr, long max_nr) +{ + int ret = 0; + + while (!*nr_events || !need_resched()) { + int tmin = 0; + + if (*nr_events < min_nr) + tmin = min_nr - *nr_events; + + ret = aio_iopoll_getevents(ctx, event, nr_events, tmin, max_nr); + if (ret <= 0) + break; + ret = 0; + } + + return ret; +} + +static int aio_iopoll_check(struct kioctx *ctx, long min_nr, long nr, + struct io_event __user *event) +{ + unsigned int nr_events = 0; + int ret; + + /* Only allow one thread polling at a time */ + if (!mutex_trylock(&ctx->getevents_lock)) + return -EBUSY; + if (unlikely(atomic_read(&ctx->dead))) { + ret = -EINVAL; + goto err; + } + + ret = __aio_iopoll_check(ctx, event, &nr_events, min_nr, nr); +err: + mutex_unlock(&ctx->getevents_lock); + return nr_events ? nr_events : ret; +} + static long read_events(struct kioctx *ctx, long min_nr, long nr, struct io_event __user *event, ktime_t until) @@ -1303,7 +1542,9 @@ SYSCALL_DEFINE5(io_setup2, u32, nr_events, u32, flags, void __user *, user1, unsigned long ctx; long ret; - if (flags || user1 || user2) + if (user1 || user2) + return -EINVAL; + if (flags & ~IOCTX_FLAG_IOPOLL) return -EINVAL; ret = get_user(ctx, ctxp); @@ -1429,13 +1670,8 @@ static void aio_remove_iocb(struct aio_kiocb *iocb) spin_unlock_irqrestore(&ctx->ctx_lock, flags); } -static void aio_complete_rw(struct kiocb *kiocb, long res, long res2) +static void kiocb_end_write(struct kiocb *kiocb) { - struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw); - - if (!list_empty_careful(&iocb->ki_list)) - aio_remove_iocb(iocb); - if (kiocb->ki_flags & IOCB_WRITE) { struct inode *inode = file_inode(kiocb->ki_filp); @@ -1447,19 +1683,48 @@ static void aio_complete_rw(struct kiocb *kiocb, long res, long res2) __sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE); file_end_write(kiocb->ki_filp); } +} + +static void aio_complete_rw(struct kiocb *kiocb, long res, long res2) +{ + struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw); + + if (!list_empty_careful(&iocb->ki_list)) + aio_remove_iocb(iocb); + + kiocb_end_write(kiocb); fput(kiocb->ki_filp); aio_complete(iocb, res, res2); } -static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb) +static void aio_complete_rw_poll(struct kiocb *kiocb, long res, long res2) { + struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw); + + kiocb_end_write(kiocb); + + /* + * Handle EAGAIN from resource limits with polled IO inline, don't + * pass the event back to userspace. + */ + if (unlikely(res == -EAGAIN)) + set_bit(KIOCB_F_POLL_EAGAIN, &iocb->ki_flags); + else { + aio_fill_event(&iocb->ki_ev, iocb, res, res2); + set_bit(KIOCB_F_POLL_COMPLETED, &iocb->ki_flags); + } +} + +static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb) +{ + struct kioctx *ctx = kiocb->ki_ctx; + struct kiocb *req = &kiocb->rw; int ret; req->ki_filp = fget(iocb->aio_fildes); if (unlikely(!req->ki_filp)) return -EBADF; - req->ki_complete = aio_complete_rw; req->ki_pos = iocb->aio_offset; req->ki_flags = iocb_flags(req->ki_filp); if (iocb->aio_flags & IOCB_FLAG_RESFD) @@ -1485,9 +1750,35 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb) if (unlikely(ret)) goto out_fput; - req->ki_flags &= ~IOCB_HIPRI; /* no one is going to poll for this I/O */ - return 0; + if (iocb->aio_flags & IOCB_FLAG_HIPRI) { + /* shares space in the union, and is rather pointless.. */ + ret = -EINVAL; + if (iocb->aio_flags & IOCB_FLAG_RESFD) + goto out_fput; + + /* can't submit polled IO to a non-polled ctx */ + if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) + goto out_fput; + + ret = -EOPNOTSUPP; + if (!(req->ki_flags & IOCB_DIRECT) || + !req->ki_filp->f_op->iopoll) + goto out_fput; + + req->ki_flags |= IOCB_HIPRI; + req->ki_complete = aio_complete_rw_poll; + } else { + /* can't submit non-polled IO to a polled ctx */ + ret = -EINVAL; + if (ctx->flags & IOCTX_FLAG_IOPOLL) + goto out_fput; + /* no one is going to poll for this I/O */ + req->ki_flags &= ~IOCB_HIPRI; + req->ki_complete = aio_complete_rw; + } + + return 0; out_fput: fput(req->ki_filp); return ret; @@ -1532,15 +1823,40 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t ret) } } -static ssize_t aio_read(struct kiocb *req, const struct iocb *iocb, +/* + * After the iocb has been issued, it's safe to be found on the poll list. + * Adding the kiocb to the list AFTER submission ensures that we don't + * find it from a io_getevents() thread before the issuer is done accessing + * the kiocb cookie. + */ +static void aio_iopoll_iocb_issued(struct aio_kiocb *kiocb) +{ + /* + * For fast devices, IO may have already completed. If it has, add + * it to the front so we find it first. We can't add to the poll_done + * list as that's unlocked from the completion side. + */ + const int front = test_bit(KIOCB_F_POLL_COMPLETED, &kiocb->ki_flags); + struct kioctx *ctx = kiocb->ki_ctx; + + spin_lock(&ctx->poll_lock); + if (front) + list_add(&kiocb->ki_list, &ctx->poll_submitted); + else + list_add_tail(&kiocb->ki_list, &ctx->poll_submitted); + spin_unlock(&ctx->poll_lock); +} + +static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, bool vectored, bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; + struct kiocb *req = &kiocb->rw; struct iov_iter iter; struct file *file; ssize_t ret; - ret = aio_prep_rw(req, iocb); + ret = aio_prep_rw(kiocb, iocb); if (ret) return ret; file = req->ki_filp; @@ -1565,15 +1881,16 @@ static ssize_t aio_read(struct kiocb *req, const struct iocb *iocb, return ret; } -static ssize_t aio_write(struct kiocb *req, const struct iocb *iocb, +static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, bool vectored, bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; + struct kiocb *req = &kiocb->rw; struct iov_iter iter; struct file *file; ssize_t ret; - ret = aio_prep_rw(req, iocb); + ret = aio_prep_rw(kiocb, iocb); if (ret) return ret; file = req->ki_filp; @@ -1844,7 +2161,8 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, return -EINVAL; } - if (!get_reqs_available(ctx)) + /* Poll IO doesn't need ring reservations */ + if (!(ctx->flags & IOCTX_FLAG_IOPOLL) && !get_reqs_available(ctx)) return -EAGAIN; ret = -EAGAIN; @@ -1867,35 +2185,44 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, } } - ret = put_user(KIOCB_KEY, &user_iocb->aio_key); - if (unlikely(ret)) { - pr_debug("EFAULT: aio_key\n"); - goto out_put_req; + if (aio_ctx_supports_cancel(ctx)) { + ret = put_user(KIOCB_KEY, &user_iocb->aio_key); + if (unlikely(ret)) { + pr_debug("EFAULT: aio_key\n"); + goto out_put_req; + } } req->ki_user_iocb = user_iocb; req->ki_user_data = iocb->aio_data; + ret = -EINVAL; switch (iocb->aio_lio_opcode) { case IOCB_CMD_PREAD: - ret = aio_read(&req->rw, iocb, false, compat); + ret = aio_read(req, iocb, false, compat); break; case IOCB_CMD_PWRITE: - ret = aio_write(&req->rw, iocb, false, compat); + ret = aio_write(req, iocb, false, compat); break; case IOCB_CMD_PREADV: - ret = aio_read(&req->rw, iocb, true, compat); + ret = aio_read(req, iocb, true, compat); break; case IOCB_CMD_PWRITEV: - ret = aio_write(&req->rw, iocb, true, compat); + ret = aio_write(req, iocb, true, compat); break; case IOCB_CMD_FSYNC: + if (ctx->flags & IOCTX_FLAG_IOPOLL) + break; ret = aio_fsync(&req->fsync, iocb, false); break; case IOCB_CMD_FDSYNC: + if (ctx->flags & IOCTX_FLAG_IOPOLL) + break; ret = aio_fsync(&req->fsync, iocb, true); break; case IOCB_CMD_POLL: + if (ctx->flags & IOCTX_FLAG_IOPOLL) + break; ret = aio_poll(req, iocb); break; default: @@ -1911,13 +2238,21 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, */ if (ret) goto out_put_req; + if (ctx->flags & IOCTX_FLAG_IOPOLL) { + if (test_bit(KIOCB_F_POLL_EAGAIN, &req->ki_flags)) { + ret = -EAGAIN; + goto out_put_req; + } + aio_iopoll_iocb_issued(req); + } return 0; out_put_req: if (req->ki_eventfd) eventfd_ctx_put(req->ki_eventfd); iocb_put(req); out_put_reqs_available: - put_reqs_available(ctx, 1); + if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) + put_reqs_available(ctx, 1); return ret; } @@ -2073,6 +2408,9 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb, if (unlikely(!ctx)) return -EINVAL; + if (!aio_ctx_supports_cancel(ctx)) + goto err; + spin_lock_irq(&ctx->ctx_lock); kiocb = lookup_kiocb(ctx, iocb); if (kiocb) { @@ -2089,9 +2427,8 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb, */ ret = -EINPROGRESS; } - +err: percpu_ref_put(&ctx->users); - return ret; } @@ -2106,8 +2443,12 @@ static long do_io_getevents(aio_context_t ctx_id, long ret = -EINVAL; if (likely(ioctx)) { - if (likely(min_nr <= nr && min_nr >= 0)) - ret = read_events(ioctx, min_nr, nr, events, until); + if (likely(min_nr <= nr && min_nr >= 0)) { + if (ioctx->flags & IOCTX_FLAG_IOPOLL) + ret = aio_iopoll_check(ioctx, min_nr, nr, events); + else + ret = read_events(ioctx, min_nr, nr, events, until); + } percpu_ref_put(&ioctx->users); } diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index 8387e0af0f76..a6829bae9ada 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -52,9 +52,11 @@ enum { * is valid. * IOCB_FLAG_IOPRIO - Set if the "aio_reqprio" member of the "struct iocb" * is valid. + * IOCB_FLAG_HIPRI - Use IO completion polling */ #define IOCB_FLAG_RESFD (1 << 0) #define IOCB_FLAG_IOPRIO (1 << 1) +#define IOCB_FLAG_HIPRI (1 << 2) /* read() from /dev/aio returns these structures. */ struct io_event { @@ -106,6 +108,8 @@ struct iocb { __u32 aio_resfd; }; /* 64 bytes */ +#define IOCTX_FLAG_IOPOLL (1 << 0) /* io_context is polled */ + #undef IFBIG #undef IFLITTLE From patchwork Thu Dec 13 17:56:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729357 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E90D83E9D for ; Thu, 13 Dec 2018 17:57:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DD1972B0EC for ; Thu, 13 Dec 2018 17:57:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D1A892C81B; Thu, 13 Dec 2018 17:57:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3CDF62BBD8 for ; Thu, 13 Dec 2018 17:57:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729707AbeLMR5V (ORCPT ); Thu, 13 Dec 2018 12:57:21 -0500 Received: from mail-it1-f196.google.com ([209.85.166.196]:55330 "EHLO mail-it1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728042AbeLMR5U (ORCPT ); Thu, 13 Dec 2018 12:57:20 -0500 Received: by mail-it1-f196.google.com with SMTP id o19so4960232itg.5 for ; Thu, 13 Dec 2018 09:57:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=GUCvbWUF6Qc+d/jnhNwti1Njr0Plm9kxIvawXXmUvck=; b=Wd4GUhApyvyd2v87HB7j4Cf635p7pDoK6WBn6bE5JJPELRSSuqOmNv23/iIPAmetXf 7HFPxfa2F+ZjCuK89pk5vNu0alzmzJi3kOcMXODvEnhxQCi8GKHgtlW6lfzIGA4m7UCm TUkez554Digkp/RsybeqRQpdb32vyZ8T5THoqAzXfC2LOxryGV6wrRYWWg5CNoNwA9gN ZbkD5XLIzLEC/aRDnCY/Yr9tZ3OZn9BC9IIwm0xu1W4b2MFaMrTjtuV/B1Db/O+3QJEg yOlNd1u+dblNz2bXSE+2ntuyes+nHzPYG9Y/b8ACYaQ0kV1OItinaqa+OLwlkUq2HVkd KP3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=GUCvbWUF6Qc+d/jnhNwti1Njr0Plm9kxIvawXXmUvck=; b=BX8cs8OZZKJk9J/XXDIl4a3PaEL+BsmtlWYmQ2sCXQIsxSSTKjOtINz1K0TcOiwtTf Pb0An7uGQ2Dj4bJFNNcmfkGdgRQrlL6Poi2+5iBKzyXdoBkHP8I2cX4ZI+T2mGluAi4j 9CAK1X2rO8pdzA5ZC2lVARm0Sb0h4VExmPl006VuHLYk99zp87OFWcUXqZestd/2WCjC 7JSL+jMKMKSAog6GA+Fwz5wTFCvixzi8zk2JUaCkeYeDlk09HCdOCBjPyVttkkVM4w25 EapeaZWWwuQEBDAZESIPJp9FtKI/r2GSn3zNAPaUhSvQDDvM7n7QllEZBRjqvrBfdhpX kc0Q== X-Gm-Message-State: AA+aEWYILtOXCI70+M9EN5IccFtsYwhwidSEHUmOZZ1D1y0pIy+8aNR2 LhTkUbjbVpQ5Yrr3En52z6YHhdF36Iwn+w== X-Google-Smtp-Source: AFSGD/Wfvyl1vF9PNnPS5H1ilACQ2MCRd81e7pONz38zJMOuES5cKhBQhm6SJ66/s59PYWnJOC9Czw== X-Received: by 2002:a24:b91d:: with SMTP id w29mr314010ite.138.1544723839087; Thu, 13 Dec 2018 09:57:19 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.17 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:18 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 15/26] aio: add submission side request cache Date: Thu, 13 Dec 2018 10:56:34 -0700 Message-Id: <20181213175645.22181-16-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We have to add each submitted polled request to the io_context poll_submitted list, which means we have to grab the poll_lock. We already use the block plug to batch submissions if we're doing a batch of IO submissions, extend that to cover the poll requests internally as well. Signed-off-by: Jens Axboe --- fs/aio.c | 136 +++++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 113 insertions(+), 23 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 0f48c288e5dd..abc0e4dc2aed 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -233,6 +233,21 @@ struct aio_kiocb { }; }; +struct aio_submit_state { + struct kioctx *ctx; + + struct blk_plug plug; +#ifdef CONFIG_BLOCK + struct blk_plug_cb plug_cb; +#endif + + /* + * Polled iocbs that have been submitted, but not added to the ctx yet + */ + struct list_head req_list; + unsigned int req_count; +}; + /*------ sysctl variables----*/ static DEFINE_SPINLOCK(aio_nr_lock); unsigned long aio_nr; /* current system wide number of aio requests */ @@ -247,6 +262,15 @@ static struct vfsmount *aio_mnt; static const struct file_operations aio_ring_fops; static const struct address_space_operations aio_ctx_aops; +/* + * We rely on block level unplugs to flush pending requests, if we schedule + */ +#ifdef CONFIG_BLOCK +static const bool aio_use_state_req_list = true; +#else +static const bool aio_use_state_req_list = false; +#endif + static void aio_iopoll_reap_events(struct kioctx *); static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages) @@ -1823,13 +1847,28 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t ret) } } +/* + * Called either at the end of IO submission, or through a plug callback + * because we're going to schedule. Moves out local batch of requests to + * the ctx poll list, so they can be found for polling + reaping. + */ +static void aio_flush_state_reqs(struct kioctx *ctx, + struct aio_submit_state *state) +{ + spin_lock(&ctx->poll_lock); + list_splice_tail_init(&state->req_list, &ctx->poll_submitted); + spin_unlock(&ctx->poll_lock); + state->req_count = 0; +} + /* * After the iocb has been issued, it's safe to be found on the poll list. * Adding the kiocb to the list AFTER submission ensures that we don't * find it from a io_getevents() thread before the issuer is done accessing * the kiocb cookie. */ -static void aio_iopoll_iocb_issued(struct aio_kiocb *kiocb) +static void aio_iopoll_iocb_issued(struct aio_submit_state *state, + struct aio_kiocb *kiocb) { /* * For fast devices, IO may have already completed. If it has, add @@ -1839,12 +1878,21 @@ static void aio_iopoll_iocb_issued(struct aio_kiocb *kiocb) const int front = test_bit(KIOCB_F_POLL_COMPLETED, &kiocb->ki_flags); struct kioctx *ctx = kiocb->ki_ctx; - spin_lock(&ctx->poll_lock); - if (front) - list_add(&kiocb->ki_list, &ctx->poll_submitted); - else - list_add_tail(&kiocb->ki_list, &ctx->poll_submitted); - spin_unlock(&ctx->poll_lock); + if (!state || !aio_use_state_req_list) { + spin_lock(&ctx->poll_lock); + if (front) + list_add(&kiocb->ki_list, &ctx->poll_submitted); + else + list_add_tail(&kiocb->ki_list, &ctx->poll_submitted); + spin_unlock(&ctx->poll_lock); + } else { + if (front) + list_add(&kiocb->ki_list, &state->req_list); + else + list_add_tail(&kiocb->ki_list, &state->req_list); + if (++state->req_count >= AIO_IOPOLL_BATCH) + aio_flush_state_reqs(ctx, state); + } } static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, @@ -2140,7 +2188,8 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) } static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, - struct iocb __user *user_iocb, bool compat) + struct iocb __user *user_iocb, + struct aio_submit_state *state, bool compat) { struct aio_kiocb *req; ssize_t ret; @@ -2243,7 +2292,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, ret = -EAGAIN; goto out_put_req; } - aio_iopoll_iocb_issued(req); + aio_iopoll_iocb_issued(state, req); } return 0; out_put_req: @@ -2257,14 +2306,51 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, } static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, - bool compat) + struct aio_submit_state *state, bool compat) { struct iocb iocb; if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb)))) return -EFAULT; - return __io_submit_one(ctx, &iocb, user_iocb, compat); + return __io_submit_one(ctx, &iocb, user_iocb, state, compat); +} + +#ifdef CONFIG_BLOCK +static void aio_state_unplug(struct blk_plug_cb *cb, bool from_schedule) +{ + struct aio_submit_state *state; + + state = container_of(cb, struct aio_submit_state, plug_cb); + if (!list_empty(&state->req_list)) + aio_flush_state_reqs(state->ctx, state); +} +#endif + +/* + * Batched submission is done, ensure local IO is flushed out. + */ +static void aio_submit_state_end(struct aio_submit_state *state) +{ + blk_finish_plug(&state->plug); + if (!list_empty(&state->req_list)) + aio_flush_state_reqs(state->ctx, state); +} + +/* + * Start submission side cache. + */ +static void aio_submit_state_start(struct aio_submit_state *state, + struct kioctx *ctx) +{ + state->ctx = ctx; + INIT_LIST_HEAD(&state->req_list); + state->req_count = 0; +#ifdef CONFIG_BLOCK + state->plug_cb.callback = aio_state_unplug; + blk_start_plug(&state->plug); + list_add(&state->plug_cb.list, &state->plug.cb_list); +#endif } /* sys_io_submit: @@ -2282,10 +2368,10 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, struct iocb __user * __user *, iocbpp) { + struct aio_submit_state state, *statep = NULL; struct kioctx *ctx; long ret = 0; int i = 0; - struct blk_plug plug; if (unlikely(nr < 0)) return -EINVAL; @@ -2299,8 +2385,10 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, if (nr > ctx->nr_events) nr = ctx->nr_events; - if (nr > AIO_PLUG_THRESHOLD) - blk_start_plug(&plug); + if (nr > AIO_PLUG_THRESHOLD) { + aio_submit_state_start(&state, ctx); + statep = &state; + } for (i = 0; i < nr; i++) { struct iocb __user *user_iocb; @@ -2309,12 +2397,12 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, break; } - ret = io_submit_one(ctx, user_iocb, false); + ret = io_submit_one(ctx, user_iocb, statep, false); if (ret) break; } - if (nr > AIO_PLUG_THRESHOLD) - blk_finish_plug(&plug); + if (statep) + aio_submit_state_end(statep); percpu_ref_put(&ctx->users); return i ? i : ret; @@ -2324,10 +2412,10 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, int, nr, compat_uptr_t __user *, iocbpp) { + struct aio_submit_state state, *statep = NULL; struct kioctx *ctx; long ret = 0; int i = 0; - struct blk_plug plug; if (unlikely(nr < 0)) return -EINVAL; @@ -2341,8 +2429,10 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, if (nr > ctx->nr_events) nr = ctx->nr_events; - if (nr > AIO_PLUG_THRESHOLD) - blk_start_plug(&plug); + if (nr > AIO_PLUG_THRESHOLD) { + aio_submit_state_start(&state, ctx); + statep = &state; + } for (i = 0; i < nr; i++) { compat_uptr_t user_iocb; @@ -2351,12 +2441,12 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, break; } - ret = io_submit_one(ctx, compat_ptr(user_iocb), true); + ret = io_submit_one(ctx, compat_ptr(user_iocb), statep, true); if (ret) break; } - if (nr > AIO_PLUG_THRESHOLD) - blk_finish_plug(&plug); + if (statep) + aio_submit_state_end(statep); percpu_ref_put(&ctx->users); return i ? i : ret; From patchwork Thu Dec 13 17:56:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729361 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 31C7514E2 for ; Thu, 13 Dec 2018 17:57:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 25E162B0EC for ; Thu, 13 Dec 2018 17:57:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 19EFD2C81B; Thu, 13 Dec 2018 17:57:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A966A2B0EC for ; Thu, 13 Dec 2018 17:57:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729715AbeLMR5X (ORCPT ); Thu, 13 Dec 2018 12:57:23 -0500 Received: from mail-io1-f67.google.com ([209.85.166.67]:35992 "EHLO mail-io1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729705AbeLMR5X (ORCPT ); Thu, 13 Dec 2018 12:57:23 -0500 Received: by mail-io1-f67.google.com with SMTP id m19so2327047ioh.3 for ; Thu, 13 Dec 2018 09:57:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=14kCfLiACV93AenWhL1xKvUDPabDTnEKBX7DXRr/kpI=; b=kCAvsZDPjgA8nWizJe5CdZPrXZhm9dVcGBb+XzFkiLfavCYuDkfRL3scrEdc5fqvQn 8GG2B8NtzOkjfBp2fdUGgZb/1Q/ldUFp419buyQmIwoILM/6QwIoWP/DGuF+Ykt6MrB0 Z07xhvuPBXhQ7RxroqiKbZqdhl4Hq5afzd9pgErIlTaa8fSqYU03tAVGx740uDMmB7jB DeTfwr93rwXbG2iGakYl6h2cPOUIwHqqnp/W5wyEQlofqKtkpNh+TE69IhEudyBSIvEz +VR61tH9MkmySspgDVwceHqYZmw3LSCAUjlMQvhfGOOmKkZRNV8H0YzFP8eP8Tuzr8He cGeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=14kCfLiACV93AenWhL1xKvUDPabDTnEKBX7DXRr/kpI=; b=O2Iiiix+HanIRu6Ut030m5PkPixUbQv+ajTvuxEinCuPrSh83/VaUrsbQf/7E+kJkf wz0+0UKR0M9oXQLHbkZOCuIfY/kns2KbgF0HblziHhBdqysqyOX3Uhmp8Tn3WyUvkvkj dy/d14w+/kwbZKIZThK/N8mRo9LZzMUAd3xOeeCGX3mKXsLc7+C9VBC3yHpcUDuG3QwG c2imGw6DL2h7OWYyBoQZ2cH0GmljXJLhk5Nrvtznq1zKki/VETQrR6v60xrW76vbXNuf 2pjmadtQ8N/6T83lDzl680gJgcEm6SK3FPnyyWv8YHPVdZu36KFJpxdRrJulbuBm9WQS XYXg== X-Gm-Message-State: AA+aEWYoarhoyhpy2v3tA3ZSBkW41w/kpw/jJspG4a40hV05pDAYgN34 mE71oEoJm0FTAqR3D3lF3n3z0DsrpyS0tg== X-Google-Smtp-Source: AFSGD/UmA4MjC6zNE6Zr2oPUWGaVzrg7t+X25vG/ychaU95FXatP0tsBJKWywwGtD0sjcwTIt3WwIw== X-Received: by 2002:a6b:3f06:: with SMTP id m6mr20055155ioa.117.1544723841121; Thu, 13 Dec 2018 09:57:21 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:19 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 16/26] fs: add fget_many() and fput_many() Date: Thu, 13 Dec 2018 10:56:35 -0700 Message-Id: <20181213175645.22181-17-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Some uses cases repeatedly get and put references to the same file, but the only exposed interface is doing these one at the time. As each of these entail an atomic inc or dec on a shared structure, that cost can add up. Add fget_many(), which works just like fget(), except it takes an argument for how many references to get on the file. Ditto fput_many(), which can drop an arbitrary number of references to a file. Signed-off-by: Jens Axboe --- fs/file.c | 15 ++++++++++----- fs/file_table.c | 10 ++++++++-- include/linux/file.h | 2 ++ include/linux/fs.h | 3 ++- 4 files changed, 22 insertions(+), 8 deletions(-) diff --git a/fs/file.c b/fs/file.c index 7ffd6e9d103d..ad9870edfd51 100644 --- a/fs/file.c +++ b/fs/file.c @@ -676,7 +676,7 @@ void do_close_on_exec(struct files_struct *files) spin_unlock(&files->file_lock); } -static struct file *__fget(unsigned int fd, fmode_t mask) +static struct file *__fget(unsigned int fd, fmode_t mask, unsigned int refs) { struct files_struct *files = current->files; struct file *file; @@ -691,7 +691,7 @@ static struct file *__fget(unsigned int fd, fmode_t mask) */ if (file->f_mode & mask) file = NULL; - else if (!get_file_rcu(file)) + else if (!get_file_rcu_many(file, refs)) goto loop; } rcu_read_unlock(); @@ -699,15 +699,20 @@ static struct file *__fget(unsigned int fd, fmode_t mask) return file; } +struct file *fget_many(unsigned int fd, unsigned int refs) +{ + return __fget(fd, FMODE_PATH, refs); +} + struct file *fget(unsigned int fd) { - return __fget(fd, FMODE_PATH); + return fget_many(fd, 1); } EXPORT_SYMBOL(fget); struct file *fget_raw(unsigned int fd) { - return __fget(fd, 0); + return __fget(fd, 0, 1); } EXPORT_SYMBOL(fget_raw); @@ -738,7 +743,7 @@ static unsigned long __fget_light(unsigned int fd, fmode_t mask) return 0; return (unsigned long)file; } else { - file = __fget(fd, mask); + file = __fget(fd, mask, 1); if (!file) return 0; return FDPUT_FPUT | (unsigned long)file; diff --git a/fs/file_table.c b/fs/file_table.c index e49af4caf15d..6a3964df33e4 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -326,9 +326,9 @@ void flush_delayed_fput(void) static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput); -void fput(struct file *file) +void fput_many(struct file *file, unsigned int refs) { - if (atomic_long_dec_and_test(&file->f_count)) { + if (atomic_long_sub_and_test(refs, &file->f_count)) { struct task_struct *task = current; if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) { @@ -347,6 +347,12 @@ void fput(struct file *file) } } +void fput(struct file *file) +{ + fput_many(file, 1); +} + + /* * synchronous analog of fput(); for kernel threads that might be needed * in some umount() (and thus can't use flush_delayed_fput() without diff --git a/include/linux/file.h b/include/linux/file.h index 6b2fb032416c..3fcddff56bc4 100644 --- a/include/linux/file.h +++ b/include/linux/file.h @@ -13,6 +13,7 @@ struct file; extern void fput(struct file *); +extern void fput_many(struct file *, unsigned int); struct file_operations; struct vfsmount; @@ -44,6 +45,7 @@ static inline void fdput(struct fd fd) } extern struct file *fget(unsigned int fd); +extern struct file *fget_many(unsigned int fd, unsigned int refs); extern struct file *fget_raw(unsigned int fd); extern unsigned long __fdget(unsigned int fd); extern unsigned long __fdget_raw(unsigned int fd); diff --git a/include/linux/fs.h b/include/linux/fs.h index 6a5f71f8ae06..dc54a65c401a 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -952,7 +952,8 @@ static inline struct file *get_file(struct file *f) atomic_long_inc(&f->f_count); return f; } -#define get_file_rcu(x) atomic_long_inc_not_zero(&(x)->f_count) +#define get_file_rcu_many(x, cnt) atomic_long_add_unless(&(x)->f_count, (cnt), 0) +#define get_file_rcu(x) get_file_rcu_many((x), 1) #define fput_atomic(x) atomic_long_add_unless(&(x)->f_count, -1, 1) #define file_count(x) atomic_long_read(&(x)->f_count) From patchwork Thu Dec 13 17:56:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729363 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 17B7213BF for ; Thu, 13 Dec 2018 17:57:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 098422B0EC for ; Thu, 13 Dec 2018 17:57:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F1C962C81B; Thu, 13 Dec 2018 17:57:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 50FCD2B0EC for ; Thu, 13 Dec 2018 17:57:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729705AbeLMR5Y (ORCPT ); Thu, 13 Dec 2018 12:57:24 -0500 Received: from mail-it1-f196.google.com ([209.85.166.196]:55342 "EHLO mail-it1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729714AbeLMR5Y (ORCPT ); Thu, 13 Dec 2018 12:57:24 -0500 Received: by mail-it1-f196.google.com with SMTP id o19so4960486itg.5 for ; Thu, 13 Dec 2018 09:57:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=GW+HicuEIZ9l2w5wF/GUDcrivS+/EOai0WgC5+IJMkE=; b=iyTSaVH0FMMIGeVswb/agzjLYIeUw6dVN/Ylui8UgwHyqLvgLYrPkILFHUcF8Bi4Xf uElnX1N4PGDy+fhWa6pSqigpSdz1+FNvDrPRsaTegJSOnlECepUSuijCnxs1DcTo32g/ 38GWCk2c5EBZweRI3n/FelCJbzjJ1eXX/vmYjwlu4SW4XAL1+2+SXtz9uyR3sxyVenq/ ZrgkjesBT7pZ9IxnFAaBzxCeD3MIKeE0OTK3ODhpKezQ7GgennosadvVnw6xM2Zlu8PE mE0CHzSE8RkpODaeRqVYNwTKSI+/7GcfRxYLrevSbEDH/rUVc3U0XsmZ1KAEYlR2FxTj RajQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=GW+HicuEIZ9l2w5wF/GUDcrivS+/EOai0WgC5+IJMkE=; b=FF+zpCj1Ro2QGdcgmVofwMH8I1nK0lTpAgzeFiMXSQHC7jaWABVmElKwhWOe5dbkUi l7ZAyUR34jX4RPOeeQWmIdQ1Jy+Pgl5MSoGdHlwemAtE5h9YCuk10I8KN/CCHwVbFD5a 4df98hBq1K1SGexwP3yr0ExttHIxjFaBWiQoQWkAZB/Ss1Fz3PeKPzNEns/3s8X/7EWq MI5i6T3TOVyzHUYPcBYwYySOGD6FwfuQOsYgb0xNrolHKq1rZD+LuMBRSqWKhzxPCwtu BR3z8ltFGklCGr0ugVOLX5ttAWWWL3mHkLSoLSyzFM4SAZs6AWEMG4rFhaKu6LCpEaxC mElQ== X-Gm-Message-State: AA+aEWaT64Ux0CJcDj7/J7Y2ue3mMwR4HGWDHEhPiBVMhRJfIQocFV7F cAVE5s9VzeC84jpxQp9dhJeTvCo4VmSq2g== X-Google-Smtp-Source: AFSGD/WG1nX78HWIVstfsogsFCBcVd5XtOI74YShCr9tr0v0gmpGBCKNDOzNeY86/P0pUiB8AdJPBw== X-Received: by 2002:a02:41c7:: with SMTP id n68mr24355169jad.61.1544723842780; Thu, 13 Dec 2018 09:57:22 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:21 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 17/26] aio: use fget/fput_many() for file references Date: Thu, 13 Dec 2018 10:56:36 -0700 Message-Id: <20181213175645.22181-18-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On the submission side, add file reference batching to the aio_submit_state. We get as many references as the number of iocbs we are submitting, and drop unused ones if we end up switching files. The assumption here is that we're usually only dealing with one fd, and if there are multiple, hopefuly they are at least somewhat ordered. Could trivially be extended to cover multiple fds, if needed. On the completion side we do the same thing, except this is trivially done just locally in aio_iopoll_reap(). Signed-off-by: Jens Axboe --- fs/aio.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 94 insertions(+), 16 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index abc0e4dc2aed..29336c6e8b40 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -246,6 +246,15 @@ struct aio_submit_state { */ struct list_head req_list; unsigned int req_count; + + /* + * File reference cache + */ + struct file *file; + unsigned int fd; + unsigned int has_refs; + unsigned int used_refs; + unsigned int ios_left; }; /*------ sysctl variables----*/ @@ -1353,7 +1362,8 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, { void *iocbs[AIO_IOPOLL_BATCH]; struct aio_kiocb *iocb, *n; - int to_free = 0, ret = 0; + int file_count, to_free = 0, ret = 0; + struct file *file = NULL; /* Shouldn't happen... */ if (*nr_events >= max) @@ -1370,7 +1380,20 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, list_del(&iocb->ki_list); iocbs[to_free++] = iocb; - fput(iocb->rw.ki_filp); + /* + * Batched puts of the same file, to avoid dirtying the + * file usage count multiple times, if avoidable. + */ + if (!file) { + file = iocb->rw.ki_filp; + file_count = 1; + } else if (file == iocb->rw.ki_filp) { + file_count++; + } else { + fput_many(file, file_count); + file = iocb->rw.ki_filp; + file_count = 1; + } if (evs && copy_to_user(evs + *nr_events, &iocb->ki_ev, sizeof(iocb->ki_ev))) { @@ -1380,6 +1403,9 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, (*nr_events)++; } + if (file) + fput_many(file, file_count); + if (to_free) iocb_put_many(ctx, iocbs, &to_free); @@ -1740,13 +1766,60 @@ static void aio_complete_rw_poll(struct kiocb *kiocb, long res, long res2) } } -static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb) +static void aio_file_put(struct aio_submit_state *state, struct file *file) +{ + if (!state) { + fput(file); + } else if (state->file) { + int diff = state->has_refs - state->used_refs; + + if (diff) + fput_many(state->file, diff); + state->file = NULL; + } +} + +/* + * Get as many references to a file as we have IOs left in this submission, + * assuming most submissions are for one file, or at least that each file + * has more than one submission. + */ +static struct file *aio_file_get(struct aio_submit_state *state, int fd) +{ + if (!state) + return fget(fd); + + if (!state->file) { +get_file: + state->file = fget_many(fd, state->ios_left); + if (!state->file) + return NULL; + + state->fd = fd; + state->has_refs = state->ios_left; + state->used_refs = 1; + state->ios_left--; + return state->file; + } + + if (state->fd == fd) { + state->used_refs++; + state->ios_left--; + return state->file; + } + + aio_file_put(state, NULL); + goto get_file; +} + +static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb, + struct aio_submit_state *state) { struct kioctx *ctx = kiocb->ki_ctx; struct kiocb *req = &kiocb->rw; int ret; - req->ki_filp = fget(iocb->aio_fildes); + req->ki_filp = aio_file_get(state, iocb->aio_fildes); if (unlikely(!req->ki_filp)) return -EBADF; req->ki_pos = iocb->aio_offset; @@ -1804,7 +1877,7 @@ static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb) return 0; out_fput: - fput(req->ki_filp); + aio_file_put(state, req->ki_filp); return ret; } @@ -1896,7 +1969,8 @@ static void aio_iopoll_iocb_issued(struct aio_submit_state *state, } static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, - bool vectored, bool compat) + struct aio_submit_state *state, bool vectored, + bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *req = &kiocb->rw; @@ -1904,7 +1978,7 @@ static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, struct file *file; ssize_t ret; - ret = aio_prep_rw(kiocb, iocb); + ret = aio_prep_rw(kiocb, iocb, state); if (ret) return ret; file = req->ki_filp; @@ -1930,7 +2004,8 @@ static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, } static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, - bool vectored, bool compat) + struct aio_submit_state *state, bool vectored, + bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *req = &kiocb->rw; @@ -1938,7 +2013,7 @@ static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, struct file *file; ssize_t ret; - ret = aio_prep_rw(kiocb, iocb); + ret = aio_prep_rw(kiocb, iocb, state); if (ret) return ret; file = req->ki_filp; @@ -2248,16 +2323,16 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, ret = -EINVAL; switch (iocb->aio_lio_opcode) { case IOCB_CMD_PREAD: - ret = aio_read(req, iocb, false, compat); + ret = aio_read(req, iocb, state, false, compat); break; case IOCB_CMD_PWRITE: - ret = aio_write(req, iocb, false, compat); + ret = aio_write(req, iocb, state, false, compat); break; case IOCB_CMD_PREADV: - ret = aio_read(req, iocb, true, compat); + ret = aio_read(req, iocb, state, true, compat); break; case IOCB_CMD_PWRITEV: - ret = aio_write(req, iocb, true, compat); + ret = aio_write(req, iocb, state, true, compat); break; case IOCB_CMD_FSYNC: if (ctx->flags & IOCTX_FLAG_IOPOLL) @@ -2335,17 +2410,20 @@ static void aio_submit_state_end(struct aio_submit_state *state) blk_finish_plug(&state->plug); if (!list_empty(&state->req_list)) aio_flush_state_reqs(state->ctx, state); + aio_file_put(state, NULL); } /* * Start submission side cache. */ static void aio_submit_state_start(struct aio_submit_state *state, - struct kioctx *ctx) + struct kioctx *ctx, int max_ios) { state->ctx = ctx; INIT_LIST_HEAD(&state->req_list); state->req_count = 0; + state->file = NULL; + state->ios_left = max_ios; #ifdef CONFIG_BLOCK state->plug_cb.callback = aio_state_unplug; blk_start_plug(&state->plug); @@ -2386,7 +2464,7 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, nr = ctx->nr_events; if (nr > AIO_PLUG_THRESHOLD) { - aio_submit_state_start(&state, ctx); + aio_submit_state_start(&state, ctx, nr); statep = &state; } for (i = 0; i < nr; i++) { @@ -2430,7 +2508,7 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, nr = ctx->nr_events; if (nr > AIO_PLUG_THRESHOLD) { - aio_submit_state_start(&state, ctx); + aio_submit_state_start(&state, ctx, nr); statep = &state; } for (i = 0; i < nr; i++) { From patchwork Thu Dec 13 17:56:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729369 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3B9EA1759 for ; Thu, 13 Dec 2018 17:57:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2F6632BBD8 for ; Thu, 13 Dec 2018 17:57:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 23F102C81C; Thu, 13 Dec 2018 17:57:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D52002BBD8 for ; Thu, 13 Dec 2018 17:57:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729714AbeLMR50 (ORCPT ); Thu, 13 Dec 2018 12:57:26 -0500 Received: from mail-io1-f66.google.com ([209.85.166.66]:37965 "EHLO mail-io1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728042AbeLMR50 (ORCPT ); Thu, 13 Dec 2018 12:57:26 -0500 Received: by mail-io1-f66.google.com with SMTP id l14so2317898ioj.5 for ; Thu, 13 Dec 2018 09:57:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=5UZD3rlbMpzki33VvR6rIT0nzMnP5TNa+2FvOR+SAL8=; b=jKc/oV4XKSrA58shJi75dwcx/fPya0GX9L5Bi9IJ8k+NeqzZdKPKkZjO/iiqeeWmLf uAxo0ltYr0pR3XNnk4x6FqeOkeNfBQxkPqDouV5oJijky1kye2nVLHFlwMJnv6iLZhuF DEWfZscTV3C48bT2H3ZeA/plNPp/tEJa6EMluzu2tAapt4af44xYtmXjsUkP82/jJbwA nIWShlLE2icmn7mOJDcAXjrTm7H9qQqCQrQ1ysS4JrvLxCUH+cPDol6EDJLYTJywwlwJ EcJ+JrOLG0QDLyrCOFmTrtPopzp/hkbw7OH11fTUMCxcx2frYMXgY0tVwBLg6FoaYNyo 2ZhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=5UZD3rlbMpzki33VvR6rIT0nzMnP5TNa+2FvOR+SAL8=; b=hDtW4grIrd4VAyH7Ujr7P1D9tmAlm1/bOksxya5U34Y0/gKZPzWFOVGZN0FBKFdtIz E2iex6y2l8+6ES36mW9/a6MGpmoBcVKcKkcKMsM0IflbU07MqyyvfyY6Sdqbgeh8Lra9 c+W3TrezT/IJ5fN2srXp7VRoLNbTkLaNLA7w7oTuJvEer0ikd+PqCB+EwMbiCdlvvA7z eBuWPco4PyQzibgbCy2JEGTZal+gYwZio41pH7RMmtXbSC+ZJev9jdrEMIi2JsdZRMEA IknfxbkbUhK+RO+J5riEB0IHE5G1s2C6miHdoZjTcd/g48KzK4bem8SPz8ZMiU3t70zs gtlg== X-Gm-Message-State: AA+aEWapFQrCn15QiVMNHOUEGjdIBum1q+rmqo6Ve5+F5qQKZTbTwtAK L+12xnQwplwsdO5ZgE5U8MmB7BNtZ+3mUQ== X-Google-Smtp-Source: AFSGD/WLqkn3SiBeBIo6dpsmhlUnT/o1tMWjNrpZ5R5yk0PI+8FAbC4pksgJ7/PiBtodLH38WnSpQg== X-Received: by 2002:a6b:1414:: with SMTP id 20mr20910546iou.140.1544723844865; Thu, 13 Dec 2018 09:57:24 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:23 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 18/26] aio: split iocb init from allocation Date: Thu, 13 Dec 2018 10:56:37 -0700 Message-Id: <20181213175645.22181-19-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In preparation from having pre-allocated requests, that we then just need to initialize before use. Signed-off-by: Jens Axboe --- fs/aio.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 29336c6e8b40..b6dfc4e9af47 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1097,6 +1097,16 @@ static bool get_reqs_available(struct kioctx *ctx) return __get_reqs_available(ctx); } +static void aio_iocb_init(struct kioctx *ctx, struct aio_kiocb *req) +{ + percpu_ref_get(&ctx->reqs); + req->ki_ctx = ctx; + INIT_LIST_HEAD(&req->ki_list); + req->ki_flags = 0; + refcount_set(&req->ki_refcnt, 0); + req->ki_eventfd = NULL; +} + /* aio_get_req * Allocate a slot for an aio request. * Returns NULL if no requests are free. @@ -1109,12 +1119,7 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx) if (unlikely(!req)) return NULL; - percpu_ref_get(&ctx->reqs); - req->ki_ctx = ctx; - INIT_LIST_HEAD(&req->ki_list); - req->ki_flags = 0; - refcount_set(&req->ki_refcnt, 0); - req->ki_eventfd = NULL; + aio_iocb_init(ctx, req); return req; } From patchwork Thu Dec 13 17:56:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729371 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6D21814E2 for ; Thu, 13 Dec 2018 17:57:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5FB212B0EC for ; Thu, 13 Dec 2018 17:57:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 53A902C81B; Thu, 13 Dec 2018 17:57:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D32712B0EC for ; Thu, 13 Dec 2018 17:57:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729726AbeLMR52 (ORCPT ); Thu, 13 Dec 2018 12:57:28 -0500 Received: from mail-it1-f194.google.com ([209.85.166.194]:52728 "EHLO mail-it1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729724AbeLMR51 (ORCPT ); Thu, 13 Dec 2018 12:57:27 -0500 Received: by mail-it1-f194.google.com with SMTP id g76so4987606itg.2 for ; Thu, 13 Dec 2018 09:57:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=RpRLIkS/GSyyVMAQBGyyudrshJ9VbZFY2XJOsbDfsqQ=; b=TYCXvBP+eZFima9PaUN7PZd3pWeVlZy0ToMWNVkLOLDxL7iC/5oabcIZqITJXMb6TC NsdMv4dXSHed0FHleSe++zr3Ge2MoiRvPfZUJQXHL5GR25/8RMOyzX3Th1fGErwL6i7v uHXPfnWUHD18iKYy4HQrKgPvDrDp6kOoB2tlwnaFtA7bGV1ra0vrIOz2LFlLifXk4tp0 D+n6rzaDO/yJ9+rgkw06ZMGi8TD6exKymAo6M/TdGJjdqtZd2JUzzeF4EyP80On79Lv7 W72L7GI+SE073cppYTO3bGpe1cdWt0l4YWTCEKu/ziWLsRUn+4yjdwJQJFQXXxz5Gxt3 YFuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=RpRLIkS/GSyyVMAQBGyyudrshJ9VbZFY2XJOsbDfsqQ=; b=BwUn7raeFkvl6WpHDPH9IUKsSIfQGCfuA+e6nTlPX1oIQptrzii5ft+YQvXAEhX3VX x7kV8xhXPyi7toZRhKdBfPlsY0lDpdyOS04WwoIu54jJ+pVH+XOqZBPd8KgcdNEuXsYW lkhl1RKHKrujC9pE0bOJXO3Mnnt3xIGouCsYC/TzBhbxfdR88CFvKltm9o4kaC2H+Cyb 1c4jIF7VYU91pgQ+9RCk8JFb5UYdP3OLR/X/vLFbxBQWosnto1D6TW3COJsJE+P3l9qW oCY3eLiqDAeWvNulNpLVoS9WU0N4RRg+QaASlLFMJERIEptnV49zYDdfLPYo/z+M4yhA dXxg== X-Gm-Message-State: AA+aEWaLubOcC2Bb4CQvG55c/7u7Tz9kxSoEUxAQ+uRjTKoqo6i78Uga jILs8ZEx9tEeDWRI8CDrtK5rL5Du0E/ABA== X-Google-Smtp-Source: AFSGD/XRIOXxSJcN5fkxfi85glEqP4PiSgn651rzD9o0zML/OCryolTEH88EIV8qKHlulnanlsj6Hg== X-Received: by 2002:a24:2208:: with SMTP id o8mr281790ito.23.1544723846571; Thu, 13 Dec 2018 09:57:26 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:25 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 19/26] aio: batch aio_kiocb allocation Date: Thu, 13 Dec 2018 10:56:38 -0700 Message-Id: <20181213175645.22181-20-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Similarly to how we use the state->ios_left to know how many references to get to a file, we can use it to allocate the aio_kiocb's we need in bulk. Signed-off-by: Jens Axboe --- fs/aio.c | 46 ++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 38 insertions(+), 8 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index b6dfc4e9af47..7a3986c6cc1b 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -233,6 +233,8 @@ struct aio_kiocb { }; }; +#define AIO_IOPOLL_BATCH 8 + struct aio_submit_state { struct kioctx *ctx; @@ -247,6 +249,13 @@ struct aio_submit_state { struct list_head req_list; unsigned int req_count; + /* + * aio_kiocb alloc cache + */ + void *iocbs[AIO_IOPOLL_BATCH]; + unsigned int free_iocbs; + unsigned int cur_iocb; + /* * File reference cache */ @@ -1111,15 +1120,34 @@ static void aio_iocb_init(struct kioctx *ctx, struct aio_kiocb *req) * Allocate a slot for an aio request. * Returns NULL if no requests are free. */ -static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx) +static struct aio_kiocb *aio_get_req(struct kioctx *ctx, + struct aio_submit_state *state) { struct aio_kiocb *req; - req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL); - if (unlikely(!req)) - return NULL; + if (!state) + req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL); + else if (!state->free_iocbs) { + size_t size; + int ret; + + size = min_t(size_t, state->ios_left, ARRAY_SIZE(state->iocbs)); + ret = kmem_cache_alloc_bulk(kiocb_cachep, GFP_KERNEL, size, + state->iocbs); + if (ret <= 0) + return ERR_PTR(-ENOMEM); + state->free_iocbs = ret - 1; + state->cur_iocb = 1; + req = state->iocbs[0]; + } else { + req = state->iocbs[state->cur_iocb]; + state->free_iocbs--; + state->cur_iocb++; + } + + if (req) + aio_iocb_init(ctx, req); - aio_iocb_init(ctx, req); return req; } @@ -1357,8 +1385,6 @@ static bool aio_read_events(struct kioctx *ctx, long min_nr, long nr, return ret < 0 || *i >= min_nr; } -#define AIO_IOPOLL_BATCH 8 - /* * Process completed iocb iopoll entries, copying the result to userspace. */ @@ -2295,7 +2321,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, return -EAGAIN; ret = -EAGAIN; - req = aio_get_req(ctx); + req = aio_get_req(ctx, state); if (unlikely(!req)) goto out_put_reqs_available; @@ -2416,6 +2442,9 @@ static void aio_submit_state_end(struct aio_submit_state *state) if (!list_empty(&state->req_list)) aio_flush_state_reqs(state->ctx, state); aio_file_put(state, NULL); + if (state->free_iocbs) + kmem_cache_free_bulk(kiocb_cachep, state->free_iocbs, + &state->iocbs[state->cur_iocb]); } /* @@ -2427,6 +2456,7 @@ static void aio_submit_state_start(struct aio_submit_state *state, state->ctx = ctx; INIT_LIST_HEAD(&state->req_list); state->req_count = 0; + state->free_iocbs = 0; state->file = NULL; state->ios_left = max_ios; #ifdef CONFIG_BLOCK From patchwork Thu Dec 13 17:56:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729377 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 58B641759 for ; Thu, 13 Dec 2018 17:57:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4D6952BBD8 for ; Thu, 13 Dec 2018 17:57:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 419592C81E; Thu, 13 Dec 2018 17:57:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F0D692BBD8 for ; Thu, 13 Dec 2018 17:57:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728042AbeLMR5a (ORCPT ); Thu, 13 Dec 2018 12:57:30 -0500 Received: from mail-it1-f194.google.com ([209.85.166.194]:34960 "EHLO mail-it1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729724AbeLMR53 (ORCPT ); Thu, 13 Dec 2018 12:57:29 -0500 Received: by mail-it1-f194.google.com with SMTP id p197so5220834itp.0 for ; Thu, 13 Dec 2018 09:57:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=gebr0BfrZPSt7sqZNiw2xj8KlSXIt8V6Wrau0EAezWQ=; b=E5vcL+W0Oc6XMYDNB9g4PC/UlQrmBRxAddKO/7j9Qx+O4nX1eyI3F+j1UuTE8llvEk S+QqGJYRpm4zwz8njAI8dVkil6aq4VCV4/268mVp0A1qcG7I6symQxB+Sc+8meyrcuXp XkEV7Jv7SGbndVPZWYSRqydqGhyvDg69gumShSoUDjA4d1pt9Lu+M5S51uhWwCuYydPR F3hL03mRSCV0PB7pyk8roVCLoxchNLQ5PAfONChZNQkOQ8Sjbb2q112xpiqHJESlWK+s oqWscN9pZGSSSeE9UKTZz387EXEj9TSCYXwGOWVllyZFjiULZpq7wn7qfVW4lhb893dI d/tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=gebr0BfrZPSt7sqZNiw2xj8KlSXIt8V6Wrau0EAezWQ=; b=RGwzeeXtxGf2uCppnXW2Urm0BQ4v7nmN/JBUZxmvnc5PvFaR2+UJVOFGIBnx/aiAMB 3A/RYfOAXyiTsbbEtrAuJUuyK5seQTULU+mQtFSdnteQnUX3O8RKTvQ0utnHm2ToU/B2 KI09vMv/qvcVgMmjOBZjTLNBQDEZ/F62vf6o5w6MiiKDpa+OPEev4T2wafzKRwQhCbsK IY2C5In2lcBhqCwx6AOq2Y90liQlLUE2EOtb5H6wIkUYZO4cTearJ4a7v3VaeAQEcb50 Wmzpafsppb/DCey+l0pb/Jw387onMPxIJW4seXqCqCY4Py2/1iMmgGAVzRr07FC2f0yP muRA== X-Gm-Message-State: AA+aEWZ80mLAFfJHBmzcPLGq4M13u07Jh6ZKOte8k2SoO7BebrcnEjjv yCUuyYdzrsegTcmw+6jvuiK5Y1qWTk7TPA== X-Google-Smtp-Source: AFSGD/UX7Trp3xDIy54f9x1LvAzA5vO0O6/uqiDg42kAkvz4+YK2Qry4lVRUS9Zn3qygxDcb9f6BCw== X-Received: by 2002:a24:ba0b:: with SMTP id p11mr272681itf.113.1544723848156; Thu, 13 Dec 2018 09:57:28 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:27 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 20/26] aio: split old ring complete out from aio_complete() Date: Thu, 13 Dec 2018 10:56:39 -0700 Message-Id: <20181213175645.22181-21-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Jens Axboe --- fs/aio.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 7a3986c6cc1b..fda9b1c53f3d 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1205,12 +1205,9 @@ static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb, ev->res2 = res2; } -/* aio_complete - * Called when the io request on the given iocb is complete. - */ -static void aio_complete(struct aio_kiocb *iocb, long res, long res2) +static void aio_ring_complete(struct kioctx *ctx, struct aio_kiocb *iocb, + long res, long res2) { - struct kioctx *ctx = iocb->ki_ctx; struct aio_ring *ring; struct io_event *ev_page, *event; unsigned tail, pos, head; @@ -1260,6 +1257,16 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2) spin_unlock_irqrestore(&ctx->completion_lock, flags); pr_debug("added to ring %p at [%u]\n", iocb, tail); +} + +/* aio_complete + * Called when the io request on the given iocb is complete. + */ +static void aio_complete(struct aio_kiocb *iocb, long res, long res2) +{ + struct kioctx *ctx = iocb->ki_ctx; + + aio_ring_complete(ctx, iocb, res, res2); /* * Check if the user asked us to deliver the result through an From patchwork Thu Dec 13 17:56:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729381 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1C51F14E2 for ; Thu, 13 Dec 2018 17:57:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0E6AF2B0EC for ; Thu, 13 Dec 2018 17:57:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 01DB62C81B; Thu, 13 Dec 2018 17:57:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B6C2D2BBD8 for ; Thu, 13 Dec 2018 17:57:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729740AbeLMR5c (ORCPT ); Thu, 13 Dec 2018 12:57:32 -0500 Received: from mail-it1-f195.google.com ([209.85.166.195]:38337 "EHLO mail-it1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729731AbeLMR5c (ORCPT ); Thu, 13 Dec 2018 12:57:32 -0500 Received: by mail-it1-f195.google.com with SMTP id h65so5188577ith.3 for ; Thu, 13 Dec 2018 09:57:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=ocF80/S/hvZ1Z6Xh69qWhSFivRYok4YbmAlvmzrC55A=; b=gAvGyp8+DO1W/pzVQnWCDhF3NQVyCulRgZXGRNpReyxj7E+IHZ4h5x/T5Tf7rJC68W 3aQTBAyIEq7cBlER5qS6gXM31DMDB3Rawp2NACgw0KmUgHcHHoqd008BNtMT+263SA5Q mWV/sa/Lbi5DbZfJeRFXTK57laMAkOM1uR4a2W9pzdD9jacJsNHaudimzOkKZjFqyTnR bIH7mBZOxyRRPkwmTQNpqHs6bRC3Wqjvfmd3r5KZz8frew74ddncgTbzyOatHhJQmOIi QZv3kWUBKV+vHu77nCzRuQxy+szAGjlQKJRO69IpC2/Pz8VskZ9aKrZ0QvdkONuR1PAQ 1+jA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ocF80/S/hvZ1Z6Xh69qWhSFivRYok4YbmAlvmzrC55A=; b=LMPv+S3L5Ei4lWUYixiRYSfR2d6FeENB3Ll7HuAGu+pLJyqydVoz7fjOvAVO+KKl9o kDPHxAYjxAPSl1HQ/fM6izsqmsAuVyhz93Q9Ehf+MV70f+3FlkBBLigDa+/H6a+flOSl xblnaIxJWul+5e+Rr0nQmGpqnwtcmAg4tNtVjbEURI/9CbrZMBl+QEG3JVaQtMrcF7bo BxN1z5ItSqQNJycECfHRmv0L8Gl4Joe8Ce9d3sfIRqCDC2DT/sSOQDcWbg6lTyYjcFG3 WS27/FhbtlTsyKapF4XxxExoxutKts7LPYX0o1iQnq2M75f5Xst5oyFfG2e6U1hbDIbZ ffuQ== X-Gm-Message-State: AA+aEWYTzbLzdbcJnB5Dot/1JxvJ3FyyGSGUNZCQ3uGP46y+KnSKQvmi LIFNXiBEzYXIRJZVgNbtfR5MeN+NCDGBuw== X-Google-Smtp-Source: AFSGD/Wm6QE+Yf/HOv+p9JZaEg4jOAiIUcqQvLtbGNXl2auVeWY+EnWoAY/1/kJR+U6R/HzUtKw+dA== X-Received: by 2002:a05:660c:b12:: with SMTP id f18mr302701itk.118.1544723849754; Thu, 13 Dec 2018 09:57:29 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:28 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 21/26] aio: add support for submission/completion rings Date: Thu, 13 Dec 2018 10:56:40 -0700 Message-Id: <20181213175645.22181-22-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Experimental support for submitting and completing IO through rings shared between the application and kernel. The submission rings are struct iocb, like we would submit through io_submit(), and the completion rings are struct io_event, like we would pass in (and copy back) from io_getevents(). A new system call is added for this, io_ring_enter(). This system call submits IO that is queued in the SQ ring, and/or completes IO and stores the results in the CQ ring. This could be augmented with a kernel thread that does the submission and polling, then the application would never have to enter the kernel to do IO. Sample application: http://git.kernel.dk/cgit/fio/plain/t/aio-ring.c Signed-off-by: Jens Axboe --- arch/x86/entry/syscalls/syscall_64.tbl | 1 + fs/aio.c | 484 +++++++++++++++++++++++-- include/linux/syscalls.h | 4 +- include/uapi/linux/aio_abi.h | 29 ++ kernel/sys_ni.c | 1 + 5 files changed, 493 insertions(+), 26 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 67c357225fb0..55a26700a637 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -344,6 +344,7 @@ 333 common io_pgetevents __x64_sys_io_pgetevents 334 common rseq __x64_sys_rseq 335 common io_setup2 __x64_sys_io_setup2 +336 common io_ring_enter __x64_sys_io_ring_enter # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/fs/aio.c b/fs/aio.c index fda9b1c53f3d..a5ec349670bc 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -92,6 +92,18 @@ struct ctx_rq_wait { atomic_t count; }; +struct aio_mapped_range { + struct page **pages; + long nr_pages; +}; + +struct aio_iocb_ring { + struct aio_mapped_range ring_range; /* maps user SQ ring */ + struct aio_sq_ring *ring; + + struct aio_mapped_range iocb_range; /* maps user iocbs */ +}; + struct kioctx { struct percpu_ref users; atomic_t dead; @@ -127,6 +139,11 @@ struct kioctx { struct page **ring_pages; long nr_pages; + /* if used, completion and submission rings */ + struct aio_iocb_ring sq_ring; + struct aio_mapped_range cq_ring; + int cq_ring_overflow; + struct rcu_work free_rwork; /* see free_ioctx() */ /* @@ -280,6 +297,13 @@ static struct vfsmount *aio_mnt; static const struct file_operations aio_ring_fops; static const struct address_space_operations aio_ctx_aops; +static const unsigned int array_page_shift = + ilog2(PAGE_SIZE / sizeof(u32)); +static const unsigned int iocb_page_shift = + ilog2(PAGE_SIZE / sizeof(struct iocb)); +static const unsigned int event_page_shift = + ilog2(PAGE_SIZE / sizeof(struct io_event)); + /* * We rely on block level unplugs to flush pending requests, if we schedule */ @@ -289,6 +313,7 @@ static const bool aio_use_state_req_list = true; static const bool aio_use_state_req_list = false; #endif +static void aio_scqring_unmap(struct kioctx *); static void aio_iopoll_reap_events(struct kioctx *); static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages) @@ -519,6 +544,12 @@ static const struct address_space_operations aio_ctx_aops = { #endif }; +/* Polled IO or SQ/CQ rings don't use the old ring */ +static bool aio_ctx_old_ring(struct kioctx *ctx) +{ + return !(ctx->flags & (IOCTX_FLAG_IOPOLL | IOCTX_FLAG_SCQRING)); +} + static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) { struct aio_ring *ring; @@ -533,7 +564,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) * IO polling doesn't require any io event entries */ size = sizeof(struct aio_ring); - if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) { + if (aio_ctx_old_ring(ctx)) { nr_events += 2; /* 1 is required, 2 for good luck */ size += sizeof(struct io_event) * nr_events; } @@ -625,7 +656,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) */ static bool aio_ctx_supports_cancel(struct kioctx *ctx) { - return (ctx->flags & IOCTX_FLAG_IOPOLL) == 0; + return (ctx->flags & (IOCTX_FLAG_IOPOLL | IOCTX_FLAG_SCQRING)) == 0; } #define AIO_EVENTS_PER_PAGE (PAGE_SIZE / sizeof(struct io_event)) @@ -661,6 +692,7 @@ static void free_ioctx(struct work_struct *work) free_rwork); pr_debug("freeing %p\n", ctx); + aio_scqring_unmap(ctx); aio_free_ring(ctx); free_percpu(ctx->cpu); percpu_ref_exit(&ctx->reqs); @@ -1205,6 +1237,39 @@ static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb, ev->res2 = res2; } +static void aio_commit_cqring(struct kioctx *ctx, unsigned next_tail) +{ + struct aio_cq_ring *ring = page_address(ctx->cq_ring.pages[0]); + + if (next_tail != ring->tail) { + ring->tail = next_tail; + smp_wmb(); + } +} + +static struct io_event *aio_peek_cqring(struct kioctx *ctx, unsigned *ntail) +{ + struct aio_cq_ring *ring; + struct io_event *ev; + unsigned tail; + + ring = page_address(ctx->cq_ring.pages[0]); + + smp_rmb(); + tail = READ_ONCE(ring->tail); + *ntail = tail + 1; + if (*ntail == ring->nr_events) + *ntail = 0; + if (*ntail == READ_ONCE(ring->head)) + return NULL; + + /* io_event array starts offset one into the mapped range */ + tail++; + ev = page_address(ctx->cq_ring.pages[tail >> event_page_shift]); + tail &= ((1 << event_page_shift) - 1); + return ev + tail; +} + static void aio_ring_complete(struct kioctx *ctx, struct aio_kiocb *iocb, long res, long res2) { @@ -1266,7 +1331,36 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2) { struct kioctx *ctx = iocb->ki_ctx; - aio_ring_complete(ctx, iocb, res, res2); + if (ctx->flags & IOCTX_FLAG_SCQRING) { + unsigned long flags; + struct io_event *ev; + unsigned int tail; + + /* + * If we can't get a cq entry, userspace overflowed the + * submission (by quite a lot). Flag it as an overflow + * condition, and next io_ring_enter(2) call will return + * -EOVERFLOW. + */ + spin_lock_irqsave(&ctx->completion_lock, flags); + ev = aio_peek_cqring(ctx, &tail); + if (ev) { + aio_fill_event(ev, iocb, res, res2); + aio_commit_cqring(ctx, tail); + } else + ctx->cq_ring_overflow = 1; + spin_unlock_irqrestore(&ctx->completion_lock, flags); + } else { + aio_ring_complete(ctx, iocb, res, res2); + + /* + * We have to order our ring_info tail store above and test + * of the wait list below outside the wait lock. This is + * like in wake_up_bit() where clearing a bit has to be + * ordered with the unlocked test. + */ + smp_mb(); + } /* * Check if the user asked us to deliver the result through an @@ -1278,14 +1372,6 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2) eventfd_ctx_put(iocb->ki_eventfd); } - /* - * We have to order our ring_info tail store above and test - * of the wait list below outside the wait lock. This is - * like in wake_up_bit() where clearing a bit has to be - * ordered with the unlocked test. - */ - smp_mb(); - if (waitqueue_active(&ctx->wait)) wake_up(&ctx->wait); iocb_put(iocb); @@ -1408,6 +1494,9 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, return 0; list_for_each_entry_safe(iocb, n, &ctx->poll_completing, ki_list) { + struct io_event *ev = NULL; + unsigned int next_tail; + if (*nr_events == max) break; if (!test_bit(KIOCB_F_POLL_COMPLETED, &iocb->ki_flags)) @@ -1415,6 +1504,14 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, if (to_free == AIO_IOPOLL_BATCH) iocb_put_many(ctx, iocbs, &to_free); + /* Will only happen if the application over-commits */ + ret = -EAGAIN; + if (ctx->flags & IOCTX_FLAG_SCQRING) { + ev = aio_peek_cqring(ctx, &next_tail); + if (!ev) + break; + } + list_del(&iocb->ki_list); iocbs[to_free++] = iocb; @@ -1433,8 +1530,11 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, file_count = 1; } - if (evs && copy_to_user(evs + *nr_events, &iocb->ki_ev, - sizeof(iocb->ki_ev))) { + if (ev) { + memcpy(ev, &iocb->ki_ev, sizeof(*ev)); + aio_commit_cqring(ctx, next_tail); + } else if (evs && copy_to_user(evs + *nr_events, &iocb->ki_ev, + sizeof(iocb->ki_ev))) { ret = -EFAULT; break; } @@ -1615,24 +1715,139 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr, return ret; } +static void aio_unmap_range(struct aio_mapped_range *range) +{ + int i; + + if (!range->nr_pages) + return; + + for (i = 0; i < range->nr_pages; i++) + put_page(range->pages[i]); + + kfree(range->pages); + range->pages = NULL; + range->nr_pages = 0; +} + +static int aio_map_range(struct aio_mapped_range *range, void __user *uaddr, + size_t size, int gup_flags) +{ + int nr_pages, ret; + + if ((unsigned long) uaddr & ~PAGE_MASK) + return -EINVAL; + + nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; + + range->pages = kzalloc(nr_pages * sizeof(struct page *), GFP_KERNEL); + if (!range->pages) + return -ENOMEM; + + down_write(¤t->mm->mmap_sem); + ret = get_user_pages((unsigned long) uaddr, nr_pages, gup_flags, + range->pages, NULL); + up_write(¤t->mm->mmap_sem); + + if (ret < nr_pages) { + kfree(range->pages); + return -ENOMEM; + } + + range->nr_pages = nr_pages; + return 0; +} + +static void aio_scqring_unmap(struct kioctx *ctx) +{ + aio_unmap_range(&ctx->sq_ring.ring_range); + aio_unmap_range(&ctx->sq_ring.iocb_range); + aio_unmap_range(&ctx->cq_ring); +} + +static int aio_scqring_map(struct kioctx *ctx, + struct aio_sq_ring __user *sq_ring, + struct aio_cq_ring __user *cq_ring) +{ + int ret, sq_ring_size, cq_ring_size; + struct aio_cq_ring *kcq_ring; + void __user *uptr; + size_t size; + + /* Two is the minimum size we can support. */ + if (ctx->max_reqs < 2) + return -EINVAL; + + /* + * The CQ ring size is QD + 1, so we don't have to track full condition + * for head == tail. The SQ ring we make twice that in size, to make + * room for having more inflight than the QD. + */ + sq_ring_size = ctx->max_reqs; + cq_ring_size = 2 * ctx->max_reqs; + + /* Map SQ ring and iocbs */ + size = sizeof(struct aio_sq_ring) + sq_ring_size * sizeof(u32); + ret = aio_map_range(&ctx->sq_ring.ring_range, sq_ring, size, FOLL_WRITE); + if (ret) + return ret; + + ctx->sq_ring.ring = page_address(ctx->sq_ring.ring_range.pages[0]); + if (ctx->sq_ring.ring->nr_events < sq_ring_size) { + ret = -EFAULT; + goto err; + } + ctx->sq_ring.ring->nr_events = sq_ring_size; + ctx->sq_ring.ring->head = ctx->sq_ring.ring->tail = 0; + + size = sizeof(struct iocb) * sq_ring_size; + uptr = (void __user *) (unsigned long) ctx->sq_ring.ring->iocbs; + ret = aio_map_range(&ctx->sq_ring.iocb_range, uptr, size, 0); + if (ret) + goto err; + + /* Map CQ ring and io_events */ + size = sizeof(struct aio_cq_ring) + + cq_ring_size * sizeof(struct io_event); + ret = aio_map_range(&ctx->cq_ring, cq_ring, size, FOLL_WRITE); + if (ret) + goto err; + + kcq_ring = page_address(ctx->cq_ring.pages[0]); + if (kcq_ring->nr_events < cq_ring_size) { + ret = -EFAULT; + goto err; + } + kcq_ring->nr_events = cq_ring_size; + kcq_ring->head = kcq_ring->tail = 0; + +err: + if (ret) { + aio_unmap_range(&ctx->sq_ring.ring_range); + aio_unmap_range(&ctx->sq_ring.iocb_range); + aio_unmap_range(&ctx->cq_ring); + } + return ret; +} + /* sys_io_setup2: * Like sys_io_setup(), except that it takes a set of flags * (IOCTX_FLAG_*), and some pointers to user structures: * - * *user1 - reserved for future use + * *sq_ring - pointer to the userspace SQ ring, if used. * - * *user2 - reserved for future use. + * *cq_ring - pointer to the userspace CQ ring, if used. */ -SYSCALL_DEFINE5(io_setup2, u32, nr_events, u32, flags, void __user *, user1, - void __user *, user2, aio_context_t __user *, ctxp) +SYSCALL_DEFINE5(io_setup2, u32, nr_events, u32, flags, + struct aio_sq_ring __user *, sq_ring, + struct aio_cq_ring __user *, cq_ring, + aio_context_t __user *, ctxp) { struct kioctx *ioctx; unsigned long ctx; long ret; - if (user1 || user2) - return -EINVAL; - if (flags & ~IOCTX_FLAG_IOPOLL) + if (flags & ~(IOCTX_FLAG_IOPOLL | IOCTX_FLAG_SCQRING)) return -EINVAL; ret = get_user(ctx, ctxp); @@ -1644,9 +1859,17 @@ SYSCALL_DEFINE5(io_setup2, u32, nr_events, u32, flags, void __user *, user1, if (IS_ERR(ioctx)) goto out; + if (flags & IOCTX_FLAG_SCQRING) { + ret = aio_scqring_map(ioctx, sq_ring, cq_ring); + if (ret) + goto err; + } + ret = put_user(ioctx->user_id, ctxp); - if (ret) + if (ret) { +err: kill_ioctx(current->mm, ioctx, NULL); + } percpu_ref_put(&ioctx->users); out: return ret; @@ -2323,8 +2546,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, return -EINVAL; } - /* Poll IO doesn't need ring reservations */ - if (!(ctx->flags & IOCTX_FLAG_IOPOLL) && !get_reqs_available(ctx)) + if (aio_ctx_old_ring(ctx) && !get_reqs_available(ctx)) return -EAGAIN; ret = -EAGAIN; @@ -2413,7 +2635,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, eventfd_ctx_put(req->ki_eventfd); iocb_put(req); out_put_reqs_available: - if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) + if (aio_ctx_old_ring(ctx)) put_reqs_available(ctx, 1); return ret; } @@ -2473,6 +2695,211 @@ static void aio_submit_state_start(struct aio_submit_state *state, #endif } +static const struct iocb *aio_iocb_from_index(struct kioctx *ctx, unsigned idx) +{ + struct aio_mapped_range *range = &ctx->sq_ring.iocb_range; + const struct iocb *iocb; + + iocb = page_address(range->pages[idx >> iocb_page_shift]); + idx &= ((1 << iocb_page_shift) - 1); + return iocb + idx; +} + +static void aio_commit_sqring(struct kioctx *ctx, unsigned next_head) +{ + struct aio_sq_ring *ring = ctx->sq_ring.ring; + + if (ring->head != next_head) { + ring->head = next_head; + smp_wmb(); + } +} + +static const struct iocb *aio_peek_sqring(struct kioctx *ctx, unsigned *nhead) +{ + struct aio_mapped_range *range = &ctx->sq_ring.ring_range; + struct aio_sq_ring *ring = ctx->sq_ring.ring; + unsigned head, index; + u32 *array; + + smp_rmb(); + head = READ_ONCE(ring->head); + if (head == READ_ONCE(ring->tail)) + return NULL; + + *nhead = head + 1; + if (*nhead == ring->nr_events) + *nhead = 0; + + /* + * No guarantee the array is in the first page, so we can't just + * index ring->array. Find the map and offset from the head. + */ + head += offsetof(struct aio_sq_ring, array) >> 2; + array = page_address(range->pages[head >> array_page_shift]); + head &= ((1 << array_page_shift) - 1); + index = array[head]; + + if (index < ring->nr_events) + return aio_iocb_from_index(ctx, index); + + /* drop invalid entries */ + aio_commit_sqring(ctx, *nhead); + return NULL; +} + +static int aio_ring_submit(struct kioctx *ctx, unsigned int to_submit) +{ + struct aio_submit_state state, *statep = NULL; + int i, ret = 0, submit = 0; + + if (to_submit > AIO_PLUG_THRESHOLD) { + aio_submit_state_start(&state, ctx, to_submit); + statep = &state; + } + + for (i = 0; i < to_submit; i++) { + const struct iocb *iocb; + unsigned int next_head; + + iocb = aio_peek_sqring(ctx, &next_head); + if (!iocb) + break; + + ret = __io_submit_one(ctx, iocb, NULL, statep, false); + if (ret) + break; + + submit++; + aio_commit_sqring(ctx, next_head); + } + + if (statep) + aio_submit_state_end(statep); + + return submit ? submit : ret; +} + +/* + * Wait until events become available, if we don't already have some. The + * application must reap them itself, as they reside on the shared cq ring. + */ +static int aio_cqring_wait(struct kioctx *ctx, int min_events) +{ + struct aio_cq_ring *ring = page_address(ctx->cq_ring.pages[0]); + DEFINE_WAIT(wait); + int ret = 0; + + smp_rmb(); + if (ring->head != ring->tail) + return 0; + if (!min_events) + return 0; + + do { + prepare_to_wait(&ctx->wait, &wait, TASK_INTERRUPTIBLE); + + ret = 0; + smp_rmb(); + if (ring->head != ring->tail) + break; + + schedule(); + + ret = -EINVAL; + if (atomic_read(&ctx->dead)) + break; + ret = -EINTR; + if (signal_pending(current)) + break; + } while (1); + + finish_wait(&ctx->wait, &wait); + return ret; +} + +static int __io_ring_enter(struct kioctx *ctx, unsigned int to_submit, + unsigned int min_complete, unsigned int flags) +{ + int ret = 0; + + if (flags & IORING_FLAG_SUBMIT) { + ret = aio_ring_submit(ctx, to_submit); + if (ret < 0) + return ret; + } + if (flags & IORING_FLAG_GETEVENTS) { + unsigned int nr_events = 0; + int get_ret; + + if (!ret && to_submit) + min_complete = 0; + + if (ctx->flags & IOCTX_FLAG_IOPOLL) + get_ret = __aio_iopoll_check(ctx, NULL, &nr_events, + min_complete, -1U); + else + get_ret = aio_cqring_wait(ctx, min_complete); + + if (get_ret < 0 && !ret) + ret = get_ret; + } + + return ret; +} + +/* sys_io_ring_enter: + * Alternative way to both submit and complete IO, instead of using + * io_submit(2) and io_getevents(2). Requires the use of the SQ/CQ + * ring interface, hence the io_context must be setup with + * io_setup2() and IOCTX_FLAG_SCQRING must be specified (and the + * sq_ring/cq_ring passed in). + * + * Returns the number of IOs submitted, if IORING_FLAG_SUBMIT + * is used, otherwise returns 0 for IORING_FLAG_GETEVENTS success, + * but not the number of events, as those will have to be found + * by the application by reading the CQ ring anyway. + * + * Apart from that, the error returns are much like io_submit() + * and io_getevents(), since a lot of the same error conditions + * are shared. + */ +SYSCALL_DEFINE4(io_ring_enter, aio_context_t, ctx_id, u32, to_submit, + u32, min_complete, u32, flags) +{ + struct kioctx *ctx; + long ret; + + ctx = lookup_ioctx(ctx_id); + if (!ctx) { + pr_debug("EINVAL: invalid context id\n"); + return -EINVAL; + } + + ret = -EBUSY; + if (!mutex_trylock(&ctx->getevents_lock)) + goto err; + + ret = -EOVERFLOW; + if (ctx->cq_ring_overflow) { + ctx->cq_ring_overflow = 0; + goto err_unlock; + } + + ret = -EINVAL; + if (unlikely(atomic_read(&ctx->dead))) + goto err_unlock; + + if (ctx->flags & IOCTX_FLAG_SCQRING) + ret = __io_ring_enter(ctx, to_submit, min_complete, flags); + +err_unlock: + mutex_unlock(&ctx->getevents_lock); +err: + percpu_ref_put(&ctx->users); + return ret; +} + /* sys_io_submit: * Queue the nr iocbs pointed to by iocbpp for processing. Returns * the number of iocbs queued. May return -EINVAL if the aio_context @@ -2502,6 +2929,10 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, return -EINVAL; } + /* SCQRING must use io_ring_enter() */ + if (ctx->flags & IOCTX_FLAG_SCQRING) + return -EINVAL; + if (nr > ctx->nr_events) nr = ctx->nr_events; @@ -2653,7 +3084,10 @@ static long do_io_getevents(aio_context_t ctx_id, long ret = -EINVAL; if (likely(ioctx)) { - if (likely(min_nr <= nr && min_nr >= 0)) { + /* SCQRING must use io_ring_enter() */ + if (ioctx->flags & IOCTX_FLAG_SCQRING) + ret = -EINVAL; + else if (min_nr <= nr && min_nr >= 0) { if (ioctx->flags & IOCTX_FLAG_IOPOLL) ret = aio_iopoll_check(ioctx, min_nr, nr, events); else diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 67b7f03aa9fc..e7bd364410c3 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -287,8 +287,10 @@ static inline void addr_limit_user_check(void) */ #ifndef CONFIG_ARCH_HAS_SYSCALL_WRAPPER asmlinkage long sys_io_setup(unsigned nr_reqs, aio_context_t __user *ctx); -asmlinkage long sys_io_setup2(unsigned, unsigned, void __user *, void __user *, +asmlinkage long sys_io_setup2(unsigned, unsigned, struct aio_cq_ring __user *, + struct aio_sq_ring __user *, aio_context_t __user *); +asmlinkage long sys_io_ring_enter(aio_context_t, unsigned, unsigned, unsigned); asmlinkage long sys_io_destroy(aio_context_t ctx); asmlinkage long sys_io_submit(aio_context_t, long, struct iocb __user * __user *); diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index a6829bae9ada..5d3ada40ce15 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -109,6 +109,35 @@ struct iocb { }; /* 64 bytes */ #define IOCTX_FLAG_IOPOLL (1 << 0) /* io_context is polled */ +#define IOCTX_FLAG_SCQRING (1 << 1) /* Use SQ/CQ rings */ + +struct aio_sq_ring { + union { + struct { + u32 head; /* kernel consumer head */ + u32 tail; /* app producer tail */ + u32 nr_events; /* max events in ring */ + u64 iocbs; /* setup pointer to app iocbs */ + }; + u32 pad[16]; + }; + u32 array[0]; /* actual ring, index to iocbs */ +}; + +struct aio_cq_ring { + union { + struct { + u32 head; /* app consumer head */ + u32 tail; /* kernel producer tail */ + u32 nr_events; /* max events in ring */ + }; + struct io_event pad; + }; + struct io_event events[0]; /* ring, array of io_events */ +}; + +#define IORING_FLAG_SUBMIT (1 << 0) +#define IORING_FLAG_GETEVENTS (1 << 1) #undef IFBIG #undef IFLITTLE diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 17c8b4393669..a32b7ea93838 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -38,6 +38,7 @@ asmlinkage long sys_ni_syscall(void) COND_SYSCALL(io_setup); COND_SYSCALL(io_setup2); +COND_SYSCALL(io_ring_enter); COND_SYSCALL_COMPAT(io_setup); COND_SYSCALL(io_destroy); COND_SYSCALL(io_submit); From patchwork Thu Dec 13 17:56:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729383 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5759613BF for ; Thu, 13 Dec 2018 17:57:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4A7F02B0EC for ; Thu, 13 Dec 2018 17:57:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3E9AA2C81D; Thu, 13 Dec 2018 17:57:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3DDCB2B0EC for ; Thu, 13 Dec 2018 17:57:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729745AbeLMR5e (ORCPT ); Thu, 13 Dec 2018 12:57:34 -0500 Received: from mail-it1-f194.google.com ([209.85.166.194]:51811 "EHLO mail-it1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729739AbeLMR5d (ORCPT ); Thu, 13 Dec 2018 12:57:33 -0500 Received: by mail-it1-f194.google.com with SMTP id x19so5005869itl.1 for ; Thu, 13 Dec 2018 09:57:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=CR0GQaJdttUrCvsA2hUdnhy8fVojsRhLDnAew3KP2wg=; b=Ku6O3HCFQtfAlenUOcj0hJieZG7IDYte87OZqkkTXbbBzFkLIQgEoKKBU6ocBFghkG qq4pa/zP7pkEZMEDJs0cZ+Zz0otIY1V8phvrOk99qnZHLlLWbWKOFynptQvpeY0C/0tI +0mxSbbHyxvshiKhMv3K1F4LMciiajdVMP9Q6W68GdSd8qB5t+tBc7lO9DrA22SH12JN 2uffHvkHza7uqTvTS0INUZTgHBYla7w9lsumyRUmkhw+M5HLpVjF5PikyC06ArVR1B0v mR2V4Qq2V4JzRtcMoDuKiN3EgFbLjdaMfEOJ1Y+ClY/umMj/m/X7V9rCnbGwyxrwdPgx 6xtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=CR0GQaJdttUrCvsA2hUdnhy8fVojsRhLDnAew3KP2wg=; b=JFgrWAgFxp9s/K5A18j2fzLCXifOEij4bBN7WCrYzayOa0mIyQPI2m551/WukDOlvX jQPo3vLiE4pUEbgdBg9cTj2MHJ1JkYp0E0xCsfHsT8P15KnmJFoKhEWyFDB67f2WcpAh htqVS6dbkSVPwiOhBWWFctLHqCQnYvwxImXKrju/48g5o4vJYNebDj7PKNVdWiUFlJBj EAsulqTLno6SCTWhPMqKmdCM782xX20sFzApHuyVemIBMZN5nUTqkkL9kfhhy5ewFzcU bPYcmAcJGsn4R3RDBMIksOohSU7MYiAnA0293Af0sRVnLLVNzdCLwcPUwSlLEvVmK1SS zSZw== X-Gm-Message-State: AA+aEWbmaMEQ1XFsafqJZJCH0EhOa6F00+bqQQOcf8+o9swXHXm/IE/V w34tS8OKRUE9kXXwqr2oOZECNE3lcyYdaQ== X-Google-Smtp-Source: AFSGD/WbFkudTva/C+p514ysQSptG/WOx6953Lq/QII3vdZQ7NmaXhnwe4LXJG4pYvOoEHXZfGdeXQ== X-Received: by 2002:a24:41ea:: with SMTP id b103mr303142itd.120.1544723851642; Thu, 13 Dec 2018 09:57:31 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:30 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 22/26] block: add BIO_HOLD_PAGES flag Date: Thu, 13 Dec 2018 10:56:41 -0700 Message-Id: <20181213175645.22181-23-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP For user mapped IO, we do get_user_pages() upfront, and then do a put_page() on each page at end_io time to release the page reference. In preparation for having permanently mapped pages, add a BIO_HOLD_PAGES flag that tells us not to release the pages, the caller will do that. Signed-off-by: Jens Axboe --- block/bio.c | 6 ++++-- include/linux/blk_types.h | 1 + 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/block/bio.c b/block/bio.c index 036e3f0cc736..03dde1c03ae6 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1636,7 +1636,8 @@ static void bio_dirty_fn(struct work_struct *work) next = bio->bi_private; bio_set_pages_dirty(bio); - bio_release_pages(bio); + if (!bio_flagged(bio, BIO_HOLD_PAGES)) + bio_release_pages(bio); bio_put(bio); } } @@ -1652,7 +1653,8 @@ void bio_check_pages_dirty(struct bio *bio) goto defer; } - bio_release_pages(bio); + if (!bio_flagged(bio, BIO_HOLD_PAGES)) + bio_release_pages(bio); bio_put(bio); return; defer: diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 921d734d6b5d..356a4c89b0d9 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -228,6 +228,7 @@ struct bio { #define BIO_TRACE_COMPLETION 10 /* bio_endio() should trace the final completion * of this bio. */ #define BIO_QUEUE_ENTERED 11 /* can use blk_queue_enter_live() */ +#define BIO_HOLD_PAGES 12 /* don't put O_DIRECT pages */ /* See BVEC_POOL_OFFSET below before adding new flags */ From patchwork Thu Dec 13 17:56:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729389 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4BD4C14E2 for ; Thu, 13 Dec 2018 17:57:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3FFFA2B0EC for ; Thu, 13 Dec 2018 17:57:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 33D782BBD8; Thu, 13 Dec 2018 17:57:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D8D492C81B for ; Thu, 13 Dec 2018 17:57:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729749AbeLMR5g (ORCPT ); Thu, 13 Dec 2018 12:57:36 -0500 Received: from mail-it1-f195.google.com ([209.85.166.195]:38349 "EHLO mail-it1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729742AbeLMR5e (ORCPT ); Thu, 13 Dec 2018 12:57:34 -0500 Received: by mail-it1-f195.google.com with SMTP id h65so5188852ith.3 for ; Thu, 13 Dec 2018 09:57:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=vHa/MPvMTJLNTvt1hlMIAttULRvBKDo+uAclgZHB28s=; b=Dm4qNKUT0UH2vQ0zBtXG2cBEU2EAp27PIVNpP6UBFQQxPxctxYSFI7GHI08Yho9U6p D8HGViWxhicHbdzOsplRvhk2pvXrMhe9Ck5u40an887MDEoRN4+VuIdXARGFK1DoumyD Zwpqsgen35Vdzauvt5uDSoinNaelZFAX8vreoHPGdxD7jtlJP+p5LskIfLdGDrO+kAIE D/0NHwaRUIsBwVfp41zM+KJZqvNJXSwe1cyYU2XW1nS3Z5D7gt6Dqj3SWHuWJjDr/Yai X1cMACLqj89BkC6c4k2mSGm7tTFeOwGmt7DHN46JpwlCWDtU0UoNtLO4IsY9lzKq5zts jR4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=vHa/MPvMTJLNTvt1hlMIAttULRvBKDo+uAclgZHB28s=; b=E40BIWi/7dc0OpjOfCPdlbo05AgObbUGfD1JVqkC431YGt0vg9nRsrVdP8PFd7/Yb9 UOGXlCw8G9hX6l0W356Fp2R1WOae1MZGPqS1nLM7nNkaPnJYK8TYbIFvezrMKztKu3TA Q73otEEukHNgq2vgIK7ptdrtVaYRa4jokP3I7SD3tStH0X2A9w1WjgpUeAWrgwnMvkmH Gneb0XLdYQ/Np3mqz+fmnL9lhI0Axg/YepEq3BEq4rC4PEa7MQtfYWhSAR8nQxLUwI4L YyTufutVZVGQJCebqR09z0uDrJIy0hsAhcqVkHjRx1hhwX1IGiH3Y4S9h/KEcZ3EdIx0 JROg== X-Gm-Message-State: AA+aEWb+x3Zfd7KCcEl1sAyASXnlU15yvqRBz2uTLBjx6sSZ1TafN6sT qwY2GR7L6cY+rxh3KO2/BXauws9xduVnRA== X-Google-Smtp-Source: AFSGD/XSHKS8oYDn4yCAwOlKXKVM/dO/9w1qtH3SDCXdKhomNtsJpemgldAUrk5w8grqcYiNZPh6HQ== X-Received: by 2002:a24:b8c6:: with SMTP id m189mr302750ite.72.1544723853307; Thu, 13 Dec 2018 09:57:33 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:32 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 23/26] block: implement bio helper to add iter bvec pages to bio Date: Thu, 13 Dec 2018 10:56:42 -0700 Message-Id: <20181213175645.22181-24-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP For an ITER_BVEC, we can just iterate the iov and add the pages to the bio directly. Signed-off-by: Jens Axboe --- block/bio.c | 27 +++++++++++++++++++++++++++ include/linux/bio.h | 1 + 2 files changed, 28 insertions(+) diff --git a/block/bio.c b/block/bio.c index 03dde1c03ae6..1da1391e8b1d 100644 --- a/block/bio.c +++ b/block/bio.c @@ -904,6 +904,33 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) } EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages); +/** + * bio_iov_bvec_add_pages - add pages from an ITER_BVEC to a bio + * @bio: bio to add pages to + * @iter: iov iterator describing the region to be added + * + * Iterate pages in the @iter and add them to the bio. We flag the + * @bio with BIO_HOLD_PAGES, telling IO completion not to free them. + */ +int bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter) +{ + unsigned short orig_vcnt = bio->bi_vcnt; + const struct bio_vec *bv; + + do { + size_t size; + + bv = iter->bvec + iter->iov_offset; + size = bio_add_page(bio, bv->bv_page, bv->bv_len, bv->bv_offset); + if (size != bv->bv_len) + break; + iov_iter_advance(iter, size); + } while (iov_iter_count(iter) && !bio_full(bio)); + + bio_set_flag(bio, BIO_HOLD_PAGES); + return bio->bi_vcnt > orig_vcnt ? 0 : -EINVAL; +} + static void submit_bio_wait_endio(struct bio *bio) { complete(bio->bi_private); diff --git a/include/linux/bio.h b/include/linux/bio.h index 7380b094dcca..ca25ea890192 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -434,6 +434,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page, void __bio_add_page(struct bio *bio, struct page *page, unsigned int len, unsigned int off); int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter); +int bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter); struct rq_map_data; extern struct bio *bio_map_user_iov(struct request_queue *, struct iov_iter *, gfp_t); From patchwork Thu Dec 13 17:56:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729391 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7C56813BF for ; Thu, 13 Dec 2018 17:57:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6FFD12B0EC for ; Thu, 13 Dec 2018 17:57:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 646EC2C81E; Thu, 13 Dec 2018 17:57:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 550E32B0EC for ; Thu, 13 Dec 2018 17:57:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729742AbeLMR5g (ORCPT ); Thu, 13 Dec 2018 12:57:36 -0500 Received: from mail-it1-f196.google.com ([209.85.166.196]:55383 "EHLO mail-it1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729748AbeLMR5g (ORCPT ); Thu, 13 Dec 2018 12:57:36 -0500 Received: by mail-it1-f196.google.com with SMTP id o19so4961487itg.5 for ; Thu, 13 Dec 2018 09:57:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=gMJi2AsGfGxIdSynZ2zLkJEEU4daS8+d5LPI8e4PgHk=; b=nywuE5PT4UNHctjoo8xM9pNVqkIq8XW9nKeO4JcDQ0LgFRyCuFaYSNSN/4/vCPK1sk N9MXyAZvn+/QdVIHYv2wbRjl5VUhF3VkNPF3nJcvR+ApYdP/r3v8kt4pu4vpytsQ+l5X r+t8czqg4FYpnulnPoaif1wbwSIH70o9zPKGmdzfIEWt8Mvhw3WmwQbjZJbfaOYFHCBg Bzf4dj4JYqfRw7AViVNnjlJUJA1pWClBxjVGuY1NAi0PNpDpg79D2+vfeQZV6W4MFZmI 7dOqyc3jrabXS8dVQ9Q1D4WSzW0ATFMwhJeLCCZ1QvJ1BOo4PDCg9Otxk+fFZ9yJ/Y6Y nRyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=gMJi2AsGfGxIdSynZ2zLkJEEU4daS8+d5LPI8e4PgHk=; b=gLleBKPif1RrdkbzNDcCXHeP42MP1Hgz5oR43psOo6OvoLmuZREXPZw/TQ9lWRGGDm Er28ezTsIno6BKm5kvIggdZB7J4iB+ssRhmpPhtH1IZWkFfEt0f7FLiOaTNkcEUt6Fa2 XVlniFIShbr0pYuaCWF8drG7G/zeedEi+yuSUkYExVs70q3LEh0KLwiEYCQfOqetgYvp D4XZgnec+s1EQqICfULja/1bh736b1qqc89kUuGW34N5ROmFs/T0WgA59Z/h1tnxMLi8 Sl6OnouAtn2YDVT+tugMoe66wamsuBzQoLMqwkFSiyouzTTKuHjudh8dfEBnfbhhlRYG 3gfQ== X-Gm-Message-State: AA+aEWaCjifpUyiwbXEhEeD7pBqOrN4Ql+Uw4K5I15Nbyptpb8OFo/PU Ylfo742nv0hycgBrA0wThSyfl6ffC20dIw== X-Google-Smtp-Source: AFSGD/WKZh0o9VrzM3ilyxmGo2jC5gCIoToW324zXOlQev09aT+IoLWUBMl14i0nw93Xbq1VUdZndg== X-Received: by 2002:a24:be06:: with SMTP id i6mr270364itf.168.1544723855015; Thu, 13 Dec 2018 09:57:35 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:34 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 24/26] fs: add support for mapping an ITER_BVEC for O_DIRECT Date: Thu, 13 Dec 2018 10:56:43 -0700 Message-Id: <20181213175645.22181-25-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This adds support for sync/async O_DIRECT to make a bvec type iter for bdev access, as well as iomap. Signed-off-by: Jens Axboe --- fs/block_dev.c | 16 ++++++++++++---- fs/iomap.c | 10 +++++++--- 2 files changed, 19 insertions(+), 7 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index b8f574615792..236c6abe649d 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -219,7 +219,10 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, bio.bi_end_io = blkdev_bio_end_io_simple; bio.bi_ioprio = iocb->ki_ioprio; - ret = bio_iov_iter_get_pages(&bio, iter); + if (iov_iter_is_bvec(iter)) + ret = bio_iov_bvec_add_pages(&bio, iter); + else + ret = bio_iov_iter_get_pages(&bio, iter); if (unlikely(ret)) goto out; ret = bio.bi_iter.bi_size; @@ -326,8 +329,9 @@ static void blkdev_bio_end_io(struct bio *bio) struct bio_vec *bvec; int i; - bio_for_each_segment_all(bvec, bio, i) - put_page(bvec->bv_page); + if (!bio_flagged(bio, BIO_HOLD_PAGES)) + bio_for_each_segment_all(bvec, bio, i) + put_page(bvec->bv_page); bio_put(bio); } } @@ -381,7 +385,11 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) bio->bi_end_io = blkdev_bio_end_io; bio->bi_ioprio = iocb->ki_ioprio; - ret = bio_iov_iter_get_pages(bio, iter); + if (iov_iter_is_bvec(iter)) + ret = bio_iov_bvec_add_pages(bio, iter); + else + ret = bio_iov_iter_get_pages(bio, iter); + if (unlikely(ret)) { bio->bi_status = BLK_STS_IOERR; bio_endio(bio); diff --git a/fs/iomap.c b/fs/iomap.c index f3039989de73..2bb309a320a3 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -1573,8 +1573,9 @@ static void iomap_dio_bio_end_io(struct bio *bio) struct bio_vec *bvec; int i; - bio_for_each_segment_all(bvec, bio, i) - put_page(bvec->bv_page); + if (!bio_flagged(bio, BIO_HOLD_PAGES)) + bio_for_each_segment_all(bvec, bio, i) + put_page(bvec->bv_page); bio_put(bio); } } @@ -1673,7 +1674,10 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; - ret = bio_iov_iter_get_pages(bio, &iter); + if (iov_iter_is_bvec(&iter)) + ret = bio_iov_bvec_add_pages(bio, &iter); + else + ret = bio_iov_iter_get_pages(bio, &iter); if (unlikely(ret)) { /* * We have to stop part way through an IO. We must fall From patchwork Thu Dec 13 17:56:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729399 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CB80713BF for ; Thu, 13 Dec 2018 17:57:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BEE342C81C for ; Thu, 13 Dec 2018 17:57:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B056A2B0EC; Thu, 13 Dec 2018 17:57:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DBEC32B0EC for ; Thu, 13 Dec 2018 17:57:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729731AbeLMR5i (ORCPT ); Thu, 13 Dec 2018 12:57:38 -0500 Received: from mail-io1-f67.google.com ([209.85.166.67]:33500 "EHLO mail-io1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729751AbeLMR5i (ORCPT ); Thu, 13 Dec 2018 12:57:38 -0500 Received: by mail-io1-f67.google.com with SMTP id t24so2341774ioi.0 for ; Thu, 13 Dec 2018 09:57:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=nnsStoxMjXagyaQxrfF8MKczBgqcviWRVnTOz/hAPko=; b=qdDp1c230eTa990/GP4Ch43Wd1XdEQKuGOuCAkhXQ0ufvvXnXMAe3RZkQa3ixeSloT ibh+MvC9l6ezvem5amBkNTYhq57Bdr12oDtgXyl2eY4xg5o96Je/WbBhpAj7StRQCs9S tI/qjMCAjjsCfC1rx1NHh095VTgwvosJqxOFKL82UyB38GbGef4LtU//ZL9ZALwAXybE gjIpWOrsPXD248YwWKAjJK3zJr1Jwo62HVD+WBCR/SwmVhEsmle5ArxevyNrPSriOPRR qij5EF/GKlcTygDbKYw4aST0UMR49Cw31MjEtn/xyCv+qBzPxC5iVgkdLjX98zgSsCRa D2Nw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=nnsStoxMjXagyaQxrfF8MKczBgqcviWRVnTOz/hAPko=; b=S/+usCtRvBVoltYImfgJyEfAQdWEz+WE4GQ/kHdVO2QONJVTn6080ypzVlGhH4dJLq E9O5g7kuPg/WW08VVL+iimkP7hZjtI/V7Kx7gLyFg1FG3syrJcRrOCC1ZAI+C8BJFvye aZ7pRS2DwQ7b/7XChGToZ5CyW4TzbbhXwMvvEUttitcQOgibNFIp8o39tEvFGV2W07Um Ypv5RfgUB5D9/saFIWONK3+a9GVqoumn/9AXFZP+dTPdywmnnpyja7D8o9k+RBgtxODw sAuxURleLCuVZDlbDncqyOtORKHkmdqJw02G0kOeOvgdp5ClhTkHRAXMc+u2gr1ztssL 64Yg== X-Gm-Message-State: AA+aEWZrH4UCvZBE/YWs+Tx+4L505Dzt11tbwKhbxIIx7i3OUmuBTO/4 c55vGw+iXSlf4Ym0u6n4gdDCwwt/k3SgeQ== X-Google-Smtp-Source: AFSGD/WYu1yHk4SXd+GMlkFnoQ6e63b8mWO66poWcnYPcOSDRBNTb6GUMHuc8Qx1/gTHvovipj0etA== X-Received: by 2002:a6b:fc03:: with SMTP id r3mr785971ioh.129.1544723856684; Thu, 13 Dec 2018 09:57:36 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:35 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 25/26] aio: add support for pre-mapped user IO buffers Date: Thu, 13 Dec 2018 10:56:44 -0700 Message-Id: <20181213175645.22181-26-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If we have fixed user buffers, we can map them into the kernel when we setup the io_context. That avoids the need to do get_user_pages() for each and every IO. To utilize this feature, the application must set both IOCTX_FLAG_USERIOCB, to provide iocb's in userspace, and then IOCTX_FLAG_FIXEDBUFS. The latter tells aio that the iocbs that are mapped already contain valid destination and sizes. These buffers can then be mapped into the kernel for the life time of the io_context, as opposed to just the duration of the each single IO. Only works with non-vectored read/write commands for now, not with PREADV/PWRITEV. A limit of 4M is imposed as the largest buffer we currently support. There's nothing preventing us from going larger, but we need some cap, and 4M seemed like it would definitely be big enough. RLIMIT_MEMLOCK is used to cap the total amount of memory pinned. Signed-off-by: Jens Axboe --- fs/aio.c | 199 +++++++++++++++++++++++++++++++---- include/uapi/linux/aio_abi.h | 1 + 2 files changed, 182 insertions(+), 18 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index a5ec349670bc..66203fee6296 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -42,6 +42,8 @@ #include #include #include +#include +#include #include #include @@ -104,6 +106,11 @@ struct aio_iocb_ring { struct aio_mapped_range iocb_range; /* maps user iocbs */ }; +struct aio_mapped_ubuf { + struct bio_vec *bvec; + unsigned int nr_bvecs; +}; + struct kioctx { struct percpu_ref users; atomic_t dead; @@ -139,6 +146,9 @@ struct kioctx { struct page **ring_pages; long nr_pages; + /* if used, fixed mapped user buffers */ + struct aio_mapped_ubuf *user_bufs; + /* if used, completion and submission rings */ struct aio_iocb_ring sq_ring; struct aio_mapped_range cq_ring; @@ -313,8 +323,10 @@ static const bool aio_use_state_req_list = true; static const bool aio_use_state_req_list = false; #endif +static void aio_iocb_buffer_unmap(struct kioctx *); static void aio_scqring_unmap(struct kioctx *); static void aio_iopoll_reap_events(struct kioctx *); +static const struct iocb *aio_iocb_from_index(struct kioctx *ctx, unsigned idx); static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages) { @@ -693,6 +705,7 @@ static void free_ioctx(struct work_struct *work) pr_debug("freeing %p\n", ctx); aio_scqring_unmap(ctx); + aio_iocb_buffer_unmap(ctx); aio_free_ring(ctx); free_percpu(ctx->cpu); percpu_ref_exit(&ctx->reqs); @@ -1830,6 +1843,122 @@ static int aio_scqring_map(struct kioctx *ctx, return ret; } +static void aio_iocb_buffer_unmap(struct kioctx *ctx) +{ + int i, j; + + if (!ctx->user_bufs) + return; + + for (i = 0; i < ctx->max_reqs; i++) { + struct aio_mapped_ubuf *amu = &ctx->user_bufs[i]; + + for (j = 0; j < amu->nr_bvecs; j++) + put_page(amu->bvec[j].bv_page); + + kfree(amu->bvec); + amu->nr_bvecs = 0; + } + + kfree(ctx->user_bufs); + ctx->user_bufs = NULL; +} + +static int aio_iocb_buffer_map(struct kioctx *ctx) +{ + unsigned long total_pages, page_limit; + struct page **pages = NULL; + int i, j, got_pages = 0; + const struct iocb *iocb; + int ret = -EINVAL; + + ctx->user_bufs = kzalloc(ctx->max_reqs * sizeof(struct aio_mapped_ubuf), + GFP_KERNEL); + if (!ctx->user_bufs) + return -ENOMEM; + + /* Don't allow more pages than we can safely lock */ + total_pages = 0; + page_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; + + for (i = 0; i < ctx->max_reqs; i++) { + struct aio_mapped_ubuf *amu = &ctx->user_bufs[i]; + unsigned long off, start, end, ubuf; + int pret, nr_pages; + size_t size; + + iocb = aio_iocb_from_index(ctx, i); + + /* + * Don't impose further limits on the size and buffer + * constraints here, we'll -EINVAL later when IO is + * submitted if they are wrong. + */ + ret = -EFAULT; + if (!iocb->aio_buf) + goto err; + + /* arbitrary limit, but we need something */ + if (iocb->aio_nbytes > SZ_4M) + goto err; + + ubuf = iocb->aio_buf; + end = (ubuf + iocb->aio_nbytes + PAGE_SIZE - 1) >> PAGE_SHIFT; + start = ubuf >> PAGE_SHIFT; + nr_pages = end - start; + + ret = -ENOMEM; + if (total_pages + nr_pages > page_limit) + goto err; + + if (!pages || nr_pages > got_pages) { + kfree(pages); + pages = kmalloc(nr_pages * sizeof(struct page *), + GFP_KERNEL); + if (!pages) + goto err; + got_pages = nr_pages; + } + + amu->bvec = kmalloc(nr_pages * sizeof(struct bio_vec), + GFP_KERNEL); + if (!amu->bvec) + goto err; + + down_write(¤t->mm->mmap_sem); + pret = get_user_pages((unsigned long) iocb->aio_buf, nr_pages, + 1, pages, NULL); + up_write(¤t->mm->mmap_sem); + + if (pret < nr_pages) { + if (pret < 0) + ret = pret; + goto err; + } + + off = ubuf & ~PAGE_MASK; + size = iocb->aio_nbytes; + for (j = 0; j < nr_pages; j++) { + size_t vec_len; + + vec_len = min_t(size_t, size, PAGE_SIZE - off); + amu->bvec[j].bv_page = pages[j]; + amu->bvec[j].bv_len = vec_len; + amu->bvec[j].bv_offset = off; + off = 0; + size -= vec_len; + } + amu->nr_bvecs = nr_pages; + total_pages += nr_pages; + } + kfree(pages); + return 0; +err: + kfree(pages); + aio_iocb_buffer_unmap(ctx); + return ret; +} + /* sys_io_setup2: * Like sys_io_setup(), except that it takes a set of flags * (IOCTX_FLAG_*), and some pointers to user structures: @@ -1847,7 +1976,8 @@ SYSCALL_DEFINE5(io_setup2, u32, nr_events, u32, flags, unsigned long ctx; long ret; - if (flags & ~(IOCTX_FLAG_IOPOLL | IOCTX_FLAG_SCQRING)) + if (flags & ~(IOCTX_FLAG_IOPOLL | IOCTX_FLAG_SCQRING | + IOCTX_FLAG_FIXEDBUFS)) return -EINVAL; ret = get_user(ctx, ctxp); @@ -1863,6 +1993,15 @@ SYSCALL_DEFINE5(io_setup2, u32, nr_events, u32, flags, ret = aio_scqring_map(ioctx, sq_ring, cq_ring); if (ret) goto err; + if (flags & IOCTX_FLAG_FIXEDBUFS) { + ret = aio_iocb_buffer_map(ioctx); + if (ret) + goto err; + } + } else if (flags & IOCTX_FLAG_FIXEDBUFS) { + /* can only support fixed bufs with SQ/CQ ring */ + ret = -EINVAL; + goto err; } ret = put_user(ioctx->user_id, ctxp); @@ -2142,23 +2281,42 @@ static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb, return ret; } -static int aio_setup_rw(int rw, const struct iocb *iocb, struct iovec **iovec, - bool vectored, bool compat, struct iov_iter *iter) +static int aio_setup_rw(int rw, struct aio_kiocb *kiocb, + const struct iocb *iocb, struct iovec **iovec, bool vectored, + bool compat, bool kaddr, struct iov_iter *iter) { - void __user *buf = (void __user *)(uintptr_t)iocb->aio_buf; + void __user *ubuf = (void __user *)(uintptr_t)iocb->aio_buf; size_t len = iocb->aio_nbytes; if (!vectored) { - ssize_t ret = import_single_range(rw, buf, len, *iovec, iter); + ssize_t ret; + + if (!kaddr) { + ret = import_single_range(rw, ubuf, len, *iovec, iter); + } else { + struct kioctx *ctx = kiocb->ki_ctx; + struct aio_mapped_ubuf *amu; + int index; + + /* __io_submit_one() already validated the index */ + index = array_index_nospec((long)kiocb->ki_user_iocb, + ctx->max_reqs); + amu = &ctx->user_bufs[index]; + iov_iter_bvec(iter, rw, amu->bvec, amu->nr_bvecs, len); + ret = 0; + + } *iovec = NULL; return ret; } + if (kaddr) + return -EINVAL; #ifdef CONFIG_COMPAT if (compat) - return compat_import_iovec(rw, buf, len, UIO_FASTIOV, iovec, + return compat_import_iovec(rw, ubuf, len, UIO_FASTIOV, iovec, iter); #endif - return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter); + return import_iovec(rw, ubuf, len, UIO_FASTIOV, iovec, iter); } static inline void aio_rw_done(struct kiocb *req, ssize_t ret) @@ -2231,7 +2389,7 @@ static void aio_iopoll_iocb_issued(struct aio_submit_state *state, static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, struct aio_submit_state *state, bool vectored, - bool compat) + bool compat, bool kaddr) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *req = &kiocb->rw; @@ -2251,9 +2409,11 @@ static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, if (unlikely(!file->f_op->read_iter)) goto out_fput; - ret = aio_setup_rw(READ, iocb, &iovec, vectored, compat, &iter); + ret = aio_setup_rw(READ, kiocb, iocb, &iovec, vectored, compat, kaddr, + &iter); if (ret) goto out_fput; + ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter)); if (!ret) aio_rw_done(req, call_read_iter(file, req, &iter)); @@ -2266,7 +2426,7 @@ static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, struct aio_submit_state *state, bool vectored, - bool compat) + bool compat, bool kaddr) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *req = &kiocb->rw; @@ -2286,7 +2446,8 @@ static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, if (unlikely(!file->f_op->write_iter)) goto out_fput; - ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, compat, &iter); + ret = aio_setup_rw(WRITE, kiocb, iocb, &iovec, vectored, compat, kaddr, + &iter); if (ret) goto out_fput; ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter)); @@ -2525,7 +2686,8 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, struct iocb __user *user_iocb, - struct aio_submit_state *state, bool compat) + struct aio_submit_state *state, bool compat, + bool kaddr) { struct aio_kiocb *req; ssize_t ret; @@ -2583,16 +2745,16 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, ret = -EINVAL; switch (iocb->aio_lio_opcode) { case IOCB_CMD_PREAD: - ret = aio_read(req, iocb, state, false, compat); + ret = aio_read(req, iocb, state, false, compat, kaddr); break; case IOCB_CMD_PWRITE: - ret = aio_write(req, iocb, state, false, compat); + ret = aio_write(req, iocb, state, false, compat, kaddr); break; case IOCB_CMD_PREADV: - ret = aio_read(req, iocb, state, true, compat); + ret = aio_read(req, iocb, state, true, compat, kaddr); break; case IOCB_CMD_PWRITEV: - ret = aio_write(req, iocb, state, true, compat); + ret = aio_write(req, iocb, state, true, compat, kaddr); break; case IOCB_CMD_FSYNC: if (ctx->flags & IOCTX_FLAG_IOPOLL) @@ -2648,7 +2810,7 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb)))) return -EFAULT; - return __io_submit_one(ctx, &iocb, user_iocb, state, compat); + return __io_submit_one(ctx, &iocb, user_iocb, state, compat, false); } #ifdef CONFIG_BLOCK @@ -2750,6 +2912,7 @@ static const struct iocb *aio_peek_sqring(struct kioctx *ctx, unsigned *nhead) static int aio_ring_submit(struct kioctx *ctx, unsigned int to_submit) { + bool kaddr = (ctx->flags & IOCTX_FLAG_FIXEDBUFS) != 0; struct aio_submit_state state, *statep = NULL; int i, ret = 0, submit = 0; @@ -2766,7 +2929,7 @@ static int aio_ring_submit(struct kioctx *ctx, unsigned int to_submit) if (!iocb) break; - ret = __io_submit_one(ctx, iocb, NULL, statep, false); + ret = __io_submit_one(ctx, iocb, NULL, statep, false, kaddr); if (ret) break; diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index 5d3ada40ce15..39d783175872 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -110,6 +110,7 @@ struct iocb { #define IOCTX_FLAG_IOPOLL (1 << 0) /* io_context is polled */ #define IOCTX_FLAG_SCQRING (1 << 1) /* Use SQ/CQ rings */ +#define IOCTX_FLAG_FIXEDBUFS (1 << 2) /* IO buffers are fixed */ struct aio_sq_ring { union { From patchwork Thu Dec 13 17:56:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10729403 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EF34413BF for ; Thu, 13 Dec 2018 17:57:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E118B2BBD8 for ; Thu, 13 Dec 2018 17:57:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D55172C81D; Thu, 13 Dec 2018 17:57:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 06DDF2BBD8 for ; Thu, 13 Dec 2018 17:57:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727579AbeLMR5l (ORCPT ); Thu, 13 Dec 2018 12:57:41 -0500 Received: from mail-it1-f196.google.com ([209.85.166.196]:34998 "EHLO mail-it1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729751AbeLMR5k (ORCPT ); Thu, 13 Dec 2018 12:57:40 -0500 Received: by mail-it1-f196.google.com with SMTP id p197so5221693itp.0 for ; Thu, 13 Dec 2018 09:57:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=BxfbXcqxqWOlP8ubFQP+pQSkqvYf8++Uvhy7DZeRXMo=; b=vPxrv3kpxgtawjFJ7LnLHwOispQUFWB+UTfgAF1LC8yebE6aAOWFZ5CDomPu9e7VX7 QgzLb9mhDUrCJiZGkLOC5Tzr0A0My3Pn+ZxyOluc55I/UZ/ExN0m/o4qS4QnDEBmSMjO zJPKieYwGMwhTDiaqyj3VFo1i2IMzrcEjLkqxkb6uknk/mr9Xel5hZHPTUwux6w7HEKq mt32qktWgOZausfjodU3aL9+dCGz8WAbyVX11ukCYELvrK0PhxbekIh2fSCbjDHH9PQb teuWojxqzh9blAEwIaQe6LPLJwuu/tRDYWycRr/IqWrn+7KdOJcVcfpliuV4qV4Rw8CH EcHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=BxfbXcqxqWOlP8ubFQP+pQSkqvYf8++Uvhy7DZeRXMo=; b=JxWryEodBYI9Yj86z+gAhITZ8jdvWa+upiJZecK7w/J/YFyxm3UL5+fb2dScJM6+EH a2DiYjapUz212d31csyYSWQ+udtEuwF8z582QBDy80Zl9VWjLMnyFfNHvId2Apl7kOqc QVaWjfApb9ouQDg6VvXlwjSJhkZXMMLlKC1YFjfHiKu9A0anWbML396XemitcIkGDOVa wth5SMpXFRq/SupiQ6iMS6yDUX0/xF9EDtqBRYs0eGvuKG+ODzda19/ZiOBFBn3BEPju DGwNAmvFwlKBLZ1rk+usLnEPm4DYO5V7VwBJEHiWL+O1+k8oGkjdB1aiG3wgBPN7HDA7 y6tw== X-Gm-Message-State: AA+aEWZztnh9r33RzseoGQZgcrqA7PEKYJBFtMEuzeI3wDLx0cdHQJlJ GDIU9kXEb+Fom49aO3EzNuGNKKLZnnEqMQ== X-Google-Smtp-Source: AFSGD/WcF+OEWuojEayV2hnn435oYib1M7Rpem89Jkhe2Q3YCvxpQcyWrSppZo1XHbotZJWOo8n3Ig== X-Received: by 2002:a24:6b58:: with SMTP id v85mr273452itc.11.1544723858178; Thu, 13 Dec 2018 09:57:38 -0800 (PST) Received: from x1.localdomain ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id k6sm1022261ios.69.2018.12.13.09.57.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Dec 2018 09:57:37 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 26/26] aio: support kernel side submission for aio with SCQRING Date: Thu, 13 Dec 2018 10:56:45 -0700 Message-Id: <20181213175645.22181-27-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181213175645.22181-1-axboe@kernel.dk> References: <20181213175645.22181-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add support for backing the io_context with either a thread, or a workqueue and letting those handle the submission for us. This can be used to reduce overhead for submission, or to always make submission async. The latter is particularly useful for buffered aio, which is now fully async with this feature. For polled IO, we could have the kernel side thread hammer on the SQ ring and submit when it finds IO. This would mean that an application would NEVER have to enter the kernel to do IO! Didn't add this yet, but it would be trivial to add. If an application sets IOCTX_FLAG_SCQTHREAD, the io_context gets a single thread backing. If used with buffered IO, this will limit the device queue depth to 1, but it will be async, IOs will simply be serialized. Or an application can set IOCTX_FLAG_SQWQ, in which case the io_context gets a work queue backing. The concurrency level is the mininum of twice the available CPUs, or the queue depth specific for the context. For this mode, we attempt to do buffered reads inline, in case they are cached. So we should only punt to a workqueue, if we would have to block to get our data. Tested with polling, no polling, fixedbufs, no fixedbufs, buffered, O_DIRECT. See this sample application for how to use it: http://git.kernel.dk/cgit/fio/plain/t/aio-ring.c Signed-off-by: Jens Axboe --- fs/aio.c | 421 ++++++++++++++++++++++++++++++++--- include/uapi/linux/aio_abi.h | 3 + 2 files changed, 397 insertions(+), 27 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 66203fee6296..0d96a9ac2044 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include #include @@ -44,6 +45,7 @@ #include #include #include +#include #include #include @@ -111,6 +113,14 @@ struct aio_mapped_ubuf { unsigned int nr_bvecs; }; +struct aio_sq_offload { + struct task_struct *thread; /* if using a thread */ + struct workqueue_struct *wq; /* wq offload */ + struct mm_struct *mm; + struct files_struct *files; + wait_queue_head_t wait; +}; + struct kioctx { struct percpu_ref users; atomic_t dead; @@ -153,6 +163,10 @@ struct kioctx { struct aio_iocb_ring sq_ring; struct aio_mapped_range cq_ring; int cq_ring_overflow; + int submit_eagain; + + /* sq ring submitter thread, if used */ + struct aio_sq_offload sq_offload; struct rcu_work free_rwork; /* see free_ioctx() */ @@ -243,6 +257,7 @@ struct aio_kiocb { unsigned long ki_flags; #define KIOCB_F_POLL_COMPLETED 0 /* polled IO has completed */ #define KIOCB_F_POLL_EAGAIN 1 /* polled submission got EAGAIN */ +#define KIOCB_F_FORCE_NONBLOCK 2 /* inline submission attempt */ refcount_t ki_refcnt; @@ -1350,19 +1365,31 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2) unsigned int tail; /* - * If we can't get a cq entry, userspace overflowed the - * submission (by quite a lot). Flag it as an overflow - * condition, and next io_ring_enter(2) call will return - * -EOVERFLOW. + * Catch EAGAIN early if we've forced a nonblock attempt, as + * we don't want to pass that back down to userspace through + * the CQ ring. Just mark the ctx as such, so the caller will + * see it and punt to workqueue. This is just for buffered + * aio reads. */ - spin_lock_irqsave(&ctx->completion_lock, flags); - ev = aio_peek_cqring(ctx, &tail); - if (ev) { - aio_fill_event(ev, iocb, res, res2); - aio_commit_cqring(ctx, tail); - } else - ctx->cq_ring_overflow = 1; - spin_unlock_irqrestore(&ctx->completion_lock, flags); + if (res == -EAGAIN && + test_bit(KIOCB_F_FORCE_NONBLOCK, &iocb->ki_flags)) { + ctx->submit_eagain = 1; + } else { + /* + * If we can't get a cq entry, userspace overflowed the + * submission (by quite a lot). Flag it as an overflow + * condition, and next io_ring_enter(2) call will return + * -EOVERFLOW. + */ + spin_lock_irqsave(&ctx->completion_lock, flags); + ev = aio_peek_cqring(ctx, &tail); + if (ev) { + aio_fill_event(ev, iocb, res, res2); + aio_commit_cqring(ctx, tail); + } else + ctx->cq_ring_overflow = 1; + spin_unlock_irqrestore(&ctx->completion_lock, flags); + } } else { aio_ring_complete(ctx, iocb, res, res2); @@ -1728,6 +1755,64 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr, return ret; } +static int aio_sq_thread(void *); + +static int aio_sq_thread_start(struct kioctx *ctx) +{ + struct aio_sq_ring *ring = ctx->sq_ring.ring; + struct aio_sq_offload *aso = &ctx->sq_offload; + int ret; + + memset(aso, 0, sizeof(*aso)); + init_waitqueue_head(&aso->wait); + + if (!(ctx->flags & IOCTX_FLAG_FIXEDBUFS)) + aso->mm = current->mm; + + ret = -EBADF; + aso->files = get_files_struct(current); + if (!aso->files) + goto err; + + if (ctx->flags & IOCTX_FLAG_SQTHREAD) { + char name[32]; + + snprintf(name, sizeof(name), "aio-sq-%lu/%d", ctx->user_id, + ring->sq_thread_cpu); + aso->thread = kthread_create_on_cpu(aio_sq_thread, ctx, + ring->sq_thread_cpu, name); + if (IS_ERR(aso->thread)) { + ret = PTR_ERR(aso->thread); + aso->thread = NULL; + goto err; + } + wake_up_process(aso->thread); + } else if (ctx->flags & IOCTX_FLAG_SQWQ) { + int concurrency; + + /* Do QD, or 2 * CPUS, whatever is smallest */ + concurrency = min(ring->nr_events - 1, 2 * num_online_cpus()); + aso->wq = alloc_workqueue("aio-sq-%lu", + WQ_UNBOUND | WQ_FREEZABLE | WQ_SYSFS, + concurrency, + ctx->user_id); + if (!aso->wq) { + ret = -ENOMEM; + goto err; + } + } + + return 0; +err: + if (aso->files) { + put_files_struct(aso->files); + aso->files = NULL; + } + if (aso->mm) + aso->mm = NULL; + return ret; +} + static void aio_unmap_range(struct aio_mapped_range *range) { int i; @@ -1773,6 +1858,20 @@ static int aio_map_range(struct aio_mapped_range *range, void __user *uaddr, static void aio_scqring_unmap(struct kioctx *ctx) { + struct aio_sq_offload *aso = &ctx->sq_offload; + + if (aso->thread) { + kthread_park(aso->thread); + kthread_stop(aso->thread); + aso->thread = NULL; + } else if (aso->wq) { + destroy_workqueue(aso->wq); + aso->wq = NULL; + } + if (aso->files) { + put_files_struct(aso->files); + aso->files = NULL; + } aio_unmap_range(&ctx->sq_ring.ring_range); aio_unmap_range(&ctx->sq_ring.iocb_range); aio_unmap_range(&ctx->cq_ring); @@ -1834,6 +1933,9 @@ static int aio_scqring_map(struct kioctx *ctx, kcq_ring->nr_events = cq_ring_size; kcq_ring->head = kcq_ring->tail = 0; + if (ctx->flags & (IOCTX_FLAG_SQTHREAD | IOCTX_FLAG_SQWQ)) + ret = aio_sq_thread_start(ctx); + err: if (ret) { aio_unmap_range(&ctx->sq_ring.ring_range); @@ -1977,7 +2079,8 @@ SYSCALL_DEFINE5(io_setup2, u32, nr_events, u32, flags, long ret; if (flags & ~(IOCTX_FLAG_IOPOLL | IOCTX_FLAG_SCQRING | - IOCTX_FLAG_FIXEDBUFS)) + IOCTX_FLAG_FIXEDBUFS | IOCTX_FLAG_SQTHREAD | + IOCTX_FLAG_SQWQ)) return -EINVAL; ret = get_user(ctx, ctxp); @@ -1998,8 +2101,9 @@ SYSCALL_DEFINE5(io_setup2, u32, nr_events, u32, flags, if (ret) goto err; } - } else if (flags & IOCTX_FLAG_FIXEDBUFS) { - /* can only support fixed bufs with SQ/CQ ring */ + } else if (flags & (IOCTX_FLAG_FIXEDBUFS | IOCTX_FLAG_SQTHREAD | + IOCTX_FLAG_SQWQ)) { + /* These features only supported with SCQRING */ ret = -EINVAL; goto err; } @@ -2213,7 +2317,7 @@ static struct file *aio_file_get(struct aio_submit_state *state, int fd) } static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb, - struct aio_submit_state *state) + struct aio_submit_state *state, bool force_nonblock) { struct kioctx *ctx = kiocb->ki_ctx; struct kiocb *req = &kiocb->rw; @@ -2246,6 +2350,10 @@ static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb, ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags); if (unlikely(ret)) goto out_fput; + if (force_nonblock) { + req->ki_flags |= IOCB_NOWAIT; + set_bit(KIOCB_F_FORCE_NONBLOCK, &kiocb->ki_flags); + } if (iocb->aio_flags & IOCB_FLAG_HIPRI) { /* shares space in the union, and is rather pointless.. */ @@ -2389,7 +2497,7 @@ static void aio_iopoll_iocb_issued(struct aio_submit_state *state, static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, struct aio_submit_state *state, bool vectored, - bool compat, bool kaddr) + bool compat, bool kaddr, bool force_nonblock) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *req = &kiocb->rw; @@ -2397,7 +2505,7 @@ static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, struct file *file; ssize_t ret; - ret = aio_prep_rw(kiocb, iocb, state); + ret = aio_prep_rw(kiocb, iocb, state, force_nonblock); if (ret) return ret; file = req->ki_filp; @@ -2434,7 +2542,7 @@ static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, struct file *file; ssize_t ret; - ret = aio_prep_rw(kiocb, iocb, state); + ret = aio_prep_rw(kiocb, iocb, state, false); if (ret) return ret; file = req->ki_filp; @@ -2687,7 +2795,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, struct iocb __user *user_iocb, struct aio_submit_state *state, bool compat, - bool kaddr) + bool kaddr, bool force_nonblock) { struct aio_kiocb *req; ssize_t ret; @@ -2745,13 +2853,15 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, ret = -EINVAL; switch (iocb->aio_lio_opcode) { case IOCB_CMD_PREAD: - ret = aio_read(req, iocb, state, false, compat, kaddr); + ret = aio_read(req, iocb, state, false, compat, kaddr, + force_nonblock); break; case IOCB_CMD_PWRITE: ret = aio_write(req, iocb, state, false, compat, kaddr); break; case IOCB_CMD_PREADV: - ret = aio_read(req, iocb, state, true, compat, kaddr); + ret = aio_read(req, iocb, state, true, compat, kaddr, + force_nonblock); break; case IOCB_CMD_PWRITEV: ret = aio_write(req, iocb, state, true, compat, kaddr); @@ -2810,7 +2920,8 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb)))) return -EFAULT; - return __io_submit_one(ctx, &iocb, user_iocb, state, compat, false); + return __io_submit_one(ctx, &iocb, user_iocb, state, compat, false, + false); } #ifdef CONFIG_BLOCK @@ -2929,7 +3040,8 @@ static int aio_ring_submit(struct kioctx *ctx, unsigned int to_submit) if (!iocb) break; - ret = __io_submit_one(ctx, iocb, NULL, statep, false, kaddr); + ret = __io_submit_one(ctx, iocb, NULL, statep, false, kaddr, + false); if (ret) break; @@ -2981,15 +3093,270 @@ static int aio_cqring_wait(struct kioctx *ctx, int min_events) return ret; } +static void aio_fill_cq_error(struct kioctx *ctx, const struct iocb *iocb, + long ret) +{ + struct io_event *ev; + unsigned tail; + + /* + * Only really need the lock for non-polled IO, but this is an error + * so not worth checking. Just lock it so we know kernel access to + * the CQ ring is serialized. + */ + spin_lock_irq(&ctx->completion_lock); + ev = aio_peek_cqring(ctx, &tail); + ev->obj = iocb->aio_data; + ev->data = 0; + ev->res = ret; + ev->res2 = 0; + aio_commit_cqring(ctx, tail); + spin_unlock_irq(&ctx->completion_lock); + + /* + * for thread offload, app could already be sleeping in io_ring_enter() + * before we get to flag the error. wake them up, if needed. + */ + if (ctx->flags & (IOCTX_FLAG_SQTHREAD | IOCTX_FLAG_SQWQ)) + if (waitqueue_active(&ctx->wait)) + wake_up(&ctx->wait); +} + +struct aio_io_work { + struct work_struct work; + struct kioctx *ctx; + struct iocb iocb; +}; + +/* + * sq thread only supports O_DIRECT or FIXEDBUFS IO + */ +static int aio_sq_thread(void *data) +{ + const struct iocb *iocbs[AIO_IOPOLL_BATCH]; + struct aio_submit_state state; + struct kioctx *ctx = data; + struct aio_sq_offload *aso = &ctx->sq_offload; + struct mm_struct *cur_mm = NULL; + struct files_struct *old_files; + mm_segment_t old_fs; + DEFINE_WAIT(wait); + + old_files = current->files; + current->files = aso->files; + + old_fs = get_fs(); + set_fs(USER_DS); + + while (!kthread_should_stop()) { + struct aio_submit_state *statep = NULL; + const struct iocb *iocb; + bool mm_fault = false; + unsigned int nhead; + int ret, i, j; + + iocb = aio_peek_sqring(ctx, &nhead); + if (!iocb) { + prepare_to_wait(&aso->wait, &wait, TASK_INTERRUPTIBLE); + iocb = aio_peek_sqring(ctx, &nhead); + if (!iocb) { + /* + * Drop cur_mm before scheduler. We can't hold + * it for long periods, and it would also + * introduce a deadlock with kill_ioctx(). + */ + if (cur_mm) { + unuse_mm(cur_mm); + mmput(cur_mm); + cur_mm = NULL; + } + if (kthread_should_park()) + kthread_parkme(); + if (kthread_should_stop()) { + finish_wait(&aso->wait, &wait); + break; + } + if (signal_pending(current)) + flush_signals(current); + schedule(); + } + finish_wait(&aso->wait, &wait); + if (!iocb) + continue; + } + + /* If ->mm is set, we're not doing FIXEDBUFS */ + if (aso->mm && !cur_mm) { + mm_fault = !mmget_not_zero(aso->mm); + if (!mm_fault) { + use_mm(aso->mm); + cur_mm = aso->mm; + } + } + + i = 0; + do { + if (i == ARRAY_SIZE(iocbs)) + break; + iocbs[i++] = iocb; + aio_commit_sqring(ctx, nhead); + } while ((iocb = aio_peek_sqring(ctx, &nhead)) != NULL); + + if (i > AIO_PLUG_THRESHOLD) { + aio_submit_state_start(&state, ctx, i); + statep = &state; + } + + for (j = 0; j < i; j++) { + if (unlikely(mm_fault)) + ret = -EFAULT; + else + ret = __io_submit_one(ctx, iocbs[j], NULL, + statep, false, !cur_mm, + false); + if (!ret) + continue; + + aio_fill_cq_error(ctx, iocbs[j], ret); + } + + if (statep) + aio_submit_state_end(&state); + } + current->files = old_files; + set_fs(old_fs); + if (cur_mm) { + unuse_mm(cur_mm); + mmput(cur_mm); + } + return 0; +} + +static void aio_sq_wq_submit_work(struct work_struct *work) +{ + struct aio_io_work *aiw = container_of(work, struct aio_io_work, work); + struct kioctx *ctx = aiw->ctx; + struct aio_sq_offload *aso = &ctx->sq_offload; + mm_segment_t old_fs = get_fs(); + struct files_struct *old_files; + int ret; + + old_files = current->files; + current->files = aso->files; + + if (aso->mm) { + if (!mmget_not_zero(aso->mm)) { + ret = -EFAULT; + goto err; + } + use_mm(aso->mm); + } + + set_fs(USER_DS); + + ret = __io_submit_one(ctx, &aiw->iocb, NULL, NULL, false, !aso->mm, + false); + + set_fs(old_fs); + if (aso->mm) { + unuse_mm(aso->mm); + mmput(aso->mm); + } + +err: + if (ret) + aio_fill_cq_error(ctx, &aiw->iocb, ret); + current->files = old_files; + kfree(aiw); +} + +/* + * If this is a read, try a cached inline read first. If the IO is in the + * page cache, we can satisfy it without blocking and without having to + * punt to a threaded execution. This is much faster, particularly for + * lower queue depth IO, and it's always a lot more efficient. + */ +static int aio_sq_try_inline(struct kioctx *ctx, struct aio_io_work *aiw) +{ + struct aio_sq_offload *aso = &ctx->sq_offload; + int ret; + + if (aiw->iocb.aio_lio_opcode != IOCB_CMD_PREAD && + aiw->iocb.aio_lio_opcode != IOCB_CMD_PREADV) + return -EAGAIN; + + ret = __io_submit_one(ctx, &aiw->iocb, NULL, NULL, false, !aso->mm, + true); + + if (ret == -EAGAIN || ctx->submit_eagain) { + ctx->submit_eagain = 0; + return -EAGAIN; + } + + /* + * We're done - even if this was an error, return 0. The error will + * be in the CQ ring for the application. + */ + kfree(aiw); + return 0; +} + +static int aio_sq_wq_submit(struct kioctx *ctx, unsigned int to_submit) +{ + struct aio_io_work *work; + const struct iocb *iocb; + unsigned nhead; + int ret, queued; + + ret = queued = 0; + while ((iocb = aio_peek_sqring(ctx, &nhead)) != NULL) { + work = kmalloc(sizeof(*work), GFP_KERNEL); + if (!work) { + ret = -ENOMEM; + break; + } + memcpy(&work->iocb, iocb, sizeof(*iocb)); + aio_commit_sqring(ctx, nhead); + ret = aio_sq_try_inline(ctx, work); + if (ret == -EAGAIN) { + INIT_WORK(&work->work, aio_sq_wq_submit_work); + work->ctx = ctx; + queue_work(ctx->sq_offload.wq, &work->work); + ret = 0; + } + queued++; + if (queued == to_submit) + break; + } + + return queued ? queued : ret; +} + static int __io_ring_enter(struct kioctx *ctx, unsigned int to_submit, unsigned int min_complete, unsigned int flags) { int ret = 0; if (flags & IORING_FLAG_SUBMIT) { - ret = aio_ring_submit(ctx, to_submit); - if (ret < 0) - return ret; + if (!to_submit) + return 0; + + /* + * Three options here: + * 1) We have an sq thread, just wake it up to do submissions + * 2) We have an sq wq, queue a work item for each iocb + * 3) Submit directly + */ + if (ctx->flags & IOCTX_FLAG_SQTHREAD) { + wake_up(&ctx->sq_offload.wait); + ret = to_submit; + } else if (ctx->flags & IOCTX_FLAG_SQWQ) { + ret = aio_sq_wq_submit(ctx, to_submit); + } else { + ret = aio_ring_submit(ctx, to_submit); + if (ret < 0) + return ret; + } } if (flags & IORING_FLAG_GETEVENTS) { unsigned int nr_events = 0; diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index 39d783175872..b09b1976e038 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -111,6 +111,8 @@ struct iocb { #define IOCTX_FLAG_IOPOLL (1 << 0) /* io_context is polled */ #define IOCTX_FLAG_SCQRING (1 << 1) /* Use SQ/CQ rings */ #define IOCTX_FLAG_FIXEDBUFS (1 << 2) /* IO buffers are fixed */ +#define IOCTX_FLAG_SQTHREAD (1 << 3) /* Use SQ thread */ +#define IOCTX_FLAG_SQWQ (1 << 4) /* Use SQ workqueue */ struct aio_sq_ring { union { @@ -118,6 +120,7 @@ struct aio_sq_ring { u32 head; /* kernel consumer head */ u32 tail; /* app producer tail */ u32 nr_events; /* max events in ring */ + u16 sq_thread_cpu; u64 iocbs; /* setup pointer to app iocbs */ }; u32 pad[16];