From patchwork Tue Dec 11 00:15:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722787 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1487013BF for ; Tue, 11 Dec 2018 00:16:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 05E122A480 for ; Tue, 11 Dec 2018 00:16:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EEAA52A485; Tue, 11 Dec 2018 00:16:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9602F2A480 for ; Tue, 11 Dec 2018 00:16:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728321AbeLKAQB (ORCPT ); Mon, 10 Dec 2018 19:16:01 -0500 Received: from mail-pl1-f195.google.com ([209.85.214.195]:46530 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727419AbeLKAQA (ORCPT ); Mon, 10 Dec 2018 19:16:00 -0500 Received: by mail-pl1-f195.google.com with SMTP id t13so6013502ply.13 for ; Mon, 10 Dec 2018 16:16:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=uEeDBkOwPid70UEg8pUacK6i7pi0+OoFq4mtLStZsZw=; b=Lq7qeaFCpFyXlVMtW7yM6Vsj8ZETaJYbuLZxoJifsF0BsjdFCr3BJQbKIy9/E6B7RD SdGLRxNQZ7EimEu11CIYK4cvl0ozgq52fFnTchIdDeTZTtOJ2pdQjwFj20SDUyVkCj8c QdTsbrC5CibxGmHTliMp6Tx/LvgN8E8bxgyQZLla3h3O9Pwretg6bs2Q7nY66pwMbuj6 JD2teF7tU9nhngX+rNY1T5ViTWiIpA1nQq42KyKWdktDzqZBS/QqWyZSTeqYHb5WO4qn ohy7acdfuSHlPgXZAZOARRXID0T5GaEer4VG4EDEfJ++x6wP9RCh4NmfJ2r62gjCMbzS V4Ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=uEeDBkOwPid70UEg8pUacK6i7pi0+OoFq4mtLStZsZw=; b=M5Btd8VfSD6amjVHgAHgEYPy1xNdLJtBu+Ql5Gx7FdXW7IjmdP9wXIUu44OuWk0P02 HI7SEJKVS0eowwGogZlKH1nd6ce+j6yyGJElp9bJUStzXG8kbkwRRCbR2bTgBc5fpzcb aaHSCPc0ZXg3mG7BgMn7VcYMvVONLm2dMsG5s+3cgkIXCLP2Qdoto+5ATgQ+BfpyCnG0 NKHQ3eHuZ5vlEKqgJYmSEUAREozjedP1dZK/AWzmKA4ZTrmq1ryTbs2lxSw7SN9MS35z V2L/7K6fWfBeQK1ShvWWHPRptCgIGavMMcN7gpp+ph2PeI8UrfmicQtQru/bcKT7fSA9 kDJQ== X-Gm-Message-State: AA+aEWZ3g3g92eKPJrZTunrtAnLftS/1XMdZYToAWwLgQmnpGFTZhih0 uIJDLc/p85ZTcxgfeqy8AjHMFzy4sOCufQ== X-Google-Smtp-Source: AFSGD/VefVID8JlzG58AAeyltkYw4p/pWt/MxSZ6QHbBDi9nGTy6oU1/OmoLQBZwie1RlX/terIDOw== X-Received: by 2002:a17:902:ba8b:: with SMTP id k11mr13870542pls.177.1544487359353; Mon, 10 Dec 2018 16:15:59 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.15.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:15:58 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 01/27] fs: add an iopoll method to struct file_operations Date: Mon, 10 Dec 2018 17:15:23 -0700 Message-Id: <20181211001549.30085-2-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Christoph Hellwig This new methods is used to explicitly poll for I/O completion for an iocb. It must be called for any iocb submitted asynchronously (that is with a non-null ki_complete) which has the IOCB_HIPRI flag set. The method is assisted by a new ki_cookie field in struct iocb to store the polling cookie. TODO: we can probably union ki_cookie with the existing hint and I/O priority fields to avoid struct kiocb growth. Reviewed-by: Johannes Thumshirn Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- Documentation/filesystems/vfs.txt | 3 +++ include/linux/fs.h | 2 ++ 2 files changed, 5 insertions(+) diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 5f71a252e2e0..d9dc5e4d82b9 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -857,6 +857,7 @@ struct file_operations { ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); ssize_t (*read_iter) (struct kiocb *, struct iov_iter *); ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); + int (*iopoll)(struct kiocb *kiocb, bool spin); int (*iterate) (struct file *, struct dir_context *); int (*iterate_shared) (struct file *, struct dir_context *); __poll_t (*poll) (struct file *, struct poll_table_struct *); @@ -902,6 +903,8 @@ otherwise noted. write_iter: possibly asynchronous write with iov_iter as source + iopoll: called when aio wants to poll for completions on HIPRI iocbs + iterate: called when the VFS needs to read the directory contents iterate_shared: called when the VFS needs to read the directory contents diff --git a/include/linux/fs.h b/include/linux/fs.h index a1ab233e6469..6a5f71f8ae06 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -310,6 +310,7 @@ struct kiocb { int ki_flags; u16 ki_hint; u16 ki_ioprio; /* See linux/ioprio.h */ + unsigned int ki_cookie; /* for ->iopoll */ } __randomize_layout; static inline bool is_sync_kiocb(struct kiocb *kiocb) @@ -1781,6 +1782,7 @@ struct file_operations { ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); ssize_t (*read_iter) (struct kiocb *, struct iov_iter *); ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); + int (*iopoll)(struct kiocb *kiocb, bool spin); int (*iterate) (struct file *, struct dir_context *); int (*iterate_shared) (struct file *, struct dir_context *); __poll_t (*poll) (struct file *, struct poll_table_struct *); From patchwork Tue Dec 11 00:15:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722791 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E69DA18E8 for ; Tue, 11 Dec 2018 00:16:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D7D812A480 for ; Tue, 11 Dec 2018 00:16:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CBCB32A483; Tue, 11 Dec 2018 00:16:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 86D932A485 for ; Tue, 11 Dec 2018 00:16:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727419AbeLKAQD (ORCPT ); Mon, 10 Dec 2018 19:16:03 -0500 Received: from mail-pf1-f193.google.com ([209.85.210.193]:46882 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728340AbeLKAQD (ORCPT ); Mon, 10 Dec 2018 19:16:03 -0500 Received: by mail-pf1-f193.google.com with SMTP id c73so6178759pfe.13 for ; Mon, 10 Dec 2018 16:16:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=iYR2nYossn2CYfY6XYD4jgz1urOe8gSQ6v8WOe22hMY=; b=0knEyz1uqw0GoyqJeJKJmtLkY7cgBPkxw9sp3uWk0EtLP5xk2lhP2JX85lGxPKDwT5 kYeb6eH0yz0mMvhX4gyEdGXBfRYWRjb+7BCXBhxl6wS/CJQ2HZboinKYeo4pEp96NT5B +KE+esbrn2fFyVvonlR3PXQT52U+1rvWmWrnHVQS56E/M9tcv42Q1+beXZ27HRqXBB4U Z3fmkPf2kwn+MZNs9ZJIdFfan1TE9nrSvkCZclSs6GBGNkyOvbyWbwgduc1HZ9QOMe1W wRpJ4UYmrSKTYH3ejEQiHADxgGLOTaCT6ZLm2e/N16vlpdB7YHw4YlwGxNc7aqpqLRgC Ywyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=iYR2nYossn2CYfY6XYD4jgz1urOe8gSQ6v8WOe22hMY=; b=TNE6Yc4mRIpSGMnSDrLHMXZPmM+diYN1rrJQyz7w6LaEhqSGmCaPmvm0Sbl2wlo4rK IyG5ADJZrVMX/Lwtdiic7yRB6uX9oY4cD03A9SDfbNximWdbPEaOyQSI/8yObTIibt2c TOiU2ENRi7qAjHn9rpjcYg5iTpZn9EALpNA186vmNGJUvJTfLFKCQaZPflG9gaZ41d75 BcFcWocevuks2D12mRdv2K4i9yXG13kQhgFLCy0ay9hITPz/8tHIcvRN4a7wMIVrk0EW nzYv3RZRk4q9HiDuA+iCfzC1RSjq9paatBBPQdAZBMdE3Z21FI7mkmTYubm1GSRytzJo 5A1Q== X-Gm-Message-State: AA+aEWYVBtMawpn3eEiHgdcLWIL+DEtLJAlrP8+oqrJeEVl6XYvL2U/C L/8GZlu7tXHpCwXJrkrpB4IzaFCNK49xUQ== X-Google-Smtp-Source: AFSGD/V306dR82B0XzUcnbvwhmQnpzSIiP8i10v9YaUUgWNyKtsS6JHNvkxLrUvzMUXrZrnDlsr4Bw== X-Received: by 2002:a63:e001:: with SMTP id e1mr12782513pgh.39.1544487362018; Mon, 10 Dec 2018 16:16:02 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.15.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:01 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 02/27] block: add REQ_HIPRI_ASYNC Date: Mon, 10 Dec 2018 17:15:24 -0700 Message-Id: <20181211001549.30085-3-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP For the upcoming async polled IO, we can't sleep allocating requests. If we do, then we introduce a deadlock where the submitter already has async polled IO in-flight, but can't wait for them to complete since polled requests must be active found and reaped. Signed-off-by: Jens Axboe --- include/linux/blk_types.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 46c005d601ac..921d734d6b5d 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -347,6 +347,7 @@ enum req_flag_bits { #define REQ_NOWAIT (1ULL << __REQ_NOWAIT) #define REQ_NOUNMAP (1ULL << __REQ_NOUNMAP) #define REQ_HIPRI (1ULL << __REQ_HIPRI) +#define REQ_HIPRI_ASYNC (REQ_HIPRI | REQ_NOWAIT) #define REQ_DRV (1ULL << __REQ_DRV) #define REQ_SWAP (1ULL << __REQ_SWAP) From patchwork Tue Dec 11 00:15:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722795 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9C66D18E8 for ; Tue, 11 Dec 2018 00:16:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8E1A72A480 for ; Tue, 11 Dec 2018 00:16:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 828EE2A485; Tue, 11 Dec 2018 00:16:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3722C2A480 for ; Tue, 11 Dec 2018 00:16:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728694AbeLKAQF (ORCPT ); Mon, 10 Dec 2018 19:16:05 -0500 Received: from mail-pf1-f193.google.com ([209.85.210.193]:33766 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728340AbeLKAQF (ORCPT ); Mon, 10 Dec 2018 19:16:05 -0500 Received: by mail-pf1-f193.google.com with SMTP id c123so6214104pfb.0 for ; Mon, 10 Dec 2018 16:16:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=eJ3aWzBT4gm4FwYHYUV1JBWUWsiFyJVaotzVIFD6bng=; b=noj+jUV60sMVyz45bMaOJmzoUuOgpnvS18GVfOib9Id5a7/erovUYfyTFU3e9Jz0MA 17UbNnQfbEpr55GMT5KZhLdGGFuUzaeokC7USoMlVGgVDCEYga6ZMXXVFNEpvmYlLGXd eq1V5dp65w/78+IU81Jhg+Bh5g32uOPl0YrWaINQ5yob63/0/IIJM07ThRJ50ak3FQb9 bBe5REPo0hA7fXcb8y/cZfkNni/bmEMZ33vTeOm6eX3dO7131sUiLLTUyFJ+ivQikniZ VUqATdSJVmn3FSkgWtR/GBF3hvcM782tPmGj8w3QuAUEmrAnSisYqNxg6tuGBrRhCOsU TWOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=eJ3aWzBT4gm4FwYHYUV1JBWUWsiFyJVaotzVIFD6bng=; b=ILIf8acXfQO6PAzHeIgwnd7Cs7zrEIP9WJQcYwK0YgEkUZABStyUspd44P7SFcFTrZ jFD0+9sbU8jzKMSWDzpY6Gfj6zL6AsdJrMsthSQ8Akt5rylVBK9bQU7evPuSB+1r3nl5 /CYDDUc8GMwk1coH6dwPtN1s3i3HzmwicM0Hxa9yZx5LBARK8hXaHcbbaafAFmpJpZjt QrUD5pJG7FxyHx23a7D5kd9cwymRyirUGQKkHtLcxpfnH08vXm+/UkeKqIXksNu4JqRa pYEZ3h8lQ1GwPbDUC7FA+jude9Jbmd6Kk2sEnGDsSguyEVQPdCoQ8hj3P50jxzxfSi5N luHg== X-Gm-Message-State: AA+aEWb0BX+vT+Thks9vTJ4WOr4vBn9njlKY5ahMI4z5Gz/8vX+dbvg7 WnTQf0ZUq8MVj+mjp0pU6089wxCUHiLQrg== X-Google-Smtp-Source: AFSGD/XxPglJqmFawVA6QATB0/3ON5e3GvdZFChFasZ2cJ6PwiRjkS3eZbKonyduQYlagwv51HGwew== X-Received: by 2002:a63:6704:: with SMTP id b4mr12798183pgc.100.1544487363719; Mon, 10 Dec 2018 16:16:03 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:02 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 03/27] block: wire up block device iopoll method Date: Mon, 10 Dec 2018 17:15:25 -0700 Message-Id: <20181211001549.30085-4-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Christoph Hellwig Just call blk_poll on the iocb cookie, we can derive the block device from the inode trivially. Reviewed-by: Johannes Thumshirn Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/block_dev.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/block_dev.c b/fs/block_dev.c index e1886cc7048f..6de8d35f6e41 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -281,6 +281,14 @@ struct blkdev_dio { static struct bio_set blkdev_dio_pool; +static int blkdev_iopoll(struct kiocb *kiocb, bool wait) +{ + struct block_device *bdev = I_BDEV(kiocb->ki_filp->f_mapping->host); + struct request_queue *q = bdev_get_queue(bdev); + + return blk_poll(q, READ_ONCE(kiocb->ki_cookie), wait); +} + static void blkdev_bio_end_io(struct bio *bio) { struct blkdev_dio *dio = bio->bi_private; @@ -398,6 +406,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) bio->bi_opf |= REQ_HIPRI; qc = submit_bio(bio); + WRITE_ONCE(iocb->ki_cookie, qc); break; } @@ -2070,6 +2079,7 @@ const struct file_operations def_blk_fops = { .llseek = block_llseek, .read_iter = blkdev_read_iter, .write_iter = blkdev_write_iter, + .iopoll = blkdev_iopoll, .mmap = generic_file_mmap, .fsync = blkdev_fsync, .unlocked_ioctl = block_ioctl, From patchwork Tue Dec 11 00:15:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722799 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7674B18E8 for ; Tue, 11 Dec 2018 00:16:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 688492A480 for ; Tue, 11 Dec 2018 00:16:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5D28C2A485; Tue, 11 Dec 2018 00:16:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1CDD62A480 for ; Tue, 11 Dec 2018 00:16:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728340AbeLKAQH (ORCPT ); Mon, 10 Dec 2018 19:16:07 -0500 Received: from mail-pg1-f194.google.com ([209.85.215.194]:38360 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728928AbeLKAQG (ORCPT ); Mon, 10 Dec 2018 19:16:06 -0500 Received: by mail-pg1-f194.google.com with SMTP id g189so5762078pgc.5 for ; Mon, 10 Dec 2018 16:16:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=CoUoLVQhZCk01wz0SKdc8zjKvmVqwUz+894lpbana7A=; b=Bm9xtJPPma2afEHR7Uw+wyVoc5D9mA1PINhDlZYbNCKG5O3Bhd2oRHvxsQqFHOZees 3jL8TOEwpXd5fbjqezrDkcJQ6703z0iCl9dmC5fixjCSfzj7mNWLam62HWqNR2Jv/DTT 0ON/t4aRDgpztUTeupnrzc9cuZUUmOhZ67D9WXDtgnDfT/dHguLdDrljTpUSTB21FoKO hp6RefsmKTRzzKeBFkQaQvAzQTj3qsy4YtReRYrZpl1BYsfIUmy4Y8gcth34Z6yftbmP yNku9YoYdBOgORk/7zi3TY8HNnDFHz4+KT24QygRDnBIf83/hR4zoGUeECXG9PFrM0xG ujNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=CoUoLVQhZCk01wz0SKdc8zjKvmVqwUz+894lpbana7A=; b=mwj07DumEltyX6uWzipzLigivEsw5QSvn/Aps364l7P/n9vrzaFd9cUtJhwUBjoASH 3ToiJSO0bu/owDVk24/Gly/O3yNy8KFH7bfqicgxloF9y1DSP4n0UAaXU7sGrBSiIctm vZMNlsrCauEb/ft5MrIfu/YZdV7WF2tYcZmxprmfzPuwtexmeaFeYLRRNs64m9sRntHV IxqK2glYh5tvtfIry7wSyGHOXdM272OYUQQ/2ckLvteEqjYkMVO9aErBp4B5fetADLh7 GgZ4Z3TwMV27l8kS9DxaK343J4t8QqFzq9wjt5oZb8SnsP6d+rbhlec0ZN2Dj8WCPysJ y1Kw== X-Gm-Message-State: AA+aEWYHYpyo/ySdLPHrgBJdeu2ASyVDE01rU8EdKJXsYIBoz2jRTp06 4BlGPgdI+L24Q/9Je32GdgQjWeQPeO6+zw== X-Google-Smtp-Source: AFSGD/UNZQ1afBnp8asHleyZXkzzeaKWBXN1R4fRXQ0/mDOZNSBPW2NFMeMgsWbNT1nJeGvTydGhow== X-Received: by 2002:a63:7418:: with SMTP id p24mr12804149pgc.196.1544487365517; Mon, 10 Dec 2018 16:16:05 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:04 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 04/27] block: use REQ_HIPRI_ASYNC for non-sync polled IO Date: Mon, 10 Dec 2018 17:15:26 -0700 Message-Id: <20181211001549.30085-5-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Tell the block layer if it's a sync or async polled request, so it can do the right thing. Signed-off-by: Jens Axboe --- fs/block_dev.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 6de8d35f6e41..b8f574615792 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -402,8 +402,12 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES); if (!nr_pages) { - if (iocb->ki_flags & IOCB_HIPRI) - bio->bi_opf |= REQ_HIPRI; + if (iocb->ki_flags & IOCB_HIPRI) { + if (!is_sync) + bio->bi_opf |= REQ_HIPRI_ASYNC; + else + bio->bi_opf |= REQ_HIPRI; + } qc = submit_bio(bio); WRITE_ONCE(iocb->ki_cookie, qc); From patchwork Tue Dec 11 00:15:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722803 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 96D1513BF for ; Tue, 11 Dec 2018 00:16:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 86D472A480 for ; Tue, 11 Dec 2018 00:16:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7B3922A485; Tue, 11 Dec 2018 00:16:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0C22F2A480 for ; Tue, 11 Dec 2018 00:16:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728928AbeLKAQJ (ORCPT ); Mon, 10 Dec 2018 19:16:09 -0500 Received: from mail-pl1-f193.google.com ([209.85.214.193]:33837 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729081AbeLKAQI (ORCPT ); Mon, 10 Dec 2018 19:16:08 -0500 Received: by mail-pl1-f193.google.com with SMTP id w4so6042664plz.1 for ; Mon, 10 Dec 2018 16:16:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=FqB1mIdreafBDC4QK5wLFzs2LJgUqvwh0ZGlyUoGytk=; b=qaDc2qKSVuIueuK8cgbOL1z8I5bd0xhcdYTNxQyJQQOFI0ALOASwz7ZyUV7mRNWaxr 8IDGp2/sNnqEbFAF2Z9J7uz8T1KO8loQaqtrdOSVIok8zJJ7cMlQn3w2GIVdNJvu04VY 9wWZDpwhxiuQKvbUbp54xtXi+HPYB8BuAxxM0I3GXO+p2oUTGm8I7zWLdie8NwSvMqhT UYUESiP66t8kPZTPDB6Bd/w85EBaIFoOdaf9XAFI/8gm6MNGG5HqtuIEIu4XdVDFMmdV Eamsmfz37wKd3cyTe+zW0LYBjB6JnfjXk/zO7o65+lILs92UgcSneRy1y5eZCCG5eEPT 89Cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=FqB1mIdreafBDC4QK5wLFzs2LJgUqvwh0ZGlyUoGytk=; b=ZWYCnNdacOEhAa+ewdpyIXxZqQh9OF9tjPl7RT7JQIjbo+J9DTBP9F+waZdYqmiWLI f2KiJvbT0cOPvT75YN7zXNKG1k4w3jSw3ZZS5UutmTQNYSJp3YqcVUJX9SfL3r8fXw5q Z/afGcGLRZG512CvE+y/3+NMBV9RDahP9+tCfJhdYEktRa7263gOd44713hs5yKqBCiO GKMshBIYbbIizK4picyzEZ835WqA+lmAZKN/HWOYuhrmkgZUs00qvxLebawau27O9aCs mye9A3LabnASvDlfbEsCG/fQq6JtfAeJOAFTfdPS6ivTx8/ORq/RMg0RCjKM3qSBxn6/ /vTA== X-Gm-Message-State: AA+aEWaybRlBVwDqLt+cIi0FAKwQZvSYNvkeH2HSLmYq5YmDabwQVedG BtOWBKMJve4TyIeUANVjzAa6g0GJSkuN1A== X-Google-Smtp-Source: AFSGD/WBD+6ocZPnD3oJEUhJnZJZOPAaCb27CI3lio8sqCXJIA4Do3WPJMaUpycAZN2C8JEI9loSkA== X-Received: by 2002:a17:902:d911:: with SMTP id c17mr9949765plz.151.1544487367403; Mon, 10 Dec 2018 16:16:07 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:06 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 05/27] iomap: wire up the iopoll method Date: Mon, 10 Dec 2018 17:15:27 -0700 Message-Id: <20181211001549.30085-6-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Christoph Hellwig Store the request queue the last bio was submitted to in the iocb private data in addition to the cookie so that we find the right block device. Also refactor the common direct I/O bio submission code into a nice little helper. Signed-off-by: Christoph Hellwig Modified to use REQ_HIPRI_ASYNC for async polled IO. Signed-off-by: Jens Axboe --- fs/gfs2/file.c | 2 ++ fs/iomap.c | 47 +++++++++++++++++++++++++++++-------------- fs/xfs/xfs_file.c | 1 + include/linux/iomap.h | 1 + 4 files changed, 36 insertions(+), 15 deletions(-) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 45a17b770d97..358157efc5b7 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -1280,6 +1280,7 @@ const struct file_operations gfs2_file_fops = { .llseek = gfs2_llseek, .read_iter = gfs2_file_read_iter, .write_iter = gfs2_file_write_iter, + .iopoll = iomap_dio_iopoll, .unlocked_ioctl = gfs2_ioctl, .mmap = gfs2_mmap, .open = gfs2_open, @@ -1310,6 +1311,7 @@ const struct file_operations gfs2_file_fops_nolock = { .llseek = gfs2_llseek, .read_iter = gfs2_file_read_iter, .write_iter = gfs2_file_write_iter, + .iopoll = iomap_dio_iopoll, .unlocked_ioctl = gfs2_ioctl, .mmap = gfs2_mmap, .open = gfs2_open, diff --git a/fs/iomap.c b/fs/iomap.c index 9a5bf1e8925b..f3039989de73 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -1441,6 +1441,32 @@ struct iomap_dio { }; }; +int iomap_dio_iopoll(struct kiocb *kiocb, bool spin) +{ + struct request_queue *q = READ_ONCE(kiocb->private); + + if (!q) + return 0; + return blk_poll(q, READ_ONCE(kiocb->ki_cookie), spin); +} +EXPORT_SYMBOL_GPL(iomap_dio_iopoll); + +static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap, + struct bio *bio) +{ + atomic_inc(&dio->ref); + + if (dio->iocb->ki_flags & IOCB_HIPRI) { + if (!dio->wait_for_completion) + bio->bi_opf |= REQ_HIPRI_ASYNC; + else + bio->bi_opf |= REQ_HIPRI; + } + + dio->submit.last_queue = bdev_get_queue(iomap->bdev); + dio->submit.cookie = submit_bio(bio); +} + static ssize_t iomap_dio_complete(struct iomap_dio *dio) { struct kiocb *iocb = dio->iocb; @@ -1553,7 +1579,7 @@ static void iomap_dio_bio_end_io(struct bio *bio) } } -static blk_qc_t +static void iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos, unsigned len) { @@ -1567,15 +1593,10 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos, bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; - if (dio->iocb->ki_flags & IOCB_HIPRI) - flags |= REQ_HIPRI; - get_page(page); __bio_add_page(bio, page, len, 0); bio_set_op_attrs(bio, REQ_OP_WRITE, flags); - - atomic_inc(&dio->ref); - return submit_bio(bio); + iomap_dio_submit_bio(dio, iomap, bio); } static loff_t @@ -1678,9 +1699,6 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, bio_set_pages_dirty(bio); } - if (dio->iocb->ki_flags & IOCB_HIPRI) - bio->bi_opf |= REQ_HIPRI; - iov_iter_advance(dio->submit.iter, n); dio->size += n; @@ -1688,11 +1706,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, copied += n; nr_pages = iov_iter_npages(&iter, BIO_MAX_PAGES); - - atomic_inc(&dio->ref); - - dio->submit.last_queue = bdev_get_queue(iomap->bdev); - dio->submit.cookie = submit_bio(bio); + iomap_dio_submit_bio(dio, iomap, bio); } while (nr_pages); /* @@ -1903,6 +1917,9 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, if (dio->flags & IOMAP_DIO_WRITE_FUA) dio->flags &= ~IOMAP_DIO_NEED_SYNC; + WRITE_ONCE(iocb->ki_cookie, dio->submit.cookie); + WRITE_ONCE(iocb->private, dio->submit.last_queue); + if (!atomic_dec_and_test(&dio->ref)) { if (!dio->wait_for_completion) return -EIOCBQUEUED; diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index e47425071e65..60c2da41f0fc 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1203,6 +1203,7 @@ const struct file_operations xfs_file_operations = { .write_iter = xfs_file_write_iter, .splice_read = generic_file_splice_read, .splice_write = iter_file_splice_write, + .iopoll = iomap_dio_iopoll, .unlocked_ioctl = xfs_file_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = xfs_file_compat_ioctl, diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 9a4258154b25..0fefb5455bda 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -162,6 +162,7 @@ typedef int (iomap_dio_end_io_t)(struct kiocb *iocb, ssize_t ret, unsigned flags); ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, iomap_dio_end_io_t end_io); +int iomap_dio_iopoll(struct kiocb *kiocb, bool spin); #ifdef CONFIG_SWAP struct file; From patchwork Tue Dec 11 00:15:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722807 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9D12F13BF for ; Tue, 11 Dec 2018 00:16:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8E9652A480 for ; Tue, 11 Dec 2018 00:16:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 82C4D2A485; Tue, 11 Dec 2018 00:16:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 710962A480 for ; Tue, 11 Dec 2018 00:16:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729162AbeLKAQK (ORCPT ); Mon, 10 Dec 2018 19:16:10 -0500 Received: from mail-pl1-f193.google.com ([209.85.214.193]:46537 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729153AbeLKAQK (ORCPT ); Mon, 10 Dec 2018 19:16:10 -0500 Received: by mail-pl1-f193.google.com with SMTP id t13so6013670ply.13 for ; Mon, 10 Dec 2018 16:16:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=er4XNTBrDyS5t2gvWJtJjpkVFpAPOC78sIXCy9gEMJ4=; b=PwW2hgnrxIOVuuD29APFFaSYxEseZew3vbxJUtXsh5mld5LxEC5rMiWY8nar+GaW9o aBU5mAWldf5oR2B3KJ9hWJz86jz0YSyAmEkdvG1pV5yYomEe/3vOI51rnnSWC6eaqmMO gFIZUzlDCVLeoP/D+9znfOHbLQoNtwYO6OLg+97JGm99X1Xxj7jmalmwIqYX92yOyeuS mnk+eJfuC91aN52qsFCPdaAoaZH2/MBAMHC6g8cy/b0XJ1C/3NNZUQTdliXhBNK+s9f/ jzPGSb27uo2vajhut1L8N6640bS085HeEhIw8OwEo4mbpvAoxkNL2nYDTF1aBHM9RN5b 3Esw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=er4XNTBrDyS5t2gvWJtJjpkVFpAPOC78sIXCy9gEMJ4=; b=njfRWxs6diKmsT1Q+6Cg0wb+Neq8G+176akBx4ilSgrG3JQBBhjQOgFizWigmxXk60 nCJpHj4baio3NGa8y/ZzIQdxpoUA1x6VwOoRJa5RXfm8VMWJ9e7aHPVoHYoR/jTdbUji c2VTi6leDRIfR5NNHsntaN8T5fup2nIFSu6e1mJ3qWijEVJt82Glh/HTO+EDM0SeEGrG CClnMFDGJ68JDx6vS9GqDXxfGb77E6hQ61f8AElkIWGEbaowzf4UgU3wjY37gK1+LQqN ULtvAFPwCtjeeriVUHIgsYZraykcdw5Hgz0+cNxDmrSCyGpEAYhy9MkZoImJWCbLLuQR N6Uw== X-Gm-Message-State: AA+aEWYOr0i9PZzkfXqOdRkOC9V0nHlgp/f1tu6PXszcXmrFuwypzZgO U3QWBJNMUsnr4Q8uQO6XwXkB3oFC8j6chg== X-Google-Smtp-Source: AFSGD/WWId/7Jtul5ho39W/VYZhn0Ty/AAZrsLtVZtUv8QDdtZ2gTplSXG9q8JFtrBWF3c6sD92Cyg== X-Received: by 2002:a17:902:6bc4:: with SMTP id m4mr13349724plt.93.1544487369157; Mon, 10 Dec 2018 16:16:09 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:08 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 06/27] aio: use assigned completion handler Date: Mon, 10 Dec 2018 17:15:28 -0700 Message-Id: <20181211001549.30085-7-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We know this is a read/write request, but in preparation for having different kinds of those, ensure that we call the assigned handler instead of assuming it's aio_complete_rq(). Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/aio.c b/fs/aio.c index 05647d352bf3..cf0de61743e8 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1490,7 +1490,7 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t ret) ret = -EINTR; /*FALLTHRU*/ default: - aio_complete_rw(req, ret, 0); + req->ki_complete(req, ret, 0); } } From patchwork Tue Dec 11 00:15:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722811 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1BB8214E2 for ; Tue, 11 Dec 2018 00:16:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0D7002A480 for ; Tue, 11 Dec 2018 00:16:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0220A2A485; Tue, 11 Dec 2018 00:16:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A35062A480 for ; Tue, 11 Dec 2018 00:16:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729139AbeLKAQN (ORCPT ); Mon, 10 Dec 2018 19:16:13 -0500 Received: from mail-pl1-f195.google.com ([209.85.214.195]:44379 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729164AbeLKAQM (ORCPT ); Mon, 10 Dec 2018 19:16:12 -0500 Received: by mail-pl1-f195.google.com with SMTP id k8so6016161pls.11 for ; Mon, 10 Dec 2018 16:16:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=X5thebUgmjkZJznWx9PghrfmA/cZ8t8Zs5+DcD2OHEI=; b=Y8RZ5Ur9i0aZET3QVVgpoDA3T8VsL+WAmQERmAWAMjTGcJYgpdQ6AgKZRrmW3tQ+cm J+iDnztoUFua/4Eq/jNUSmkbFYBT8ziWagQ13bNcHJ9/e/3SVDYR6azkXSV8Q5WwyfjM /vQfSgQ+iDD7NOw/6lSxLSXbswGXCBV6Sknz6Medl+kPrf0jZHBX6eqSzpwuuTnc/9QQ kRHjrEsHJuhSgkZEykxaYsFbM/DVAqrb7aZEF2avywPvTvIpcBnxNFfkyQSDjJcrMFKe l/XkkayC8X9PxEOK5gr597j8BiGMGC+8+DQaDMBEB0LfL0JHoeTaS/LRu4DI1GEOxgVh rTUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=X5thebUgmjkZJznWx9PghrfmA/cZ8t8Zs5+DcD2OHEI=; b=grUS+xAjsKeD6bs+xnYTFQt1rlK0EJvab9w/wOVE+KK3D3H+Ts9sb+gCKGU/gVjKoM 4ss/9J2ABCSwYr9wenW5hjRrbH54uclgAXjkCt9OzBFGR00i54+X2hOnvIqnFUP18hG1 8xbiq0GnurdHKErotzN1ZtXh9bNrB2pNUxqmA/gIeeVAg0Z10ZYikIX8WcX2sElC7JDz kNwAqB2bRj2uaWaABCRkNcuRuC7BIHxUJHcM3WB6KAClLpC4cEE/XXTf4cRUEJOqx6i/ mddlb+VH9jqCH/Vm9FbBM6Z2HJSo7FgTbOPG1GhbH2YhEkC8CnywzYLolpH+vI6IGUCI +U1Q== X-Gm-Message-State: AA+aEWbyxt4Mw9ex0QftQA2TK1Xwz9e/sUfOdfAI5UajfbXepUK87NFw pucyNNNe8gNjIQigyTwkqIoAikYhpChHYQ== X-Google-Smtp-Source: AFSGD/UNL5A2FJv46NFEkODQ1eZ5LGYMNs8Xz5eg3td075YCtKLIYXBYmkmi6olOCSFm66TuqrnVZw== X-Received: by 2002:a17:902:4124:: with SMTP id e33mr14009686pld.236.1544487371294; Mon, 10 Dec 2018 16:16:11 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:10 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 07/27] aio: separate out ring reservation from req allocation Date: Mon, 10 Dec 2018 17:15:29 -0700 Message-Id: <20181211001549.30085-8-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Christoph Hellwig This is in preparation for certain types of IO not needing a ring reserveration. Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index cf0de61743e8..eaceb40e6cf5 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -901,7 +901,7 @@ static void put_reqs_available(struct kioctx *ctx, unsigned nr) local_irq_restore(flags); } -static bool get_reqs_available(struct kioctx *ctx) +static bool __get_reqs_available(struct kioctx *ctx) { struct kioctx_cpu *kcpu; bool ret = false; @@ -993,6 +993,14 @@ static void user_refill_reqs_available(struct kioctx *ctx) spin_unlock_irq(&ctx->completion_lock); } +static bool get_reqs_available(struct kioctx *ctx) +{ + if (__get_reqs_available(ctx)) + return true; + user_refill_reqs_available(ctx); + return __get_reqs_available(ctx); +} + /* aio_get_req * Allocate a slot for an aio request. * Returns NULL if no requests are free. @@ -1001,24 +1009,15 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx) { struct aio_kiocb *req; - if (!get_reqs_available(ctx)) { - user_refill_reqs_available(ctx); - if (!get_reqs_available(ctx)) - return NULL; - } - req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL|__GFP_ZERO); if (unlikely(!req)) - goto out_put; + return NULL; percpu_ref_get(&ctx->reqs); INIT_LIST_HEAD(&req->ki_list); refcount_set(&req->ki_refcnt, 0); req->ki_ctx = ctx; return req; -out_put: - put_reqs_available(ctx, 1); - return NULL; } static struct kioctx *lookup_ioctx(unsigned long ctx_id) @@ -1805,9 +1804,13 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, return -EINVAL; } + if (!get_reqs_available(ctx)) + return -EAGAIN; + + ret = -EAGAIN; req = aio_get_req(ctx); if (unlikely(!req)) - return -EAGAIN; + goto out_put_reqs_available; if (iocb.aio_flags & IOCB_FLAG_RESFD) { /* @@ -1870,11 +1873,12 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, goto out_put_req; return 0; out_put_req: - put_reqs_available(ctx, 1); percpu_ref_put(&ctx->reqs); if (req->ki_eventfd) eventfd_ctx_put(req->ki_eventfd); kmem_cache_free(kiocb_cachep, req); +out_put_reqs_available: + put_reqs_available(ctx, 1); return ret; } From patchwork Tue Dec 11 00:15:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722815 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EBDAA18E8 for ; Tue, 11 Dec 2018 00:16:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DAEC12A480 for ; Tue, 11 Dec 2018 00:16:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CF0B12A485; Tue, 11 Dec 2018 00:16:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 865172A480 for ; Tue, 11 Dec 2018 00:16:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729164AbeLKAQO (ORCPT ); Mon, 10 Dec 2018 19:16:14 -0500 Received: from mail-pl1-f193.google.com ([209.85.214.193]:45380 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729177AbeLKAQO (ORCPT ); Mon, 10 Dec 2018 19:16:14 -0500 Received: by mail-pl1-f193.google.com with SMTP id a14so6020425plm.12 for ; Mon, 10 Dec 2018 16:16:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=j8zbGImEaiI6fzdEzobInCNQIlTJ46kLaRSNEoXmu1s=; b=mobL/sgXqqCnQNBG8A1SuXTWeUbV+ofMZxhvx1FLlC6QcW+sMYvEsGzjJeNt37/GVf bu3aUbbIo8k77wc2T6b2vYP1CXwmc/ZIBdjYtMnUDxlhrJPiaRP9EQzoy0HRWK49PRCX vurIwT/wnro9DlFiegGdET84bJdCCK5mLT0XHfKmOmFGqNRsW8De75iV8C47EJYlFQQl CmztOL59uQd3rXGQMt+7RxwD04pXTWE2VdIR1ZymjNjdc8w6A4eIn7MAlWPxbLB1+z44 XtQT60+rUYNjzr5ihIeRfmEmT52cvW/a6Oj7iSN+bCN75AS1ufHGhYEoY1+xeOUDVhrN 0s8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=j8zbGImEaiI6fzdEzobInCNQIlTJ46kLaRSNEoXmu1s=; b=bZPfWiRtmcRw+4+dBVM4GrY98qxI+pbahntZxaTdULYPuh+/Qzj4RNzjSLLobz6P8U woPiOMc7BvXLhPoOchSHf/7v7tzBET9siY7jAurDchWyViwKuBKEqqiNlRioYzYXj/Fe 3xRbt/ixIKhRu7w0+FWvq66XMpAj0QCMxVg19Js9J0eluNajjBqLRFDy6F6uV8UQp6tt Xyt0C7UN7vaZ6RgE95MvU2Sf8Sh32mxBYqoNlDxDx5+C/xhJhi6pWg0JRREcI8xEomMc WR0AlimJzxKAQHkkyvIpdFhAIXQRaTEFsChW5Nc1Og/eIQJ4xOb7Lemw7nP0QgSbb8NE D9hw== X-Gm-Message-State: AA+aEWb1HRxLd65EOrk+5QirptWDuPA0hfpUOJ1D3RpFVE5ZDlYWk0Gz 86Q/FdZ32hzf8uRFskQFgAeqiBpwFOFa6Q== X-Google-Smtp-Source: AFSGD/Uhx09gfBxAVsQBf/gfo9USeD/Xs5h/cTDX9ehOEsncld6HdPpMRU54NvEOb4MRhJ46Meq4OA== X-Received: by 2002:a17:902:a83:: with SMTP id 3mr13132906plp.276.1544487373090; Mon, 10 Dec 2018 16:16:13 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:12 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 08/27] aio: don't zero entire aio_kiocb aio_get_req() Date: Mon, 10 Dec 2018 17:15:30 -0700 Message-Id: <20181211001549.30085-9-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP It's 192 bytes, fairly substantial. Most items don't need to be cleared, especially not upfront. Clear the ones we do need to clear, and leave the other ones for setup when the iocb is prepared and submitted. Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index eaceb40e6cf5..522c04864d82 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1009,14 +1009,15 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx) { struct aio_kiocb *req; - req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL|__GFP_ZERO); + req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL); if (unlikely(!req)) return NULL; percpu_ref_get(&ctx->reqs); + req->ki_ctx = ctx; INIT_LIST_HEAD(&req->ki_list); refcount_set(&req->ki_refcnt, 0); - req->ki_ctx = ctx; + req->ki_eventfd = NULL; return req; } @@ -1730,6 +1731,10 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb) if (unlikely(!req->file)) return -EBADF; + req->head = NULL; + req->woken = false; + req->cancelled = false; + apt.pt._qproc = aio_poll_queue_proc; apt.pt._key = req->events; apt.iocb = aiocb; From patchwork Tue Dec 11 00:15:31 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722819 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 03ABE18E8 for ; Tue, 11 Dec 2018 00:16:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E7F0C2A480 for ; Tue, 11 Dec 2018 00:16:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DC7BB2A485; Tue, 11 Dec 2018 00:16:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8FEB92A480 for ; Tue, 11 Dec 2018 00:16:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728946AbeLKAQQ (ORCPT ); Mon, 10 Dec 2018 19:16:16 -0500 Received: from mail-pg1-f195.google.com ([209.85.215.195]:47037 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729177AbeLKAQQ (ORCPT ); Mon, 10 Dec 2018 19:16:16 -0500 Received: by mail-pg1-f195.google.com with SMTP id w7so5740515pgp.13 for ; Mon, 10 Dec 2018 16:16:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Fw/O4zwzIPzN/w4BqDNf+gyUsaBSdiYM1Ss+7FBryao=; b=kwCDzfkPee4ZtPzlWVNLI+lJRi5gToNUPBgEc56domEEAg4IR5gD306QKokM9D83QO GST5HWe5HOSi1eV1lraM7vurWHttSvDO7/x/c5+jsOHHEiiDsJqKsLTCiNFka5TDxckL OMHw/6fBXIGZ8FBdpxiv50jLuMU4wjAXUhh+LRGDIwrsEQUFJyLW9e4a+aGpXLHeSr4i IbGrC4GQ5iILQEkDXb/P0ftb1e0eyMCwuoiLkI2LAIqrZRIDvzv0Rd/VyrPpxY/AJqNx 3emZDCZSbHr3qzKFMu1L17S0E5mHHwZU4KV4cUARmiaJM0RaqKVV5/FrBaMPFLVDf4c+ D68A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Fw/O4zwzIPzN/w4BqDNf+gyUsaBSdiYM1Ss+7FBryao=; b=H/HRFOGkRSWbkoILAejgyzGQCknjHEe+Ad9JDGR7lsyGteUOvLv+VXbC8qlP/WNH2a LpNbUOHSBkW6AwCkpS4sqH9/S7pnN1aNgtcDSQcQ1le/uoWe+9CG80TgQJPT3+kbIUeq YRFcc5dDFYMmWryF0A/UvrsIMbw8yAlFexJp2pt3dKGnw6JMkTTV+5OcL0SshJ0SqKDe u0rsPd7BpPptKqqG7jHLH0RgNMO7SQ7gvaGwXo/4/EgR2qy1oI82almDKV9ItcJOu3kp vfhNrbAR9xlI30mK7EiFEHZ9bU0n8D5Vuew7N2ikG8Boz6rWT/p7QNnjT9wrPTryXZDq 5IBg== X-Gm-Message-State: AA+aEWZFLbHpWSiWkxFLaHNt9JyZzQocGwY73U+gdqw73ttQm9IS6TZP +ErgDTxDLXvxPr63ZoG86h9hC6hEIm4L+A== X-Google-Smtp-Source: AFSGD/XvYERFo0Vz9cDYn7LnxLMrIx/U/rmSwCOFvKBEe4f0B1NbhrDYzsqLsmggbSYfaUkgqaZLzg== X-Received: by 2002:a63:2b01:: with SMTP id r1mr12587322pgr.432.1544487374939; Mon, 10 Dec 2018 16:16:14 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:14 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 09/27] aio: only use blk plugs for > 2 depth submissions Date: Mon, 10 Dec 2018 17:15:31 -0700 Message-Id: <20181211001549.30085-10-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Plugging is meant to optimize submission of a string of IOs, if we don't have more than 2 being submitted, don't bother setting up a plug. Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 522c04864d82..ed6c3914477a 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -69,6 +69,12 @@ struct aio_ring { struct io_event io_events[0]; }; /* 128 bytes + ring size */ +/* + * Plugging is meant to work with larger batches of IOs. If we don't + * have more than the below, then don't bother setting up a plug. + */ +#define AIO_PLUG_THRESHOLD 2 + #define AIO_RING_PAGES 8 struct kioctx_table { @@ -1919,7 +1925,8 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, if (nr > ctx->nr_events) nr = ctx->nr_events; - blk_start_plug(&plug); + if (nr > AIO_PLUG_THRESHOLD) + blk_start_plug(&plug); for (i = 0; i < nr; i++) { struct iocb __user *user_iocb; @@ -1932,7 +1939,8 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, if (ret) break; } - blk_finish_plug(&plug); + if (nr > AIO_PLUG_THRESHOLD) + blk_finish_plug(&plug); percpu_ref_put(&ctx->users); return i ? i : ret; @@ -1959,7 +1967,8 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, if (nr > ctx->nr_events) nr = ctx->nr_events; - blk_start_plug(&plug); + if (nr > AIO_PLUG_THRESHOLD) + blk_start_plug(&plug); for (i = 0; i < nr; i++) { compat_uptr_t user_iocb; @@ -1972,7 +1981,8 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, if (ret) break; } - blk_finish_plug(&plug); + if (nr > AIO_PLUG_THRESHOLD) + blk_finish_plug(&plug); percpu_ref_put(&ctx->users); return i ? i : ret; From patchwork Tue Dec 11 00:15:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722823 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E320C18E8 for ; Tue, 11 Dec 2018 00:16:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D46692A480 for ; Tue, 11 Dec 2018 00:16:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C8F702A485; Tue, 11 Dec 2018 00:16:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 871782A480 for ; Tue, 11 Dec 2018 00:16:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729177AbeLKAQS (ORCPT ); Mon, 10 Dec 2018 19:16:18 -0500 Received: from mail-pl1-f195.google.com ([209.85.214.195]:37892 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728609AbeLKAQS (ORCPT ); Mon, 10 Dec 2018 19:16:18 -0500 Received: by mail-pl1-f195.google.com with SMTP id e5so6034193plb.5 for ; Mon, 10 Dec 2018 16:16:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=46YCPSjyTrlcpzDW0Ukbp5ofhNBoK9a5mtdA0d9T/5Q=; b=z5G8mLTgAuvpYLegi+S5fGTYJMXodCmeX47ulRAotdeqmI1MI/cnHV1gbAJ+WwDKg+ WqFwtnWTb83Er1j2Edy9IFjPyMiD0wgIqcYZ7Tz3LxdUgRAFUlYscdteiXlv1JV6Wxiq 6fctt2XXKWVnn72wbxasAKA1UCS6syzquWWT6HoUs35nHVvXYh+NhIhsyDvfjciJCK5O hmwdgaZqsZBmfGkkqauP6aai9kWJQdLExA6UBkB/H/r37/9R4ygZSRjUR3lEvaijGoNu 3/I2E91pKubNVi8d1lSUUloSN/xfKCmLzH3U2N8WQgoK1aOGAog6dT4ggNp3m6t+qzDF jJ7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=46YCPSjyTrlcpzDW0Ukbp5ofhNBoK9a5mtdA0d9T/5Q=; b=sukP4nKGYiP0GwtmC+squZ13OISWT8JhJK88Tn0FmY4sESbuS6b7orxkbuHKzCzrKt +G9avbrW9YMHrt993z4uZbh/lXh/5u9tdbLwXHKVzvBhtjiue+U8hLJGPLPaZGnleeoq NDbdqIsdRKszjDVwy5OMXcMMEEFqxnuEgE7qDYS025NWbemr1o6k6lpDkEaNE16ZueJo aWNsAMO0fIzZqVLXNYR0gIDbVXX/F7HiZvNuwvF/dUkx49PrYJ7ASKd8n8MbIEaDaL0l HVfnrQreqrUeTpVaXUQr2Mt4sLduewTbXngO3ptsxJHV+PeGkiDkAKTDB/AbuZrsTj0o Q6gQ== X-Gm-Message-State: AA+aEWYpi5gQWG4SjspfabR5So+tdh6vvn/yhtthutEpE5FWvqsRrZbM lhjd9XK+LfN6mH+4CV704ZO0a/iNVT46pw== X-Google-Smtp-Source: AFSGD/WueLnrroTicqXzhpk8YxL3bnKO2H1XJvAFXHVH+RDI6IXD4XhAF2DbEAe4Taasoli0mcIBaw== X-Received: by 2002:a17:902:8541:: with SMTP id d1mr14291577plo.205.1544487377374; Mon, 10 Dec 2018 16:16:17 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.15 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:16 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 10/27] aio: use iocb_put() instead of open coding it Date: Mon, 10 Dec 2018 17:15:32 -0700 Message-Id: <20181211001549.30085-11-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Replace the percpu_ref_put() + kmem_cache_free() with a call to iocb_put() instead. Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index ed6c3914477a..cf93b92bfb1e 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1884,10 +1884,9 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, goto out_put_req; return 0; out_put_req: - percpu_ref_put(&ctx->reqs); if (req->ki_eventfd) eventfd_ctx_put(req->ki_eventfd); - kmem_cache_free(kiocb_cachep, req); + iocb_put(req); out_put_reqs_available: put_reqs_available(ctx, 1); return ret; From patchwork Tue Dec 11 00:15:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722827 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3B3B418E8 for ; Tue, 11 Dec 2018 00:16:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 288192A480 for ; Tue, 11 Dec 2018 00:16:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 19E3A2A485; Tue, 11 Dec 2018 00:16:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 84E062A480 for ; Tue, 11 Dec 2018 00:16:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728609AbeLKAQV (ORCPT ); Mon, 10 Dec 2018 19:16:21 -0500 Received: from mail-pl1-f195.google.com ([209.85.214.195]:37894 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729237AbeLKAQU (ORCPT ); Mon, 10 Dec 2018 19:16:20 -0500 Received: by mail-pl1-f195.google.com with SMTP id e5so6034212plb.5 for ; Mon, 10 Dec 2018 16:16:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=zjXjGJpzxG4sQV1Ooff1P7NVDuSBRD7W2guw+LhTNUo=; b=EcltQ7tG5wfRDDtcXzq/muShkXcB/qV/F8mCt30dfRKBSEUOegBIqOVhPcR7RNGeXk RPbgZDB9wPdbocoyqmD+o76aYyH0RPg4+mjI/lF8LjCiOAl5YdEwT+rN2EXQrucDtSmq f2+beWoU8YOHa4pmk4smpCShhR9GUS77mev63NH9VabfxOmQmdCUfB14woQCuvLnGPVy W/c0FgX7A1NsIxJCQfiU0kVrXI5vVC9hoz6fKt/7erCwrb6o5InPLAgKQY04unRdpmqQ 0sKwZF7McQqN3xHV/fhb0wYNmu4kKQPIlrtLQoTSs3e94CCw0wgbJJ31Dxh9d2VjZEaY Dtfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=zjXjGJpzxG4sQV1Ooff1P7NVDuSBRD7W2guw+LhTNUo=; b=Izwq3om4wWKHa/p50d+fgdx1mub1mywbx2kTPU21qTdwcY5KfbpekkF6JQ7kj5qBi5 7lVx/E7bpJ6BpcFtUzriWpedxEfNwILALi5kPD9NrMQohwcfvb2UnmTGEvTYukOMiXQM PjAnsbuVs+8j74TCYAv4oRGEhmdVJeEQkCqR+kOrKmwMhcf/3T3m9uf1xQ2j79R7Dp6I ZSppfb/oQ+VMyKvR3M0oy9k6m9kZNkegHgiO1L2ozxzKq5KNCeDRIbwzWpxo9S6gH4Ga eiTko/6FyXB34sCuhiKmCttbibLLMeg8YhS2dpaJH7FJh7a8sf7ZVGnUmSK7ZTHEnJ/U y9cQ== X-Gm-Message-State: AA+aEWbzf4aMeqtJPA4rRpzdszSgK0jYAgpos6FE9tf/eOE+adtFvty0 gXufVNUhNAJbcs/VG6BGx4uV9qptZBqL3g== X-Google-Smtp-Source: AFSGD/X5ykegWJJlqI/mGWFst5Ot5tXtKaqEnYKlwTBBOk/s5zRAq+o3tOa8SH6wJE3xol6VisalFw== X-Received: by 2002:a17:902:503:: with SMTP id 3mr14192747plf.233.1544487379178; Mon, 10 Dec 2018 16:16:19 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.17 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:18 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 11/27] aio: split out iocb copy from io_submit_one() Date: Mon, 10 Dec 2018 17:15:33 -0700 Message-Id: <20181211001549.30085-12-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In preparation of handing in iocbs in a different fashion as well. Also make it clear that the iocb being passed in isn't modified, by marking it const throughout. Signed-off-by: Jens Axboe --- fs/aio.c | 68 +++++++++++++++++++++++++++++++------------------------- 1 file changed, 38 insertions(+), 30 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index cf93b92bfb1e..06c8bcc72496 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1420,7 +1420,7 @@ static void aio_complete_rw(struct kiocb *kiocb, long res, long res2) aio_complete(iocb, res, res2); } -static int aio_prep_rw(struct kiocb *req, struct iocb *iocb) +static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb) { int ret; @@ -1461,7 +1461,7 @@ static int aio_prep_rw(struct kiocb *req, struct iocb *iocb) return ret; } -static int aio_setup_rw(int rw, struct iocb *iocb, struct iovec **iovec, +static int aio_setup_rw(int rw, const struct iocb *iocb, struct iovec **iovec, bool vectored, bool compat, struct iov_iter *iter) { void __user *buf = (void __user *)(uintptr_t)iocb->aio_buf; @@ -1500,8 +1500,8 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t ret) } } -static ssize_t aio_read(struct kiocb *req, struct iocb *iocb, bool vectored, - bool compat) +static ssize_t aio_read(struct kiocb *req, const struct iocb *iocb, + bool vectored, bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct iov_iter iter; @@ -1533,8 +1533,8 @@ static ssize_t aio_read(struct kiocb *req, struct iocb *iocb, bool vectored, return ret; } -static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored, - bool compat) +static ssize_t aio_write(struct kiocb *req, const struct iocb *iocb, + bool vectored, bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct iov_iter iter; @@ -1589,7 +1589,8 @@ static void aio_fsync_work(struct work_struct *work) aio_complete(container_of(req, struct aio_kiocb, fsync), ret, 0); } -static int aio_fsync(struct fsync_iocb *req, struct iocb *iocb, bool datasync) +static int aio_fsync(struct fsync_iocb *req, const struct iocb *iocb, + bool datasync) { if (unlikely(iocb->aio_buf || iocb->aio_offset || iocb->aio_nbytes || iocb->aio_rw_flags)) @@ -1717,7 +1718,7 @@ aio_poll_queue_proc(struct file *file, struct wait_queue_head *head, add_wait_queue(head, &pt->iocb->poll.wait); } -static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb) +static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) { struct kioctx *ctx = aiocb->ki_ctx; struct poll_iocb *req = &aiocb->poll; @@ -1789,27 +1790,23 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb) return 0; } -static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, - bool compat) +static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, + struct iocb __user *user_iocb, bool compat) { struct aio_kiocb *req; - struct iocb iocb; ssize_t ret; - if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb)))) - return -EFAULT; - /* enforce forwards compatibility on users */ - if (unlikely(iocb.aio_reserved2)) { + if (unlikely(iocb->aio_reserved2)) { pr_debug("EINVAL: reserve field set\n"); return -EINVAL; } /* prevent overflows */ if (unlikely( - (iocb.aio_buf != (unsigned long)iocb.aio_buf) || - (iocb.aio_nbytes != (size_t)iocb.aio_nbytes) || - ((ssize_t)iocb.aio_nbytes < 0) + (iocb->aio_buf != (unsigned long)iocb->aio_buf) || + (iocb->aio_nbytes != (size_t)iocb->aio_nbytes) || + ((ssize_t)iocb->aio_nbytes < 0) )) { pr_debug("EINVAL: overflow check\n"); return -EINVAL; @@ -1823,14 +1820,14 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, if (unlikely(!req)) goto out_put_reqs_available; - if (iocb.aio_flags & IOCB_FLAG_RESFD) { + if (iocb->aio_flags & IOCB_FLAG_RESFD) { /* * If the IOCB_FLAG_RESFD flag of aio_flags is set, get an * instance of the file* now. The file descriptor must be * an eventfd() fd, and will be signaled for each completed * event using the eventfd_signal() function. */ - req->ki_eventfd = eventfd_ctx_fdget((int) iocb.aio_resfd); + req->ki_eventfd = eventfd_ctx_fdget((int) iocb->aio_resfd); if (IS_ERR(req->ki_eventfd)) { ret = PTR_ERR(req->ki_eventfd); req->ki_eventfd = NULL; @@ -1845,32 +1842,32 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, } req->ki_user_iocb = user_iocb; - req->ki_user_data = iocb.aio_data; + req->ki_user_data = iocb->aio_data; - switch (iocb.aio_lio_opcode) { + switch (iocb->aio_lio_opcode) { case IOCB_CMD_PREAD: - ret = aio_read(&req->rw, &iocb, false, compat); + ret = aio_read(&req->rw, iocb, false, compat); break; case IOCB_CMD_PWRITE: - ret = aio_write(&req->rw, &iocb, false, compat); + ret = aio_write(&req->rw, iocb, false, compat); break; case IOCB_CMD_PREADV: - ret = aio_read(&req->rw, &iocb, true, compat); + ret = aio_read(&req->rw, iocb, true, compat); break; case IOCB_CMD_PWRITEV: - ret = aio_write(&req->rw, &iocb, true, compat); + ret = aio_write(&req->rw, iocb, true, compat); break; case IOCB_CMD_FSYNC: - ret = aio_fsync(&req->fsync, &iocb, false); + ret = aio_fsync(&req->fsync, iocb, false); break; case IOCB_CMD_FDSYNC: - ret = aio_fsync(&req->fsync, &iocb, true); + ret = aio_fsync(&req->fsync, iocb, true); break; case IOCB_CMD_POLL: - ret = aio_poll(req, &iocb); + ret = aio_poll(req, iocb); break; default: - pr_debug("invalid aio operation %d\n", iocb.aio_lio_opcode); + pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode); ret = -EINVAL; break; } @@ -1892,6 +1889,17 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, return ret; } +static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, + bool compat) +{ + struct iocb iocb; + + if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb)))) + return -EFAULT; + + return __io_submit_one(ctx, &iocb, user_iocb, compat); +} + /* sys_io_submit: * Queue the nr iocbs pointed to by iocbpp for processing. Returns * the number of iocbs queued. May return -EINVAL if the aio_context From patchwork Tue Dec 11 00:15:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722831 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AAF0F18E8 for ; Tue, 11 Dec 2018 00:16:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9C7F22A480 for ; Tue, 11 Dec 2018 00:16:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9134A2A485; Tue, 11 Dec 2018 00:16:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4A8BC2A480 for ; Tue, 11 Dec 2018 00:16:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729237AbeLKAQW (ORCPT ); Mon, 10 Dec 2018 19:16:22 -0500 Received: from mail-pl1-f193.google.com ([209.85.214.193]:46550 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729248AbeLKAQV (ORCPT ); Mon, 10 Dec 2018 19:16:21 -0500 Received: by mail-pl1-f193.google.com with SMTP id t13so6013873ply.13 for ; Mon, 10 Dec 2018 16:16:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=L8yxXuFlKRVZGhU6vByhKAQcKxwaMq31dRwuK5kkt+A=; b=GCnoPPUoagNq0kOwllOvPwQlYFRv/5BkOJlrw4jvQfsgDG1Ae6xGy+dthV6G9g6+ZQ IFqGL7YskI6o9mX4YWKoxWchXjm4XGdd6+EPZZfHvETv9AABLvNrkNcABH0+5AzU/a1c zYLProS0ZI/Hlq9fl3XKC+MPRDSgUTnBPQI36y2VejSPQTT3oKIMtgWpWYkkCEmJyVTc 0+Yy80GZhODptHiU74Ke50zMa0JsnuTaa/mEKvCY9/2yjJyn+p7PYMcSRqMmxVKWcLVy hYEcb1trx/uR+jFabA/qCcLQ4T1IQCLo/bbGyXhhvJo3sUjSK4sfYxtqx/CYqwh14OWm I1BQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=L8yxXuFlKRVZGhU6vByhKAQcKxwaMq31dRwuK5kkt+A=; b=RWv1fQg3y8WrNT2hJCi7ritmCUsy/lf8E5ZeknybTVzMFlHIZa2wYhbgQHFM7Hzvas 4C2kbcg8P2bruntuJCFziyp8gTkYwWCuIbqhd6GyxN/5ZJxKHci+6oJgXepvTZHmq7eQ dT/FyZD17zCt+UV6yA6/6iQtFWCwLfPPSvi+n4UTC9UeydxK6niebBYobhqZrMW62Z82 NKXm4ljTl+x9Wp9f6rgZxGcwTH9lfu38KS6RGe+p1koDQAiLEwJUwEFGJRHDaPzDobJW wjEx3nYeuTaF07JH+WVILcdieGtTORf7I+HmhstTiOXlVV1I+DqLrgw7jOOJGNKoALNa JDAw== X-Gm-Message-State: AA+aEWZU9y9+7TKUYYeszZ3BEgtoXQCTF5YZGYcD/WoNZEhYimMVUbDy ty1Y8PZ1OWjVtqqm+llIwy5edVAVpCOfpQ== X-Google-Smtp-Source: AFSGD/Xrsck6afU4P2Mfh9U0ZAqNvpmAN/Gqg+zayLhwzo45glZp8XkqY5tHxWS0fulXouT190WmPQ== X-Received: by 2002:a17:902:ba8b:: with SMTP id k11mr13871283pls.177.1544487380952; Mon, 10 Dec 2018 16:16:20 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:20 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 12/27] aio: abstract out io_event filler helper Date: Mon, 10 Dec 2018 17:15:34 -0700 Message-Id: <20181211001549.30085-13-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Jens Axboe --- fs/aio.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 06c8bcc72496..173f1f79dc8f 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1063,6 +1063,15 @@ static inline void iocb_put(struct aio_kiocb *iocb) } } +static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb, + long res, long res2) +{ + ev->obj = (u64)(unsigned long)iocb->ki_user_iocb; + ev->data = iocb->ki_user_data; + ev->res = res; + ev->res2 = res2; +} + /* aio_complete * Called when the io request on the given iocb is complete. */ @@ -1090,10 +1099,7 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2) ev_page = kmap_atomic(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]); event = ev_page + pos % AIO_EVENTS_PER_PAGE; - event->obj = (u64)(unsigned long)iocb->ki_user_iocb; - event->data = iocb->ki_user_data; - event->res = res; - event->res2 = res2; + aio_fill_event(event, iocb, res, res2); kunmap_atomic(ev_page); flush_dcache_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]); From patchwork Tue Dec 11 00:15:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722835 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1BDAA18E8 for ; Tue, 11 Dec 2018 00:16:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0C37A2A480 for ; Tue, 11 Dec 2018 00:16:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0039A2A485; Tue, 11 Dec 2018 00:16:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4C3AD2A483 for ; Tue, 11 Dec 2018 00:16:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729248AbeLKAQY (ORCPT ); Mon, 10 Dec 2018 19:16:24 -0500 Received: from mail-pl1-f193.google.com ([209.85.214.193]:44029 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729279AbeLKAQY (ORCPT ); Mon, 10 Dec 2018 19:16:24 -0500 Received: by mail-pl1-f193.google.com with SMTP id gn14so6021126plb.10 for ; Mon, 10 Dec 2018 16:16:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=2VwCYOeUGu7l/Lze6URcp7snxlphdCvYVYhOjWANYnU=; b=F9nlvPXXhlHrNd6o85cJH2/Ml/NeAwJJ8T8WB+GrR7KcUi5CjYAYOHIXZHJ37Kx89v Os415k0VQ1So8ApFMyO1QzVVyeUdAIlfAvgIJ5FWReocfpzHzm4GR/y2OMbwsimDfX7s XvkcWPETR1Fmh51LPseyyj1gtAxNAUcmXGSLVFIZkj/VQuUf5J8TvArLNIcwc+2PN63y Qthg13fvxe6VLxUX4xwLMqSz8T5u/+TP3X/Luc4IcOaO/LnvOZTOPxeSeHH4DKw99FaU /i/i8v0y4AGe8o/JhX6wLvFETCcFckLoM/k8oIr/iOUTkaj7Yd9WjVYufEDGhnViZ+GG x3cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=2VwCYOeUGu7l/Lze6URcp7snxlphdCvYVYhOjWANYnU=; b=q4Do3S8dSoOWlifnAeIgTGodAUe6AKW4CDV8ksFdtejuGRURAoVbCVFc6w0LPtu2vK xXbicidLfeqzfrum0AnVm0uJ8f9J1XhGFR7p71EmLoRxCgwXIRg+dvQn0z2HsgwCoE2/ KZERss/q9aSAZPgwCFzaqd+yrIGxTBHuBLooaQ+TX9jmCRj/Ay8FvN9he6VUVpOfyAyK p1eis2rLZgUM6TtuZkwhvVD3xojMnRHsfvEBjm7ZNY1EjTIHKIEDPSP1CVb6qYCw5EtJ 9ZwcIsFeJbB+hQT/gEcvveC+qm/UC96rx/kcFNxJIn+rIPGsvB1WarTLHziL93/5yJej S0fQ== X-Gm-Message-State: AA+aEWaDFRhCYxr81ZRyxEwWG/H5BWbJjmjuMK+EuY79c7tj4Nkiy6fc ukXZDKEPU9grsAQiA54JWDjxEA+BH0IMTA== X-Google-Smtp-Source: AFSGD/WBZLUiMmZH02QLdJXp1E0IcmOr7+t1UVjvME+HULOeu6z4/d0GcPqqkP543sQfKkvPEkOSCQ== X-Received: by 2002:a17:902:9a9:: with SMTP id 38mr13816843pln.204.1544487382810; Mon, 10 Dec 2018 16:16:22 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:21 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 13/27] aio: add io_setup2() system call Date: Mon, 10 Dec 2018 17:15:35 -0700 Message-Id: <20181211001549.30085-14-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is just like io_setup(), except add a flags argument to let the caller control/define some of the io_context behavior. Outside of the flags, we add an iocb array and two user pointers for future use. Signed-off-by: Jens Axboe --- Documentation/sysctl/fs.txt | 8 +-- arch/x86/entry/syscalls/syscall_64.tbl | 1 + fs/aio.c | 80 ++++++++++++++++++-------- include/linux/syscalls.h | 3 + include/uapi/asm-generic/unistd.h | 4 +- kernel/sys_ni.c | 1 + 6 files changed, 67 insertions(+), 30 deletions(-) diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt index 819caf8ca05f..5e484eb7a25f 100644 --- a/Documentation/sysctl/fs.txt +++ b/Documentation/sysctl/fs.txt @@ -47,10 +47,10 @@ Currently, these files are in /proc/sys/fs: aio-nr & aio-max-nr: aio-nr is the running total of the number of events specified on the -io_setup system call for all currently active aio contexts. If aio-nr -reaches aio-max-nr then io_setup will fail with EAGAIN. Note that -raising aio-max-nr does not result in the pre-allocation or re-sizing -of any kernel data structures. +io_setup/io_setup2 system call for all currently active aio contexts. +If aio-nr reaches aio-max-nr then io_setup will fail with EAGAIN. +Note that raising aio-max-nr does not result in the pre-allocation or +re-sizing of any kernel data structures. ============================================================== diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index f0b1709a5ffb..67c357225fb0 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -343,6 +343,7 @@ 332 common statx __x64_sys_statx 333 common io_pgetevents __x64_sys_io_pgetevents 334 common rseq __x64_sys_rseq +335 common io_setup2 __x64_sys_io_setup2 # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/fs/aio.c b/fs/aio.c index 173f1f79dc8f..0bad70eab553 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -100,6 +100,8 @@ struct kioctx { unsigned long user_id; + unsigned int flags; + struct __percpu kioctx_cpu *cpu; /* @@ -686,10 +688,8 @@ static void aio_nr_sub(unsigned nr) spin_unlock(&aio_nr_lock); } -/* ioctx_alloc - * Allocates and initializes an ioctx. Returns an ERR_PTR if it failed. - */ -static struct kioctx *ioctx_alloc(unsigned nr_events) +static struct kioctx *io_setup_flags(unsigned long ctxid, + unsigned int nr_events, unsigned int flags) { struct mm_struct *mm = current->mm; struct kioctx *ctx; @@ -701,6 +701,12 @@ static struct kioctx *ioctx_alloc(unsigned nr_events) */ unsigned int max_reqs = nr_events; + if (unlikely(ctxid || nr_events == 0)) { + pr_debug("EINVAL: ctx %lu nr_events %u\n", + ctxid, nr_events); + return ERR_PTR(-EINVAL); + } + /* * We keep track of the number of available ringbuffer slots, to prevent * overflow (reqs_available), and we also use percpu counters for this. @@ -726,6 +732,7 @@ static struct kioctx *ioctx_alloc(unsigned nr_events) if (!ctx) return ERR_PTR(-ENOMEM); + ctx->flags = flags; ctx->max_reqs = max_reqs; spin_lock_init(&ctx->ctx_lock); @@ -1281,6 +1288,45 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr, return ret; } +/* sys_io_setup2: + * Like sys_io_setup(), except that it takes a set of flags + * (IOCTX_FLAG_*), and some pointers to user structures: + * + * *iocbs - pointer to array of struct iocb, for when + * IOCTX_FLAG_USERIOCB is set in flags. + * + * *user1 - reserved for future use + * + * *user2 - reserved for future use. + */ +SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, struct iocb __user *, + iocbs, void __user *, user1, void __user *, user2, + aio_context_t __user *, ctxp) +{ + struct kioctx *ioctx; + unsigned long ctx; + long ret; + + if (flags || user1 || user2) + return -EINVAL; + + ret = get_user(ctx, ctxp); + if (unlikely(ret)) + goto out; + + ioctx = io_setup_flags(ctx, nr_events, flags); + ret = PTR_ERR(ioctx); + if (IS_ERR(ioctx)) + goto out; + + ret = put_user(ioctx->user_id, ctxp); + if (ret) + kill_ioctx(current->mm, ioctx, NULL); + percpu_ref_put(&ioctx->users); +out: + return ret; +} + /* sys_io_setup: * Create an aio_context capable of receiving at least nr_events. * ctxp must not point to an aio_context that already exists, and @@ -1296,7 +1342,7 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr, */ SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctxp) { - struct kioctx *ioctx = NULL; + struct kioctx *ioctx; unsigned long ctx; long ret; @@ -1304,14 +1350,7 @@ SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctxp) if (unlikely(ret)) goto out; - ret = -EINVAL; - if (unlikely(ctx || nr_events == 0)) { - pr_debug("EINVAL: ctx %lu nr_events %u\n", - ctx, nr_events); - goto out; - } - - ioctx = ioctx_alloc(nr_events); + ioctx = io_setup_flags(ctx, nr_events, 0); ret = PTR_ERR(ioctx); if (!IS_ERR(ioctx)) { ret = put_user(ioctx->user_id, ctxp); @@ -1327,7 +1366,7 @@ SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctxp) #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE2(io_setup, unsigned, nr_events, u32 __user *, ctx32p) { - struct kioctx *ioctx = NULL; + struct kioctx *ioctx; unsigned long ctx; long ret; @@ -1335,23 +1374,14 @@ COMPAT_SYSCALL_DEFINE2(io_setup, unsigned, nr_events, u32 __user *, ctx32p) if (unlikely(ret)) goto out; - ret = -EINVAL; - if (unlikely(ctx || nr_events == 0)) { - pr_debug("EINVAL: ctx %lu nr_events %u\n", - ctx, nr_events); - goto out; - } - - ioctx = ioctx_alloc(nr_events); + ioctx = io_setup_flags(ctx, nr_events, 0); ret = PTR_ERR(ioctx); if (!IS_ERR(ioctx)) { - /* truncating is ok because it's a user address */ - ret = put_user((u32)ioctx->user_id, ctx32p); + ret = put_user(ioctx->user_id, ctx32p); if (ret) kill_ioctx(current->mm, ioctx, NULL); percpu_ref_put(&ioctx->users); } - out: return ret; } diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 2ac3d13a915b..a20a663d583f 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -287,6 +287,9 @@ static inline void addr_limit_user_check(void) */ #ifndef CONFIG_ARCH_HAS_SYSCALL_WRAPPER asmlinkage long sys_io_setup(unsigned nr_reqs, aio_context_t __user *ctx); +asmlinkage long sys_io_setup2(unsigned, unsigned, struct iocb __user *, + void __user *, void __user *, + aio_context_t __user *); asmlinkage long sys_io_destroy(aio_context_t ctx); asmlinkage long sys_io_submit(aio_context_t, long, struct iocb __user * __user *); diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index c7f3321fbe43..1bbaa4c59f20 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -738,9 +738,11 @@ __SYSCALL(__NR_statx, sys_statx) __SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents) #define __NR_rseq 293 __SYSCALL(__NR_rseq, sys_rseq) +#define __NR_io_setup2 294 +__SYSCALL(__NR_io_setup2, sys_io_setup2) #undef __NR_syscalls -#define __NR_syscalls 294 +#define __NR_syscalls 295 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index df556175be50..17c8b4393669 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -37,6 +37,7 @@ asmlinkage long sys_ni_syscall(void) */ COND_SYSCALL(io_setup); +COND_SYSCALL(io_setup2); COND_SYSCALL_COMPAT(io_setup); COND_SYSCALL(io_destroy); COND_SYSCALL(io_submit); From patchwork Tue Dec 11 00:15:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722839 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2B90D14E2 for ; Tue, 11 Dec 2018 00:16:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1CD4C2A480 for ; Tue, 11 Dec 2018 00:16:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 10E542A483; Tue, 11 Dec 2018 00:16:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 695492A486 for ; Tue, 11 Dec 2018 00:16:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727950AbeLKAQ0 (ORCPT ); Mon, 10 Dec 2018 19:16:26 -0500 Received: from mail-pl1-f193.google.com ([209.85.214.193]:44030 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729279AbeLKAQ0 (ORCPT ); Mon, 10 Dec 2018 19:16:26 -0500 Received: by mail-pl1-f193.google.com with SMTP id gn14so6021152plb.10 for ; Mon, 10 Dec 2018 16:16:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=+LP3w7x2B9MpQgqSbgcHQj0u8w6yOA4gvKXwxHoehoE=; b=fkMUWoU7YmQvU3Z68SMkc1UsLB4TTSNB4XnGOuYItETQ399I0eAB+L4q0UBp9nV6LT miEjzj9Ex0WHFBYd7UGlQoyHoF0C3rYvBAANqPe79X+dr/YC/6bDdzPdKCD6yHhRJFz+ VBGWPt3j+fOxFW6CDUL6XCpTfYNR0IpwY0Dot2NFIKWL6x8bB/ZejcNARr68oWAMJP1y vFht6Wo+ipSA5X0Qt89rrU5OwpC6mvgyM20kHg/HU0dGSTiQilyetvDeax95M1ndtEWW NN0PrbYcnQJqBYLn4mVWFg9+D+WWt6ucKRUyM1Yc49CC0hAfAXE15ocAOAky+1oQg6kj YjHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=+LP3w7x2B9MpQgqSbgcHQj0u8w6yOA4gvKXwxHoehoE=; b=TnlQmx993Pd/gFq1oEXuqHAbXZpCeCvCXHDS8XqcioAdk6bOPdq33wLu3v2tJt7z9i kz61I27l0FCxiEeni7a2pgHgGJAp8g35BSfjGuCStXpZJN3zVPMxaW0HdS2ouhQMpS92 a4NbRKLFPJdzt4KkBhH12zEEzCzniJBYM/pEA82uDLg13uEFSz/0KqNE1Vz58AGDPtJt kfueA6NP81+ytFzTP/aLUP55ygrtMevXPtcMDNw8VFDpHZNouyGhz0ywJh4nJiKK7vEu 1mjsR/Ldi2P9YCa0H1cLfsznS8rPNoXLXP/UfoVCi4CQcOTJVt61UL/UkjzysrYoGm16 oWFQ== X-Gm-Message-State: AA+aEWbvbVH/urQZQWDgjTarUgntPRcPDDJNQAQU/ZwM3jbf1EENbEsX h2AHUcwJl7LW7eKH0/PRQCRKF4XB+7h0/Q== X-Google-Smtp-Source: AFSGD/U4aoQlzm56RN2XfkQJrh4BAK+IW6H12hZz3FXJn5/JadFnpAv/Lkbvsegi8FJf6MCE+PO6nw== X-Received: by 2002:a17:902:720c:: with SMTP id ba12mr14105170plb.79.1544487384828; Mon, 10 Dec 2018 16:16:24 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.23 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:23 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 14/27] aio: add support for having user mapped iocbs Date: Mon, 10 Dec 2018 17:15:36 -0700 Message-Id: <20181211001549.30085-15-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP For io_submit(), we have to first copy each pointer to an iocb, then copy the iocb. The latter is 64 bytes in size, and that's a lot of copying for a single IO. Add support for setting IOCTX_FLAG_USERIOCB through the new io_setup2() system call, which allows the iocbs to reside in userspace. If this flag is used, then io_submit() doesn't take pointers to iocbs anymore, it takes an index value into the array of iocbs instead. Similary, for io_getevents(), the iocb ->obj will be the index, not the pointer to the iocb. See the change made to fio to support this feature, it's pretty trivialy to adapt to. For applications, like fio, that previously embedded the iocb inside a application private structure, some sort of lookup table/structure is needed to find the private IO structure from the index at io_getevents() time. http://git.kernel.dk/cgit/fio/commit/?id=3c3168e91329c83880c91e5abc28b9d6b940fd95 Signed-off-by: Jens Axboe --- fs/aio.c | 126 +++++++++++++++++++++++++++++++---- include/uapi/linux/aio_abi.h | 2 + 2 files changed, 116 insertions(+), 12 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 0bad70eab553..4e8c471a2598 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -92,6 +92,11 @@ struct ctx_rq_wait { atomic_t count; }; +struct aio_mapped_range { + struct page **pages; + long nr_pages; +}; + struct kioctx { struct percpu_ref users; atomic_t dead; @@ -127,6 +132,8 @@ struct kioctx { struct page **ring_pages; long nr_pages; + struct aio_mapped_range iocb_range; + struct rcu_work free_rwork; /* see free_ioctx() */ /* @@ -222,6 +229,11 @@ static struct vfsmount *aio_mnt; static const struct file_operations aio_ring_fops; static const struct address_space_operations aio_ctx_aops; +static const unsigned int iocb_page_shift = + ilog2(PAGE_SIZE / sizeof(struct iocb)); + +static void aio_useriocb_unmap(struct kioctx *); + static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages) { struct file *file; @@ -578,6 +590,7 @@ static void free_ioctx(struct work_struct *work) free_rwork); pr_debug("freeing %p\n", ctx); + aio_useriocb_unmap(ctx); aio_free_ring(ctx); free_percpu(ctx->cpu); percpu_ref_exit(&ctx->reqs); @@ -1288,6 +1301,70 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr, return ret; } +static struct iocb *aio_iocb_from_index(struct kioctx *ctx, int index) +{ + struct iocb *iocb; + + iocb = page_address(ctx->iocb_range.pages[index >> iocb_page_shift]); + index &= ((1 << iocb_page_shift) - 1); + return iocb + index; +} + +static void aio_unmap_range(struct aio_mapped_range *range) +{ + int i; + + if (!range->nr_pages) + return; + + for (i = 0; i < range->nr_pages; i++) + put_page(range->pages[i]); + + kfree(range->pages); + range->pages = NULL; + range->nr_pages = 0; +} + +static int aio_map_range(struct aio_mapped_range *range, void __user *uaddr, + size_t size, int gup_flags) +{ + int nr_pages, ret; + + if ((unsigned long) uaddr & ~PAGE_MASK) + return -EINVAL; + + nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; + + range->pages = kzalloc(nr_pages * sizeof(struct page *), GFP_KERNEL); + if (!range->pages) + return -ENOMEM; + + down_write(¤t->mm->mmap_sem); + ret = get_user_pages((unsigned long) uaddr, nr_pages, gup_flags, + range->pages, NULL); + up_write(¤t->mm->mmap_sem); + + if (ret < nr_pages) { + kfree(range->pages); + return -ENOMEM; + } + + range->nr_pages = nr_pages; + return 0; +} + +static void aio_useriocb_unmap(struct kioctx *ctx) +{ + aio_unmap_range(&ctx->iocb_range); +} + +static int aio_useriocb_map(struct kioctx *ctx, struct iocb __user *iocbs) +{ + size_t size = sizeof(struct iocb) * ctx->max_reqs; + + return aio_map_range(&ctx->iocb_range, iocbs, size, 0); +} + /* sys_io_setup2: * Like sys_io_setup(), except that it takes a set of flags * (IOCTX_FLAG_*), and some pointers to user structures: @@ -1307,7 +1384,9 @@ SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, struct iocb __user *, unsigned long ctx; long ret; - if (flags || user1 || user2) + if (user1 || user2) + return -EINVAL; + if (flags & ~IOCTX_FLAG_USERIOCB) return -EINVAL; ret = get_user(ctx, ctxp); @@ -1319,9 +1398,17 @@ SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, struct iocb __user *, if (IS_ERR(ioctx)) goto out; + if (flags & IOCTX_FLAG_USERIOCB) { + ret = aio_useriocb_map(ioctx, iocbs); + if (ret) + goto err; + } + ret = put_user(ioctx->user_id, ctxp); - if (ret) + if (ret) { +err: kill_ioctx(current->mm, ioctx, NULL); + } percpu_ref_put(&ioctx->users); out: return ret; @@ -1871,10 +1958,13 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, } } - ret = put_user(KIOCB_KEY, &user_iocb->aio_key); - if (unlikely(ret)) { - pr_debug("EFAULT: aio_key\n"); - goto out_put_req; + /* Don't support cancel on user mapped iocbs */ + if (!(ctx->flags & IOCTX_FLAG_USERIOCB)) { + ret = put_user(KIOCB_KEY, &user_iocb->aio_key); + if (unlikely(ret)) { + pr_debug("EFAULT: aio_key\n"); + goto out_put_req; + } } req->ki_user_iocb = user_iocb; @@ -1928,12 +2018,22 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, bool compat) { - struct iocb iocb; + struct iocb iocb, *iocbp; - if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb)))) - return -EFAULT; + if (ctx->flags & IOCTX_FLAG_USERIOCB) { + unsigned long iocb_index = (unsigned long) user_iocb; - return __io_submit_one(ctx, &iocb, user_iocb, compat); + if (iocb_index >= ctx->max_reqs) + return -EINVAL; + + iocbp = aio_iocb_from_index(ctx, iocb_index); + } else { + if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb)))) + return -EFAULT; + iocbp = &iocb; + } + + return __io_submit_one(ctx, iocbp, user_iocb, compat); } /* sys_io_submit: @@ -2077,6 +2177,9 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb, if (unlikely(!ctx)) return -EINVAL; + if (ctx->flags & IOCTX_FLAG_USERIOCB) + goto err; + spin_lock_irq(&ctx->ctx_lock); kiocb = lookup_kiocb(ctx, iocb); if (kiocb) { @@ -2093,9 +2196,8 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb, */ ret = -EINPROGRESS; } - +err: percpu_ref_put(&ctx->users); - return ret; } diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index 8387e0af0f76..814e6606c413 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -106,6 +106,8 @@ struct iocb { __u32 aio_resfd; }; /* 64 bytes */ +#define IOCTX_FLAG_USERIOCB (1 << 0) /* iocbs are user mapped */ + #undef IFBIG #undef IFLITTLE From patchwork Tue Dec 11 00:15:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722843 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 287E413BF for ; Tue, 11 Dec 2018 00:16:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 17F432A480 for ; Tue, 11 Dec 2018 00:16:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0C2232A486; Tue, 11 Dec 2018 00:16:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E685C2A480 for ; Tue, 11 Dec 2018 00:16:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729332AbeLKAQ3 (ORCPT ); Mon, 10 Dec 2018 19:16:29 -0500 Received: from mail-pl1-f194.google.com ([209.85.214.194]:42179 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729306AbeLKAQ2 (ORCPT ); Mon, 10 Dec 2018 19:16:28 -0500 Received: by mail-pl1-f194.google.com with SMTP id y1so6022686plp.9 for ; Mon, 10 Dec 2018 16:16:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=UYw0jovo4VTHJ2MnJC+loazQ70Qx9z0qEOOqqatXPMI=; b=GQFrOrJcXmuGth3nbKHbFV4y73fG3E7EV8FBjo7k+Svrfo5ponmp/TBt+m7RBOCbKp ry3GCDnd+81t/QTySxsGnPHWyWFA1MLp5u5ff2CnURNwHjxxLacBZH6QNhTpyw/M1tJI izTyYnnhva8XZOZIP5nxZlKuareFIXFYiYjruvVPeFyla5xqtw7F6lX2XNNQrAefs43j jRhUnrLSSmZwjBjHjiVhO2/5OSW5HU/zeqcyMa1W5TYOyF3lF8xlB9VH55TJR8Qtda23 ShjLaOrnulChmjK3ZT1JrRcfkENg6KxeZQ4cVPccLBr0/7yB/ZA43GgI0gY/UaBXLRb+ CGMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=UYw0jovo4VTHJ2MnJC+loazQ70Qx9z0qEOOqqatXPMI=; b=m8+d9dHGtJbXy15GbOxBw81/LsCMRrog1cpWs6SLmnS+1Cz7JsqtIPs7zjol9AXkOH 8DN11Wqi4Tb6BFKOTljjuMLS5/ehfKHi2VsadJ8kKU5pfduqTVc1S+PMgwk3LLYtgoV5 d/6pzeI14Z/vZDHwIduy2nP04spKFZQ3fFVZuSl+bfgwuyrAL7/8Z4TQXfJAEUeSYMyP Z5bnS6pdsx7To9O52aePkEZfj/yqvfpNJB5ZBpBDYR9TD0pYlVgwqoVWq1tTAu6rjSZg 4E5vT2he6MW3/RlfUuvTkf1123Ye2brlLbsm0S/GFW2dxpAlHJXXdhvwoTpIpKd8uMVd 6eFw== X-Gm-Message-State: AA+aEWZsRTKcNItohv4OUdEjVkf+cxzqEtJwgQ+U7QtjHUeKLhjVkZcu Be7qpm35eJp/wflwiyDAhB+awxLvjotBkw== X-Google-Smtp-Source: AFSGD/Wq7Dy6Cixh3r/fBcHUNz1yKmHV5ZbFUZBMtI/frt7k6odRFxW5O9/UQEIYhVQUs7tGyTPyTw== X-Received: by 2002:a17:902:1d4a:: with SMTP id u10mr13429038plu.122.1544487386903; Mon, 10 Dec 2018 16:16:26 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:25 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 15/27] aio: support for IO polling Date: Mon, 10 Dec 2018 17:15:37 -0700 Message-Id: <20181211001549.30085-16-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add polled variants of PREAD/PREADV and PWRITE/PWRITEV. These act like their non-polled counterparts, except we expect to poll for completion of them. The polling happens at io_getevent() time, and works just like non-polled IO. To setup an io_context for polled IO, the application must call io_setup2() with IOCTX_FLAG_IOPOLL as one of the flags. It is illegal to mix and match polled and non-polled IO on an io_context. Polled IO doesn't support the user mapped completion ring. Events must be reaped through the io_getevents() system call. For non-irq driven poll devices, there's no way to support completion reaping from userspace by just looking at the ring. The application itself is the one that pulls completion entries. Signed-off-by: Jens Axboe --- fs/aio.c | 396 +++++++++++++++++++++++++++++++---- include/uapi/linux/aio_abi.h | 3 + 2 files changed, 363 insertions(+), 36 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 4e8c471a2598..c4dbd5e1c350 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -153,6 +153,18 @@ struct kioctx { atomic_t reqs_available; } ____cacheline_aligned_in_smp; + /* iopoll submission state */ + struct { + spinlock_t poll_lock; + struct list_head poll_submitted; + } ____cacheline_aligned_in_smp; + + /* iopoll completion state */ + struct { + struct list_head poll_completing; + struct mutex getevents_lock; + } ____cacheline_aligned_in_smp; + struct { spinlock_t ctx_lock; struct list_head active_reqs; /* used for cancellation */ @@ -205,14 +217,27 @@ struct aio_kiocb { __u64 ki_user_data; /* user's data for completion */ struct list_head ki_list; /* the aio core uses this - * for cancellation */ + * for cancellation, or for + * polled IO */ + + unsigned long ki_flags; +#define KIOCB_F_POLL_COMPLETED 0 /* polled IO has completed */ +#define KIOCB_F_POLL_EAGAIN 1 /* polled submission got EAGAIN */ + refcount_t ki_refcnt; - /* - * If the aio_resfd field of the userspace iocb is not zero, - * this is the underlying eventfd context to deliver events to. - */ - struct eventfd_ctx *ki_eventfd; + union { + /* + * If the aio_resfd field of the userspace iocb is not zero, + * this is the underlying eventfd context to deliver events to. + */ + struct eventfd_ctx *ki_eventfd; + + /* + * For polled IO, stash completion info here + */ + struct io_event ki_ev; + }; }; /*------ sysctl variables----*/ @@ -233,6 +258,7 @@ static const unsigned int iocb_page_shift = ilog2(PAGE_SIZE / sizeof(struct iocb)); static void aio_useriocb_unmap(struct kioctx *); +static void aio_iopoll_reap_events(struct kioctx *); static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages) { @@ -471,11 +497,15 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) int i; struct file *file; - /* Compensate for the ring buffer's head/tail overlap entry */ - nr_events += 2; /* 1 is required, 2 for good luck */ - + /* + * Compensate for the ring buffer's head/tail overlap entry. + * IO polling doesn't require any io event entries + */ size = sizeof(struct aio_ring); - size += sizeof(struct io_event) * nr_events; + if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) { + nr_events += 2; /* 1 is required, 2 for good luck */ + size += sizeof(struct io_event) * nr_events; + } nr_pages = PFN_UP(size); if (nr_pages < 0) @@ -758,6 +788,11 @@ static struct kioctx *io_setup_flags(unsigned long ctxid, INIT_LIST_HEAD(&ctx->active_reqs); + spin_lock_init(&ctx->poll_lock); + INIT_LIST_HEAD(&ctx->poll_submitted); + INIT_LIST_HEAD(&ctx->poll_completing); + mutex_init(&ctx->getevents_lock); + if (percpu_ref_init(&ctx->users, free_ioctx_users, 0, GFP_KERNEL)) goto err; @@ -829,11 +864,15 @@ static int kill_ioctx(struct mm_struct *mm, struct kioctx *ctx, { struct kioctx_table *table; + mutex_lock(&ctx->getevents_lock); spin_lock(&mm->ioctx_lock); if (atomic_xchg(&ctx->dead, 1)) { spin_unlock(&mm->ioctx_lock); + mutex_unlock(&ctx->getevents_lock); return -EINVAL; } + aio_iopoll_reap_events(ctx); + mutex_unlock(&ctx->getevents_lock); table = rcu_dereference_raw(mm->ioctx_table); WARN_ON(ctx != rcu_access_pointer(table->table[ctx->id])); @@ -1042,6 +1081,7 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx) percpu_ref_get(&ctx->reqs); req->ki_ctx = ctx; INIT_LIST_HEAD(&req->ki_list); + req->ki_flags = 0; refcount_set(&req->ki_refcnt, 0); req->ki_eventfd = NULL; return req; @@ -1083,6 +1123,15 @@ static inline void iocb_put(struct aio_kiocb *iocb) } } +static void iocb_put_many(struct kioctx *ctx, void **iocbs, int *nr) +{ + if (*nr) { + percpu_ref_put_many(&ctx->reqs, *nr); + kmem_cache_free_bulk(kiocb_cachep, *nr, iocbs); + *nr = 0; + } +} + static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb, long res, long res2) { @@ -1272,6 +1321,185 @@ static bool aio_read_events(struct kioctx *ctx, long min_nr, long nr, return ret < 0 || *i >= min_nr; } +#define AIO_IOPOLL_BATCH 8 + +/* + * Process completed iocb iopoll entries, copying the result to userspace. + */ +static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, + unsigned int *nr_events, long max) +{ + void *iocbs[AIO_IOPOLL_BATCH]; + struct aio_kiocb *iocb, *n; + int to_free = 0, ret = 0; + + /* Shouldn't happen... */ + if (*nr_events >= max) + return 0; + + list_for_each_entry_safe(iocb, n, &ctx->poll_completing, ki_list) { + if (*nr_events == max) + break; + if (!test_bit(KIOCB_F_POLL_COMPLETED, &iocb->ki_flags)) + continue; + if (to_free == AIO_IOPOLL_BATCH) + iocb_put_many(ctx, iocbs, &to_free); + + list_del(&iocb->ki_list); + iocbs[to_free++] = iocb; + + fput(iocb->rw.ki_filp); + + if (evs && copy_to_user(evs + *nr_events, &iocb->ki_ev, + sizeof(iocb->ki_ev))) { + ret = -EFAULT; + break; + } + (*nr_events)++; + } + + if (to_free) + iocb_put_many(ctx, iocbs, &to_free); + + return ret; +} + +/* + * Poll for a mininum of 'min' events, and a maximum of 'max'. Note that if + * min == 0 we consider that a non-spinning poll check - we'll still enter + * the driver poll loop, but only as a non-spinning completion check. + */ +static int aio_iopoll_getevents(struct kioctx *ctx, + struct io_event __user *event, + unsigned int *nr_events, long min, long max) +{ + struct aio_kiocb *iocb; + int to_poll, polled, ret; + + /* + * Check if we already have done events that satisfy what we need + */ + if (!list_empty(&ctx->poll_completing)) { + ret = aio_iopoll_reap(ctx, event, nr_events, max); + if (ret < 0) + return ret; + if ((min && *nr_events >= min) || *nr_events >= max) + return 0; + } + + /* + * Take in a new working set from the submitted list, if possible. + */ + if (!list_empty_careful(&ctx->poll_submitted)) { + spin_lock(&ctx->poll_lock); + list_splice_init(&ctx->poll_submitted, &ctx->poll_completing); + spin_unlock(&ctx->poll_lock); + } + + if (list_empty(&ctx->poll_completing)) + return 0; + + /* + * Check again now that we have a new batch. + */ + ret = aio_iopoll_reap(ctx, event, nr_events, max); + if (ret < 0) + return ret; + if ((min && *nr_events >= min) || *nr_events >= max) + return 0; + + /* + * Find up to 'max' worth of events to poll for, including the + * events we already successfully polled + */ + polled = to_poll = 0; + list_for_each_entry(iocb, &ctx->poll_completing, ki_list) { + /* + * Poll for needed events with spin == true, anything after + * that we just check if we have more, up to max. + */ + bool spin = !polled || *nr_events < min; + struct kiocb *kiocb = &iocb->rw; + + if (test_bit(KIOCB_F_POLL_COMPLETED, &iocb->ki_flags)) + break; + if (++to_poll + *nr_events > max) + break; + + ret = kiocb->ki_filp->f_op->iopoll(kiocb, spin); + if (ret < 0) + return ret; + + polled += ret; + if (polled + *nr_events >= max) + break; + } + + ret = aio_iopoll_reap(ctx, event, nr_events, max); + if (ret < 0) + return ret; + if (*nr_events >= min) + return 0; + return to_poll; +} + +/* + * We can't just wait for polled events to come to us, we have to actively + * find and complete them. + */ +static void aio_iopoll_reap_events(struct kioctx *ctx) +{ + if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) + return; + + while (!list_empty_careful(&ctx->poll_submitted) || + !list_empty(&ctx->poll_completing)) { + unsigned int nr_events = 0; + + aio_iopoll_getevents(ctx, NULL, &nr_events, 1, UINT_MAX); + } +} + +static int __aio_iopoll_check(struct kioctx *ctx, struct io_event __user *event, + unsigned int *nr_events, long min_nr, long max_nr) +{ + int ret = 0; + + while (!*nr_events || !need_resched()) { + int tmin = 0; + + if (*nr_events < min_nr) + tmin = min_nr - *nr_events; + + ret = aio_iopoll_getevents(ctx, event, nr_events, tmin, max_nr); + if (ret <= 0) + break; + ret = 0; + } + + return ret; +} + +static int aio_iopoll_check(struct kioctx *ctx, long min_nr, long nr, + struct io_event __user *event) +{ + unsigned int nr_events = 0; + int ret; + + /* Only allow one thread polling at a time */ + if (!mutex_trylock(&ctx->getevents_lock)) + return -EBUSY; + if (unlikely(atomic_read(&ctx->dead))) { + ret = -EINVAL; + goto err; + } + + ret = __aio_iopoll_check(ctx, event, &nr_events, min_nr, nr); +err: + mutex_unlock(&ctx->getevents_lock); + return nr_events ? nr_events : ret; +} + static long read_events(struct kioctx *ctx, long min_nr, long nr, struct io_event __user *event, ktime_t until) @@ -1386,7 +1614,7 @@ SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, struct iocb __user *, if (user1 || user2) return -EINVAL; - if (flags & ~IOCTX_FLAG_USERIOCB) + if (flags & ~(IOCTX_FLAG_USERIOCB | IOCTX_FLAG_IOPOLL)) return -EINVAL; ret = get_user(ctx, ctxp); @@ -1520,13 +1748,8 @@ static void aio_remove_iocb(struct aio_kiocb *iocb) spin_unlock_irqrestore(&ctx->ctx_lock, flags); } -static void aio_complete_rw(struct kiocb *kiocb, long res, long res2) +static void kiocb_end_write(struct kiocb *kiocb) { - struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw); - - if (!list_empty_careful(&iocb->ki_list)) - aio_remove_iocb(iocb); - if (kiocb->ki_flags & IOCB_WRITE) { struct inode *inode = file_inode(kiocb->ki_filp); @@ -1538,19 +1761,48 @@ static void aio_complete_rw(struct kiocb *kiocb, long res, long res2) __sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE); file_end_write(kiocb->ki_filp); } +} + +static void aio_complete_rw(struct kiocb *kiocb, long res, long res2) +{ + struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw); + + if (!list_empty_careful(&iocb->ki_list)) + aio_remove_iocb(iocb); + + kiocb_end_write(kiocb); fput(kiocb->ki_filp); aio_complete(iocb, res, res2); } -static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb) +static void aio_complete_rw_poll(struct kiocb *kiocb, long res, long res2) { + struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw); + + kiocb_end_write(kiocb); + + /* + * Handle EAGAIN from resource limits with polled IO inline, don't + * pass the event back to userspace. + */ + if (unlikely(res == -EAGAIN)) + set_bit(KIOCB_F_POLL_EAGAIN, &iocb->ki_flags); + else { + aio_fill_event(&iocb->ki_ev, iocb, res, res2); + set_bit(KIOCB_F_POLL_COMPLETED, &iocb->ki_flags); + } +} + +static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb) +{ + struct kioctx *ctx = kiocb->ki_ctx; + struct kiocb *req = &kiocb->rw; int ret; req->ki_filp = fget(iocb->aio_fildes); if (unlikely(!req->ki_filp)) return -EBADF; - req->ki_complete = aio_complete_rw; req->ki_pos = iocb->aio_offset; req->ki_flags = iocb_flags(req->ki_filp); if (iocb->aio_flags & IOCB_FLAG_RESFD) @@ -1576,9 +1828,35 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb) if (unlikely(ret)) goto out_fput; - req->ki_flags &= ~IOCB_HIPRI; /* no one is going to poll for this I/O */ - return 0; + if (iocb->aio_flags & IOCB_FLAG_HIPRI) { + /* shares space in the union, and is rather pointless.. */ + ret = -EINVAL; + if (iocb->aio_flags & IOCB_FLAG_RESFD) + goto out_fput; + + /* can't submit polled IO to a non-polled ctx */ + if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) + goto out_fput; + + ret = -EOPNOTSUPP; + if (!(req->ki_flags & IOCB_DIRECT) || + !req->ki_filp->f_op->iopoll) + goto out_fput; + + req->ki_flags |= IOCB_HIPRI; + req->ki_complete = aio_complete_rw_poll; + } else { + /* can't submit non-polled IO to a polled ctx */ + ret = -EINVAL; + if (ctx->flags & IOCTX_FLAG_IOPOLL) + goto out_fput; + /* no one is going to poll for this I/O */ + req->ki_flags &= ~IOCB_HIPRI; + req->ki_complete = aio_complete_rw; + } + + return 0; out_fput: fput(req->ki_filp); return ret; @@ -1623,15 +1901,40 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t ret) } } -static ssize_t aio_read(struct kiocb *req, const struct iocb *iocb, +/* + * After the iocb has been issued, it's safe to be found on the poll list. + * Adding the kiocb to the list AFTER submission ensures that we don't + * find it from a io_getevents() thread before the issuer is done accessing + * the kiocb cookie. + */ +static void aio_iopoll_iocb_issued(struct aio_kiocb *kiocb) +{ + /* + * For fast devices, IO may have already completed. If it has, add + * it to the front so we find it first. We can't add to the poll_done + * list as that's unlocked from the completion side. + */ + const int front = test_bit(KIOCB_F_POLL_COMPLETED, &kiocb->ki_flags); + struct kioctx *ctx = kiocb->ki_ctx; + + spin_lock(&ctx->poll_lock); + if (front) + list_add(&kiocb->ki_list, &ctx->poll_submitted); + else + list_add_tail(&kiocb->ki_list, &ctx->poll_submitted); + spin_unlock(&ctx->poll_lock); +} + +static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, bool vectored, bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; + struct kiocb *req = &kiocb->rw; struct iov_iter iter; struct file *file; ssize_t ret; - ret = aio_prep_rw(req, iocb); + ret = aio_prep_rw(kiocb, iocb); if (ret) return ret; file = req->ki_filp; @@ -1656,15 +1959,16 @@ static ssize_t aio_read(struct kiocb *req, const struct iocb *iocb, return ret; } -static ssize_t aio_write(struct kiocb *req, const struct iocb *iocb, +static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, bool vectored, bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; + struct kiocb *req = &kiocb->rw; struct iov_iter iter; struct file *file; ssize_t ret; - ret = aio_prep_rw(req, iocb); + ret = aio_prep_rw(kiocb, iocb); if (ret) return ret; file = req->ki_filp; @@ -1935,7 +2239,8 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, return -EINVAL; } - if (!get_reqs_available(ctx)) + /* Poll IO doesn't need ring reservations */ + if (!(ctx->flags & IOCTX_FLAG_IOPOLL) && !get_reqs_available(ctx)) return -EAGAIN; ret = -EAGAIN; @@ -1958,8 +2263,8 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, } } - /* Don't support cancel on user mapped iocbs */ - if (!(ctx->flags & IOCTX_FLAG_USERIOCB)) { + /* Don't support cancel on user mapped iocbs or polled context */ + if (!(ctx->flags & (IOCTX_FLAG_USERIOCB | IOCTX_FLAG_IOPOLL))) { ret = put_user(KIOCB_KEY, &user_iocb->aio_key); if (unlikely(ret)) { pr_debug("EFAULT: aio_key\n"); @@ -1970,26 +2275,33 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, req->ki_user_iocb = user_iocb; req->ki_user_data = iocb->aio_data; + ret = -EINVAL; switch (iocb->aio_lio_opcode) { case IOCB_CMD_PREAD: - ret = aio_read(&req->rw, iocb, false, compat); + ret = aio_read(req, iocb, false, compat); break; case IOCB_CMD_PWRITE: - ret = aio_write(&req->rw, iocb, false, compat); + ret = aio_write(req, iocb, false, compat); break; case IOCB_CMD_PREADV: - ret = aio_read(&req->rw, iocb, true, compat); + ret = aio_read(req, iocb, true, compat); break; case IOCB_CMD_PWRITEV: - ret = aio_write(&req->rw, iocb, true, compat); + ret = aio_write(req, iocb, true, compat); break; case IOCB_CMD_FSYNC: + if (ctx->flags & IOCTX_FLAG_IOPOLL) + break; ret = aio_fsync(&req->fsync, iocb, false); break; case IOCB_CMD_FDSYNC: + if (ctx->flags & IOCTX_FLAG_IOPOLL) + break; ret = aio_fsync(&req->fsync, iocb, true); break; case IOCB_CMD_POLL: + if (ctx->flags & IOCTX_FLAG_IOPOLL) + break; ret = aio_poll(req, iocb); break; default: @@ -2005,13 +2317,21 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, */ if (ret) goto out_put_req; + if (ctx->flags & IOCTX_FLAG_IOPOLL) { + if (test_bit(KIOCB_F_POLL_EAGAIN, &req->ki_flags)) { + ret = -EAGAIN; + goto out_put_req; + } + aio_iopoll_iocb_issued(req); + } return 0; out_put_req: if (req->ki_eventfd) eventfd_ctx_put(req->ki_eventfd); iocb_put(req); out_put_reqs_available: - put_reqs_available(ctx, 1); + if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) + put_reqs_available(ctx, 1); return ret; } @@ -2177,7 +2497,7 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb, if (unlikely(!ctx)) return -EINVAL; - if (ctx->flags & IOCTX_FLAG_USERIOCB) + if (ctx->flags & (IOCTX_FLAG_USERIOCB | IOCTX_FLAG_IOPOLL)) goto err; spin_lock_irq(&ctx->ctx_lock); @@ -2212,8 +2532,12 @@ static long do_io_getevents(aio_context_t ctx_id, long ret = -EINVAL; if (likely(ioctx)) { - if (likely(min_nr <= nr && min_nr >= 0)) - ret = read_events(ioctx, min_nr, nr, events, until); + if (likely(min_nr <= nr && min_nr >= 0)) { + if (ioctx->flags & IOCTX_FLAG_IOPOLL) + ret = aio_iopoll_check(ioctx, min_nr, nr, events); + else + ret = read_events(ioctx, min_nr, nr, events, until); + } percpu_ref_put(&ioctx->users); } diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index 814e6606c413..ea0b9a19f4df 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -52,9 +52,11 @@ enum { * is valid. * IOCB_FLAG_IOPRIO - Set if the "aio_reqprio" member of the "struct iocb" * is valid. + * IOCB_FLAG_HIPRI - Use IO completion polling */ #define IOCB_FLAG_RESFD (1 << 0) #define IOCB_FLAG_IOPRIO (1 << 1) +#define IOCB_FLAG_HIPRI (1 << 2) /* read() from /dev/aio returns these structures. */ struct io_event { @@ -107,6 +109,7 @@ struct iocb { }; /* 64 bytes */ #define IOCTX_FLAG_USERIOCB (1 << 0) /* iocbs are user mapped */ +#define IOCTX_FLAG_IOPOLL (1 << 1) /* io_context is polled */ #undef IFBIG #undef IFLITTLE From patchwork Tue Dec 11 00:15:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722847 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 158AF13BF for ; Tue, 11 Dec 2018 00:16:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 05E0C2A480 for ; Tue, 11 Dec 2018 00:16:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EE2702A485; Tue, 11 Dec 2018 00:16:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 561502A480 for ; Tue, 11 Dec 2018 00:16:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729279AbeLKAQa (ORCPT ); Mon, 10 Dec 2018 19:16:30 -0500 Received: from mail-pl1-f195.google.com ([209.85.214.195]:45398 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729306AbeLKAQa (ORCPT ); Mon, 10 Dec 2018 19:16:30 -0500 Received: by mail-pl1-f195.google.com with SMTP id a14so6020756plm.12 for ; Mon, 10 Dec 2018 16:16:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=or2tcedhS0QBXtP0bJsBjGzM2Gx/ZvQJgz+HrITSXEk=; b=0w8wREmdymZE6dEvrjt8F4+LKaKXrXkJHTwsEXDivCS3LDALrhIO/Cv3Beu4Qo1RVt i8t7Jsjdc1j/stigrBKgciUYQVZOD3Tw1FoyQdQVtZY5LcZldl14sV3ESKDoPtNN/9Zg /punhjrEqcF1LV6YpYZUR596XUrjBvHBlXSXFRoH0c7cJuUlFPIfB93vSaXd2/uA0dmn NUXeGl0V6TxyWdqXuzOcRrKe9A++5wb46G9XzlUBIAiRctTPN1eM9ESt+Yr4eACc2JBv se2GpNXOsRqjXkde1S7DrLFqfUFnQlGBD6VRoYi9xNTtL9pz6x9VasuN5PRZ+oy/3DM9 IFJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=or2tcedhS0QBXtP0bJsBjGzM2Gx/ZvQJgz+HrITSXEk=; b=WXm01XkzR4KBnY9RYgShebEy6qLBf6uPLOCyklisUBufJ4mqDZBV6WVJXJ2zPkwqmt GkFA56xjcez0jzq4okXsCMjC+lD2CSfMaRrTQfmhSAB3/7TAmBUyOm1ZtEaiCU1T7mJ1 Fswnm2bI7UsedBisrOEE9eqqdYhdc2Gl8tlaPbiHBmJyAGWqIRhg7ANPDYHOeeNWRCgW 5WIeXzYPNhiReUH3QJUjFDabT6jNZTsm6xdenEs+CyVRO6jA/2gY8JufK++0Sw3u+kr0 1IKW7he4srU91K3Ltqnt4G7w3ZcDIsmGMQDNZvKR3M1OX7M1DpksNub4ekY/0I035o4y YU0g== X-Gm-Message-State: AA+aEWb5hdobSfOjkbem55dwQjMKWp6djBfw7DeauPa7TuXK/g3EN1SE EXJzTMgutbhjeVmqd/kk1GhW78kyh3hWtA== X-Google-Smtp-Source: AFSGD/VR09KJFndJAsUZ1tbSDbTZEhPNsNbmvT8qpY5cBrHbiUaIO7nW6IbOaUfmvroD1x4QJoOWSg== X-Received: by 2002:a17:902:4222:: with SMTP id g31mr13960418pld.240.1544487388896; Mon, 10 Dec 2018 16:16:28 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:27 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 16/27] aio: add submission side request cache Date: Mon, 10 Dec 2018 17:15:38 -0700 Message-Id: <20181211001549.30085-17-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We have to add each submitted polled request to the io_context poll_submitted list, which means we have to grab the poll_lock. We already use the block plug to batch submissions if we're doing a batch of IO submissions, extend that to cover the poll requests internally as well. Signed-off-by: Jens Axboe --- fs/aio.c | 136 +++++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 113 insertions(+), 23 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index c4dbd5e1c350..2e8cde976cb4 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -240,6 +240,21 @@ struct aio_kiocb { }; }; +struct aio_submit_state { + struct kioctx *ctx; + + struct blk_plug plug; +#ifdef CONFIG_BLOCK + struct blk_plug_cb plug_cb; +#endif + + /* + * Polled iocbs that have been submitted, but not added to the ctx yet + */ + struct list_head req_list; + unsigned int req_count; +}; + /*------ sysctl variables----*/ static DEFINE_SPINLOCK(aio_nr_lock); unsigned long aio_nr; /* current system wide number of aio requests */ @@ -257,6 +272,15 @@ static const struct address_space_operations aio_ctx_aops; static const unsigned int iocb_page_shift = ilog2(PAGE_SIZE / sizeof(struct iocb)); +/* + * We rely on block level unplugs to flush pending requests, if we schedule + */ +#ifdef CONFIG_BLOCK +static const bool aio_use_state_req_list = true; +#else +static const bool aio_use_state_req_list = false; +#endif + static void aio_useriocb_unmap(struct kioctx *); static void aio_iopoll_reap_events(struct kioctx *); @@ -1901,13 +1925,28 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t ret) } } +/* + * Called either at the end of IO submission, or through a plug callback + * because we're going to schedule. Moves out local batch of requests to + * the ctx poll list, so they can be found for polling + reaping. + */ +static void aio_flush_state_reqs(struct kioctx *ctx, + struct aio_submit_state *state) +{ + spin_lock(&ctx->poll_lock); + list_splice_tail_init(&state->req_list, &ctx->poll_submitted); + spin_unlock(&ctx->poll_lock); + state->req_count = 0; +} + /* * After the iocb has been issued, it's safe to be found on the poll list. * Adding the kiocb to the list AFTER submission ensures that we don't * find it from a io_getevents() thread before the issuer is done accessing * the kiocb cookie. */ -static void aio_iopoll_iocb_issued(struct aio_kiocb *kiocb) +static void aio_iopoll_iocb_issued(struct aio_submit_state *state, + struct aio_kiocb *kiocb) { /* * For fast devices, IO may have already completed. If it has, add @@ -1917,12 +1956,21 @@ static void aio_iopoll_iocb_issued(struct aio_kiocb *kiocb) const int front = test_bit(KIOCB_F_POLL_COMPLETED, &kiocb->ki_flags); struct kioctx *ctx = kiocb->ki_ctx; - spin_lock(&ctx->poll_lock); - if (front) - list_add(&kiocb->ki_list, &ctx->poll_submitted); - else - list_add_tail(&kiocb->ki_list, &ctx->poll_submitted); - spin_unlock(&ctx->poll_lock); + if (!state || !aio_use_state_req_list) { + spin_lock(&ctx->poll_lock); + if (front) + list_add(&kiocb->ki_list, &ctx->poll_submitted); + else + list_add_tail(&kiocb->ki_list, &ctx->poll_submitted); + spin_unlock(&ctx->poll_lock); + } else { + if (front) + list_add(&kiocb->ki_list, &state->req_list); + else + list_add_tail(&kiocb->ki_list, &state->req_list); + if (++state->req_count >= AIO_IOPOLL_BATCH) + aio_flush_state_reqs(ctx, state); + } } static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, @@ -2218,7 +2266,8 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) } static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, - struct iocb __user *user_iocb, bool compat) + struct iocb __user *user_iocb, + struct aio_submit_state *state, bool compat) { struct aio_kiocb *req; ssize_t ret; @@ -2322,7 +2371,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, ret = -EAGAIN; goto out_put_req; } - aio_iopoll_iocb_issued(req); + aio_iopoll_iocb_issued(state, req); } return 0; out_put_req: @@ -2336,7 +2385,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, } static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, - bool compat) + struct aio_submit_state *state, bool compat) { struct iocb iocb, *iocbp; @@ -2353,7 +2402,44 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, iocbp = &iocb; } - return __io_submit_one(ctx, iocbp, user_iocb, compat); + return __io_submit_one(ctx, iocbp, user_iocb, state, compat); +} + +#ifdef CONFIG_BLOCK +static void aio_state_unplug(struct blk_plug_cb *cb, bool from_schedule) +{ + struct aio_submit_state *state; + + state = container_of(cb, struct aio_submit_state, plug_cb); + if (!list_empty(&state->req_list)) + aio_flush_state_reqs(state->ctx, state); +} +#endif + +/* + * Batched submission is done, ensure local IO is flushed out. + */ +static void aio_submit_state_end(struct aio_submit_state *state) +{ + blk_finish_plug(&state->plug); + if (!list_empty(&state->req_list)) + aio_flush_state_reqs(state->ctx, state); +} + +/* + * Start submission side cache. + */ +static void aio_submit_state_start(struct aio_submit_state *state, + struct kioctx *ctx) +{ + state->ctx = ctx; + INIT_LIST_HEAD(&state->req_list); + state->req_count = 0; +#ifdef CONFIG_BLOCK + state->plug_cb.callback = aio_state_unplug; + blk_start_plug(&state->plug); + list_add(&state->plug_cb.list, &state->plug.cb_list); +#endif } /* sys_io_submit: @@ -2371,10 +2457,10 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, struct iocb __user * __user *, iocbpp) { + struct aio_submit_state state, *statep = NULL; struct kioctx *ctx; long ret = 0; int i = 0; - struct blk_plug plug; if (unlikely(nr < 0)) return -EINVAL; @@ -2388,8 +2474,10 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, if (nr > ctx->nr_events) nr = ctx->nr_events; - if (nr > AIO_PLUG_THRESHOLD) - blk_start_plug(&plug); + if (nr > AIO_PLUG_THRESHOLD) { + aio_submit_state_start(&state, ctx); + statep = &state; + } for (i = 0; i < nr; i++) { struct iocb __user *user_iocb; @@ -2398,12 +2486,12 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, break; } - ret = io_submit_one(ctx, user_iocb, false); + ret = io_submit_one(ctx, user_iocb, statep, false); if (ret) break; } - if (nr > AIO_PLUG_THRESHOLD) - blk_finish_plug(&plug); + if (statep) + aio_submit_state_end(statep); percpu_ref_put(&ctx->users); return i ? i : ret; @@ -2413,10 +2501,10 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, int, nr, compat_uptr_t __user *, iocbpp) { + struct aio_submit_state state, *statep = NULL; struct kioctx *ctx; long ret = 0; int i = 0; - struct blk_plug plug; if (unlikely(nr < 0)) return -EINVAL; @@ -2430,8 +2518,10 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, if (nr > ctx->nr_events) nr = ctx->nr_events; - if (nr > AIO_PLUG_THRESHOLD) - blk_start_plug(&plug); + if (nr > AIO_PLUG_THRESHOLD) { + aio_submit_state_start(&state, ctx); + statep = &state; + } for (i = 0; i < nr; i++) { compat_uptr_t user_iocb; @@ -2440,12 +2530,12 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, break; } - ret = io_submit_one(ctx, compat_ptr(user_iocb), true); + ret = io_submit_one(ctx, compat_ptr(user_iocb), statep, true); if (ret) break; } - if (nr > AIO_PLUG_THRESHOLD) - blk_finish_plug(&plug); + if (statep) + aio_submit_state_end(statep); percpu_ref_put(&ctx->users); return i ? i : ret; From patchwork Tue Dec 11 00:15:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722851 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BCE5118E8 for ; Tue, 11 Dec 2018 00:16:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ADD602A480 for ; Tue, 11 Dec 2018 00:16:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A24FD2A485; Tue, 11 Dec 2018 00:16:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3C3BF2A480 for ; Tue, 11 Dec 2018 00:16:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729306AbeLKAQc (ORCPT ); Mon, 10 Dec 2018 19:16:32 -0500 Received: from mail-pl1-f195.google.com ([209.85.214.195]:42184 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729353AbeLKAQb (ORCPT ); Mon, 10 Dec 2018 19:16:31 -0500 Received: by mail-pl1-f195.google.com with SMTP id y1so6022751plp.9 for ; Mon, 10 Dec 2018 16:16:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=14kCfLiACV93AenWhL1xKvUDPabDTnEKBX7DXRr/kpI=; b=oZWkgk3Y+ExS5+zTvoHMpbASGdK6Gd7piR4YSunNeYBTOuWZROV63qTK3VKq9j+rhK d4+M8o4ZRdEPgbR4P9x8xBiY3FYVLMDH3PThZWUtTfDTCSd20S1s7pY2oKPVMDY1ciVW WrrDeZWcB1s964sy7gqBgxwLxociegWn/OtgeDMR6dm3brpaMIxVDEVC2Hlm7N/DUibY tc+7Xdc3dVTOhSPgTBB+so9cPk0d+biMamrmV0QPxwKXyq38HPMyrhaRxXbGhTOYb31z U/y1B5dQKIOCT3eCoAmVixNCQnlJ7CvL+QLiNqIxPQsi1M4BspGNrqguKKg/z91Fw0W2 +lIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=14kCfLiACV93AenWhL1xKvUDPabDTnEKBX7DXRr/kpI=; b=gwccAEqei2SmXdpMSPJPFaWyTf58o4cYMYD9mcX31VttcINSU+uj6S8NdJDUmyAD4o LqcMmmUn2AQDRi7StA5KQFJF93pirhM5Swuz/SEbKBLxJLXVMnLkCG5YGKmHWIP6cJWP 6/kjA70218PPsyQwhtRmmnNpRdVWL0lpMmqUVw1Y9uIBuTOVEzT6PSwZL0xgFiIFc92z JYu9IAFkTnzBc46FyVIaKhFrbu/oRW3F7ixc0gtOS14YrYCyOV6Ycr6QLHVWSl6t4V8S RwXfUNjKZQO5dgpUg6XZP5qaQSr3xbI9rYWvTMG43R2+TLvPIbwKQs8BEAbd5ro4PS2+ TKXA== X-Gm-Message-State: AA+aEWb46mAZbtNrAQR1qkWjG94Fmr8niXntmXoHuBiSC+Hco8TugJ3/ wbj+QnY4gXtEovXrU9pu9J9HE71ZhCwQSQ== X-Google-Smtp-Source: AFSGD/WU09tsOuYv5f8br+tZn7vAt04Z7JG4xLDzM4E8HVV3HEmSMMrS84LbqN/ANd97Dm/zoZ6wZw== X-Received: by 2002:a17:902:15a8:: with SMTP id m37mr14248027pla.129.1544487390737; Mon, 10 Dec 2018 16:16:30 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:29 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 17/27] fs: add fget_many() and fput_many() Date: Mon, 10 Dec 2018 17:15:39 -0700 Message-Id: <20181211001549.30085-18-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Some uses cases repeatedly get and put references to the same file, but the only exposed interface is doing these one at the time. As each of these entail an atomic inc or dec on a shared structure, that cost can add up. Add fget_many(), which works just like fget(), except it takes an argument for how many references to get on the file. Ditto fput_many(), which can drop an arbitrary number of references to a file. Signed-off-by: Jens Axboe --- fs/file.c | 15 ++++++++++----- fs/file_table.c | 10 ++++++++-- include/linux/file.h | 2 ++ include/linux/fs.h | 3 ++- 4 files changed, 22 insertions(+), 8 deletions(-) diff --git a/fs/file.c b/fs/file.c index 7ffd6e9d103d..ad9870edfd51 100644 --- a/fs/file.c +++ b/fs/file.c @@ -676,7 +676,7 @@ void do_close_on_exec(struct files_struct *files) spin_unlock(&files->file_lock); } -static struct file *__fget(unsigned int fd, fmode_t mask) +static struct file *__fget(unsigned int fd, fmode_t mask, unsigned int refs) { struct files_struct *files = current->files; struct file *file; @@ -691,7 +691,7 @@ static struct file *__fget(unsigned int fd, fmode_t mask) */ if (file->f_mode & mask) file = NULL; - else if (!get_file_rcu(file)) + else if (!get_file_rcu_many(file, refs)) goto loop; } rcu_read_unlock(); @@ -699,15 +699,20 @@ static struct file *__fget(unsigned int fd, fmode_t mask) return file; } +struct file *fget_many(unsigned int fd, unsigned int refs) +{ + return __fget(fd, FMODE_PATH, refs); +} + struct file *fget(unsigned int fd) { - return __fget(fd, FMODE_PATH); + return fget_many(fd, 1); } EXPORT_SYMBOL(fget); struct file *fget_raw(unsigned int fd) { - return __fget(fd, 0); + return __fget(fd, 0, 1); } EXPORT_SYMBOL(fget_raw); @@ -738,7 +743,7 @@ static unsigned long __fget_light(unsigned int fd, fmode_t mask) return 0; return (unsigned long)file; } else { - file = __fget(fd, mask); + file = __fget(fd, mask, 1); if (!file) return 0; return FDPUT_FPUT | (unsigned long)file; diff --git a/fs/file_table.c b/fs/file_table.c index e49af4caf15d..6a3964df33e4 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -326,9 +326,9 @@ void flush_delayed_fput(void) static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput); -void fput(struct file *file) +void fput_many(struct file *file, unsigned int refs) { - if (atomic_long_dec_and_test(&file->f_count)) { + if (atomic_long_sub_and_test(refs, &file->f_count)) { struct task_struct *task = current; if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) { @@ -347,6 +347,12 @@ void fput(struct file *file) } } +void fput(struct file *file) +{ + fput_many(file, 1); +} + + /* * synchronous analog of fput(); for kernel threads that might be needed * in some umount() (and thus can't use flush_delayed_fput() without diff --git a/include/linux/file.h b/include/linux/file.h index 6b2fb032416c..3fcddff56bc4 100644 --- a/include/linux/file.h +++ b/include/linux/file.h @@ -13,6 +13,7 @@ struct file; extern void fput(struct file *); +extern void fput_many(struct file *, unsigned int); struct file_operations; struct vfsmount; @@ -44,6 +45,7 @@ static inline void fdput(struct fd fd) } extern struct file *fget(unsigned int fd); +extern struct file *fget_many(unsigned int fd, unsigned int refs); extern struct file *fget_raw(unsigned int fd); extern unsigned long __fdget(unsigned int fd); extern unsigned long __fdget_raw(unsigned int fd); diff --git a/include/linux/fs.h b/include/linux/fs.h index 6a5f71f8ae06..dc54a65c401a 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -952,7 +952,8 @@ static inline struct file *get_file(struct file *f) atomic_long_inc(&f->f_count); return f; } -#define get_file_rcu(x) atomic_long_inc_not_zero(&(x)->f_count) +#define get_file_rcu_many(x, cnt) atomic_long_add_unless(&(x)->f_count, (cnt), 0) +#define get_file_rcu(x) get_file_rcu_many((x), 1) #define fput_atomic(x) atomic_long_add_unless(&(x)->f_count, -1, 1) #define file_count(x) atomic_long_read(&(x)->f_count) From patchwork Tue Dec 11 00:15:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722855 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4656B14E2 for ; Tue, 11 Dec 2018 00:16:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3735A2A483 for ; Tue, 11 Dec 2018 00:16:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 28ACA2A485; Tue, 11 Dec 2018 00:16:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8E42C2A480 for ; Tue, 11 Dec 2018 00:16:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729353AbeLKAQe (ORCPT ); Mon, 10 Dec 2018 19:16:34 -0500 Received: from mail-pl1-f194.google.com ([209.85.214.194]:36109 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729368AbeLKAQd (ORCPT ); Mon, 10 Dec 2018 19:16:33 -0500 Received: by mail-pl1-f194.google.com with SMTP id g9so6032917plo.3 for ; Mon, 10 Dec 2018 16:16:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=/oAfHym5lithpgO5SQjc7fd2In+73xfaMkIl93wN+2Y=; b=uAIwjxgB2KwmX6ouR91PO3/TvYZ3ghpPaB/aBfojSmZ15L8N0EGuNrfncH5tK6KDoj aIKHA3q+QU/3lvGf/rB2ANFITWB5+/uMW8ebTfTHVV5K8eo+GZn7giEzuJ5Wlwm/ECcM Gj7TmgM4A1nVGHSpw6LC7i4oQVUAXfqRfIp3VLyjD5mLYdYt1YfUJV9QnyAU2DUkcGz1 hrHWfhUfAGxJ4c2koUap6Yd0lIpjPi+zyJhTY9u0nrjmV0KBI/9WHIYNl+2hR8BRmZHR Fi0VbRlwEt1ph4lguH9cGqd1qsYFSX4jHvtOoAUuPPh66RITVH/PD5Li7qL9cBKSJTyz v+IQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=/oAfHym5lithpgO5SQjc7fd2In+73xfaMkIl93wN+2Y=; b=GN0UxEPsX+wAMsk302565cc1yzIye5RpcWH4pFghlahTUa+HMGpO4+qECf2b3CUlSo US96PPhRevMgwLnM89/QayeE7GZhzYNrREbDKpctaNan9KzC/MRm8cEHjz3BYdtLOYYq XR6n4V9MoioCc03evA2nQjFddDZQRDrxVE3J4iZG55eTPi6d2o7rHFHMykRXs1XNF03I lCS0h3/xWChXdvQCvLz4R5OLMcr2vrS7Y74MSRFnwD9jZSpT1u/tKJqtfg0EGtLxLI9I ScmZ11vced60P++4ji98wawS4wvkBXXcHZD6jRUAob7Qdxzq6WOt3VQWCPfGIus4J5i5 wzkA== X-Gm-Message-State: AA+aEWaTgZXIFNOBUH+rrz9ZFwWLAhGgfig5XEQzVA31HlOTRVZIxC2Y +8JstLJzJ8p8puRWUA9rwCqjWOaR5sQk4w== X-Google-Smtp-Source: AFSGD/XCo3anE255QRMJyLxsI7gc3jcVFkxlY9+ayQnnJo5PlsiENRdndWBUGyt9z8NN9aw1ja8DAg== X-Received: by 2002:a17:902:4827:: with SMTP id s36mr13550144pld.168.1544487392422; Mon, 10 Dec 2018 16:16:32 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:31 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 18/27] aio: use fget/fput_many() for file references Date: Mon, 10 Dec 2018 17:15:40 -0700 Message-Id: <20181211001549.30085-19-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On the submission side, add file reference batching to the aio_submit_state. We get as many references as the number of iocbs we are submitting, and drop unused ones if we end up switching files. The assumption here is that we're usually only dealing with one fd, and if there are multiple, hopefuly they are at least somewhat ordered. Could trivially be extended to cover multiple fds, if needed. On the completion side we do the same thing, except this is trivially done just locally in aio_iopoll_reap(). Signed-off-by: Jens Axboe --- fs/aio.c | 106 +++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 91 insertions(+), 15 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 2e8cde976cb4..6cbfe9905637 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -253,6 +253,15 @@ struct aio_submit_state { */ struct list_head req_list; unsigned int req_count; + + /* + * File reference cache + */ + struct file *file; + unsigned int fd; + unsigned int has_refs; + unsigned int used_refs; + unsigned int ios_left; }; /*------ sysctl variables----*/ @@ -1355,7 +1364,8 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, { void *iocbs[AIO_IOPOLL_BATCH]; struct aio_kiocb *iocb, *n; - int to_free = 0, ret = 0; + int file_count, to_free = 0, ret = 0; + struct file *file = NULL; /* Shouldn't happen... */ if (*nr_events >= max) @@ -1372,7 +1382,20 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, list_del(&iocb->ki_list); iocbs[to_free++] = iocb; - fput(iocb->rw.ki_filp); + /* + * Batched puts of the same file, to avoid dirtying the + * file usage count multiple times, if avoidable. + */ + if (!file) { + file = iocb->rw.ki_filp; + file_count = 1; + } else if (file == iocb->rw.ki_filp) { + file_count++; + } else { + fput_many(file, file_count); + file = iocb->rw.ki_filp; + file_count = 1; + } if (evs && copy_to_user(evs + *nr_events, &iocb->ki_ev, sizeof(iocb->ki_ev))) { @@ -1382,6 +1405,9 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, (*nr_events)++; } + if (file) + fput_many(file, file_count); + if (to_free) iocb_put_many(ctx, iocbs, &to_free); @@ -1818,13 +1844,58 @@ static void aio_complete_rw_poll(struct kiocb *kiocb, long res, long res2) } } -static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb) +static void aio_file_put(struct aio_submit_state *state) +{ + if (state->file) { + int diff = state->has_refs - state->used_refs; + + if (diff) + fput_many(state->file, diff); + state->file = NULL; + } +} + +/* + * Get as many references to a file as we have IOs left in this submission, + * assuming most submissions are for one file, or at least that each file + * has more than one submission. + */ +static struct file *aio_file_get(struct aio_submit_state *state, int fd) +{ + if (!state) + return fget(fd); + + if (!state->file) { +get_file: + state->file = fget_many(fd, state->ios_left); + if (!state->file) + return NULL; + + state->fd = fd; + state->has_refs = state->ios_left; + state->used_refs = 1; + state->ios_left--; + return state->file; + } + + if (state->fd == fd) { + state->used_refs++; + state->ios_left--; + return state->file; + } + + aio_file_put(state); + goto get_file; +} + +static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb, + struct aio_submit_state *state) { struct kioctx *ctx = kiocb->ki_ctx; struct kiocb *req = &kiocb->rw; int ret; - req->ki_filp = fget(iocb->aio_fildes); + req->ki_filp = aio_file_get(state, iocb->aio_fildes); if (unlikely(!req->ki_filp)) return -EBADF; req->ki_pos = iocb->aio_offset; @@ -1974,7 +2045,8 @@ static void aio_iopoll_iocb_issued(struct aio_submit_state *state, } static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, - bool vectored, bool compat) + struct aio_submit_state *state, bool vectored, + bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *req = &kiocb->rw; @@ -1982,7 +2054,7 @@ static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, struct file *file; ssize_t ret; - ret = aio_prep_rw(kiocb, iocb); + ret = aio_prep_rw(kiocb, iocb, state); if (ret) return ret; file = req->ki_filp; @@ -2008,7 +2080,8 @@ static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, } static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, - bool vectored, bool compat) + struct aio_submit_state *state, bool vectored, + bool compat) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *req = &kiocb->rw; @@ -2016,7 +2089,7 @@ static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, struct file *file; ssize_t ret; - ret = aio_prep_rw(kiocb, iocb); + ret = aio_prep_rw(kiocb, iocb, state); if (ret) return ret; file = req->ki_filp; @@ -2327,16 +2400,16 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, ret = -EINVAL; switch (iocb->aio_lio_opcode) { case IOCB_CMD_PREAD: - ret = aio_read(req, iocb, false, compat); + ret = aio_read(req, iocb, state, false, compat); break; case IOCB_CMD_PWRITE: - ret = aio_write(req, iocb, false, compat); + ret = aio_write(req, iocb, state, false, compat); break; case IOCB_CMD_PREADV: - ret = aio_read(req, iocb, true, compat); + ret = aio_read(req, iocb, state, true, compat); break; case IOCB_CMD_PWRITEV: - ret = aio_write(req, iocb, true, compat); + ret = aio_write(req, iocb, state, true, compat); break; case IOCB_CMD_FSYNC: if (ctx->flags & IOCTX_FLAG_IOPOLL) @@ -2424,17 +2497,20 @@ static void aio_submit_state_end(struct aio_submit_state *state) blk_finish_plug(&state->plug); if (!list_empty(&state->req_list)) aio_flush_state_reqs(state->ctx, state); + aio_file_put(state); } /* * Start submission side cache. */ static void aio_submit_state_start(struct aio_submit_state *state, - struct kioctx *ctx) + struct kioctx *ctx, int max_ios) { state->ctx = ctx; INIT_LIST_HEAD(&state->req_list); state->req_count = 0; + state->file = NULL; + state->ios_left = max_ios; #ifdef CONFIG_BLOCK state->plug_cb.callback = aio_state_unplug; blk_start_plug(&state->plug); @@ -2475,7 +2551,7 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, nr = ctx->nr_events; if (nr > AIO_PLUG_THRESHOLD) { - aio_submit_state_start(&state, ctx); + aio_submit_state_start(&state, ctx, nr); statep = &state; } for (i = 0; i < nr; i++) { @@ -2519,7 +2595,7 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id, nr = ctx->nr_events; if (nr > AIO_PLUG_THRESHOLD) { - aio_submit_state_start(&state, ctx); + aio_submit_state_start(&state, ctx, nr); statep = &state; } for (i = 0; i < nr; i++) { From patchwork Tue Dec 11 00:15:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722859 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 82C9B14E2 for ; Tue, 11 Dec 2018 00:16:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 743512A480 for ; Tue, 11 Dec 2018 00:16:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 68A352A485; Tue, 11 Dec 2018 00:16:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 22D352A480 for ; Tue, 11 Dec 2018 00:16:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729368AbeLKAQf (ORCPT ); Mon, 10 Dec 2018 19:16:35 -0500 Received: from mail-pl1-f193.google.com ([209.85.214.193]:42186 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729385AbeLKAQf (ORCPT ); Mon, 10 Dec 2018 19:16:35 -0500 Received: by mail-pl1-f193.google.com with SMTP id y1so6022804plp.9 for ; Mon, 10 Dec 2018 16:16:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=qHDLuoPuT2zJbNlsgPDVv0H8LQBSdMIFml14oahFVS4=; b=FILp/J8+QutgOiP2YSsBf52ECS8kvrINIxuFPprXHObFxAMcqvIroJwwUlZDeIVkG+ Tpbg//zxDEC8PUfzRz5Q1Z9LTYWg/7ud/2ognBD7j393jNhZTe1wLCh94lX5mNyhEFZR GbJf8CU6k+4/mUa0dme4wOsy/PdHVltphMYNO1FnSWH6wpBdFExyM4qD4h7KhZmoA1or c/jq5jCwecaiDuV/CC9sx5KsT6bybyQhOssrQZ7l/IIvqL9O5WJnY3Br368xcN+gloUO VVtYqINMI/ptUdSWZxmwY9cUl+gdS5VCkh0Hux27igN1ow7CXYvUiRUX774u0GEJx1bE HmmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=qHDLuoPuT2zJbNlsgPDVv0H8LQBSdMIFml14oahFVS4=; b=kOjaXgM8+vXaz41FPE0Gd/zuFEGO/3sbabWsWl/gdSIM5TZx4pw/SuSVi0upark+OD hLY8ov7moZRC2sJmRq7w5++vJxdl7Uwkp64iFpyom4siDzzxPRqwe5zgLMf1UtgrAoPT e5ayHS4I+xmAjxzPIXo1P+NHfsK/ZaZK/Nf92RxWz93wC65Xn+uuHuBfWS02q50EwPAf +bgqtbr50rxCxRaimElgfdsUot8Z1d1cRrQGgQmg+ja/auagfUv2qztOxOSjpoi4sDAJ qlXA1ebpOf0895yl0spEcQltSnTXqsk7D6VYkIKe1D4D5499Bp/FSy7N5MXO734tKpM/ SC8w== X-Gm-Message-State: AA+aEWYJP4LYDu2BxML2y2zKPWlGE4+Qk2nI7OMyK+Bc28Az1drztZNQ 3EyFGMB7seTcCdjWkywN06m44fhMFJ9gNA== X-Google-Smtp-Source: AFSGD/X4LwREfRiY5kWKwdQO1+oFl8jkakjrN6XDV6UWcqVRy3rct/R8YXaJrY03Y6n1Yqu33DrqKg== X-Received: by 2002:a17:902:503:: with SMTP id 3mr14193312plf.233.1544487394350; Mon, 10 Dec 2018 16:16:34 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:33 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 19/27] aio: split iocb init from allocation Date: Mon, 10 Dec 2018 17:15:41 -0700 Message-Id: <20181211001549.30085-20-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In preparation from having pre-allocated requests, that we then just need to initialize before use. Signed-off-by: Jens Axboe --- fs/aio.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 6cbfe9905637..e9dbaedda7ae 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1099,6 +1099,16 @@ static bool get_reqs_available(struct kioctx *ctx) return __get_reqs_available(ctx); } +static void aio_iocb_init(struct kioctx *ctx, struct aio_kiocb *req) +{ + percpu_ref_get(&ctx->reqs); + req->ki_ctx = ctx; + INIT_LIST_HEAD(&req->ki_list); + req->ki_flags = 0; + refcount_set(&req->ki_refcnt, 0); + req->ki_eventfd = NULL; +} + /* aio_get_req * Allocate a slot for an aio request. * Returns NULL if no requests are free. @@ -1111,12 +1121,7 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx) if (unlikely(!req)) return NULL; - percpu_ref_get(&ctx->reqs); - req->ki_ctx = ctx; - INIT_LIST_HEAD(&req->ki_list); - req->ki_flags = 0; - refcount_set(&req->ki_refcnt, 0); - req->ki_eventfd = NULL; + aio_iocb_init(ctx, req); return req; } From patchwork Tue Dec 11 00:15:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722863 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5742813BF for ; Tue, 11 Dec 2018 00:16:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 487F82A480 for ; Tue, 11 Dec 2018 00:16:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3CB6C2A483; Tue, 11 Dec 2018 00:16:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DB0C12A486 for ; Tue, 11 Dec 2018 00:16:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729385AbeLKAQi (ORCPT ); Mon, 10 Dec 2018 19:16:38 -0500 Received: from mail-pf1-f193.google.com ([209.85.210.193]:38711 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729407AbeLKAQh (ORCPT ); Mon, 10 Dec 2018 19:16:37 -0500 Received: by mail-pf1-f193.google.com with SMTP id q1so6197176pfi.5 for ; Mon, 10 Dec 2018 16:16:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Xk1RHxu77EhoWQ2KrK3JOFsMR8MJBsimunzriwFtjX4=; b=z2vb28R1yacueiHcy/Ehtp94uaUv814p97EktZb+1JNRhOIXdHDMJRzZ+u0wVrfSYm sPDtILsFJwvmPZcvDHhnlssD/doLGlSO3/nbQT5MFtpWzoWt1sAGjHdSwMCCuqAOtsLR Zm8RsHN2GkFH+lE2CnWuS7xS4rmYdIGUQys8cecyNJO9n3Du2RuK65WJZ9TijyUW8q6g tCpwDvZyqFuh9IiPvgxf7Hdmn4F0ugYsNT4SDkek57PXSyEmNJ1v/WL/+rPhgpOcGsc5 NHZg/CgSaE52Sg01+obJQE9SNeyo6Nzm30/bECp9kLM+i+tPTXckmg4sUa+GSXIAt0tc zdtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Xk1RHxu77EhoWQ2KrK3JOFsMR8MJBsimunzriwFtjX4=; b=qvFFJLuHal33ZG7x+fLsaTnGhzCmmM8bdtchMgagCt1T77ThvVZpCd8d0kCtniPaSn kz3EOxYDgWnLh5+4h/VWqW9s2RGG1SPBksjcPl9FCcQg2SoldURFuRAxiTMgZp+8RQZL 9yTxbAKWTynFvH044My/y7pK8jQ+NZxFUV/uh5/Md9Lh8OGuuhXCJgJvwrnwKtd8DsZc wX/EGo3ukRpOk3q18j1q6LQWiRQiJjCoYIGEey7qH7sEG6+NUrdXLHAZ9Qfui/CUr/m1 S4tQxcFmTXEBxwqt+RxQ91kZa6ahhXNfXWeRSUmW5a4BLhLhnvFxJgMwmpHGH0LB/shJ /XWw== X-Gm-Message-State: AA+aEWYlU1CdOln2vNmaJ643USdpxOMq19dAlEUw2ouvho8UOnjBLJXd V5DkPfKaC2pKkYc5Omg63eEwhVrX3p27BQ== X-Google-Smtp-Source: AFSGD/XBmYiSfaIEgVDuXfJS8eF/OInHdcEmU7M6OakGczFQIFkbtdzHpHyYdNi53hTqrhgAYn4ZNQ== X-Received: by 2002:a63:4044:: with SMTP id n65mr12309997pga.90.1544487396140; Mon, 10 Dec 2018 16:16:36 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:35 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 20/27] aio: batch aio_kiocb allocation Date: Mon, 10 Dec 2018 17:15:42 -0700 Message-Id: <20181211001549.30085-21-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Similarly to how we use the state->ios_left to know how many references to get to a file, we can use it to allocate the aio_kiocb's we need in bulk. Signed-off-by: Jens Axboe --- fs/aio.c | 47 +++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 39 insertions(+), 8 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index e9dbaedda7ae..a385e7c06bfa 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -240,6 +240,8 @@ struct aio_kiocb { }; }; +#define AIO_IOPOLL_BATCH 8 + struct aio_submit_state { struct kioctx *ctx; @@ -254,6 +256,13 @@ struct aio_submit_state { struct list_head req_list; unsigned int req_count; + /* + * aio_kiocb alloc cache + */ + void *iocbs[AIO_IOPOLL_BATCH]; + unsigned int free_iocbs; + unsigned int cur_iocb; + /* * File reference cache */ @@ -1113,15 +1122,35 @@ static void aio_iocb_init(struct kioctx *ctx, struct aio_kiocb *req) * Allocate a slot for an aio request. * Returns NULL if no requests are free. */ -static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx) +static struct aio_kiocb *aio_get_req(struct kioctx *ctx, + struct aio_submit_state *state) { struct aio_kiocb *req; - req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL); - if (unlikely(!req)) - return NULL; + if (!state) + req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL); + else if (!state->free_iocbs) { + size_t size; + + size = min_t(size_t, state->ios_left, ARRAY_SIZE(state->iocbs)); + size = kmem_cache_alloc_bulk(kiocb_cachep, GFP_KERNEL, size, + state->iocbs); + if (size < 0) + return ERR_PTR(size); + else if (!size) + return ERR_PTR(-ENOMEM); + state->free_iocbs = size - 1; + state->cur_iocb = 1; + req = state->iocbs[0]; + } else { + req = state->iocbs[state->cur_iocb]; + state->free_iocbs--; + state->cur_iocb++; + } + + if (req) + aio_iocb_init(ctx, req); - aio_iocb_init(ctx, req); return req; } @@ -1359,8 +1388,6 @@ static bool aio_read_events(struct kioctx *ctx, long min_nr, long nr, return ret < 0 || *i >= min_nr; } -#define AIO_IOPOLL_BATCH 8 - /* * Process completed iocb iopoll entries, copying the result to userspace. */ @@ -2371,7 +2398,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, return -EAGAIN; ret = -EAGAIN; - req = aio_get_req(ctx); + req = aio_get_req(ctx, state); if (unlikely(!req)) goto out_put_reqs_available; @@ -2503,6 +2530,9 @@ static void aio_submit_state_end(struct aio_submit_state *state) if (!list_empty(&state->req_list)) aio_flush_state_reqs(state->ctx, state); aio_file_put(state); + if (state->free_iocbs) + kmem_cache_free_bulk(kiocb_cachep, state->free_iocbs, + &state->iocbs[state->cur_iocb]); } /* @@ -2514,6 +2544,7 @@ static void aio_submit_state_start(struct aio_submit_state *state, state->ctx = ctx; INIT_LIST_HEAD(&state->req_list); state->req_count = 0; + state->free_iocbs = 0; state->file = NULL; state->ios_left = max_ios; #ifdef CONFIG_BLOCK From patchwork Tue Dec 11 00:15:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722881 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 684843E9D for ; Tue, 11 Dec 2018 00:16:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 58A012A480 for ; Tue, 11 Dec 2018 00:16:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4CEED2A485; Tue, 11 Dec 2018 00:16:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E39612A483 for ; Tue, 11 Dec 2018 00:16:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729448AbeLKAQj (ORCPT ); Mon, 10 Dec 2018 19:16:39 -0500 Received: from mail-pf1-f193.google.com ([209.85.210.193]:32794 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729407AbeLKAQj (ORCPT ); Mon, 10 Dec 2018 19:16:39 -0500 Received: by mail-pf1-f193.google.com with SMTP id c123so6214756pfb.0 for ; Mon, 10 Dec 2018 16:16:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=CR0GQaJdttUrCvsA2hUdnhy8fVojsRhLDnAew3KP2wg=; b=cUQCB79426hxPMsE17jp8UQUOqBRG6YItq3O7GXGi5MqGxCietElMFnfYX+SaI+9Mc /ZZEO6BJlYqNQPWD1LcJ+1bYcy7cSAh7KK53RHGl3yIQN7WlsIPvMl1fPpw/kXjtbFc+ bnLvIWHBQuS+NyyPaJ8MY6V6AZmNwG2rtBn4+pZgBv8+xgMy+0oRyJ/RqYpByyfPO8VX tN5KtFqk0TqLH1FWWhPf+rC1A8mIgezWcg6eL/BSnCxU8I5/8D6spLxLEGBxQGgeLssl mTs2Pkg4m6gD1JUeO5ZEuMSjylvTZhHjK6g9UufS5gSy+olUbyol5Bo48uI9iily+zOQ KEFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=CR0GQaJdttUrCvsA2hUdnhy8fVojsRhLDnAew3KP2wg=; b=Inqnfj7qmjiejSDCKLQ3Gw3AHPfKFfrkfKy+Jhfj/VBZcTyNaSzxYlG1rL83pUUt4p SAU/lPxELaYH8CvjkeRCXlpbqqy9DZeocqn0ElXGKx90KtVSa3EHzeBzEcj3am767NEg gPPmEDD2fV8n2jqnM8lY90/eDv9dFWcyN7QvghmfFbSvLa4jPfROyCv5S1qntvgF7waK 1CKqqemV9lsUNjHcmxzgBm9AYkfvgWdb54yp0GqFLQ+GSvVkgV2CXlgmD65GA3gspuHK yNRFphsIAbUFGNKmv1kfrhjaiU0tK2qzJNlIYo139ep2D//ObSlROw9DHnFpQLxIVT4e 5yDA== X-Gm-Message-State: AA+aEWYvV1NF12F9xPls2bIdCfjvNZ6UR079PmSiTlPS/iMGhis+yR2n M1URXuXMUXTGL/n/LiIiAwvSjHiJGvqinA== X-Google-Smtp-Source: AFSGD/XqR7YeQxAD2uhXDtX/fxk9igDsqWCS4i+JuyJQx3fkDEr62zyS96Hjd/TG6E+98pXe7jhv1w== X-Received: by 2002:a62:ab0d:: with SMTP id p13mr14301892pff.211.1544487397836; Mon, 10 Dec 2018 16:16:37 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:37 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 21/27] block: add BIO_HOLD_PAGES flag Date: Mon, 10 Dec 2018 17:15:43 -0700 Message-Id: <20181211001549.30085-22-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP For user mapped IO, we do get_user_pages() upfront, and then do a put_page() on each page at end_io time to release the page reference. In preparation for having permanently mapped pages, add a BIO_HOLD_PAGES flag that tells us not to release the pages, the caller will do that. Signed-off-by: Jens Axboe --- block/bio.c | 6 ++++-- include/linux/blk_types.h | 1 + 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/block/bio.c b/block/bio.c index 036e3f0cc736..03dde1c03ae6 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1636,7 +1636,8 @@ static void bio_dirty_fn(struct work_struct *work) next = bio->bi_private; bio_set_pages_dirty(bio); - bio_release_pages(bio); + if (!bio_flagged(bio, BIO_HOLD_PAGES)) + bio_release_pages(bio); bio_put(bio); } } @@ -1652,7 +1653,8 @@ void bio_check_pages_dirty(struct bio *bio) goto defer; } - bio_release_pages(bio); + if (!bio_flagged(bio, BIO_HOLD_PAGES)) + bio_release_pages(bio); bio_put(bio); return; defer: diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 921d734d6b5d..356a4c89b0d9 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -228,6 +228,7 @@ struct bio { #define BIO_TRACE_COMPLETION 10 /* bio_endio() should trace the final completion * of this bio. */ #define BIO_QUEUE_ENTERED 11 /* can use blk_queue_enter_live() */ +#define BIO_HOLD_PAGES 12 /* don't put O_DIRECT pages */ /* See BVEC_POOL_OFFSET below before adding new flags */ From patchwork Tue Dec 11 00:15:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722867 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 608D514E2 for ; Tue, 11 Dec 2018 00:16:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 527152A480 for ; Tue, 11 Dec 2018 00:16:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 473B42A485; Tue, 11 Dec 2018 00:16:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ED56E2A480 for ; Tue, 11 Dec 2018 00:16:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729532AbeLKAQn (ORCPT ); Mon, 10 Dec 2018 19:16:43 -0500 Received: from mail-pg1-f196.google.com ([209.85.215.196]:39982 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729522AbeLKAQk (ORCPT ); Mon, 10 Dec 2018 19:16:40 -0500 Received: by mail-pg1-f196.google.com with SMTP id z10so5757816pgp.7 for ; Mon, 10 Dec 2018 16:16:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=vHa/MPvMTJLNTvt1hlMIAttULRvBKDo+uAclgZHB28s=; b=gRWJFGeKm9AzTExGUtyQtCjMEldXDT3NyYLNKD9Y0gHc0NPSAdMbbJhLsqOHj+HLHT l8bKVBwIvs+LRibIwpt/0j6CRReCOVEeTaiHEq6NqECkJ31IdvPXTwmZ9JNG+vRefwua fYXtQKVSW7dnh/Hz0FTDmdiJ0wGwEkA1lPU0UZcVJXwoQXb0mpJt4Q5XkU/6xWu9BPes zh8A3Dbcb8cequbeqGkJ5MjvJcTOQ300oqs/nUGdr3t/5ChoFDYp2KPGgeaRzqMu08qh vf0T6A3WfEmHru4WmMYVCH3RvJDpdRqUxYAC+3r7THRml08t8SuRW1AxjCDYORHChvfb 1BIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=vHa/MPvMTJLNTvt1hlMIAttULRvBKDo+uAclgZHB28s=; b=U+8SldYCU9GlrDQq6/p0fQnyyWECj8JUMUj2kbcyhSXmwY4SgI70d5nDHHuRg68wxX YlxpLiTjtgO3BJxQ1Acnhf6jpyTc9pa294fEmiwxM1uiM0JjayMftrC49wrO2XkqvWZO uKGxu7aylU1lI1XdXZNFMPBEqJDsvQajU8HXuS+xYjA5rF0jWPyeI6SpaM5eFUQ8xlIA hE+f1i1m6lqkFxEOg5NsG8Rzecy+H0zcqSTJMMUV/9knBZegDkGve0V/jz2tCRpd6B0r fR/N/DoQI1NrH3ae/fuYHZBLpw5CLYLciJUGDGiARiQQOdQ64MvipyOowwlgUuJKFaf8 0Iqw== X-Gm-Message-State: AA+aEWYX6QvIGqezhVrfuXyi/0wOe7PUoj+2fguwUQ75RNayx9CaPwM6 9gANc+Mlk/EHR7dz/bx7LtwAc5czkOXFIw== X-Google-Smtp-Source: AFSGD/XlwranD+HfzGwVJYnVKfHNI9YDmCf+iaIH7ePV697JRmIY78pW90V8CWsjKdu2B+GrBRrG0g== X-Received: by 2002:a62:938f:: with SMTP id r15mr14216577pfk.27.1544487399870; Mon, 10 Dec 2018 16:16:39 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.38 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:38 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 22/27] block: implement bio helper to add iter bvec pages to bio Date: Mon, 10 Dec 2018 17:15:44 -0700 Message-Id: <20181211001549.30085-23-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP For an ITER_BVEC, we can just iterate the iov and add the pages to the bio directly. Signed-off-by: Jens Axboe --- block/bio.c | 27 +++++++++++++++++++++++++++ include/linux/bio.h | 1 + 2 files changed, 28 insertions(+) diff --git a/block/bio.c b/block/bio.c index 03dde1c03ae6..1da1391e8b1d 100644 --- a/block/bio.c +++ b/block/bio.c @@ -904,6 +904,33 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) } EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages); +/** + * bio_iov_bvec_add_pages - add pages from an ITER_BVEC to a bio + * @bio: bio to add pages to + * @iter: iov iterator describing the region to be added + * + * Iterate pages in the @iter and add them to the bio. We flag the + * @bio with BIO_HOLD_PAGES, telling IO completion not to free them. + */ +int bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter) +{ + unsigned short orig_vcnt = bio->bi_vcnt; + const struct bio_vec *bv; + + do { + size_t size; + + bv = iter->bvec + iter->iov_offset; + size = bio_add_page(bio, bv->bv_page, bv->bv_len, bv->bv_offset); + if (size != bv->bv_len) + break; + iov_iter_advance(iter, size); + } while (iov_iter_count(iter) && !bio_full(bio)); + + bio_set_flag(bio, BIO_HOLD_PAGES); + return bio->bi_vcnt > orig_vcnt ? 0 : -EINVAL; +} + static void submit_bio_wait_endio(struct bio *bio) { complete(bio->bi_private); diff --git a/include/linux/bio.h b/include/linux/bio.h index 7380b094dcca..ca25ea890192 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -434,6 +434,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page, void __bio_add_page(struct bio *bio, struct page *page, unsigned int len, unsigned int off); int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter); +int bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter); struct rq_map_data; extern struct bio *bio_map_user_iov(struct request_queue *, struct iov_iter *, gfp_t); From patchwork Tue Dec 11 00:15:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722871 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0D64F13BF for ; Tue, 11 Dec 2018 00:16:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F2C712A480 for ; Tue, 11 Dec 2018 00:16:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E73132A485; Tue, 11 Dec 2018 00:16:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 94CD32A480 for ; Tue, 11 Dec 2018 00:16:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729522AbeLKAQn (ORCPT ); Mon, 10 Dec 2018 19:16:43 -0500 Received: from mail-pg1-f196.google.com ([209.85.215.196]:45917 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729533AbeLKAQn (ORCPT ); Mon, 10 Dec 2018 19:16:43 -0500 Received: by mail-pg1-f196.google.com with SMTP id y4so5741575pgc.12 for ; Mon, 10 Dec 2018 16:16:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=gMJi2AsGfGxIdSynZ2zLkJEEU4daS8+d5LPI8e4PgHk=; b=zBHrEEKH+8ZZViH+5PUT9h0v7nbnR336qSs6EzwYtZcCls7wVjXTIzW+qGGV03hyNE 1ROPeuMX1PIfbFAoAZciJ1/LhW+i9s6FHhd9O4lg8ZkgJ8mwc8LA3mX9fnLw6xF9tHBM hkdT8ZwkQyAJbt6MkimZRoB5XDyC55h7V5Asj1dMwzjzHNhsR3lBJ1avKVbL7j7V6HKx qLDvCO/k4Gwp9hzYg54DWUprWlrh4/8XtxqrW53rPSr5TQe9/ZL26I4XkdtZa63/Bbo8 Y14OroU27xo0StqsRMykrL7EUMAeSzyFMpmksZUbul2IZgOEfu53bHfWPebvN2zR3h5h FpuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=gMJi2AsGfGxIdSynZ2zLkJEEU4daS8+d5LPI8e4PgHk=; b=Ql6RNlsfmkK7KVylnmjdE0Inqbqj1btcv6MvG2NxXI/DMJAjvQNG0O5nJl+aLuZMWR baysoT8oXliHd/B/HAjG00hKXTPNhua3l8iKAsZPHCY68wISE4MdGM3WpCdyUk6Zn8dd NjoiM+lLvJXW2vbnlCXpOKmJTXgUBg+WeoxqVxutAwdRSfGD56EywZ+oxO+EFly9Mego auhfLnfY1eCJEnIo7ooA+MsXeDYfO+XY3J+mIUe6naZmjfaJV6U1g3sK8Itxu3mdskKQ h3o4TXjSmVSEa9RnzXQ6EHfMzRKG8h4tDyuX+mwdk7Gtukj0Ut7jKhEIuHDZtNym4g5O SLeg== X-Gm-Message-State: AA+aEWaZZaZ7WUh8g+hSpfEBcgaaBv3KSrt2DaHoyKYSMLJ4S0A9hgRu paONdVvbj9jEFTmEcl53dSMDnQFmcBqBrQ== X-Google-Smtp-Source: AFSGD/V64yN2Yp00oP/+OplAPM/JrecWaapZzKVWJu06XGwtDHV9RKTzBhekSIUWWRB267uXD/vdLA== X-Received: by 2002:a63:a30a:: with SMTP id s10mr11793187pge.234.1544487401738; Mon, 10 Dec 2018 16:16:41 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:40 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 23/27] fs: add support for mapping an ITER_BVEC for O_DIRECT Date: Mon, 10 Dec 2018 17:15:45 -0700 Message-Id: <20181211001549.30085-24-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This adds support for sync/async O_DIRECT to make a bvec type iter for bdev access, as well as iomap. Signed-off-by: Jens Axboe --- fs/block_dev.c | 16 ++++++++++++---- fs/iomap.c | 10 +++++++--- 2 files changed, 19 insertions(+), 7 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index b8f574615792..236c6abe649d 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -219,7 +219,10 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, bio.bi_end_io = blkdev_bio_end_io_simple; bio.bi_ioprio = iocb->ki_ioprio; - ret = bio_iov_iter_get_pages(&bio, iter); + if (iov_iter_is_bvec(iter)) + ret = bio_iov_bvec_add_pages(&bio, iter); + else + ret = bio_iov_iter_get_pages(&bio, iter); if (unlikely(ret)) goto out; ret = bio.bi_iter.bi_size; @@ -326,8 +329,9 @@ static void blkdev_bio_end_io(struct bio *bio) struct bio_vec *bvec; int i; - bio_for_each_segment_all(bvec, bio, i) - put_page(bvec->bv_page); + if (!bio_flagged(bio, BIO_HOLD_PAGES)) + bio_for_each_segment_all(bvec, bio, i) + put_page(bvec->bv_page); bio_put(bio); } } @@ -381,7 +385,11 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) bio->bi_end_io = blkdev_bio_end_io; bio->bi_ioprio = iocb->ki_ioprio; - ret = bio_iov_iter_get_pages(bio, iter); + if (iov_iter_is_bvec(iter)) + ret = bio_iov_bvec_add_pages(bio, iter); + else + ret = bio_iov_iter_get_pages(bio, iter); + if (unlikely(ret)) { bio->bi_status = BLK_STS_IOERR; bio_endio(bio); diff --git a/fs/iomap.c b/fs/iomap.c index f3039989de73..2bb309a320a3 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -1573,8 +1573,9 @@ static void iomap_dio_bio_end_io(struct bio *bio) struct bio_vec *bvec; int i; - bio_for_each_segment_all(bvec, bio, i) - put_page(bvec->bv_page); + if (!bio_flagged(bio, BIO_HOLD_PAGES)) + bio_for_each_segment_all(bvec, bio, i) + put_page(bvec->bv_page); bio_put(bio); } } @@ -1673,7 +1674,10 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; - ret = bio_iov_iter_get_pages(bio, &iter); + if (iov_iter_is_bvec(&iter)) + ret = bio_iov_bvec_add_pages(bio, &iter); + else + ret = bio_iov_iter_get_pages(bio, &iter); if (unlikely(ret)) { /* * We have to stop part way through an IO. We must fall From patchwork Tue Dec 11 00:15:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722879 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2FD2818E8 for ; Tue, 11 Dec 2018 00:16:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 20FDA2A480 for ; Tue, 11 Dec 2018 00:16:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 153982A486; Tue, 11 Dec 2018 00:16:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 299862A480 for ; Tue, 11 Dec 2018 00:16:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729416AbeLKAQr (ORCPT ); Mon, 10 Dec 2018 19:16:47 -0500 Received: from mail-pf1-f195.google.com ([209.85.210.195]:35850 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729535AbeLKAQq (ORCPT ); Mon, 10 Dec 2018 19:16:46 -0500 Received: by mail-pf1-f195.google.com with SMTP id b85so6206291pfc.3 for ; Mon, 10 Dec 2018 16:16:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Wmmz5gdq1N6BM7gec4uiZ/daiKzgspzwy4R6vbN8zcI=; b=RaMSNyhWAJZFqafJ5RPKdwS92KZYyD+NvDyyLRbKTIdCpTJvVPpVMzpxBT4sQB78zs 9Mc7tTsQPcPj39NzY3b5borJHXSRBbO9MR/Fjg/huJ285D3fjd6PTJuTRRJmzXgrG6QR hZhlzdhu3quwkP1eCVk2BPa5iNzNCXScWLhWzU+kyXF3qu9QpanY+Yo1939cCBVMAM6w 4T6PU1twjsjcx0+8S23mdWrr2a6FgAbdG/RD9tmY55HKRnqsmTLMLag+4pZRYSZLeBMf MZDG2JWCWw9TiIBweNQ8Cr8jVzO7j02o2mjm7EUqOdk7ReztDPWUAipNAWnySOvMJIqw PAKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Wmmz5gdq1N6BM7gec4uiZ/daiKzgspzwy4R6vbN8zcI=; b=ld4ZXKlShpluknQ4gQC1iJ9WrkCw6xyU40UVXrDE7smyzHJeLElqoqIsCWSBVhoIY6 zLOpKgtsfi9kpfgwJu4/d2gFFCKi1Q5xW0i+6oGAiYeiMuW/uRp5XOWJ1A5L1goYdsbj SEMoJRsv7aQK9sr26ZntKLcKOP7hzeT3Eq2prKD+DmEYpEuNP2sbYEQ1a7tx975uXxrE vg0gV3ACjo1Bga/gH3nkrFqmBquSJC+hqkHdyJECrYKLBQFXmQtJ0axbNhSNwFh54YN1 W+3+EII5ntgntm+MVeZndqDPPfJqk32EJ55WrG73CV5LOH7iZzgm1be3oiQCT8guhqsu VdPQ== X-Gm-Message-State: AA+aEWaVeybxybBiCX7Kl8cEsx4n8NFOdyOTp8k64XcLWQyLvs0FJ3Sf HR97cmkGtqLUcn5zjm0pVRzm47ZxSchwOw== X-Google-Smtp-Source: AFSGD/UqMs9/30SWQRQOmCb8W8PVUg0qaqiee7FBohYuEzATfYq3p2KuDlStqz8uae8+kLsmqTXYvA== X-Received: by 2002:a62:64d7:: with SMTP id y206mr14398852pfb.84.1544487403565; Mon, 10 Dec 2018 16:16:43 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.41 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:42 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 24/27] aio: add support for pre-mapped user IO buffers Date: Mon, 10 Dec 2018 17:15:46 -0700 Message-Id: <20181211001549.30085-25-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If we have fixed user buffers, we can map them into the kernel when we setup the io_context. That avoids the need to do get_user_pages() for each and every IO. To utilize this feature, the application must set both IOCTX_FLAG_USERIOCB, to provide iocb's in userspace, and then IOCTX_FLAG_FIXEDBUFS. The latter tells aio that the iocbs that are mapped already contain valid destination and sizes. These buffers can then be mapped into the kernel for the life time of the io_context, as opposed to just the duration of the each single IO. Only works with non-vectored read/write commands for now, not with PREADV/PWRITEV. A limit of 4M is imposed as the largest buffer we currently support. There's nothing preventing us from going larger, but we need some cap, and 4M seemed like it would definitely be big enough. RLIMIT_MEMLOCK is used to cap the total amount of memory pinned. See the fio change for how to utilize this feature: http://git.kernel.dk/cgit/fio/commit/?id=2041bd343da1c1e955253f62374588718c64f0f3 Signed-off-by: Jens Axboe --- fs/aio.c | 193 ++++++++++++++++++++++++++++++++--- include/uapi/linux/aio_abi.h | 1 + 2 files changed, 177 insertions(+), 17 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index a385e7c06bfa..7bd5975a83e6 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -42,6 +42,7 @@ #include #include #include +#include #include #include @@ -97,6 +98,11 @@ struct aio_mapped_range { long nr_pages; }; +struct aio_mapped_ubuf { + struct bio_vec *bvec; + unsigned int nr_bvecs; +}; + struct kioctx { struct percpu_ref users; atomic_t dead; @@ -132,6 +138,8 @@ struct kioctx { struct page **ring_pages; long nr_pages; + struct aio_mapped_ubuf *user_bufs; + struct aio_mapped_range iocb_range; struct rcu_work free_rwork; /* see free_ioctx() */ @@ -301,6 +309,7 @@ static const bool aio_use_state_req_list = false; static void aio_useriocb_unmap(struct kioctx *); static void aio_iopoll_reap_events(struct kioctx *); +static void aio_iocb_buffer_unmap(struct kioctx *); static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages) { @@ -662,6 +671,7 @@ static void free_ioctx(struct work_struct *work) free_rwork); pr_debug("freeing %p\n", ctx); + aio_iocb_buffer_unmap(ctx); aio_useriocb_unmap(ctx); aio_free_ring(ctx); free_percpu(ctx->cpu); @@ -1675,6 +1685,122 @@ static int aio_useriocb_map(struct kioctx *ctx, struct iocb __user *iocbs) return aio_map_range(&ctx->iocb_range, iocbs, size, 0); } +static void aio_iocb_buffer_unmap(struct kioctx *ctx) +{ + int i, j; + + if (!ctx->user_bufs) + return; + + for (i = 0; i < ctx->max_reqs; i++) { + struct aio_mapped_ubuf *amu = &ctx->user_bufs[i]; + + for (j = 0; j < amu->nr_bvecs; j++) + put_page(amu->bvec[j].bv_page); + + kfree(amu->bvec); + amu->nr_bvecs = 0; + } + + kfree(ctx->user_bufs); + ctx->user_bufs = NULL; +} + +static int aio_iocb_buffer_map(struct kioctx *ctx) +{ + unsigned long total_pages, page_limit; + struct page **pages = NULL; + int i, j, got_pages = 0; + struct iocb *iocb; + int ret = -EINVAL; + + ctx->user_bufs = kzalloc(ctx->max_reqs * sizeof(struct aio_mapped_ubuf), + GFP_KERNEL); + if (!ctx->user_bufs) + return -ENOMEM; + + /* Don't allow more pages than we can safely lock */ + total_pages = 0; + page_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; + + for (i = 0; i < ctx->max_reqs; i++) { + struct aio_mapped_ubuf *amu = &ctx->user_bufs[i]; + unsigned long off, start, end, ubuf; + int pret, nr_pages; + size_t size; + + iocb = aio_iocb_from_index(ctx, i); + + /* + * Don't impose further limits on the size and buffer + * constraints here, we'll -EINVAL later when IO is + * submitted if they are wrong. + */ + ret = -EFAULT; + if (!iocb->aio_buf) + goto err; + + /* arbitrary limit, but we need something */ + if (iocb->aio_nbytes > SZ_4M) + goto err; + + ubuf = iocb->aio_buf; + end = (ubuf + iocb->aio_nbytes + PAGE_SIZE - 1) >> PAGE_SHIFT; + start = ubuf >> PAGE_SHIFT; + nr_pages = end - start; + + ret = -ENOMEM; + if (total_pages + nr_pages > page_limit) + goto err; + + if (!pages || nr_pages > got_pages) { + kfree(pages); + pages = kmalloc(nr_pages * sizeof(struct page *), + GFP_KERNEL); + if (!pages) + goto err; + got_pages = nr_pages; + } + + amu->bvec = kmalloc(nr_pages * sizeof(struct bio_vec), + GFP_KERNEL); + if (!amu->bvec) + goto err; + + down_write(¤t->mm->mmap_sem); + pret = get_user_pages((unsigned long) iocb->aio_buf, nr_pages, + 1, pages, NULL); + up_write(¤t->mm->mmap_sem); + + if (pret < nr_pages) { + if (pret < 0) + ret = pret; + goto err; + } + + off = ubuf & ~PAGE_MASK; + size = iocb->aio_nbytes; + for (j = 0; j < nr_pages; j++) { + size_t vec_len; + + vec_len = min_t(size_t, size, PAGE_SIZE - off); + amu->bvec[j].bv_page = pages[j]; + amu->bvec[j].bv_len = vec_len; + amu->bvec[j].bv_offset = off; + off = 0; + size -= vec_len; + } + amu->nr_bvecs = nr_pages; + total_pages += nr_pages; + } + kfree(pages); + return 0; +err: + kfree(pages); + aio_iocb_buffer_unmap(ctx); + return ret; +} + /* sys_io_setup2: * Like sys_io_setup(), except that it takes a set of flags * (IOCTX_FLAG_*), and some pointers to user structures: @@ -1696,7 +1822,8 @@ SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, struct iocb __user *, if (user1 || user2) return -EINVAL; - if (flags & ~(IOCTX_FLAG_USERIOCB | IOCTX_FLAG_IOPOLL)) + if (flags & ~(IOCTX_FLAG_USERIOCB | IOCTX_FLAG_IOPOLL | + IOCTX_FLAG_FIXEDBUFS)) return -EINVAL; ret = get_user(ctx, ctxp); @@ -1712,6 +1839,15 @@ SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, struct iocb __user *, ret = aio_useriocb_map(ioctx, iocbs); if (ret) goto err; + if (flags & IOCTX_FLAG_FIXEDBUFS) { + ret = aio_iocb_buffer_map(ioctx); + if (ret) + goto err; + } + } else if (flags & IOCTX_FLAG_FIXEDBUFS) { + /* can only support fixed bufs with user mapped iocbs */ + ret = -EINVAL; + goto err; } ret = put_user(ioctx->user_id, ctxp); @@ -1989,23 +2125,39 @@ static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb, return ret; } -static int aio_setup_rw(int rw, const struct iocb *iocb, struct iovec **iovec, - bool vectored, bool compat, struct iov_iter *iter) +static int aio_setup_rw(int rw, struct aio_kiocb *kiocb, + const struct iocb *iocb, struct iovec **iovec, bool vectored, + bool compat, bool kaddr, struct iov_iter *iter) { - void __user *buf = (void __user *)(uintptr_t)iocb->aio_buf; + void __user *ubuf = (void __user *)(uintptr_t)iocb->aio_buf; size_t len = iocb->aio_nbytes; if (!vectored) { - ssize_t ret = import_single_range(rw, buf, len, *iovec, iter); + ssize_t ret; + + if (!kaddr) { + ret = import_single_range(rw, ubuf, len, *iovec, iter); + } else { + long index = (long) kiocb->ki_user_iocb; + struct aio_mapped_ubuf *amu; + + /* __io_submit_one() already validated the index */ + amu = &kiocb->ki_ctx->user_bufs[index]; + iov_iter_bvec(iter, rw, amu->bvec, amu->nr_bvecs, len); + ret = 0; + + } *iovec = NULL; return ret; } + if (kaddr) + return -EINVAL; #ifdef CONFIG_COMPAT if (compat) - return compat_import_iovec(rw, buf, len, UIO_FASTIOV, iovec, + return compat_import_iovec(rw, ubuf, len, UIO_FASTIOV, iovec, iter); #endif - return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter); + return import_iovec(rw, ubuf, len, UIO_FASTIOV, iovec, iter); } static inline void aio_rw_done(struct kiocb *req, ssize_t ret) @@ -2078,7 +2230,7 @@ static void aio_iopoll_iocb_issued(struct aio_submit_state *state, static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, struct aio_submit_state *state, bool vectored, - bool compat) + bool compat, bool kaddr) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *req = &kiocb->rw; @@ -2098,9 +2250,11 @@ static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, if (unlikely(!file->f_op->read_iter)) goto out_fput; - ret = aio_setup_rw(READ, iocb, &iovec, vectored, compat, &iter); + ret = aio_setup_rw(READ, kiocb, iocb, &iovec, vectored, compat, kaddr, + &iter); if (ret) goto out_fput; + ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter)); if (!ret) aio_rw_done(req, call_read_iter(file, req, &iter)); @@ -2113,7 +2267,7 @@ static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, struct aio_submit_state *state, bool vectored, - bool compat) + bool compat, bool kaddr) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *req = &kiocb->rw; @@ -2133,7 +2287,8 @@ static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, if (unlikely(!file->f_op->write_iter)) goto out_fput; - ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, compat, &iter); + ret = aio_setup_rw(WRITE, kiocb, iocb, &iovec, vectored, compat, kaddr, + &iter); if (ret) goto out_fput; ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter)); @@ -2372,7 +2527,8 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, struct iocb __user *user_iocb, - struct aio_submit_state *state, bool compat) + struct aio_submit_state *state, bool compat, + bool kaddr) { struct aio_kiocb *req; ssize_t ret; @@ -2432,16 +2588,16 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, ret = -EINVAL; switch (iocb->aio_lio_opcode) { case IOCB_CMD_PREAD: - ret = aio_read(req, iocb, state, false, compat); + ret = aio_read(req, iocb, state, false, compat, kaddr); break; case IOCB_CMD_PWRITE: - ret = aio_write(req, iocb, state, false, compat); + ret = aio_write(req, iocb, state, false, compat, kaddr); break; case IOCB_CMD_PREADV: - ret = aio_read(req, iocb, state, true, compat); + ret = aio_read(req, iocb, state, true, compat, kaddr); break; case IOCB_CMD_PWRITEV: - ret = aio_write(req, iocb, state, true, compat); + ret = aio_write(req, iocb, state, true, compat, kaddr); break; case IOCB_CMD_FSYNC: if (ctx->flags & IOCTX_FLAG_IOPOLL) @@ -2493,6 +2649,7 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, struct aio_submit_state *state, bool compat) { struct iocb iocb, *iocbp; + bool kaddr; if (ctx->flags & IOCTX_FLAG_USERIOCB) { unsigned long iocb_index = (unsigned long) user_iocb; @@ -2500,14 +2657,16 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, if (iocb_index >= ctx->max_reqs) return -EINVAL; + kaddr = (ctx->flags & IOCTX_FLAG_FIXEDBUFS) != 0; iocbp = aio_iocb_from_index(ctx, iocb_index); } else { if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb)))) return -EFAULT; + kaddr = false; iocbp = &iocb; } - return __io_submit_one(ctx, iocbp, user_iocb, state, compat); + return __io_submit_one(ctx, iocbp, user_iocb, state, compat, kaddr); } #ifdef CONFIG_BLOCK diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index ea0b9a19f4df..05d72cf86bd3 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -110,6 +110,7 @@ struct iocb { #define IOCTX_FLAG_USERIOCB (1 << 0) /* iocbs are user mapped */ #define IOCTX_FLAG_IOPOLL (1 << 1) /* io_context is polled */ +#define IOCTX_FLAG_FIXEDBUFS (1 << 2) /* IO buffers are fixed */ #undef IFBIG #undef IFLITTLE From patchwork Tue Dec 11 00:15:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722875 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1A50E18E8 for ; Tue, 11 Dec 2018 00:16:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0A04B2A480 for ; Tue, 11 Dec 2018 00:16:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F2BB52A485; Tue, 11 Dec 2018 00:16:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A33322A480 for ; Tue, 11 Dec 2018 00:16:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729535AbeLKAQr (ORCPT ); Mon, 10 Dec 2018 19:16:47 -0500 Received: from mail-pf1-f196.google.com ([209.85.210.196]:35853 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729563AbeLKAQq (ORCPT ); Mon, 10 Dec 2018 19:16:46 -0500 Received: by mail-pf1-f196.google.com with SMTP id b85so6206322pfc.3 for ; Mon, 10 Dec 2018 16:16:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=2LdAh26WYFrL2379/w5QmgBLZw0GToHLyZ6z+1pYptw=; b=MfZh3KaDoXTTy95rcPtY1/eGCUtQvCqz/lb15myEOl0xD7bOFLF3yLV9hXbf5Gbo2t esHzCTLgL/L3JCUl00+Q4TbKUQw0Oz9vZ06cVuRI/wzIL7vRJQ5MZNyT/dIHvqrxNKmn Os1UtuY9SSu9HZnmZIATs7Pb0nKHLmO/smT8qE/utSc2FWzwJ+0Gi9z63M+Q7lVvWuTi ELCZyeFJDZLzMltkoOQ3uOTWI60p5ZPQiugcTSam2ykf5Jue+V8lxEib0KkM3SxXVih8 jjKaHPp+9e/tQ7HfXU1lUm+jWfEFfAFhh12DtkfLsOR/3X2KoGnau/o0L/e2hchLHCnl NVDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=2LdAh26WYFrL2379/w5QmgBLZw0GToHLyZ6z+1pYptw=; b=XBS8eKkmt8x4TCvoMhEsdwnVYAhqebbqD/YJSDrSeK5GUSbrG/1Pp1dMYRcZ7U4zp0 rNsOTl+CuNxy9qpqGM3nHYwCyrKsCZGi0NAPL8x8Fv08J7U4EN3j05Q6HZGsIQdEEIXP B07dZrMK72ZT4h5rnktXBexe7l9UI96s2+vrwq6Bx9brywStZiofHjd/KPg5X65PljZ7 q5d3+V2GrKRjqbr0LVR0hDSkDJPgTpo3kfiK28ihy/uoxZlxIlh5Qi67UTVfAaC/I2rv 3NnQlx/0naKVF7umpyGPPuJ218BP7l9fjrQOua6XZJnY81tGh+Hri4AoZNin9WW9ZLUi hgnQ== X-Gm-Message-State: AA+aEWb7r8iFlCakyQVBJ6xPfraIcO4qDW31FzdZQaM00oggjkKHU+/P Gywq38gJgD8oxSvwKTsWzgQ2nK0sQsrDsA== X-Google-Smtp-Source: AFSGD/Vhsu1VRiaEwia88U95LQtj2tEpBbIQjs5y+zbSbuoN8m8NPQ4MB+ZY1+h67I0lkl9zERGg6w== X-Received: by 2002:a63:1321:: with SMTP id i33mr12969007pgl.380.1544487405413; Mon, 10 Dec 2018 16:16:45 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.43 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:44 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 25/27] aio: split old ring complete out from aio_complete() Date: Mon, 10 Dec 2018 17:15:47 -0700 Message-Id: <20181211001549.30085-26-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Jens Axboe --- fs/aio.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 7bd5975a83e6..a69e6228fe62 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1218,12 +1218,9 @@ static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb, ev->res2 = res2; } -/* aio_complete - * Called when the io request on the given iocb is complete. - */ -static void aio_complete(struct aio_kiocb *iocb, long res, long res2) +static void aio_ring_complete(struct kioctx *ctx, struct aio_kiocb *iocb, + long res, long res2) { - struct kioctx *ctx = iocb->ki_ctx; struct aio_ring *ring; struct io_event *ev_page, *event; unsigned tail, pos, head; @@ -1273,6 +1270,16 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2) spin_unlock_irqrestore(&ctx->completion_lock, flags); pr_debug("added to ring %p at [%u]\n", iocb, tail); +} + +/* aio_complete + * Called when the io request on the given iocb is complete. + */ +static void aio_complete(struct aio_kiocb *iocb, long res, long res2) +{ + struct kioctx *ctx = iocb->ki_ctx; + + aio_ring_complete(ctx, iocb, res, res2); /* * Check if the user asked us to deliver the result through an From patchwork Tue Dec 11 00:15:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722885 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DDBF213BF for ; Tue, 11 Dec 2018 00:16:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CEC652A483 for ; Tue, 11 Dec 2018 00:16:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C29B02A486; Tue, 11 Dec 2018 00:16:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 76B762A483 for ; Tue, 11 Dec 2018 00:16:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729543AbeLKAQu (ORCPT ); Mon, 10 Dec 2018 19:16:50 -0500 Received: from mail-pg1-f193.google.com ([209.85.215.193]:44077 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729563AbeLKAQt (ORCPT ); Mon, 10 Dec 2018 19:16:49 -0500 Received: by mail-pg1-f193.google.com with SMTP id t13so5746311pgr.11 for ; Mon, 10 Dec 2018 16:16:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=NdUj7+vWcnuqdKEx/mdjlZ9zUpcJbvgzGfAQtrkuAQU=; b=qyrVIU5JNKebDD6JTbAkHjXY94o8CyVYMDU9E7E3e5kcxqgTepFb6rgNQe70F/1e2a wCocP1jLfJeKsoA9FEGPo2ge463VMeQ+ZIQ31ePdaOedZhiIsbBm8gRAIYQZFI/Rnp3/ rkMUWIhihblM3bA1RqEp/GblZbt6SbEwgWBRjJuZvB0dYUs1FNh/L5NKyLqyS2KHvNlL yxfJl9Ur+6hkvvIh6IKFKsmHAIFPyt3/3eHhHsONo7J2cWM1w+AoCJjtEDUn7YgpYtbH StZ7Jhv7bVX4fw6Yf8oZFQ6v3loYxpoVhYrTdRLLWRFySv+LnTbdEycJfiPc5ZWx1ufo 5lMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=NdUj7+vWcnuqdKEx/mdjlZ9zUpcJbvgzGfAQtrkuAQU=; b=Aoqwycoqe4AaJNjkjdT3uR1ZLzvkGKGQdgsTejIPghjJuPmkwgbGwQlmuMJKThLo1l YMoaGVryzB0rkMz+gmiLdPedMCgpKU4nHbDP9F00daN89nKkn2IPdd4ndkXxB+dzUopv PRyxCb0PkC8v1ZfbD7FQo/P/6OZmV0Gv+AoHYlLbbL546mgeqoD3ZbPc88n1PaHezB4Q iDgSRN7MkWcOWHVxAnhK23y7gJocv1Rc+uV5M+5NHVvrsi3Bts6yUc6qvJFSEyB97VwN 7XWmMPts6H6BHjLS4Fqc4Lgn6255zyWkKJrqEFfsLLHRmIOog/zS0Vy5D0QxGVPXDZf9 rjow== X-Gm-Message-State: AA+aEWZ/ugiaz35T5IokHNX6/3mqb0MqSFJu3CVGpMkvG/3qLVBydYAJ 0W1StyvfEu/OLZEzZFilVrXPRRfMktmL0Q== X-Google-Smtp-Source: AFSGD/VpSe0xCK42Tq+NaQfbzWAopBnFdSvCTZE2wGKPgWy6d0jBQYiQoFkbTgnwSRs8TtRe7z7DeA== X-Received: by 2002:a62:7652:: with SMTP id r79mr14806336pfc.241.1544487407478; Mon, 10 Dec 2018 16:16:47 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:46 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 26/27] aio: add support for submission/completion rings Date: Mon, 10 Dec 2018 17:15:48 -0700 Message-Id: <20181211001549.30085-27-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Experimental support for submitting and completing IO through rings shared between the application and kernel. The submission rings are struct iocb, like we would submit through io_submit(), and the completion rings are struct io_event, like we would pass in (and copy back) from io_getevents(). A new system call is added for this, io_ring_enter(). This system call submits IO that is queued in the SQ ring, and/or completes IO and stores the results in the CQ ring. This could be augmented with a kernel thread that does the submission and polling, then the application would never have to enter the kernel to do IO. Sample application: http://git.kernel.dk/cgit/fio/plain/t/aio-ring.c Signed-off-by: Jens Axboe --- arch/x86/entry/syscalls/syscall_64.tbl | 1 + fs/aio.c | 456 +++++++++++++++++++++++-- include/linux/syscalls.h | 4 +- include/uapi/linux/aio_abi.h | 26 ++ kernel/sys_ni.c | 1 + 5 files changed, 452 insertions(+), 36 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 67c357225fb0..55a26700a637 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -344,6 +344,7 @@ 333 common io_pgetevents __x64_sys_io_pgetevents 334 common rseq __x64_sys_rseq 335 common io_setup2 __x64_sys_io_setup2 +336 common io_ring_enter __x64_sys_io_ring_enter # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/fs/aio.c b/fs/aio.c index a69e6228fe62..f4f39a7f8f94 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -142,6 +142,11 @@ struct kioctx { struct aio_mapped_range iocb_range; + /* if used, completion and submission rings */ + struct aio_mapped_range sq_ring; + struct aio_mapped_range cq_ring; + int cq_ring_overflow; + struct rcu_work free_rwork; /* see free_ioctx() */ /* @@ -297,6 +302,8 @@ static const struct address_space_operations aio_ctx_aops; static const unsigned int iocb_page_shift = ilog2(PAGE_SIZE / sizeof(struct iocb)); +static const unsigned int event_page_shift = + ilog2(PAGE_SIZE / sizeof(struct io_event)); /* * We rely on block level unplugs to flush pending requests, if we schedule @@ -307,6 +314,7 @@ static const bool aio_use_state_req_list = true; static const bool aio_use_state_req_list = false; #endif +static void aio_scqring_unmap(struct kioctx *); static void aio_useriocb_unmap(struct kioctx *); static void aio_iopoll_reap_events(struct kioctx *); static void aio_iocb_buffer_unmap(struct kioctx *); @@ -539,6 +547,12 @@ static const struct address_space_operations aio_ctx_aops = { #endif }; +/* Polled IO or SQ/CQ rings don't use the old ring */ +static bool aio_ctx_old_ring(struct kioctx *ctx) +{ + return !(ctx->flags & (IOCTX_FLAG_IOPOLL | IOCTX_FLAG_SCQRING)); +} + static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) { struct aio_ring *ring; @@ -553,7 +567,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) * IO polling doesn't require any io event entries */ size = sizeof(struct aio_ring); - if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) { + if (aio_ctx_old_ring(ctx)) { nr_events += 2; /* 1 is required, 2 for good luck */ size += sizeof(struct io_event) * nr_events; } @@ -640,6 +654,17 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) return 0; } +/* + * Don't support cancel on anything that isn't regular aio + */ +static bool aio_ctx_supports_cancel(struct kioctx *ctx) +{ + int noflags = IOCTX_FLAG_USERIOCB | IOCTX_FLAG_IOPOLL | + IOCTX_FLAG_SCQRING; + + return (ctx->flags & noflags) == 0; +} + #define AIO_EVENTS_PER_PAGE (PAGE_SIZE / sizeof(struct io_event)) #define AIO_EVENTS_FIRST_PAGE ((PAGE_SIZE - sizeof(struct aio_ring)) / sizeof(struct io_event)) #define AIO_EVENTS_OFFSET (AIO_EVENTS_PER_PAGE - AIO_EVENTS_FIRST_PAGE) @@ -650,6 +675,8 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel) struct kioctx *ctx = req->ki_ctx; unsigned long flags; + if (WARN_ON_ONCE(!aio_ctx_supports_cancel(ctx))) + return; if (WARN_ON_ONCE(!list_empty(&req->ki_list))) return; @@ -673,6 +700,7 @@ static void free_ioctx(struct work_struct *work) aio_iocb_buffer_unmap(ctx); aio_useriocb_unmap(ctx); + aio_scqring_unmap(ctx); aio_free_ring(ctx); free_percpu(ctx->cpu); percpu_ref_exit(&ctx->reqs); @@ -1218,6 +1246,47 @@ static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb, ev->res2 = res2; } +static struct io_event *__aio_get_cqring_ev(struct aio_io_event_ring *ring, + struct aio_mapped_range *range, + unsigned *next_tail) +{ + struct io_event *ev; + unsigned tail; + + smp_rmb(); + tail = READ_ONCE(ring->tail); + *next_tail = tail + 1; + if (*next_tail == ring->nr_events) + *next_tail = 0; + if (*next_tail == READ_ONCE(ring->head)) + return NULL; + + /* io_event array starts offset one into the mapped range */ + tail++; + ev = page_address(range->pages[tail >> event_page_shift]); + tail &= ((1 << event_page_shift) - 1); + return ev + tail; +} + +static void aio_commit_cqring(struct kioctx *ctx, unsigned next_tail) +{ + struct aio_io_event_ring *ring; + + ring = page_address(ctx->cq_ring.pages[0]); + if (next_tail != ring->tail) { + ring->tail = next_tail; + smp_wmb(); + } +} + +static struct io_event *aio_peek_cqring(struct kioctx *ctx, unsigned *ntail) +{ + struct aio_io_event_ring *ring; + + ring = page_address(ctx->cq_ring.pages[0]); + return __aio_get_cqring_ev(ring, &ctx->cq_ring, ntail); +} + static void aio_ring_complete(struct kioctx *ctx, struct aio_kiocb *iocb, long res, long res2) { @@ -1279,7 +1348,36 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2) { struct kioctx *ctx = iocb->ki_ctx; - aio_ring_complete(ctx, iocb, res, res2); + if (ctx->flags & IOCTX_FLAG_SCQRING) { + unsigned long flags; + struct io_event *ev; + unsigned int tail; + + /* + * If we can't get a cq entry, userspace overflowed the + * submission (by quite a lot). Flag it as an overflow + * condition, and next io_ring_enter(2) call will return + * -EOVERFLOW. + */ + spin_lock_irqsave(&ctx->completion_lock, flags); + ev = aio_peek_cqring(ctx, &tail); + if (ev) { + aio_fill_event(ev, iocb, res, res2); + aio_commit_cqring(ctx, tail); + } else + ctx->cq_ring_overflow = 1; + spin_unlock_irqrestore(&ctx->completion_lock, flags); + } else { + aio_ring_complete(ctx, iocb, res, res2); + + /* + * We have to order our ring_info tail store above and test + * of the wait list below outside the wait lock. This is + * like in wake_up_bit() where clearing a bit has to be + * ordered with the unlocked test. + */ + smp_mb(); + } /* * Check if the user asked us to deliver the result through an @@ -1291,14 +1389,6 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2) eventfd_ctx_put(iocb->ki_eventfd); } - /* - * We have to order our ring_info tail store above and test - * of the wait list below outside the wait lock. This is - * like in wake_up_bit() where clearing a bit has to be - * ordered with the unlocked test. - */ - smp_mb(); - if (waitqueue_active(&ctx->wait)) wake_up(&ctx->wait); iocb_put(iocb); @@ -1421,6 +1511,9 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, return 0; list_for_each_entry_safe(iocb, n, &ctx->poll_completing, ki_list) { + struct io_event *ev = NULL; + unsigned int next_tail; + if (*nr_events == max) break; if (!test_bit(KIOCB_F_POLL_COMPLETED, &iocb->ki_flags)) @@ -1428,6 +1521,14 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, if (to_free == AIO_IOPOLL_BATCH) iocb_put_many(ctx, iocbs, &to_free); + /* Will only happen if the application over-commits */ + ret = -EAGAIN; + if (ctx->flags & IOCTX_FLAG_SCQRING) { + ev = aio_peek_cqring(ctx, &next_tail); + if (!ev) + break; + } + list_del(&iocb->ki_list); iocbs[to_free++] = iocb; @@ -1446,8 +1547,11 @@ static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs, file_count = 1; } - if (evs && copy_to_user(evs + *nr_events, &iocb->ki_ev, - sizeof(iocb->ki_ev))) { + if (ev) { + memcpy(ev, &iocb->ki_ev, sizeof(*ev)); + aio_commit_cqring(ctx, next_tail); + } else if (evs && copy_to_user(evs + *nr_events, &iocb->ki_ev, + sizeof(iocb->ki_ev))) { ret = -EFAULT; break; } @@ -1628,15 +1732,42 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr, return ret; } -static struct iocb *aio_iocb_from_index(struct kioctx *ctx, int index) +static struct iocb *__aio_sqring_from_index(struct aio_iocb_ring *ring, + struct aio_mapped_range *range, + int index) { struct iocb *iocb; - iocb = page_address(ctx->iocb_range.pages[index >> iocb_page_shift]); + /* iocb array starts offset one into the mapped range */ + index++; + iocb = page_address(range->pages[index >> iocb_page_shift]); index &= ((1 << iocb_page_shift) - 1); return iocb + index; } +static struct iocb *aio_sqring_from_index(struct kioctx *ctx, int index) +{ + struct aio_iocb_ring *ring; + + ring = page_address(ctx->sq_ring.pages[0]); + return __aio_sqring_from_index(ring, &ctx->sq_ring, index); +} + +static struct iocb *aio_iocb_from_index(struct kioctx *ctx, int index) +{ + struct iocb *iocb; + + if (ctx->flags & IOCTX_FLAG_SCQRING) { + iocb = aio_sqring_from_index(ctx, index); + } else { + iocb = page_address(ctx->iocb_range.pages[index >> iocb_page_shift]); + index &= ((1 << iocb_page_shift) - 1); + iocb += index; + } + + return iocb; +} + static void aio_unmap_range(struct aio_mapped_range *range) { int i; @@ -1692,6 +1823,52 @@ static int aio_useriocb_map(struct kioctx *ctx, struct iocb __user *iocbs) return aio_map_range(&ctx->iocb_range, iocbs, size, 0); } +static void aio_scqring_unmap(struct kioctx *ctx) +{ + aio_unmap_range(&ctx->sq_ring); + aio_unmap_range(&ctx->cq_ring); +} + +static int aio_scqring_map(struct kioctx *ctx, + struct aio_iocb_ring __user *sq_ring, + struct aio_io_event_ring __user *cq_ring) +{ + struct aio_iocb_ring *ksq_ring; + struct aio_io_event_ring *kcq_ring; + int ret, sq_ring_size, cq_ring_size; + size_t size; + + /* + * The CQ ring size is QD + 1, so we don't have to track full condition + * for head == tail. The SQ ring we make twice that in size, to make + * room for having more inflight than the QD. + */ + sq_ring_size = ctx->max_reqs; + cq_ring_size = 2 * ctx->max_reqs; + + size = sq_ring_size * sizeof(struct iocb); + ret = aio_map_range(&ctx->sq_ring, sq_ring, + sq_ring_size * sizeof(struct iocb), 0); + if (ret) + return ret; + + ret = aio_map_range(&ctx->cq_ring, cq_ring, + cq_ring_size * sizeof(struct io_event), FOLL_WRITE); + if (ret) { + aio_unmap_range(&ctx->sq_ring); + return ret; + } + + ksq_ring = page_address(ctx->sq_ring.pages[0]); + ksq_ring->nr_events = sq_ring_size; + ksq_ring->head = ksq_ring->tail = 0; + + kcq_ring = page_address(ctx->cq_ring.pages[0]); + kcq_ring->nr_events = cq_ring_size; + kcq_ring->head = kcq_ring->tail = 0; + return 0; +} + static void aio_iocb_buffer_unmap(struct kioctx *ctx) { int i, j; @@ -1815,22 +1992,22 @@ static int aio_iocb_buffer_map(struct kioctx *ctx) * *iocbs - pointer to array of struct iocb, for when * IOCTX_FLAG_USERIOCB is set in flags. * - * *user1 - reserved for future use + * *sq_ring - pointer to the userspace SQ ring, if used. * - * *user2 - reserved for future use. + * *cq_ring - pointer to the userspace CQ ring, if used. */ -SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, struct iocb __user *, - iocbs, void __user *, user1, void __user *, user2, +SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, + struct iocb __user *, iocbs, + struct aio_iocb_ring __user *, sq_ring, + struct aio_io_event_ring __user *, cq_ring, aio_context_t __user *, ctxp) { struct kioctx *ioctx; unsigned long ctx; long ret; - if (user1 || user2) - return -EINVAL; if (flags & ~(IOCTX_FLAG_USERIOCB | IOCTX_FLAG_IOPOLL | - IOCTX_FLAG_FIXEDBUFS)) + IOCTX_FLAG_FIXEDBUFS | IOCTX_FLAG_SCQRING)) return -EINVAL; ret = get_user(ctx, ctxp); @@ -1843,18 +2020,26 @@ SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, struct iocb __user *, goto out; if (flags & IOCTX_FLAG_USERIOCB) { + ret = -EINVAL; + if (flags & IOCTX_FLAG_SCQRING) + goto err; + ret = aio_useriocb_map(ioctx, iocbs); if (ret) goto err; - if (flags & IOCTX_FLAG_FIXEDBUFS) { - ret = aio_iocb_buffer_map(ioctx); - if (ret) - goto err; - } - } else if (flags & IOCTX_FLAG_FIXEDBUFS) { - /* can only support fixed bufs with user mapped iocbs */ + } + if (flags & IOCTX_FLAG_SCQRING) { + ret = aio_scqring_map(ioctx, sq_ring, cq_ring); + if (ret) + goto err; + } + if (flags & IOCTX_FLAG_FIXEDBUFS) { ret = -EINVAL; - goto err; + if (!(flags & (IOCTX_FLAG_USERIOCB | IOCTX_FLAG_SCQRING))) + goto err; + ret = aio_iocb_buffer_map(ioctx); + if (ret) + goto err; } ret = put_user(ioctx->user_id, ctxp); @@ -2556,8 +2741,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, return -EINVAL; } - /* Poll IO doesn't need ring reservations */ - if (!(ctx->flags & IOCTX_FLAG_IOPOLL) && !get_reqs_available(ctx)) + if (aio_ctx_old_ring(ctx) && !get_reqs_available(ctx)) return -EAGAIN; ret = -EAGAIN; @@ -2581,7 +2765,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, } /* Don't support cancel on user mapped iocbs or polled context */ - if (!(ctx->flags & (IOCTX_FLAG_USERIOCB | IOCTX_FLAG_IOPOLL))) { + if (aio_ctx_supports_cancel(ctx)) { ret = put_user(KIOCB_KEY, &user_iocb->aio_key); if (unlikely(ret)) { pr_debug("EFAULT: aio_key\n"); @@ -2647,7 +2831,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, eventfd_ctx_put(req->ki_eventfd); iocb_put(req); out_put_reqs_available: - if (!(ctx->flags & IOCTX_FLAG_IOPOLL)) + if (aio_ctx_old_ring(ctx)) put_reqs_available(ctx, 1); return ret; } @@ -2720,6 +2904,201 @@ static void aio_submit_state_start(struct aio_submit_state *state, #endif } +static struct iocb *__aio_get_sqring(struct aio_iocb_ring *ring, + struct aio_mapped_range *range, + unsigned *next_head) +{ + unsigned head; + + smp_rmb(); + head = READ_ONCE(ring->head); + if (head == READ_ONCE(ring->tail)) + return NULL; + + *next_head = head + 1; + if (*next_head == ring->nr_events) + *next_head = 0; + + return __aio_sqring_from_index(ring, range, head); +} + +static void aio_commit_sqring(struct kioctx *ctx, unsigned next_head) +{ + struct aio_iocb_ring *ring; + + ring = page_address(ctx->sq_ring.pages[0]); + if (ring->head != next_head) { + ring->head = next_head; + smp_wmb(); + } +} + +static const struct iocb *aio_peek_sqring(struct kioctx *ctx, unsigned *nhead) +{ + struct aio_iocb_ring *ring; + + ring = page_address(ctx->sq_ring.pages[0]); + return __aio_get_sqring(ring, &ctx->sq_ring, nhead); +} + +static int aio_ring_submit(struct kioctx *ctx, unsigned int to_submit) +{ + bool kaddr = (ctx->flags & IOCTX_FLAG_FIXEDBUFS) != 0; + struct aio_submit_state state, *statep = NULL; + int i, ret = 0, submit = 0; + + if (to_submit > AIO_PLUG_THRESHOLD) { + aio_submit_state_start(&state, ctx, to_submit); + statep = &state; + } + + for (i = 0; i < to_submit; i++) { + const struct iocb *iocb; + unsigned int next_head; + + iocb = aio_peek_sqring(ctx, &next_head); + if (!iocb) + break; + + ret = __io_submit_one(ctx, iocb, NULL, NULL, false, kaddr); + if (ret) + break; + + submit++; + aio_commit_sqring(ctx, next_head); + } + + if (statep) + aio_submit_state_end(statep); + + return submit ? submit : ret; +} + +/* + * Wait until events become available, if we don't already have some. The + * application must reap them itself, as they reside on the shared cq ring. + */ +static int aio_cqring_wait(struct kioctx *ctx, int min_events) +{ + struct aio_io_event_ring *ring; + DEFINE_WAIT(wait); + int ret = 0; + + ring = page_address(ctx->cq_ring.pages[0]); + smp_rmb(); + if (ring->head != ring->tail) + return 0; + + do { + prepare_to_wait(&ctx->wait, &wait, TASK_INTERRUPTIBLE); + + ret = 0; + smp_rmb(); + if (ring->head != ring->tail) + break; + if (!min_events) + break; + + schedule(); + + ret = -EINVAL; + if (atomic_read(&ctx->dead)) + break; + ret = -EINTR; + if (signal_pending(current)) + break; + } while (1); + + finish_wait(&ctx->wait, &wait); + return ret; +} + +static int __io_ring_enter(struct kioctx *ctx, unsigned int to_submit, + unsigned int min_complete, unsigned int flags) +{ + int ret = 0; + + if (flags & IORING_FLAG_SUBMIT) { + ret = aio_ring_submit(ctx, to_submit); + if (ret < 0) + return ret; + } + if (flags & IORING_FLAG_GETEVENTS) { + unsigned int nr_events = 0; + int get_ret; + + if (!ret && to_submit) + min_complete = 0; + + if (ctx->flags & IOCTX_FLAG_IOPOLL) + get_ret = __aio_iopoll_check(ctx, NULL, &nr_events, + min_complete, -1U); + else + get_ret = aio_cqring_wait(ctx, min_complete); + + if (get_ret < 0 && !ret) + ret = get_ret; + } + + return ret; +} + +/* sys_io_ring_enter: + * Alternative way to both submit and complete IO, instead of using + * io_submit(2) and io_getevents(2). Requires the use of the SQ/CQ + * ring interface, hence the io_context must be setup with + * io_setup2() and IOCTX_FLAG_SCQRING must be specified (and the + * sq_ring/cq_ring passed in). + * + * Returns the number of IOs submitted, if IORING_FLAG_SUBMIT + * is used, otherwise returns 0 for IORING_FLAG_GETEVENTS success, + * but not the number of events, as those will have to be found + * by the application by reading the CQ ring anyway. + * + * Apart from that, the error returns are much like io_submit() + * and io_getevents(), since a lot of the same error conditions + * are shared. + */ +SYSCALL_DEFINE4(io_ring_enter, aio_context_t, ctx_id, u32, to_submit, + u32, min_complete, u32, flags) +{ + struct kioctx *ctx; + long ret; + + BUILD_BUG_ON(sizeof(struct aio_iocb_ring) != sizeof(struct iocb)); + BUILD_BUG_ON(sizeof(struct aio_io_event_ring) != + sizeof(struct io_event)); + + ctx = lookup_ioctx(ctx_id); + if (!ctx) { + pr_debug("EINVAL: invalid context id\n"); + return -EINVAL; + } + + ret = -EBUSY; + if (!mutex_trylock(&ctx->getevents_lock)) + goto err; + + ret = -EOVERFLOW; + if (ctx->cq_ring_overflow) { + ctx->cq_ring_overflow = 0; + goto err_unlock; + } + + ret = -EINVAL; + if (unlikely(atomic_read(&ctx->dead))) + goto err_unlock; + + if (ctx->flags & IOCTX_FLAG_SCQRING) + ret = __io_ring_enter(ctx, to_submit, min_complete, flags); + +err_unlock: + mutex_unlock(&ctx->getevents_lock); +err: + percpu_ref_put(&ctx->users); + return ret; +} + /* sys_io_submit: * Queue the nr iocbs pointed to by iocbpp for processing. Returns * the number of iocbs queued. May return -EINVAL if the aio_context @@ -2749,6 +3128,10 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr, return -EINVAL; } + /* SCQRING must use io_ring_enter() */ + if (ctx->flags & IOCTX_FLAG_SCQRING) + return -EINVAL; + if (nr > ctx->nr_events) nr = ctx->nr_events; @@ -2865,7 +3248,7 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb, if (unlikely(!ctx)) return -EINVAL; - if (ctx->flags & (IOCTX_FLAG_USERIOCB | IOCTX_FLAG_IOPOLL)) + if (!aio_ctx_supports_cancel(ctx)) goto err; spin_lock_irq(&ctx->ctx_lock); @@ -2900,7 +3283,10 @@ static long do_io_getevents(aio_context_t ctx_id, long ret = -EINVAL; if (likely(ioctx)) { - if (likely(min_nr <= nr && min_nr >= 0)) { + /* SCQRING must use io_ring_enter() */ + if (ioctx->flags & IOCTX_FLAG_SCQRING) + ret = -EINVAL; + else if (min_nr <= nr && min_nr >= 0) { if (ioctx->flags & IOCTX_FLAG_IOPOLL) ret = aio_iopoll_check(ioctx, min_nr, nr, events); else diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index a20a663d583f..576725d00020 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -288,8 +288,10 @@ static inline void addr_limit_user_check(void) #ifndef CONFIG_ARCH_HAS_SYSCALL_WRAPPER asmlinkage long sys_io_setup(unsigned nr_reqs, aio_context_t __user *ctx); asmlinkage long sys_io_setup2(unsigned, unsigned, struct iocb __user *, - void __user *, void __user *, + struct aio_iocb_ring __user *, + struct aio_io_event_ring __user *, aio_context_t __user *); +asmlinkage long sys_io_ring_enter(aio_context_t, unsigned, unsigned, unsigned); asmlinkage long sys_io_destroy(aio_context_t ctx); asmlinkage long sys_io_submit(aio_context_t, long, struct iocb __user * __user *); diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index 05d72cf86bd3..9fb7d0ec868f 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -111,6 +111,32 @@ struct iocb { #define IOCTX_FLAG_USERIOCB (1 << 0) /* iocbs are user mapped */ #define IOCTX_FLAG_IOPOLL (1 << 1) /* io_context is polled */ #define IOCTX_FLAG_FIXEDBUFS (1 << 2) /* IO buffers are fixed */ +#define IOCTX_FLAG_SCQRING (1 << 3) /* Use SQ/CQ rings */ + +struct aio_iocb_ring { + union { + struct { + u32 head, tail; + u32 nr_events; + }; + struct iocb pad_iocb; + }; + struct iocb iocbs[0]; +}; + +struct aio_io_event_ring { + union { + struct { + u32 head, tail; + u32 nr_events; + }; + struct io_event pad_event; + }; + struct io_event events[0]; +}; + +#define IORING_FLAG_SUBMIT (1 << 0) +#define IORING_FLAG_GETEVENTS (1 << 1) #undef IFBIG #undef IFLITTLE diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 17c8b4393669..a32b7ea93838 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -38,6 +38,7 @@ asmlinkage long sys_ni_syscall(void) COND_SYSCALL(io_setup); COND_SYSCALL(io_setup2); +COND_SYSCALL(io_ring_enter); COND_SYSCALL_COMPAT(io_setup); COND_SYSCALL(io_destroy); COND_SYSCALL(io_submit); From patchwork Tue Dec 11 00:15:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10722891 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 213FE18E8 for ; Tue, 11 Dec 2018 00:16:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0FE8B2A48D for ; Tue, 11 Dec 2018 00:16:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 03CE02A485; Tue, 11 Dec 2018 00:16:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DC4182A483 for ; Tue, 11 Dec 2018 00:16:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727903AbeLKAQz (ORCPT ); Mon, 10 Dec 2018 19:16:55 -0500 Received: from mail-pl1-f193.google.com ([209.85.214.193]:34952 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729563AbeLKAQu (ORCPT ); Mon, 10 Dec 2018 19:16:50 -0500 Received: by mail-pl1-f193.google.com with SMTP id p8so6039295plo.2 for ; Mon, 10 Dec 2018 16:16:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=tBHP5sOR5DXH98m2JOmIkXehq+FLfo5iFcQI16RcgNI=; b=hi9l4joddTD20tjRlCCyJDrOxXPKVR9z3dB+a7O5HT3pz9oecQm8rjZGQnYvFhwJSy qRYGWORzXw4hqmnslpRQjq03VPILMCxZ5XI0/DuvPewOoLfpgWmFG+pzFAWypgH98R6e YBZWhBiLsixtZHe/TL9bulnxCX8L3U+ljZSyoSnBdWCxfLVbOTi8DwDkdaBy6AN/N7sX 6oyXh+woZVw2ys+Lal2YRjbc1O/FMZzuN0eOdj2SlbG+cKAzUXiA7ZRWRylJfjre6iJr hKu5/tGp2Dwvf8AAKRy8S2dGOMCQzxu01Idx3IQFd1h2IMSWc3jKq5MvrR2QqUzBy8rw jwkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=tBHP5sOR5DXH98m2JOmIkXehq+FLfo5iFcQI16RcgNI=; b=RDtDWYluzbf3u/D5f3ZVhc2fCBp4IDMKvXlHawpK4hB2iHjg77e7jpRNtvjftCQ83q AMYHO+aZJsQ6PNVdLm+OhCM8GPW6SzgPRy2Arc2fzwDj32jjl1P6DDkUKGE7lt6sCRCS GHgfRnE7aZ4F17eFQtmfMHkoh1hOX4TAmBSq5jhHvZ5kSygWS0xFEn9WzORCFqaAMa/f SKIlqkwmfivSJGC+JrzTs9uzhgQnkQQpCweIcw4WasIVMGh1q597W3Eiw0Xe0IqsBoza zCHGq8Qh40e+Y5rvAGFTsi5k0CXyM085mWP8ltmhBoiaqXW4AilOczxlQ3PsoPw7+7BO 1OZw== X-Gm-Message-State: AA+aEWZMdayh+vx2iYT72gKqyovEOpMKY0O1uXC9p18qW1uIzQwuL/Qj yloH73ke+Vs/nBY6mCztM7qQsmH5Unw2DQ== X-Google-Smtp-Source: AFSGD/Usl3o1ju/19az+RqVnY6CPMKSJ7Zdw1P/3uKdUqlxPO4mjR+TOV4FSb+kmntyNThlB+koddg== X-Received: by 2002:a17:902:d911:: with SMTP id c17mr9951260plz.151.1544487409276; Mon, 10 Dec 2018 16:16:49 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id u8sm16872856pfl.16.2018.12.10.16.16.47 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Dec 2018 16:16:48 -0800 (PST) From: Jens Axboe To: linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org Cc: hch@lst.de, jmoyer@redhat.com, clm@fb.com, Jens Axboe Subject: [PATCH 27/27] aio: support kernel side submission for aio with SCQRING Date: Mon, 10 Dec 2018 17:15:49 -0700 Message-Id: <20181211001549.30085-28-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181211001549.30085-1-axboe@kernel.dk> References: <20181211001549.30085-1-axboe@kernel.dk> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add support for backing the io_context with either a thread, or a workqueue and letting those handle the submission for us. This can be used to reduce overhead for submission, or to always make submission async. The latter is particularly useful for buffered aio, which is now fully async with this feature. For polled IO, we could have the kernel side thread hammer on the SQ ring and submit when it finds IO. This would mean that an application would NEVER have to enter the kernel to do IO! Didn't add this yet, but it would be trivial to add. If an application sets IOCTX_FLAG_SCQTHREAD, the io_context gets a single thread backing. If used with buffered IO, this will limit the device queue depth to 1, but it will be async, IOs will simply be serialized. Or an application can set IOCTX_FLAG_SQWQ, in which case the io_context gets a work queue backing. The concurrency level is the mininum of twice the available CPUs, or the queue depth specific for the context. For this mode, we attempt to do buffered reads inline, in case they are cached. So we should only punt to a workqueue, if we would have to block to get our data. Tested with polling, no polling, fixedbufs, no fixedbufs, buffered, O_DIRECT. See the sample application for how to use it: http://git.kernel.dk/cgit/fio/plain/t/aio-ring.c Signed-off-by: Jens Axboe --- fs/aio.c | 416 ++++++++++++++++++++++++++++++++--- include/uapi/linux/aio_abi.h | 3 + 2 files changed, 389 insertions(+), 30 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index f4f39a7f8f94..44284b1f4ec9 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include #include @@ -43,6 +44,7 @@ #include #include #include +#include #include #include @@ -103,6 +105,14 @@ struct aio_mapped_ubuf { unsigned int nr_bvecs; }; +struct aio_sq_offload { + struct task_struct *thread; /* if using a thread */ + struct workqueue_struct *wq; /* wq offload */ + struct mm_struct *mm; + struct files_struct *files; + wait_queue_head_t wait; +}; + struct kioctx { struct percpu_ref users; atomic_t dead; @@ -146,6 +156,10 @@ struct kioctx { struct aio_mapped_range sq_ring; struct aio_mapped_range cq_ring; int cq_ring_overflow; + int submit_eagain; + + /* sq ring submitter thread, if used */ + struct aio_sq_offload sq_offload; struct rcu_work free_rwork; /* see free_ioctx() */ @@ -236,6 +250,7 @@ struct aio_kiocb { unsigned long ki_flags; #define KIOCB_F_POLL_COMPLETED 0 /* polled IO has completed */ #define KIOCB_F_POLL_EAGAIN 1 /* polled submission got EAGAIN */ +#define KIOCB_F_FORCE_NONBLOCK 2 /* inline submission attempt */ refcount_t ki_refcnt; @@ -1354,19 +1369,31 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2) unsigned int tail; /* - * If we can't get a cq entry, userspace overflowed the - * submission (by quite a lot). Flag it as an overflow - * condition, and next io_ring_enter(2) call will return - * -EOVERFLOW. + * Catch EAGAIN early if we've forced a nonblock attempt, as + * we don't want to pass that back down to userspace through + * the CQ ring. Just mark the ctx as such, so the caller will + * see it and punt to workqueue. This is just for buffered + * aio reads. */ - spin_lock_irqsave(&ctx->completion_lock, flags); - ev = aio_peek_cqring(ctx, &tail); - if (ev) { - aio_fill_event(ev, iocb, res, res2); - aio_commit_cqring(ctx, tail); - } else - ctx->cq_ring_overflow = 1; - spin_unlock_irqrestore(&ctx->completion_lock, flags); + if (res == -EAGAIN && + test_bit(KIOCB_F_FORCE_NONBLOCK, &iocb->ki_flags)) { + ctx->submit_eagain = 1; + } else { + /* + * If we can't get a cq entry, userspace overflowed the + * submission (by quite a lot). Flag it as an overflow + * condition, and next io_ring_enter(2) call will return + * -EOVERFLOW. + */ + spin_lock_irqsave(&ctx->completion_lock, flags); + ev = aio_peek_cqring(ctx, &tail); + if (ev) { + aio_fill_event(ev, iocb, res, res2); + aio_commit_cqring(ctx, tail); + } else + ctx->cq_ring_overflow = 1; + spin_unlock_irqrestore(&ctx->completion_lock, flags); + } } else { aio_ring_complete(ctx, iocb, res, res2); @@ -1768,6 +1795,63 @@ static struct iocb *aio_iocb_from_index(struct kioctx *ctx, int index) return iocb; } +static int aio_sq_thread(void *); + +static int aio_sq_thread_start(struct kioctx *ctx, struct aio_iocb_ring *ring) +{ + struct aio_sq_offload *aso = &ctx->sq_offload; + int ret; + + memset(aso, 0, sizeof(*aso)); + init_waitqueue_head(&aso->wait); + + if (!(ctx->flags & IOCTX_FLAG_FIXEDBUFS)) + aso->mm = current->mm; + + ret = -EBADF; + aso->files = get_files_struct(current); + if (!aso->files) + goto err; + + if (ctx->flags & IOCTX_FLAG_SQTHREAD) { + char name[32]; + + snprintf(name, sizeof(name), "aio-sq-%lu/%d", ctx->user_id, + ring->sq_thread_cpu); + aso->thread = kthread_create_on_cpu(aio_sq_thread, ctx, + ring->sq_thread_cpu, name); + if (IS_ERR(aso->thread)) { + ret = PTR_ERR(aso->thread); + aso->thread = NULL; + goto err; + } + wake_up_process(aso->thread); + } else if (ctx->flags & IOCTX_FLAG_SQWQ) { + int concurrency; + + /* Do QD, or 2 * CPUS, whatever is smallest */ + concurrency = min(ring->nr_events - 1, 2 * num_online_cpus()); + aso->wq = alloc_workqueue("aio-sq-%lu", + WQ_UNBOUND | WQ_FREEZABLE | WQ_SYSFS, + concurrency, + ctx->user_id); + if (!aso->wq) { + ret = -ENOMEM; + goto err; + } + } + + return 0; +err: + if (aso->files) { + put_files_struct(aso->files); + aso->files = NULL; + } + if (aso->mm) + aso->mm = NULL; + return ret; +} + static void aio_unmap_range(struct aio_mapped_range *range) { int i; @@ -1825,6 +1909,19 @@ static int aio_useriocb_map(struct kioctx *ctx, struct iocb __user *iocbs) static void aio_scqring_unmap(struct kioctx *ctx) { + struct aio_sq_offload *aso = &ctx->sq_offload; + + if (aso->thread) { + kthread_stop(aso->thread); + aso->thread = NULL; + } else if (aso->wq) { + destroy_workqueue(aso->wq); + aso->wq = NULL; + } + if (aso->files) { + put_files_struct(aso->files); + aso->files = NULL; + } aio_unmap_range(&ctx->sq_ring); aio_unmap_range(&ctx->cq_ring); } @@ -1854,10 +1951,8 @@ static int aio_scqring_map(struct kioctx *ctx, ret = aio_map_range(&ctx->cq_ring, cq_ring, cq_ring_size * sizeof(struct io_event), FOLL_WRITE); - if (ret) { - aio_unmap_range(&ctx->sq_ring); - return ret; - } + if (ret) + goto err; ksq_ring = page_address(ctx->sq_ring.pages[0]); ksq_ring->nr_events = sq_ring_size; @@ -1866,7 +1961,16 @@ static int aio_scqring_map(struct kioctx *ctx, kcq_ring = page_address(ctx->cq_ring.pages[0]); kcq_ring->nr_events = cq_ring_size; kcq_ring->head = kcq_ring->tail = 0; - return 0; + + if (ctx->flags & (IOCTX_FLAG_SQTHREAD | IOCTX_FLAG_SQWQ)) + ret = aio_sq_thread_start(ctx, ksq_ring); + +err: + if (ret) { + aio_unmap_range(&ctx->sq_ring); + aio_unmap_range(&ctx->cq_ring); + } + return ret; } static void aio_iocb_buffer_unmap(struct kioctx *ctx) @@ -2007,7 +2111,8 @@ SYSCALL_DEFINE6(io_setup2, u32, nr_events, u32, flags, long ret; if (flags & ~(IOCTX_FLAG_USERIOCB | IOCTX_FLAG_IOPOLL | - IOCTX_FLAG_FIXEDBUFS | IOCTX_FLAG_SCQRING)) + IOCTX_FLAG_FIXEDBUFS | IOCTX_FLAG_SCQRING | + IOCTX_FLAG_SQTHREAD | IOCTX_FLAG_SQWQ)) return -EINVAL; ret = get_user(ctx, ctxp); @@ -2249,7 +2354,7 @@ static struct file *aio_file_get(struct aio_submit_state *state, int fd) } static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb, - struct aio_submit_state *state) + struct aio_submit_state *state, bool force_nonblock) { struct kioctx *ctx = kiocb->ki_ctx; struct kiocb *req = &kiocb->rw; @@ -2282,6 +2387,10 @@ static int aio_prep_rw(struct aio_kiocb *kiocb, const struct iocb *iocb, ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags); if (unlikely(ret)) goto out_fput; + if (force_nonblock) { + req->ki_flags |= IOCB_NOWAIT; + set_bit(KIOCB_F_FORCE_NONBLOCK, &kiocb->ki_flags); + } if (iocb->aio_flags & IOCB_FLAG_HIPRI) { /* shares space in the union, and is rather pointless.. */ @@ -2422,7 +2531,7 @@ static void aio_iopoll_iocb_issued(struct aio_submit_state *state, static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, struct aio_submit_state *state, bool vectored, - bool compat, bool kaddr) + bool compat, bool kaddr, bool force_nonblock) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *req = &kiocb->rw; @@ -2430,7 +2539,7 @@ static ssize_t aio_read(struct aio_kiocb *kiocb, const struct iocb *iocb, struct file *file; ssize_t ret; - ret = aio_prep_rw(kiocb, iocb, state); + ret = aio_prep_rw(kiocb, iocb, state, force_nonblock); if (ret) return ret; file = req->ki_filp; @@ -2467,7 +2576,7 @@ static ssize_t aio_write(struct aio_kiocb *kiocb, const struct iocb *iocb, struct file *file; ssize_t ret; - ret = aio_prep_rw(kiocb, iocb, state); + ret = aio_prep_rw(kiocb, iocb, state, false); if (ret) return ret; file = req->ki_filp; @@ -2720,7 +2829,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb) static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, struct iocb __user *user_iocb, struct aio_submit_state *state, bool compat, - bool kaddr) + bool kaddr, bool force_nonblock) { struct aio_kiocb *req; ssize_t ret; @@ -2779,13 +2888,15 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb, ret = -EINVAL; switch (iocb->aio_lio_opcode) { case IOCB_CMD_PREAD: - ret = aio_read(req, iocb, state, false, compat, kaddr); + ret = aio_read(req, iocb, state, false, compat, kaddr, + force_nonblock); break; case IOCB_CMD_PWRITE: ret = aio_write(req, iocb, state, false, compat, kaddr); break; case IOCB_CMD_PREADV: - ret = aio_read(req, iocb, state, true, compat, kaddr); + ret = aio_read(req, iocb, state, true, compat, kaddr, + force_nonblock); break; case IOCB_CMD_PWRITEV: ret = aio_write(req, iocb, state, true, compat, kaddr); @@ -2857,7 +2968,8 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, iocbp = &iocb; } - return __io_submit_one(ctx, iocbp, user_iocb, state, compat, kaddr); + return __io_submit_one(ctx, iocbp, user_iocb, state, compat, kaddr, + false); } #ifdef CONFIG_BLOCK @@ -2960,7 +3072,8 @@ static int aio_ring_submit(struct kioctx *ctx, unsigned int to_submit) if (!iocb) break; - ret = __io_submit_one(ctx, iocb, NULL, NULL, false, kaddr); + ret = __io_submit_one(ctx, iocb, NULL, NULL, false, kaddr, + false); if (ret) break; @@ -3013,15 +3126,258 @@ static int aio_cqring_wait(struct kioctx *ctx, int min_events) return ret; } +static void aio_fill_cq_error(struct kioctx *ctx, const struct iocb *iocb, + long ret) +{ + struct io_event *ev; + unsigned tail; + + /* + * Only really need the lock for non-polled IO, but this is an error + * so not worth checking. Just lock it so we know kernel access to + * the CQ ring is serialized. + */ + spin_lock_irq(&ctx->completion_lock); + ev = aio_peek_cqring(ctx, &tail); + ev->obj = iocb->aio_data; + ev->data = 0; + ev->res = ret; + ev->res2 = 0; + aio_commit_cqring(ctx, tail); + spin_unlock_irq(&ctx->completion_lock); + + /* + * for thread offload, app could already be sleeping in io_ring_enter() + * before we get to flag the error. wake them up, if needed. + */ + if (ctx->flags & (IOCTX_FLAG_SQTHREAD | IOCTX_FLAG_SQWQ)) + if (waitqueue_active(&ctx->wait)) + wake_up(&ctx->wait); +} + +struct aio_io_work { + struct work_struct work; + struct kioctx *ctx; + struct iocb iocb; +}; + +/* + * sq thread only supports O_DIRECT or FIXEDBUFS IO + */ +static int aio_sq_thread(void *data) +{ + const struct iocb *iocbs[AIO_IOPOLL_BATCH]; + struct aio_submit_state state; + struct kioctx *ctx = data; + struct aio_sq_offload *aso = &ctx->sq_offload; + struct mm_struct *cur_mm = NULL; + struct files_struct *old_files; + mm_segment_t old_fs; + DEFINE_WAIT(wait); + + old_files = current->files; + current->files = aso->files; + + old_fs = get_fs(); + set_fs(USER_DS); + + while (!kthread_should_stop()) { + struct aio_submit_state *statep = NULL; + const struct iocb *iocb; + bool mm_fault = false; + unsigned int nhead; + int ret, i, j; + + iocb = aio_peek_sqring(ctx, &nhead); + if (!iocb) { + prepare_to_wait(&aso->wait, &wait, TASK_INTERRUPTIBLE); + iocb = aio_peek_sqring(ctx, &nhead); + if (!iocb) { + /* + * Drop cur_mm before scheduler. We can't hold + * it for long periods, and it would also + * introduce a deadlock with kill_ioctx(). + */ + if (cur_mm) { + unuse_mm(cur_mm); + mmput(cur_mm); + cur_mm = NULL; + } + schedule(); + } + finish_wait(&aso->wait, &wait); + if (!iocb) + continue; + } + + /* If ->mm is set, we're not doing FIXEDBUFS */ + if (aso->mm) { + mm_fault = !mmget_not_zero(aso->mm); + if (!mm_fault) { + use_mm(aso->mm); + cur_mm = aso->mm; + } + } + + i = 0; + do { + if (i == ARRAY_SIZE(iocbs)) + break; + iocbs[i++] = iocb; + aio_commit_sqring(ctx, nhead); + } while ((iocb = aio_peek_sqring(ctx, &nhead)) != NULL); + + if (i > AIO_PLUG_THRESHOLD) { + aio_submit_state_start(&state, ctx, i); + statep = &state; + } + + for (j = 0; j < i; j++) { + if (unlikely(mm_fault)) + ret = -EFAULT; + else + ret = __io_submit_one(ctx, iocbs[j], NULL, NULL, + false, !cur_mm, false); + if (!ret) + continue; + + aio_fill_cq_error(ctx, iocbs[j], ret); + } + + if (statep) + aio_submit_state_end(&state); + } + current->files = old_files; + set_fs(old_fs); + if (cur_mm) { + unuse_mm(cur_mm); + mmput(cur_mm); + } + return 0; +} + +static void aio_sq_wq_submit_work(struct work_struct *work) +{ + struct aio_io_work *aiw = container_of(work, struct aio_io_work, work); + struct kioctx *ctx = aiw->ctx; + struct aio_sq_offload *aso = &ctx->sq_offload; + mm_segment_t old_fs = get_fs(); + struct files_struct *old_files; + int ret; + + old_files = current->files; + current->files = aso->files; + + if (aso->mm) { + if (!mmget_not_zero(aso->mm)) { + ret = -EFAULT; + goto err; + } + use_mm(aso->mm); + } + + set_fs(USER_DS); + + ret = __io_submit_one(ctx, &aiw->iocb, NULL, NULL, false, !aso->mm, + false); + + set_fs(old_fs); + if (aso->mm) { + unuse_mm(aso->mm); + mmput(aso->mm); + } + +err: + if (ret) + aio_fill_cq_error(ctx, &aiw->iocb, ret); + current->files = old_files; + kfree(aiw); +} + +/* + * If this is a read, try a cached inline read first. If the IO is in the + * page cache, we can satisfy it without blocking and without having to + * punt to a threaded execution. This is much faster, particularly for + * lower queue depth IO, and it's always a lot more efficient. + */ +static int aio_sq_try_inline(struct kioctx *ctx, struct aio_io_work *aiw) +{ + struct aio_sq_offload *aso = &ctx->sq_offload; + int ret; + + if (aiw->iocb.aio_lio_opcode != IOCB_CMD_PREAD && + aiw->iocb.aio_lio_opcode != IOCB_CMD_PREADV) + return -EAGAIN; + + ret = __io_submit_one(ctx, &aiw->iocb, NULL, NULL, false, !aso->mm, + true); + + if (ret == -EAGAIN || ctx->submit_eagain) { + ctx->submit_eagain = 0; + return -EAGAIN; + } + + /* + * We're done - even if this was an error, return 0. The error will + * be in the CQ ring for the application. + */ + kfree(aiw); + return 0; +} + +static int aio_sq_wq_submit(struct kioctx *ctx, unsigned int to_submit) +{ + struct aio_io_work *work; + const struct iocb *iocb; + unsigned nhead; + int ret, queued; + + ret = queued = 0; + while ((iocb = aio_peek_sqring(ctx, &nhead)) != NULL) { + work = kmalloc(sizeof(*work), GFP_KERNEL); + if (!work) { + ret = -ENOMEM; + break; + } + memcpy(&work->iocb, iocb, sizeof(*iocb)); + aio_commit_sqring(ctx, nhead); + ret = aio_sq_try_inline(ctx, work); + if (ret == -EAGAIN) { + INIT_WORK(&work->work, aio_sq_wq_submit_work); + work->ctx = ctx; + queue_work(ctx->sq_offload.wq, &work->work); + ret = 0; + } + queued++; + if (queued == to_submit) + break; + } + + return queued ? queued : ret; +} + static int __io_ring_enter(struct kioctx *ctx, unsigned int to_submit, unsigned int min_complete, unsigned int flags) { int ret = 0; if (flags & IORING_FLAG_SUBMIT) { - ret = aio_ring_submit(ctx, to_submit); - if (ret < 0) - return ret; + /* + * Three options here: + * 1) We have an sq thread, just wake it up to do submissions + * 2) We have an sq wq, queue a work item for each iocb + * 3) Submit directly + */ + if (to_submit && (ctx->flags & IOCTX_FLAG_SQTHREAD)) { + wake_up(&ctx->sq_offload.wait); + ret = to_submit; + } else if (to_submit && (ctx->flags & IOCTX_FLAG_SQWQ)) { + ret = aio_sq_wq_submit(ctx, to_submit); + } else { + ret = aio_ring_submit(ctx, to_submit); + if (ret < 0) + return ret; + } } if (flags & IORING_FLAG_GETEVENTS) { unsigned int nr_events = 0; diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h index 9fb7d0ec868f..500b37feeaa8 100644 --- a/include/uapi/linux/aio_abi.h +++ b/include/uapi/linux/aio_abi.h @@ -112,12 +112,15 @@ struct iocb { #define IOCTX_FLAG_IOPOLL (1 << 1) /* io_context is polled */ #define IOCTX_FLAG_FIXEDBUFS (1 << 2) /* IO buffers are fixed */ #define IOCTX_FLAG_SCQRING (1 << 3) /* Use SQ/CQ rings */ +#define IOCTX_FLAG_SQTHREAD (1 << 4) /* Use SQ thread */ +#define IOCTX_FLAG_SQWQ (1 << 5) /* Use SQ workqueue */ struct aio_iocb_ring { union { struct { u32 head, tail; u32 nr_events; + u32 sq_thread_cpu; }; struct iocb pad_iocb; };