From patchwork Thu Aug 22 03:35:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 13772578 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD281C3DA4A for ; Thu, 22 Aug 2024 03:35:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5944D6B025B; Wed, 21 Aug 2024 23:35:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 53B246B0296; Wed, 21 Aug 2024 23:35:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B71E6B0297; Wed, 21 Aug 2024 23:35:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 197F16B025B for ; Wed, 21 Aug 2024 23:35:45 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id C19BB4048D for ; Thu, 22 Aug 2024 03:35:44 +0000 (UTC) X-FDA: 82478467008.01.72244CD Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) by imf25.hostedemail.com (Postfix) with ESMTP id CE5CEA0004 for ; Thu, 22 Aug 2024 03:35:42 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IYeKNGto; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of asml.silence@gmail.com designates 209.85.221.43 as permitted sender) smtp.mailfrom=asml.silence@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724297677; a=rsa-sha256; cv=none; b=oDZCmGo2qXzkF6RiR5x3R0oe2b8IovdqnECnagS81wxdPKyQiMxOZPwdJfye5AhcqoGFlk kZuqiaB3pM1pTlf1wQ3EzuHzhlEj6TQofr5NsaC9YigOtwR80pORJNHin5votYQ6jHmfGv 4/PkwChSIUfDSZIyqtYO8hH5i3H614I= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IYeKNGto; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of asml.silence@gmail.com designates 209.85.221.43 as permitted sender) smtp.mailfrom=asml.silence@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724297677; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qGj49BU9lgDQwY8mNGLmILE2ejA/5ZP33Z20PIIjLDo=; b=yMg9yy+/VyuYGo0zw9g4Az/kYOjkRWQg/9jHk6GTqYrRY7VBCsmIq5MgHWxGXF8NvMFxxQ eku+AkfgZRmIpmEw134tFsc4zELXv9JXN9iyXBCTZni97lnHJMgLCcjpNJwzY7LDM3YHXo hLSiNWzDq1lApc6vvSLpHUJC6yv2viQ= Received: by mail-wr1-f43.google.com with SMTP id ffacd0b85a97d-3718706cf8aso110251f8f.3 for ; Wed, 21 Aug 2024 20:35:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724297741; x=1724902541; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qGj49BU9lgDQwY8mNGLmILE2ejA/5ZP33Z20PIIjLDo=; b=IYeKNGtomNALHsb1s7Abxy7L9WaNLV5nGr4hFLeEGcs8nnfRhfYMvCxD7iVi/QFwJq zY+3MExniQTBiyB/mUSpHkqFWKqQZiN6I5Fn0PtKGjNE2FQX8o6oNR4Rw9oGXRUKXOcr y51I4LYP6k54l39E1QKDeYBCUzBsUiws+xLS66PjDokp51QFbE/wSLmf8c6x7nWp9HEz KpN2N6pqWN1n0ZvplbUUol86ei7o5dNrF5btwnKjI4qqYWJ4rLyGNdn9H6M4BaUSIEvr fhZYmBJDYg9af78x18KfeWWT44UCoem/UbCt6mcRQH8ZDnTrdv+RIwsgYfEhD3IM/fNQ QVWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724297741; x=1724902541; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qGj49BU9lgDQwY8mNGLmILE2ejA/5ZP33Z20PIIjLDo=; b=Iz7o5ZvwlJYACMw0wiIKeG4vTXACs11cstX6+SUv0N7HvMJOeDKtrj0Ava1sAf5kKx YTAYVrQNK1pVELQlcwrNFCVsu9KTay95cMWIPuK4Cz1sAZCSigGahZuDhqmDmyMuwop2 i8ZeiQmHO/1x2L3xZOeaKGJC2R1mLVtOlafSnkMz/cngcIx6Ig/okWPHR0uRe19Wmgav MMNHvGvnBP8/hU4y3cYIjKoP30lfrnZmffkBlb16H2pdzcewjURKdtmxga2qxCBmgtq0 LdCIOBC32GQwyxv+7js19OXuYNd0E/NpDPZ0qppaV8neXLST5t2CuyLicVwlGB3T9B/I n4YA== X-Forwarded-Encrypted: i=1; AJvYcCWu/IG+9zdSn6nlyOGXDCYNsjSnItjWLkfwUZ0XjVi0w9NQXtYYIlu5mgihkVxvOIJmnsNbrnz69A==@kvack.org X-Gm-Message-State: AOJu0Yw8CtTw3GPZm17kiWQRJwUOnWl7qotkJueKfOmmOa3YIVkxNFUO sUt6gzwjTSNQ/DM2hYanWe77y86blIZCvSEt4timfKD9VcUmjgNgnltBEA== X-Google-Smtp-Source: AGHT+IEgmvrrHLtdXgyQgbBcddGUU2deWM/qRpnDz7MB2b3ub/DdWnFYjjz5OkIhJ7k2yrrWHOHjOg== X-Received: by 2002:adf:cd8a:0:b0:368:310d:c383 with SMTP id ffacd0b85a97d-37308c08adcmr230595f8f.5.1724297741164; Wed, 21 Aug 2024 20:35:41 -0700 (PDT) Received: from 127.0.0.1localhost ([148.252.128.6]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-42abefc626fsm45491995e9.31.2024.08.21.20.35.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Aug 2024 20:35:40 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org Cc: Jens Axboe , asml.silence@gmail.com, Conrad Meyer , linux-block@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 5/7] block: implement async discard as io_uring cmd Date: Thu, 22 Aug 2024 04:35:55 +0100 Message-ID: X-Mailer: git-send-email 2.45.2 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: CE5CEA0004 X-Stat-Signature: kwg9jf55e4huoke8yefe7mumoo19dewo X-Rspam-User: X-HE-Tag: 1724297742-110963 X-HE-Meta: U2FsdGVkX1/QcAi7eljI/3iN3Yoh9juiXrFK9fWy9oncanShKuICH4KEsJyhPokRjJoGPtbC6pWtx6+1v04cY6l2dsZxgChB8Og2ZQstdrHG7dYK9MrcZmLTd36zpNpy0SLRt6z/j0Bf4PC4Y/BV0Hdi5WQto19anM5cj4adUMuM9otBp7e40dCkqz/lsmcJm4BdkkKJ53+U5XF/XUMd/2PJIgyREEogTzcb/XLsUqk8M8IpaS3WEBLUQf2eLm+aLqG1YProE/mDCO18ydZqv7HStCtpRrgWD0OqdMCJ1cu0zjCzKnlMPEpS5k5NBvKT2aqZNQ34ic3CWRXwc+nlplc4CBhCJcb9aYcZXAmEi13SMiWrTR50B23QNvGza1FfusaDAlbHXAyfaKYiR3ZFYnw+1KdKhlfeMhbD/nE2vA9cF3BMveGE0w5ru70Mh1Ad0EPL69Ul059ELtTpER8zFqGsLq4atorE8XVWIBJZnvgBvafpgc03LBnuN/skFEDGwFoNBfo7jOu5nzPQWCoH37Dr9Kt7Kd5e8RCrDmVS7XR4EDvJXeKwcHaf3LfX2Z6GYlLVavH+9+VIwOLIKlFv2Efxw3kOGmjeDpTyMEfQW9ExUQ7LJNJReI032bWvqEd0evkw1/zJoTULbIKMNJvU3nRxM2Wpbalod2mU/i5ELdr2RdCFvuSN5UjWx3M5vlKI83P9Aew8p5k/Vk5KAlmt/j/w79OYMs8/BfzmnmrG4T1H4RX7MLAfJpLgBOQGZE9eVc6MiuvaWI52o0U9KeGfrBf/lKLDyCFXW2y9Q+dlPxDzrv5Py182O4vQNzLEJHi5f+bBY5cW8Lb5cMGM7d7LPkE529SiZs+OlB0pqB4xrvSVlU1hUlJwJxt+3Iae5JO+eHkjpYvUuyj7z8xrxz0XacBL9zaw6XNqIchsn2yz9uHDMA3NlRzOXvGXe0unli1QmPoWqY/F7+QVQR3Bjgy ZuX4J+7V UI8f06U3VdYV1w2/DiE0b6OJjbp+5usyKuD6ExLO6nSRpO5NrVzWJ0N4tkqRMfKBNt2SJclXlDNL80vrCwWc6QBTb13N/mmFe5ELSj3xrEiWaRyJXov3ISisyy8I5ORNZru34Obvoa6hTD9JTizpL270M54BKv2rWZpZATWyNgAyPk3DS/5KO7TxR2ta/BRKQqLwMSsK2n1yeM7txkAM6XlmIvx9ddJdflLf4WFEoE/SJAUh49w/pwebSvZVBiWMj8JbMG3DD5P1kKTQ6/GLpYpE4eHUftyJCmLI5dxck2KSLbEnqsWTzShW2apH2iINukssza/OuerNpF3h5nOd6w1zZlD+JIyGjlyecpPN4XzBa8X+KycRLpKhrwZKEIdi4UwpqFY9DT55dyejLTaWjon+QsIvdl23CkYZ0fLNytIBA86+mwJsy6Dxn3FRz15m7sGLXU1tXg0s2yoVAROuCZxVg/gZsCTge+XanFHQkm40+c6Q66mHM2qJVcNUcmLAcBKVuJUjcqir6ckRE27+fZfmfxK9ZOuPS8iJo+9eL/FK6YxKcMbOi4OKbnw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: io_uring allows to implement custom file specific operations via fops->uring_cmd callback. Use it to wire up asynchronous discard commands. Normally, first it tries to do a non-blocking issue, and if fails we'd retry from a blocking context by returning -EAGAIN to core io_uring. Note, unlike ioctl(BLKDISCARD) with stronger guarantees against races, we only do a best effort attempt to invalidate page cache, and it can race with any writes and reads and leave page cache stale. It's the same kind of races we allow to direct writes. Suggested-by: Conrad Meyer Signed-off-by: Pavel Begunkov --- block/blk.h | 1 + block/fops.c | 2 + block/ioctl.c | 101 ++++++++++++++++++++++++++++++++++++++++ include/uapi/linux/fs.h | 2 + 4 files changed, 106 insertions(+) diff --git a/block/blk.h b/block/blk.h index e180863f918b..5178c5ba6852 100644 --- a/block/blk.h +++ b/block/blk.h @@ -571,6 +571,7 @@ blk_mode_t file_to_blk_mode(struct file *file); int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode, loff_t lstart, loff_t lend); long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg); +int blkdev_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags); long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg); extern const struct address_space_operations def_blk_aops; diff --git a/block/fops.c b/block/fops.c index 9825c1713a49..8154b10b5abf 100644 --- a/block/fops.c +++ b/block/fops.c @@ -17,6 +17,7 @@ #include #include #include +#include #include "blk.h" static inline struct inode *bdev_file_inode(struct file *file) @@ -873,6 +874,7 @@ const struct file_operations def_blk_fops = { .splice_read = filemap_splice_read, .splice_write = iter_file_splice_write, .fallocate = blkdev_fallocate, + .uring_cmd = blkdev_uring_cmd, .fop_flags = FOP_BUFFER_RASYNC, }; diff --git a/block/ioctl.c b/block/ioctl.c index 8df0bc8002f5..a9aaa7cb7f73 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -11,6 +11,8 @@ #include #include #include +#include +#include #include "blk.h" static int blkpg_do_ioctl(struct block_device *bdev, @@ -745,3 +747,102 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg) return ret; } #endif + +struct blk_cmd { + blk_status_t status; + bool nowait; +}; + +static void blk_cmd_complete(struct io_uring_cmd *cmd, unsigned int issue_flags) +{ + struct blk_cmd *bc = io_uring_cmd_to_pdu(cmd, struct blk_cmd); + int res = blk_status_to_errno(bc->status); + + if (res == -EAGAIN && bc->nowait) + io_uring_cmd_issue_blocking(cmd); + else + io_uring_cmd_done(cmd, res, 0, issue_flags); +} + +static void bio_cmd_end(struct bio *bio) +{ + struct io_uring_cmd *cmd = bio->bi_private; + struct blk_cmd *bc = io_uring_cmd_to_pdu(cmd, struct blk_cmd); + + if (unlikely(bio->bi_status) && !bc->status) + bc->status = bio->bi_status; + + io_uring_cmd_do_in_task_lazy(cmd, blk_cmd_complete); + bio_put(bio); +} + +static int blkdev_cmd_discard(struct io_uring_cmd *cmd, + struct block_device *bdev, + uint64_t start, uint64_t len, bool nowait) +{ + sector_t sector = start >> SECTOR_SHIFT; + sector_t nr_sects = len >> SECTOR_SHIFT; + struct bio *prev = NULL, *bio; + int err; + + if (!bdev_max_discard_sectors(bdev)) + return -EOPNOTSUPP; + + err = blk_validate_write(bdev, file_to_blk_mode(cmd->file), start, len); + if (err) + return err; + err = filemap_invalidate_pages(bdev->bd_mapping, start, + start + len - 1, nowait); + if (err) + return err; + + while ((bio = blk_alloc_discard_bio(bdev, §or, &nr_sects, + GFP_KERNEL))) { + if (nowait) { + /* + * Don't allow multi-bio non-blocking submissions as + * subsequent bios may fail but we won't get direct + * feedback about that. Normally, the caller should + * retry from a blocking context. + */ + if (unlikely(nr_sects)) { + bio_put(bio); + return -EAGAIN; + } + bio->bi_opf |= REQ_NOWAIT; + } + prev = bio_chain_and_submit(prev, bio); + } + if (!prev) + return -EFAULT; + + prev->bi_private = cmd; + prev->bi_end_io = bio_cmd_end; + submit_bio(prev); + return -EIOCBQUEUED; +} + +int blkdev_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) +{ + struct block_device *bdev = I_BDEV(cmd->file->f_mapping->host); + struct blk_cmd *bc = io_uring_cmd_to_pdu(cmd, struct blk_cmd); + const struct io_uring_sqe *sqe = cmd->sqe; + u32 cmd_op = cmd->cmd_op; + uint64_t start, len; + + if (unlikely(sqe->ioprio || sqe->__pad1 || sqe->len || + sqe->rw_flags || sqe->file_index)) + return -EINVAL; + + bc->status = BLK_STS_OK; + bc->nowait = issue_flags & IO_URING_F_NONBLOCK; + + start = READ_ONCE(sqe->addr); + len = READ_ONCE(sqe->addr3); + + switch (cmd_op) { + case BLOCK_URING_CMD_DISCARD: + return blkdev_cmd_discard(cmd, bdev, start, len, bc->nowait); + } + return -EINVAL; +} diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index 753971770733..0016e38ed33c 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -208,6 +208,8 @@ struct fsxattr { * (see uapi/linux/blkzoned.h) */ +#define BLOCK_URING_CMD_DISCARD 0 + #define BMAP_IOCTL 1 /* obsolete - kept for compatibility */ #define FIBMAP _IO(0x00,1) /* bmap access */ #define FIGETBSZ _IO(0x00,2) /* get the block size used for bmap */