From patchwork Thu Oct 17 16:09:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13840326 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 880331D95B5 for ; Thu, 17 Oct 2024 16:13:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729181588; cv=none; b=aNGgxnwWhkgtpE5DxJTF7t5FX3/VvOm57zReLh3x+9btjomryksHI3hixNw57xDmGCdcUdZiG0Lv6hHu7Ne3FcnN1tvJ03zL4Dhq7v7F9IXJ2yTFzpeePXjX1D0inkOIEy11jLbpclj6T9KGtXbgTzXC80Chs0u6D0XuhSJmoWQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729181588; c=relaxed/simple; bh=RaJWIJh/DiVAdUI7KCm5HwoZmUdvvS0hR2E+rcnqMhg=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jhoadkSlLnJOX7BXYKTYltbHNcBW6yz1tHmf/nrNgViDAiVZW2WRpfj1KRQOHJ17a4dvusVVZ/G16ax5nV2Kxr9CXjrBDojVvYCEcGeBggEjUAtY0f6ArK1kK1dH0x39wU3F26zcRfVKV8j9+bPRozm6hUr/oX9Pq7IzjoTeVnc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=R3hoP7VR; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="R3hoP7VR" Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.18.1.2/8.18.1.2) with ESMTP id 49HChDXC024703 for ; Thu, 17 Oct 2024 09:13:02 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=iWtes4q52ch0yD+cMaKAQwN0cKm8w/0t0MM6PKPQXDs=; b=R3hoP7VRBd3E szLhqAtpjscsyJaGsll4Wyj6CbEk4HtJnWtZ1dUcp4TQ0wh12uL0xq7gO372ZOnk P5QzD70MtVn7feHhSYVF3SzjB6ddQIyi/fT+y2VGhuAqHzZdi8BWFhOYMFFcn0nP qYr48pzwYguZrTCabgV51PhnRxTBHjVSzkrwNG7MdxfIbd+q+jNNnkidyuErSXR5 zOxRqpMvwypR3VU2fSq4l0sPqww6+99rDPDAq00+SWYhCng1QLZGWe983sWXgIpW RVo7kajYIOBwlxT5vEim3ZkMVHd6iZSQevoz3EfMazJZw2nADudlkenuDccN9Ulr EiGsGGAqMg== Received: from maileast.thefacebook.com ([163.114.130.16]) by m0001303.ppops.net (PPS) with ESMTPS id 42a8wmb2by-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 17 Oct 2024 09:13:02 -0700 (PDT) Received: from twshared4354.35.frc1.facebook.com (2620:10d:c0a8:fe::f072) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Thu, 17 Oct 2024 16:12:59 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 16DCD143A4AA1; Thu, 17 Oct 2024 09:09:57 -0700 (PDT) From: Keith Busch To: , , , , CC: , , , Nitesh Shetty , Hannes Reinecke , Keith Busch Subject: [PATCHv8 1/6] block, fs: restore kiocb based write hint processing Date: Thu, 17 Oct 2024 09:09:32 -0700 Message-ID: <20241017160937.2283225-2-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241017160937.2283225-1-kbusch@meta.com> References: <20241017160937.2283225-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: 7_aiPOcYKegcvp5dpBxr4G6-i2keVuZU X-Proofpoint-ORIG-GUID: 7_aiPOcYKegcvp5dpBxr4G6-i2keVuZU X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_03,2024-10-04_01,2024-09-30_01 From: Kanchan Joshi struct kiocb has a 2 bytes hole that developed post commit 41d36a9f3e53 ("fs: remove kiocb.ki_hint"). But write hint returned with commit 449813515d3e ("block, fs: Restore the per-bio/request data lifetime fields"). This patch uses the leftover space in kiocb to carve 2 byte field ki_write_hint. Restore the code that operates on kiocb to use ki_write_hint instead of inode hint value. This does not change any behavior, but needed to enable per-io hints. Signed-off-by: Kanchan Joshi Signed-off-by: Nitesh Shetty Reviewed-by: Hannes Reinecke Signed-off-by: Keith Busch --- block/fops.c | 6 +++--- fs/aio.c | 1 + fs/cachefiles/io.c | 1 + fs/direct-io.c | 2 +- fs/iomap/direct-io.c | 2 +- include/linux/fs.h | 8 ++++++++ io_uring/rw.c | 1 + 7 files changed, 16 insertions(+), 5 deletions(-) diff --git a/block/fops.c b/block/fops.c index e696ae53bf1e0..85b9b97d372c8 100644 --- a/block/fops.c +++ b/block/fops.c @@ -74,7 +74,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb, bio_init(&bio, bdev, vecs, nr_pages, dio_bio_write_op(iocb)); } bio.bi_iter.bi_sector = pos >> SECTOR_SHIFT; - bio.bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint; + bio.bi_write_hint = iocb->ki_write_hint; bio.bi_ioprio = iocb->ki_ioprio; if (iocb->ki_flags & IOCB_ATOMIC) bio.bi_opf |= REQ_ATOMIC; @@ -203,7 +203,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, for (;;) { bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT; - bio->bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint; + bio->bi_write_hint = iocb->ki_write_hint; bio->bi_private = dio; bio->bi_end_io = blkdev_bio_end_io; bio->bi_ioprio = iocb->ki_ioprio; @@ -319,7 +319,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb, dio->flags = 0; dio->iocb = iocb; bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT; - bio->bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint; + bio->bi_write_hint = iocb->ki_write_hint; bio->bi_end_io = blkdev_bio_end_io_async; bio->bi_ioprio = iocb->ki_ioprio; diff --git a/fs/aio.c b/fs/aio.c index e8920178b50f7..db618817e670d 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1517,6 +1517,7 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb, int rw_type) req->ki_flags = req->ki_filp->f_iocb_flags | IOCB_AIO_RW; if (iocb->aio_flags & IOCB_FLAG_RESFD) req->ki_flags |= IOCB_EVENTFD; + req->ki_write_hint = file_write_hint(req->ki_filp); if (iocb->aio_flags & IOCB_FLAG_IOPRIO) { /* * If the IOCB_FLAG_IOPRIO flag of aio_flags is set, then diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c index 6a821a959b59e..c3db102ae64e2 100644 --- a/fs/cachefiles/io.c +++ b/fs/cachefiles/io.c @@ -309,6 +309,7 @@ int __cachefiles_write(struct cachefiles_object *object, ki->iocb.ki_pos = start_pos; ki->iocb.ki_flags = IOCB_DIRECT | IOCB_WRITE; ki->iocb.ki_ioprio = get_current_ioprio(); + ki->iocb.ki_write_hint = file_write_hint(file); ki->object = object; ki->start = start_pos; ki->len = len; diff --git a/fs/direct-io.c b/fs/direct-io.c index bbd05f1a21453..73629e26becbe 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -409,7 +409,7 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio, bio->bi_end_io = dio_bio_end_io; if (dio->is_pinned) bio_set_flag(bio, BIO_PAGE_PINNED); - bio->bi_write_hint = file_inode(dio->iocb->ki_filp)->i_write_hint; + bio->bi_write_hint = dio->iocb->ki_write_hint; sdio->bio = bio; sdio->logical_offset_in_bio = sdio->cur_page_fs_offset; diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index f637aa0706a31..fff43f121ee65 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -397,7 +397,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits, GFP_KERNEL); bio->bi_iter.bi_sector = iomap_sector(iomap, pos); - bio->bi_write_hint = inode->i_write_hint; + bio->bi_write_hint = dio->iocb->ki_write_hint; bio->bi_ioprio = dio->iocb->ki_ioprio; bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; diff --git a/include/linux/fs.h b/include/linux/fs.h index 3559446279c15..04e875a37f604 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -370,6 +370,7 @@ struct kiocb { void *private; int ki_flags; u16 ki_ioprio; /* See linux/ioprio.h */ + u16 ki_write_hint; union { /* * Only used for async buffered reads, where it denotes the @@ -2337,12 +2338,18 @@ static inline bool HAS_UNMAPPED_ID(struct mnt_idmap *idmap, !vfsgid_valid(i_gid_into_vfsgid(idmap, inode)); } +static inline enum rw_hint file_write_hint(struct file *filp) +{ + return file_inode(filp)->i_write_hint; +} + static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp) { *kiocb = (struct kiocb) { .ki_filp = filp, .ki_flags = filp->f_iocb_flags, .ki_ioprio = get_current_ioprio(), + .ki_write_hint = file_write_hint(filp), }; } @@ -2353,6 +2360,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src, .ki_filp = filp, .ki_flags = kiocb_src->ki_flags, .ki_ioprio = kiocb_src->ki_ioprio, + .ki_write_hint = kiocb_src->ki_write_hint, .ki_pos = kiocb_src->ki_pos, }; } diff --git a/io_uring/rw.c b/io_uring/rw.c index 80ae3c2ebb70c..ffd637ca0bd17 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -1027,6 +1027,7 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags) if (unlikely(ret)) return ret; req->cqe.res = iov_iter_count(&io->iter); + rw->kiocb.ki_write_hint = file_write_hint(rw->kiocb.ki_filp); if (force_nonblock) { /* If the file doesn't support async, just async punt */ From patchwork Thu Oct 17 16:09:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13840313 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E9FA11DBB13 for ; Thu, 17 Oct 2024 16:10:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729181424; cv=none; b=XpcPMw4fPEBJ6s7P/G2vyHj/tVnwckWSMsfCn3os5yGniuoMvu2bSVemSKU60UMa4u9WQtS/oOTihmHZCJkUsH2mv2pFt/hF5o2SH8EoCst3iPiJ1uSa+Rss+tqj5PMZ3qPbV+LBkxO8RT/0Tk0N6aWs1EGyKcBYgYIZHwQiIA8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729181424; c=relaxed/simple; bh=C/WBRiycOZQcPwrC1sW8w8AXc0VnrI4DNeQOyGBCoeE=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=iuSjELf0P0uacW+oitrLvYxTkOoy3wosvnMOESN101x8Ahvp/p3DdvxRmDYOTO4cC6gbO6Rz0fd+lgEwIWyRInwCuiw0u8KkBou+YIeS0Z0xb6TK2E8pt4IYVe4GfQqKYQa7nYHFlEBtmlyIlsb2CzSfD7GtqqLG9aA/t+FEvJY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=c31A6XtI; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="c31A6XtI" Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49HCh1EN006877 for ; Thu, 17 Oct 2024 09:10:22 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=F8aJRbYmwU4rV7+ts4Kj/9zivoR8gg/b0xX0kqcFNzw=; b=c31A6XtIm9Ci gcYKQ3yho/Rzn03fosS0zovJ8pJxkiDkTSxJ159qOZNIwJxcwVflTWKo9WDoAC66 4x1hvBwCv0zqpUOLVYfMIVOhFzhbbi2aaAlPXRIZ5mLfqrjGooGC3BATWk07ACgF ISkgHrk0D9MzomGroaLmlrROTq1Kud2GwcAQNVgT4tSLjY3rAUzN74yqTNAGL7Nt Nu5aaGNgCVhnoUZjbClcnQ+vfc/OHNWN3j0a3NKrUkPCj0DVq3Ohvl7n5HnU5Hm0 N/pDuGDEvvpQ5S+1z3MpjIKULRoFcErSyOhneyx/PoJIEOqE+HtgYBy5X8Mq+fHA Vy02XrAjUQ== Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42ar0mn3pw-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 17 Oct 2024 09:10:22 -0700 (PDT) Received: from twshared16035.07.ash9.facebook.com (2620:10d:c0a8:1c::1b) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Thu, 17 Oct 2024 16:10:17 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 2D3EF143A4AB4; Thu, 17 Oct 2024 09:10:01 -0700 (PDT) From: Keith Busch To: , , , , CC: , , , Keith Busch Subject: [PATCHv8 2/6] block: use generic u16 for write hints Date: Thu, 17 Oct 2024 09:09:33 -0700 Message-ID: <20241017160937.2283225-3-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241017160937.2283225-1-kbusch@meta.com> References: <20241017160937.2283225-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: HHyzj1eQxHxLSOYNOLfM5vwVZoHEnWT2 X-Proofpoint-GUID: HHyzj1eQxHxLSOYNOLfM5vwVZoHEnWT2 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_02,2024-10-04_01,2024-09-30_01 From: Keith Busch This is still backwards compatible with lifetime hints. It just doesn't constrain the hints to that definition. Signed-off-by: Keith Busch Reviewed-by: Hannes Reinecke --- include/linux/blk-mq.h | 3 +-- include/linux/blk_types.h | 2 +- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 59e9adf815a49..bf007a4081d9b 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -8,7 +8,6 @@ #include #include #include -#include struct blk_mq_tags; struct blk_flush_queue; @@ -156,7 +155,7 @@ struct request { struct blk_crypto_keyslot *crypt_keyslot; #endif - enum rw_hint write_hint; + unsigned short write_hint; unsigned short ioprio; enum mq_rq_state state; diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index dce7615c35e7e..56b7fb961e0c7 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -219,7 +219,7 @@ struct bio { */ unsigned short bi_flags; /* BIO_* below */ unsigned short bi_ioprio; - enum rw_hint bi_write_hint; + unsigned short bi_write_hint; blk_status_t bi_status; atomic_t __bi_remaining; From patchwork Thu Oct 17 16:09:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13840314 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CBC721DF267 for ; Thu, 17 Oct 2024 16:10:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729181427; cv=none; b=eVdM+GwkGmtMixC1DH8a2zJr3oHWwzGwGsTd1lf+3DdC1Gm6ptS9oMBCXiZ3llbQVLxZLcilIkKaRnBuIcF3vjUSVvKNotGd6oSspXFkCeG+bwcFCpdPjNJDWRr1NAGCE0i87DXpCJH4+PP2CvAbPi3SZ2lKO+6h9Cgfc2VGbak= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729181427; c=relaxed/simple; bh=MvS/i32WJkDXXnSscWSi1W93KMQWsKONvvpOgs3T5G8=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=HCx79TCbDKJ08knrJgS4tSJDbKyzzMLfm3tYbBLa45oz8272CvTtyCH1DDd4lSEoU541atoJBAqpZE7MMzH8wjOLsiB0RNHWbUIoIahHoIo5eu7lZlvanf0s8QxUVMQpqlTZxnfS5xNtVnvdNOcrOnqw4xVzBL1hznBjGMxHOAQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=Z0l5OvWa; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="Z0l5OvWa" Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49HCgsB6003824 for ; Thu, 17 Oct 2024 09:10:25 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=Fpv8CvzFpr/t1VctB+FPTead3v5vovwpLAkShY9F2ik=; b=Z0l5OvWaMysp KMXYFzntY/wCwA+7z/Vy9Rr775j9k3A6klqNjvB+mpfrF7lJPVNgIQhnSs895PbB RORv4KLNys0PkXh4a41Abi1nQSGGypY3zZolrJgRESkM/rYFT+L9r2SdVjDVMt8X xOgusRSl0hfQbCajU62s+eOetg7aj84z8fyNZJkKdog89sI8aowg4x+ZD38wQnVe DSRDZg7mnaO9WI49LAhe2R5dUnPY5uCwwNoL7iabLAuOkwSQIR7ncIrYT0Hc1up2 J5scsh3VbCLMLb4s0I4g6puiaJ/ZKbjDf4Kh2T1iWKHRSy1dARN5q5N8o9/dg7Lh bCvxd5huRw== Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42a9tjkbjn-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 17 Oct 2024 09:10:24 -0700 (PDT) Received: from twshared23455.15.frc2.facebook.com (2620:10d:c0a8:1c::11) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Thu, 17 Oct 2024 16:10:21 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id ADEF7143A4AD2; Thu, 17 Oct 2024 09:10:07 -0700 (PDT) From: Keith Busch To: , , , , CC: , , , Keith Busch Subject: [PATCHv8 3/6] block: introduce max_write_hints queue limit Date: Thu, 17 Oct 2024 09:09:34 -0700 Message-ID: <20241017160937.2283225-4-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241017160937.2283225-1-kbusch@meta.com> References: <20241017160937.2283225-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: YDktiGf3ORN5P8IniB9M1BEq_j9PcRFq X-Proofpoint-GUID: YDktiGf3ORN5P8IniB9M1BEq_j9PcRFq X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_02,2024-10-04_01,2024-09-30_01 From: Keith Busch Drivers with hardware that support write hints need a way to export how many are available so applications can generically query this. Signed-off-by: Keith Busch Reviewed-by: Hannes Reinecke --- Documentation/ABI/stable/sysfs-block | 7 +++++++ block/blk-settings.c | 3 +++ block/blk-sysfs.c | 3 +++ block/fops.c | 2 ++ include/linux/blkdev.h | 12 ++++++++++++ 5 files changed, 27 insertions(+) diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block index 8353611107154..f2db2cabb8e75 100644 --- a/Documentation/ABI/stable/sysfs-block +++ b/Documentation/ABI/stable/sysfs-block @@ -506,6 +506,13 @@ Description: [RO] Maximum size in bytes of a single element in a DMA scatter/gather list. +What: /sys/block//queue/max_write_hints +Date: October 2024 +Contact: linux-block@vger.kernel.org +Description: + [RO] Maximum number of write hints supported, 0 if not + supported. If supported, valid values are 1 through + max_write_hints, inclusive. What: /sys/block//queue/max_segments Date: March 2010 diff --git a/block/blk-settings.c b/block/blk-settings.c index a446654ddee5e..921fb4d334fa4 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -43,6 +43,7 @@ void blk_set_stacking_limits(struct queue_limits *lim) lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK; /* Inherit limits from component devices */ + lim->max_write_hints = USHRT_MAX; lim->max_segments = USHRT_MAX; lim->max_discard_segments = USHRT_MAX; lim->max_hw_sectors = UINT_MAX; @@ -544,6 +545,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, t->max_segment_size = min_not_zero(t->max_segment_size, b->max_segment_size); + t->max_write_hints = min(t->max_write_hints, b->max_write_hints); + alignment = queue_limit_alignment_offset(b, start); /* Bottom device has different alignment. Check that it is diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 741b95dfdbf6f..85f48ca461049 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -104,6 +104,7 @@ QUEUE_SYSFS_LIMIT_SHOW(max_segments) QUEUE_SYSFS_LIMIT_SHOW(max_discard_segments) QUEUE_SYSFS_LIMIT_SHOW(max_integrity_segments) QUEUE_SYSFS_LIMIT_SHOW(max_segment_size) +QUEUE_SYSFS_LIMIT_SHOW(max_write_hints) QUEUE_SYSFS_LIMIT_SHOW(logical_block_size) QUEUE_SYSFS_LIMIT_SHOW(physical_block_size) QUEUE_SYSFS_LIMIT_SHOW(chunk_sectors) @@ -457,6 +458,7 @@ QUEUE_RO_ENTRY(queue_max_hw_sectors, "max_hw_sectors_kb"); QUEUE_RO_ENTRY(queue_max_segments, "max_segments"); QUEUE_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments"); QUEUE_RO_ENTRY(queue_max_segment_size, "max_segment_size"); +QUEUE_RO_ENTRY(queue_max_write_hints, "max_write_hints"); QUEUE_RW_LOAD_MODULE_ENTRY(elv_iosched, "scheduler"); QUEUE_RO_ENTRY(queue_logical_block_size, "logical_block_size"); @@ -591,6 +593,7 @@ static struct attribute *queue_attrs[] = { &queue_max_discard_segments_entry.attr, &queue_max_integrity_segments_entry.attr, &queue_max_segment_size_entry.attr, + &queue_max_write_hints_entry.attr, &queue_hw_sector_size_entry.attr, &queue_logical_block_size_entry.attr, &queue_physical_block_size_entry.attr, diff --git a/block/fops.c b/block/fops.c index 85b9b97d372c8..d0b16d3975fd6 100644 --- a/block/fops.c +++ b/block/fops.c @@ -376,6 +376,8 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter) if (blkdev_dio_invalid(bdev, iocb->ki_pos, iter, is_atomic)) return -EINVAL; + if (iocb->ki_write_hint > bdev_max_write_hints(bdev)) + return -EINVAL; nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1); if (likely(nr_pages <= BIO_MAX_VECS)) { diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 6b78a68e0bd9c..01aba0ffeff6e 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -393,6 +393,8 @@ struct queue_limits { unsigned short max_integrity_segments; unsigned short max_discard_segments; + unsigned short max_write_hints; + unsigned int max_open_zones; unsigned int max_active_zones; @@ -1183,6 +1185,11 @@ static inline unsigned short queue_max_segments(const struct request_queue *q) return q->limits.max_segments; } +static inline unsigned short queue_max_write_hints(struct request_queue *q) +{ + return q->limits.max_write_hints; +} + static inline unsigned short queue_max_discard_segments(const struct request_queue *q) { return q->limits.max_discard_segments; @@ -1230,6 +1237,11 @@ static inline unsigned int bdev_max_segments(struct block_device *bdev) return queue_max_segments(bdev_get_queue(bdev)); } +static inline unsigned short bdev_max_write_hints(struct block_device *bdev) +{ + return queue_max_write_hints(bdev_get_queue(bdev)); +} + static inline unsigned queue_logical_block_size(const struct request_queue *q) { return q->limits.logical_block_size; From patchwork Thu Oct 17 16:09:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13840316 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 25BEF1DF72D for ; Thu, 17 Oct 2024 16:10:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729181435; cv=none; b=NN1FT9Zcrx0Q/a4EJCZEr28tjPQ6XWRyCPhTsRS6T0XGvCBbroaWuULjjARCKTMm02gukifq6uyyZWkqc39xLxvZtuT5ASGTjFW/XYyL0X6wYZiF9hNhrA6H0NgYiYpAmh8xvnZuH26MKSV8zS+Wb+soF2+aY7JdTCNbYv02DNM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729181435; c=relaxed/simple; bh=p/pdlwEOz7x4atHfPH22yaUj+cFfP6Ldj5Fdo1PQsAg=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=rwc/lXHo1nGyP55pSJqymOJva6YFIW4yCiZd/zzkTTUYLVnHSjrWp9L1HvxRiFxzsNEk1q/fmE/C1Td8cT8yEpzUSsUQVYh5vK1uAN2mOktLq7pU7PubYkSJ28IMaEII7YJx363d6mburwPAG6+IVRFRFG8/plAixl/A5HVsh/g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=IpsQFmON; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="IpsQFmON" Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49HCh1Ec006877 for ; Thu, 17 Oct 2024 09:10:33 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=8XyX8u59mNjnQPnGrxzikA04q0KGVl3u3qy1AjZng4g=; b=IpsQFmON4IMW CH/iLUNJvCbdArXLLYsgLxtHPtSmunWocZc6QSNJUgBUeoJdHYieF6TQTIksvdln 7Z9xPLofaq/X2vsdEDJpYSB3/+UjVzv4sj2ZUn0QEOT0uF2iekQaLEQyctRrWLdX a3qhXO7MdefRTbpkdejv4s7txNh2yjT4sESFXtxiAYfC1fNGbxu4rGibsrBrfYLK rKsphDuZCxF2Y1Eb+de/UdcJKLurqYJuescb0DpZ73MvSrYZ7iDi+4yWiH1US/Bu t5V+xB0T7zDgkhiWoiEfIX1KDgR1Zazem27gFeqbeoJZhg3ePEH9TDn6qMKhn+SC 6TsjdCIi7g== Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42ar0mn3pw-20 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 17 Oct 2024 09:10:33 -0700 (PDT) Received: from twshared4085.05.ash9.facebook.com (2620:10d:c0a8:1c::11) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Thu, 17 Oct 2024 16:10:23 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 2C20E143A4AFC; Thu, 17 Oct 2024 09:10:14 -0700 (PDT) From: Keith Busch To: , , , , CC: , , , Keith Busch Subject: [PATCHv8 4/6] fs: introduce per-io hint support flag Date: Thu, 17 Oct 2024 09:09:35 -0700 Message-ID: <20241017160937.2283225-5-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241017160937.2283225-1-kbusch@meta.com> References: <20241017160937.2283225-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: S66pGlGGspMRmo6c4WpGvke2uamuABLH X-Proofpoint-GUID: S66pGlGGspMRmo6c4WpGvke2uamuABLH X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_02,2024-10-04_01,2024-09-30_01 From: Keith Busch A block device may support write hints on a per-io basis. The raw block file operations can effectively use these, but real filesystems are not ready to make use of this. Provide a file_operations flag to indicate support, and set it for the block file operations. Signed-off-by: Keith Busch Reviewed-by: Hannes Reinecke --- block/fops.c | 2 +- include/linux/fs.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/block/fops.c b/block/fops.c index d0b16d3975fd6..15a63e26161ea 100644 --- a/block/fops.c +++ b/block/fops.c @@ -869,7 +869,7 @@ const struct file_operations def_blk_fops = { .splice_write = iter_file_splice_write, .fallocate = blkdev_fallocate, .uring_cmd = blkdev_uring_cmd, - .fop_flags = FOP_BUFFER_RASYNC, + .fop_flags = FOP_BUFFER_RASYNC | FOP_PER_IO_HINTS, }; static __init int blkdev_init(void) diff --git a/include/linux/fs.h b/include/linux/fs.h index 04e875a37f604..026dc9801dc20 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2117,6 +2117,8 @@ struct file_operations { #define FOP_HUGE_PAGES ((__force fop_flags_t)(1 << 4)) /* Treat loff_t as unsigned (e.g., /dev/mem) */ #define FOP_UNSIGNED_OFFSET ((__force fop_flags_t)(1 << 5)) +/* File system can handle per-io hints */ +#define FOP_PER_IO_HINTS ((__force fop_flags_t)(1 << 6)) /* Wrap a directory iterator that needs exclusive inode access */ int wrap_directory_iterator(struct file *, struct dir_context *, From patchwork Thu Oct 17 16:09:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13840315 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 184311D95B5 for ; Thu, 17 Oct 2024 16:10:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729181429; cv=none; b=ha7GV55bqum/q4z/rfbEnjBugp95301zrx8z94Vo7AMcXRuDVnP+rNfZ0Sn56EJdrmoipAQZ5BH9MeDxHY1G4gdbe6JQ/6hOsGu/KWWcVOcMHiI7wjDCFSKo+k7L0hiLRmLEdvwSDkNnkl3Z0zlFQPXeC0lMwsSGJ2HDpsX2SfU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729181429; c=relaxed/simple; bh=DbI4oJWTL68DZghETXmLNUCsxPzqSr/T+mWexsILi8E=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ifMhANIi4BD7zg4ONQnn77ZwxyW1trQdhtp7aLKvqLn7qfMC5qtGd2hKndQoRT5jdvC3/HeTRBkiwiu2qYigYgHbDyx2bAA8lwa4dmaCI0wRHzGn1WmZj7ZnB0cr/CYF0pkoKtIyXD36O5yJLw0K/+whaZWKOLiU+BgyNX11c5M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=FGgxopGS; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="FGgxopGS" Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49HCh1EP006877 for ; Thu, 17 Oct 2024 09:10:24 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=OwIAJzKtghZAIEiijCWk0Jg1lUOZ2HiT8OKL2/Nxeyw=; b=FGgxopGSbfeG cR3367fdrM3pzFWzkCm0xyt1u49tMM6qW0Z7/6+5SCc461cyI2xXVGqi3uJM6fsG lMXxrZPwCdyde3JV5anhVfd8PMiy/R3VOcda2oPgpa1kO4zSYpXHBuCH8gjbhgU2 FAvPafjsRyA/5Pd7GnVDojQJ5URdTvIaeZNhMzHoNfqffP3jAT/LX9EVNiP85was F4At58YRz8fjPmIfKdA4DyLWdq/8jvM8i329lYonUhjQRZW7kgAAuRoxxlRbZl/N XXZdNx9mYdOOFHcLBGChQW1PEjDvrwsXEHqqmmGKSfJeGWF36M+NxzKc9I8AxEib J4dHLplH2w== Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42ar0mn3pw-8 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 17 Oct 2024 09:10:24 -0700 (PDT) Received: from twshared29849.08.ash9.facebook.com (2620:10d:c0a8:fe::f072) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Thu, 17 Oct 2024 16:10:20 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id F0491143A4B14; Thu, 17 Oct 2024 09:10:18 -0700 (PDT) From: Keith Busch To: , , , , CC: , , , Nitesh Shetty , Keith Busch Subject: [PATCHv8 5/6] io_uring: enable per-io hinting capability Date: Thu, 17 Oct 2024 09:09:36 -0700 Message-ID: <20241017160937.2283225-6-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241017160937.2283225-1-kbusch@meta.com> References: <20241017160937.2283225-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: y4LOaWfSy7ELegzkg9zVlsDwP37dVSoL X-Proofpoint-GUID: y4LOaWfSy7ELegzkg9zVlsDwP37dVSoL X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_02,2024-10-04_01,2024-09-30_01 From: Kanchan Joshi With F_SET_RW_HINT fcntl, user can set a hint on the file inode, and all the subsequent writes on the file pass that hint value down. This can be limiting for block device as all the writes can be tagged with only one lifetime hint value. Concurrent writes (with different hint values) are hard to manage. Per-IO hinting solves that problem. Allow userspace to pass additional metadata in the SQE. __u16 write_hint; This accepts all hint values that the file allows. The write handlers (io_prep_rw, io_write) send the hint value to lower-layer using kiocb. This is good for upporting direct IO, but not when kiocb is not available (e.g., buffered IO). When per-io hints are not passed, the per-inode hint values are set in the kiocb (as before). Otherwise, per-io hints take the precedence over per-inode hints. Signed-off-by: Kanchan Joshi Signed-off-by: Nitesh Shetty Signed-off-by: Keith Busch Reviewed-by: Hannes Reinecke --- include/uapi/linux/io_uring.h | 4 ++++ io_uring/rw.c | 11 +++++++++-- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 86cb385fe0b53..bd9acc0053318 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -92,6 +92,10 @@ struct io_uring_sqe { __u16 addr_len; __u16 __pad3[1]; }; + struct { + __u16 write_hint; + __u16 __pad4[1]; + }; }; union { struct { diff --git a/io_uring/rw.c b/io_uring/rw.c index ffd637ca0bd17..9a6d3ba76af4f 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -279,7 +279,11 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe, rw->kiocb.ki_ioprio = get_current_ioprio(); } rw->kiocb.dio_complete = NULL; - + if (ddir == ITER_SOURCE && + req->file->f_op->fop_flags & FOP_PER_IO_HINTS) + rw->kiocb.ki_write_hint = READ_ONCE(sqe->write_hint); + else + rw->kiocb.ki_write_hint = WRITE_LIFE_NOT_SET; rw->addr = READ_ONCE(sqe->addr); rw->len = READ_ONCE(sqe->len); rw->flags = READ_ONCE(sqe->rw_flags); @@ -1027,7 +1031,10 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags) if (unlikely(ret)) return ret; req->cqe.res = iov_iter_count(&io->iter); - rw->kiocb.ki_write_hint = file_write_hint(rw->kiocb.ki_filp); + + /* Use per-file hint only if per-io hint is not set. */ + if (rw->kiocb.ki_write_hint == WRITE_LIFE_NOT_SET) + rw->kiocb.ki_write_hint = file_write_hint(rw->kiocb.ki_filp); if (force_nonblock) { /* If the file doesn't support async, just async punt */ From patchwork Thu Oct 17 16:09:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13840327 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 21F9A1DEFE3 for ; Thu, 17 Oct 2024 16:16:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729181770; cv=none; b=m70nMuWJE4pliltz/EDoQ2tgs/7PE1yMoHXVFTVazwSHJqljR3zqWRULleRjs8QgLyyMFoqH7yr1Se6oCrlHZ8LkyV40rsr3faNtrGQLzPHvrCn/JAgaw/A245QhCZiAT83SdD1Q7sGiQcY9A1pQ7K3X7AzwWeWMkkvBP/q1HRQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729181770; c=relaxed/simple; bh=OcGVtyrzhVu2x88GuEyv6IQ88OnuyHPqZaMqYeLXeVc=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=CUjCo31/6cqhschiA6Waiq9YOyX9HD5HlHPUhohZT1hCM/jsryph21znj/ODNvmJNJAb9Av6HwbsGFAbtzGEP1qzS+b9GZPLEZf9YxwdZvrsGYHg9lDK7DAtkdlNB+yaG7Np/laZ1fiu3RYkzFvXOyhOeHuwvsvfCxlg1epWqrw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=cn+CituJ; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="cn+CituJ" Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49HCh1El007042 for ; Thu, 17 Oct 2024 09:16:04 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=gfjX4gdVpoRZdOrqZPFH61tH5y9pgFNLY8DXmPMq2Fc=; b=cn+CituJl8mz SgWMJ/89fxaA4HYSuRkGCgOD/MxuxZUlwI16NzoLaiX45g0txojmnNDawO6ljnpo LgsXLyNV6uzU7QaG13o3kww32jCFeJbxKduJW/gN33DUtP+xGa4F+NlW9BtGDQlR 5PcD5xjXxZzlE77fVjyHYOmL1VJ/tjomFirTG1g0AL+1mXaJ8Qk6aPCXpBFhX4Mi oQThtQo0MXm+0Ffly8pgQSlZiyIkCUC2kpKtbBBY79lMCGYo+OJy+1EAF0iQ60Nh RJArAaGlhPKQvsxMuszZZv7WQCaKEYTXVjfVilxy0rqFR19lHBK6tC4STIlQU2m7 r0E7WnvJcw== Received: from mail.thefacebook.com ([163.114.134.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 42ar0mn5e4-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 17 Oct 2024 09:16:04 -0700 (PDT) Received: from twshared10900.35.frc1.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c08b:78::2ac9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Thu, 17 Oct 2024 16:16:00 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 99136143A4B18; Thu, 17 Oct 2024 09:10:19 -0700 (PDT) From: Keith Busch To: , , , , CC: , , , Hui Qi , Nitesh Shetty , Hannes Reinecke , Keith Busch Subject: [PATCHv8 6/6] nvme: enable FDP support Date: Thu, 17 Oct 2024 09:09:37 -0700 Message-ID: <20241017160937.2283225-7-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241017160937.2283225-1-kbusch@meta.com> References: <20241017160937.2283225-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: YO81mqgEz7yRhL1JmMSkQJaZcoPDQ9Ji X-Proofpoint-GUID: YO81mqgEz7yRhL1JmMSkQJaZcoPDQ9Ji X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_02,2024-10-04_01,2024-09-30_01 From: Kanchan Joshi Flexible Data Placement (FDP), as ratified in TP 4146a, allows the host to control the placement of logical blocks so as to reduce the SSD WAF. Userspace can send the write hint information using io_uring or fcntl. Fetch the placement-identifiers if the device supports FDP. The incoming write-hint is mapped to a placement-identifier, which in turn is set in the DSPEC field of the write command. Signed-off-by: Kanchan Joshi Signed-off-by: Hui Qi Signed-off-by: Nitesh Shetty Nacked-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Signed-off-by: Keith Busch --- drivers/nvme/host/core.c | 82 ++++++++++++++++++++++++++++++++++++++++ drivers/nvme/host/nvme.h | 5 +++ include/linux/nvme.h | 19 ++++++++++ 3 files changed, 106 insertions(+) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 43d73d31c66f3..02a36032c835f 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -44,6 +44,20 @@ struct nvme_ns_info { bool is_removed; }; +struct nvme_fdp_ruh_status_desc { + u16 pid; + u16 ruhid; + u32 earutr; + u64 ruamw; + u8 rsvd16[16]; +}; + +struct nvme_fdp_ruh_status { + u8 rsvd0[14]; + __le16 nruhsd; + struct nvme_fdp_ruh_status_desc ruhsd[]; +}; + unsigned int admin_timeout = 60; module_param(admin_timeout, uint, 0644); MODULE_PARM_DESC(admin_timeout, "timeout in seconds for admin commands"); @@ -657,6 +671,7 @@ static void nvme_free_ns_head(struct kref *ref) ida_free(&head->subsys->ns_ida, head->instance); cleanup_srcu_struct(&head->srcu); nvme_put_subsystem(head->subsys); + kfree(head->plids); kfree(head); } @@ -974,6 +989,13 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns, if (req->cmd_flags & REQ_RAHEAD) dsmgmt |= NVME_RW_DSM_FREQ_PREFETCH; + if (req->write_hint != WRITE_LIFE_NOT_SET && ns->head->nr_plids) { + u16 hint = max(req->write_hint, ns->head->nr_plids); + + dsmgmt |= ns->head->plids[hint - 1] << 16; + control |= NVME_RW_DTYPE_DPLCMT; + } + if (req->cmd_flags & REQ_ATOMIC && !nvme_valid_atomic_write(req)) return BLK_STS_INVAL; @@ -2114,6 +2136,52 @@ static int nvme_update_ns_info_generic(struct nvme_ns *ns, return ret; } +static int nvme_fetch_fdp_plids(struct nvme_ns *ns, u32 nsid) +{ + struct nvme_fdp_ruh_status_desc *ruhsd; + struct nvme_ns_head *head = ns->head; + struct nvme_fdp_ruh_status *ruhs; + struct nvme_command c = {}; + int size, ret, i; + + if (head->plids) + return 0; + + size = struct_size(ruhs, ruhsd, NVME_MAX_PLIDS); + ruhs = kzalloc(size, GFP_KERNEL); + if (!ruhs) + return -ENOMEM; + + c.imr.opcode = nvme_cmd_io_mgmt_recv; + c.imr.nsid = cpu_to_le32(nsid); + c.imr.mo = 0x1; + c.imr.numd = cpu_to_le32((size >> 2) - 1); + + ret = nvme_submit_sync_cmd(ns->queue, &c, ruhs, size); + if (ret) + goto out; + + i = le16_to_cpu(ruhs->nruhsd); + if (!i) + goto out; + + ns->head->nr_plids = min_t(u16, i, NVME_MAX_PLIDS); + head->plids = kcalloc(ns->head->nr_plids, sizeof(head->plids), + GFP_KERNEL); + if (!head->plids) { + ret = -ENOMEM; + goto out; + } + + for (i = 0; i < ns->head->nr_plids; i++) { + ruhsd = &ruhs->ruhsd[i]; + head->plids[i] = le16_to_cpu(ruhsd->pid); + } +out: + kfree(ruhs); + return ret; +} + static int nvme_update_ns_info_block(struct nvme_ns *ns, struct nvme_ns_info *info) { @@ -2150,6 +2218,19 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns, goto out; } + if (ns->ctrl->ctratt & NVME_CTRL_ATTR_FDPS) { + ret = nvme_fetch_fdp_plids(ns, info->nsid); + if (ret) + dev_warn(ns->ctrl->device, + "FDP failure status:0x%x\n", ret); + if (ret < 0) + goto out; + } else { + ns->head->nr_plids = 0; + kfree(ns->head->plids); + ns->head->plids = NULL; + } + blk_mq_freeze_queue(ns->disk->queue); ns->head->lba_shift = id->lbaf[lbaf].ds; ns->head->nuse = le64_to_cpu(id->nuse); @@ -2180,6 +2261,7 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns, if (!nvme_init_integrity(ns->head, &lim, info)) capacity = 0; + lim.max_write_hints = ns->head->nr_plids; ret = queue_limits_commit_update(ns->disk->queue, &lim); if (ret) { blk_mq_unfreeze_queue(ns->disk->queue); diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 313a4f978a2cf..d6b516c0e502c 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -454,6 +454,8 @@ struct nvme_ns_ids { u8 csi; }; +#define NVME_MAX_PLIDS (NVME_CTRL_PAGE_SIZE / sizeof(16)) + /* * Anchor structure for namespaces. There is one for each namespace in a * NVMe subsystem that any of our controllers can see, and the namespace @@ -490,6 +492,9 @@ struct nvme_ns_head { struct device cdev_device; struct gendisk *disk; + + u16 nr_plids; + u16 *plids; #ifdef CONFIG_NVME_MULTIPATH struct bio_list requeue_list; spinlock_t requeue_lock; diff --git a/include/linux/nvme.h b/include/linux/nvme.h index b58d9405d65e0..a954eaee5b0f3 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -275,6 +275,7 @@ enum nvme_ctrl_attr { NVME_CTRL_ATTR_HID_128_BIT = (1 << 0), NVME_CTRL_ATTR_TBKAS = (1 << 6), NVME_CTRL_ATTR_ELBAS = (1 << 15), + NVME_CTRL_ATTR_FDPS = (1 << 19), }; struct nvme_id_ctrl { @@ -843,6 +844,7 @@ enum nvme_opcode { nvme_cmd_resv_register = 0x0d, nvme_cmd_resv_report = 0x0e, nvme_cmd_resv_acquire = 0x11, + nvme_cmd_io_mgmt_recv = 0x12, nvme_cmd_resv_release = 0x15, nvme_cmd_zone_mgmt_send = 0x79, nvme_cmd_zone_mgmt_recv = 0x7a, @@ -864,6 +866,7 @@ enum nvme_opcode { nvme_opcode_name(nvme_cmd_resv_register), \ nvme_opcode_name(nvme_cmd_resv_report), \ nvme_opcode_name(nvme_cmd_resv_acquire), \ + nvme_opcode_name(nvme_cmd_io_mgmt_recv), \ nvme_opcode_name(nvme_cmd_resv_release), \ nvme_opcode_name(nvme_cmd_zone_mgmt_send), \ nvme_opcode_name(nvme_cmd_zone_mgmt_recv), \ @@ -1015,6 +1018,7 @@ enum { NVME_RW_PRINFO_PRCHK_GUARD = 1 << 12, NVME_RW_PRINFO_PRACT = 1 << 13, NVME_RW_DTYPE_STREAMS = 1 << 4, + NVME_RW_DTYPE_DPLCMT = 2 << 4, NVME_WZ_DEAC = 1 << 9, }; @@ -1102,6 +1106,20 @@ struct nvme_zone_mgmt_recv_cmd { __le32 cdw14[2]; }; +struct nvme_io_mgmt_recv_cmd { + __u8 opcode; + __u8 flags; + __u16 command_id; + __le32 nsid; + __le64 rsvd2[2]; + union nvme_data_ptr dptr; + __u8 mo; + __u8 rsvd11; + __u16 mos; + __le32 numd; + __le32 cdw12[4]; +}; + enum { NVME_ZRA_ZONE_REPORT = 0, NVME_ZRASF_ZONE_REPORT_ALL = 0, @@ -1822,6 +1840,7 @@ struct nvme_command { struct nvmf_auth_receive_command auth_receive; struct nvme_dbbuf dbbuf; struct nvme_directive_cmd directive; + struct nvme_io_mgmt_recv_cmd imr; }; };